In testimony before the United States House of Representatives on Tuesday, Adam Meyers, senior VP of counter adversary operations at CrowdStrike, apologized for the worldwide technology outage this past summer that disabled millions of Microsoft Windows machines across healthcare and other industries.
In his appearance before the House Homeland Security Committee’s subcommittee on cybersecurity and infrastructure protection, Meyers answered lawmakers’ technical questions and provided assurances about updated protocols.
Though some subcommittee members acknowledged the company’s “humility” in addressing the error after the incident, they asked pointed questions about what the company has done since to ensure it’s not repeated.
WHY IT MATTERS
Representative Eric Swalwell, D-California, co-chair of the subcommittee, reiterated the need for a high degree of assurance that CrowdStrike will employ rigorous quality assurance processes.
“A global IT outage that impacts every sector of the economy is a catastrophe that we would expect to see in a movie,” and one perpetrated by a malicious nation adversary, said Mark Green, chairman of the House Homeland Security Committee.
“To add insult to injury, the largest IT outage in history was due to a mistake,” one that left many Americans and allies globally unable to call 911, grounded while traveling or their healthcare delayed with scheduled medical procedures canceled, he said.
“We have undertaken a full review of our systems and begun implementing plans to bolster our content update procedures so that we emerge from this experience as a stronger company,” Meyers assured in his testimony.
CrowdStrike Falcon uses artificial intelligence and machine learning models to act on the latest advanced cyber threats and then pushes out content updates to its customers so their systems will recognize those threats, and defend against them.
In Crowdstrike’s External Technical Root Cause Analysis, dated August 6, the company outlined its findings and mitigations to ensure that the error – which Green pointed out shutdown 8.5 million devices and caused $5.4B in losses – is not repeated.
A key item that Meyers addressed during the 90-minute session is the now staged deployment of such updates to minimize widespread failures – such as system crashes.
Content updates “that have passed canary testing are to be successively promoted to wider deployment rings or rolled back if problems are detected,” Crowdstrike said in the analysis, which Meyers described for lawmakers as concentric rings.
Green acknowledged that Crowdstrike has “shown the right attitude,” since the incident, but he wanted to know if artificial intelligence initiated the update. It did not, Meyers explained.
Congressman Mike Ezell, R-Mississippi, asked why a manual solution was required to get systems back up and running given the dearth of skilled workforce in many areas – such as in his rural district.
Meyers said that while he drove 10 hours to get a customer back up and running, the bulk of the recovery came when Crowdstrike deployed an automated process the following day.
Morgan Luttrell, R-Texas, wanted to know, “Where exactly did [the Content Validator] fail?”
“The validator itself has been in place for over a decade and we’ve released 10-12 of these updates every day,” Meyers explained.
“It tested clean, or good, and that’s why it was allowed to roll out,” Meyers said, which was explained as a logic error in the company’s August 6 analysis. But, he said a rule with faulty logic that caused the sensor to fail would now be detected under Crowdstrike’s revised procedures.
THE LARGER TREND
The faulty update pushed to Windows early that Friday morning in July caused millions of computers to freeze on the infamous “blue screen of death,” disrupting care delivery at hospitals, health systems and medical practices in the U.S. and worldwide.
Providers began working manually to provide patient care in the absence of access to electronic health records and other mission-critical IT systems. While most affected healthcare organizations recovered from the CrowdStrike outage within days, the incident highlighted how third-party technology disruptions can impact patient care.
“This incident demonstrates the interconnected nature of our broad ecosystem – global cloud providers, software platforms, security vendors and other software vendors and customers,” Microsoft acknowledged in a statement July 20.
In a July contributed piece for Healthcare IT News, Christopher Frenz, information security officer and AVP of IT Security at Mount Sinai South Nassau, said the CrowdStrike outage should serve as a wakeup call for health system IT and security leaders: Security controls don’t just fail in major events, but are always at risk. That’s why hospitals and other providers need to invest in security architectures that provide control resiliency, he said.
ON THE RECORD
“On behalf of everyone at CrowdStrike, I want to apologize,” Meyers said in his statement to lawmakers. “We are deeply sorry this happened and are determined to prevent it from happening again. We appreciate the incredible round-the-clock efforts of our customers and partners who, working alongside our teams, mobilized immediately to restore systems and bring many back online within hours.
“I can assure you that we continue to approach this with a great sense of urgency,” he added.
Andrea Fox is senior editor of Healthcare IT News.
Email: afox@himss.org
Healthcare IT News is a HIMSS Media publication.