Last week’s technology outage – the world’s worst to date – shocked many and exposed the fundamental fragility of our digital networks, but for those working on the front lines of cyber security, it was just a taste of things to come. to come
In the often secretive world of network protection, where well-heeled clients are kept out of the media and details of hacks and paid bounties are swept under the rug, Friday’s “blue screen of death” meltdown was not entirely unexpected.
“These things are going to happen periodically, and we’re just going to have to prepare for them to somehow recover as quickly and smartly as we can,” says Jeffrey Dodson, a leading cybersecurity expert in Washington, DC.
“No matter how much money we pour into it, or how much process, or how much oversight, occasionally mistakes will be made.”
After the chaos created by last week’s flawed CrowdStrike software update, which crashed millions of computers around the world, disrupting air travel, hospitals, banks and more, it might seem inconceivable that such a problem would arise again.
But Mr Dodson, known in the industry as “JC”, is determined to comment. Once in charge of security for defense multinational BAE Systems, he is now a partner with an exclusive referral-only global risk advisory firm.
Security customers like companies and governments face a dilemma, Dodson says. They should trust third-party vendors like CrowdStrike with the keys to the systems they protect, but after last Friday, they don’t.
“Let’s just say we don’t trust them…what do we do?” he says.
“I think that’s the right question, but am I going to try another product and maybe that product isn’t as good?
“Or do I basically accept that … these things will happen?”
Why does the internet suddenly feel so weak?
An outage like this has been coming for years. As the Internet grew and online networks proliferated, Mr. Dodson and others observed two overlapping trends.
First, cybercrime became big business, with well-resourced hacking squads backed by nation states collecting billions in rewards and trillions in damages each year.
The Internet was also changing and becoming more vulnerable to major outages. What looked like a diverse collection of websites and platforms from the outside was, when you looked under the hood, increasingly standardized and homogenous.
Many companies and organizations share the same accounting software, payroll services, website hosting, third-party security or cloud-based storage. Apple’s iCloud, for example, dominated the mobile storage market.
Most of the Internet traffic passed through the networks of a few large companies. If these deep, mostly hidden layers were compromised, the Internet broke. If attackers had access to these shared layers, they could infect many computers.
There were warning signs.
For example, in 2017, a new type of ransomware called NotPetya infected thousands of computers around the world. It is believed to have spread by masquerading as a legitimate update to a widely used accounting software. More on how it went down, and its lingering effects, later.
The rise of cybercrime fueled the second trend: Internet security. Security vendors like CrowdStrike asked customers for more privileged access to the computer they were paid to protect. Known as “endpoint security,” this new generation of threat detection dived deep into operating systems.
Here, at the heart of vital global information flows, these threat detectors received automatic updates several times a day, telling them how to detect the latest type of malware or block a newly discovered vulnerability.
To avoid delays, updates occurred on millions of client computers almost simultaneously.
And CrowdStrike is the world’s largest endpoint security company.
Last Friday, the wheels fell off. The hidden layer was compromised, though (unlike, say, NotPetya) it appears to have been through human error rather than malicious activity.
A CrowdStrike update froze clients’ computers.
The frantic, high-risk, dangerous work of preventing a global outage caused a global outage.
Mr. Dodson puts it this way: “There is no free lunch.”
Network protection carries a risk.
“There will always be some amount of risk that you work with in whatever path you choose.”
No retreat from the fight to the death of cyber security
So what’s the plan now?
The sight of blue error screens on supermarket self-checkout machines and roadside signs opened the public’s eyes to the internet’s house of cards and the extent of the covert security apparatus needed to protect its networks.
As internet traffic resumed, the conversation turned to what it would take to prevent another major outage.
Some questioned the wisdom of effectively giving a third-party vendor like CrowdStrike the keys to the core of an operating system, known as the “kernel.” This allowed the Texas-based company to remotely and freely update the kernel.
In addition to the inherent risk of this arrangement, when the kernel failed after a faulty update, as it did last Friday, it could only be replaced by manual access. Administrators had to connect a physical keyboard to each affected system, boot into safe mode, remove the corrupt CrowdStrike update, and then reboot.
Over the weekend, Microsoft released a recovery tool to speed up the process, although it could still take weeks to restore all affected sectors online.
Would another outage prevent the return of privileged system access to CrowdStrike, or its competitors?
Johanna Weaver, founding director of the Technical Policy Design Center at the Australian National University, reacts with horror at the idea.
“If they don’t have that level of endpoint access, they’re not able to stop threats before they get into customer networks.”
Robert Potter, managing partner of Canberra-based Cyber Activities Group, sees it the same way. With cybersecurity locked in a death grip with its adversaries, there is no turning back.
“The tricky secret of antivirus is that it needs the same permissions as malware. To stop malware, you need root permissions and administrator privileges.”
Diffusion of risk
After last week, security customers will protect themselves from the risk by dividing their network into segments, each running security from different vendors, Mr. Dodson says. If an attack or accident takes one segment, the others can go limp.
“But that’s a luxury that only big companies can afford,” he says.
Some customers may require that third-party security updates for their systems be distributed to other less impactful customers before them to test if they are defective.
“[They may say] “If it works for the commercial world, then I’m willing to accept it. I don’t want to be your guinea pig.”
Loading…
Potter agrees that some organizations will end up with multiple cybersecurity vendors.
“The consensus among most cyber executives after Friday is … diversity in software is key.”
But even with the best measures in place, we cannot completely eliminate the risk of another outage, says Professor Weaver. In fact, she believes “it will happen again.”
“It’s inevitable,” she says.
And the next one could be much worse than last week’s.
“The thing that keeps me up at night is an outage that disrupts telecoms at the same time as an outage that disrupts power, water and transportation, which means food,” says Professor Weaver.
“Then you have a society that quickly descends into chaos.”
Analog shield against digital outages
The benchmark for a nation-damaging cyberattack is 2017’s NotPetya, which targeted Ukraine.
On June 27, a series of powerful cyber attacks hit hospitals, energy companies, airports, banks, ATMs, transportation and federal government agencies. They even removed the radiation monitoring system at Ukraine’s Chernobyl nuclear power plant.
NotPetya wiped out computers, froze commerce and paralyzed government functions for days, but Ukraine was able to fall back on manual workarounds to keep critical infrastructure running and limit the damage, Professor Weaver says.
After Friday’s outage, Professor Weaver called on the Australian Government to consider ways of operating critical infrastructure without the use of IT services, including adding analogue backups.
“If the telecom is down and the media is offline, how do we communicate with the public?
For the telecom network, adding an analog backup may involve reinstalling copper wires (which carry an analog signal) in some areas.
“If we had copper wire … it would allow limited functionality in a time of crisis,” says Professor Weaver.
“It’s not just about faster, faster NBN [National Broadband Network].”
The Critical Infrastructure Security Act, which passed in two installments in 2021 and 2022, goes some way to protecting critical infrastructure from cyberattacks.
Companies in sectors from transportation to energy and defense to food must notify the government of cyber incidents and develop risk management plans.
“But the legislation does not cover the wider issue of resilience,” says Professor Weaver.
“How can we make sure the network continues to work when these types of outages occur?”
She says that before last week, her warnings of a national disaster were dismissed as “catastrophic”.
“Now we can see that these are real questions that we must answer calmly.
“Do we as a nation accept that this is going to happen and put measures in place that will help prevent chaos?”
Loading…
Get all the latest science stories from across the ABC.