On Thursday 6th December, 2018, I realized how dependent I was on my mobile phone having an internet connection. That particular day, I was out and about away from Wi-Fi networks. The first time I noticed I had no connectivity was when I used my phone to check if my train was on time.
As I got close to London, I realized I was not the only person who did not have data services on their devices, as I overheard a few people commenting on no connectivity.
During the day, I found I wasn’t able to communicate with my emails, WhatsApp and other services, which depended on having 3G or 4G connectivity. I felt isolated and cut off. Only when I found a coffee shop and connected to Wi-Fi did my phone spring to life with lots of push notifications, messages and emails.
It’s around that time that I learned the O2’s 4G network had gone offline earlier in the day.
Although this event did not significantly affect my life that day, I wondered what impact the outage had on the millions of other people who were also on the same mobile phone network and the networks of affiliates, such as Giffgaff, Tesco Mobile and Lycamobile. There are surely organizations like emergency services that are surely dependent on constant service from their mobile provider.
I know that transportation for London’s electronic timetable services at bus stops stopped working; they were reliant on the network at the time of the service interruption. And as taxi firms use the network to authorize card payments, these companies might have lost fares.
This outage didn’t just impact UK customers, either. SoftBank in Japan was affected, and it’s believed other mobile operators around the globe also experienced some downtime..
O2 and news agencies did well to keep affected customers informed, reassuring them that the teams were working around the clock to rectify the problem.
We are aware our customers are unable to use data this morning. Our technical teams are working on the issue with high priority. We are really sorry and working as hard and as fast as we can to fix this. Please keep an eye on our status checker: https://t.co/O8fb26fNIv
— O2 in the UK (@O2) December 6, 2018
The following day, service had been fully restored, and an explanation of what caused the outage came out very quickly.
The Root Cause: SSL Certificates
O2 and Ericsson issued a joint statement of an apology. Ericsson was quick to identify the root cause of the outage was an expired certificate which their software failed to identify before it was too late.
From this global impact, we see how important it is to ensure certificates don’t expire. SSL Certificates are small files that digitally bind cryptographic keys to an organization’s identity. It ensures confidentiality and integrity between systems. Unlike other services that can renew automatically, SSL certificates have a set expiry date.
Once they expire, services will stop functioning. From an end user experience, when browsing the internet, you will normally get a notification warning you that the certificate expired.
The impact to a business that lets certificates expire reduces trust from customers and services. “If this happened once, will it happen again?,” people might wonder. A loss in sales and revenue would follow, as would damage to the corporate brand and reputation, thereby putting the business at risk.
Was This Avoidable?
It’s without a shadow of a doubt that this latest outage could have been avoided. Ericsson identified that two software versions of the tool that managed the certificates had caused the issue, which should have been identified and rectified before this outage happened.
Companies should be checking what critical services are using certificates and make sure there is a documented and executed process in place to ensure certificates are valid. Common places are websites that use SSL signed certificates. If these expire, users are warned that the site is no longer secure and transactions may not be processed. If you are using solutions provided by third-parties, regular audits should identify the documented processes of certificate management.
Common frameworks such as ISO27001 cater to such requirements. Technology can also play a significant role. Solutions such as vulnerability management or policy and change detection solutions can help audit the estate to identify expired or expiring certificates.
You can learn more about how to manage certificate expiration effectively here.