What is cyber resilience? If you search the definition within the Oxford Dictionary, resilience alone is defined as “the capacity to recover quickly from difficulties; toughness.” If you narrow the definition down to cyber resilience, it shifts to maintaining vs recovery. As noted on Wikipedia, it becomes “the ability to provide and maintain an acceptable level of service in the face of faults and challenges to normal operation.”
I spoke with Matt Torrens, the COO at Sprout IT, regarding resilience. He gave me the following definition: “A true cyber resilience approach blends protection, detection, response and recovery to form an organization-wide, collaborative strategy.” As part of this definition, all three elements of the cybersecurity triad—confidentiality, integrity, and availability—are vital to an organization’s resilience. Resilience is essentially a holistic approach to preparing for, responding to, and recovering from an incident.
Here are some additional thoughts from Matt:
To protect businesses from cyber threats, we must first be able to recognize risks (combining threats and vulnerabilities) and go on to define solutions to help manage those risks. Response and recovery plans may then take many different forms but should always have the aim of enabling the organization to rally with minimal financial or reputational damage. When it comes to cyber security, in general, organizations across all sectors still tend to emphasize protection over response and recovery. While in the last few years, cyber insurance has become more commonplace, many organizations have still not considered how they would respond to a major attack at all.
From my experience, cyber resilient organizations are ones that put the thought into planning, explicitly record decisions and alignments within their risk register, and consistently carry out testing to validate that these decisions are accurate. Cyber resilience requires ongoing dedication for when disaster strikes, be it a malicious actor, human error, or even natural disasters that are responsible. The organization is able to maintain at least minimal services and recover to full operations without completely diminishing their resources.
Organizations that are looking to enhance their cyber resilience can begin by working within three areas: preparing against, responding to, and recovering from incidents.
Area 1: Prepare for incidents
- Principle of Least Privilege (PoLP): Simply put, this step consists of providing the access that is required and restricting all else. Feel as if you have heard this over and over again? Unfortunately, whilst vital to resilience, PoLP is often entirely missed. It might sound logical and simple, but the challenge is that in order to do it properly, an organization must understand their employees’ roles and responsibilities completely along with what each transaction requires from systems, services, and persons.
- Prioritization of assets: On top of understanding employees’ roles and responsibilities, an organization has to prioritize each part. Assets include data but also departments and persons. Consider if the office must be relocated. Which department will be moved first? Which team must be settled in and back online first to maintain operations, and further, what teams aren’t business critical in this reduced scope? When able to return to the office, which department do you move first to validate this? These are the lowest priority, not the top, because you want to reduce any impact to operations.
- Depth of controls and testing: Security and privacy are not switches that you turn on and then walk away from. They’re long-term goals that require consistent testing, a holistic involvement and training. By implementing controls based on the identified risks of the organization’s threat map and then validating via a variety of testing, an organization can strengthen their responses and identify gaps via things like red team exercises, simulations, table-top exercises, and disaster recovery scenarios.
Area 2: Respond to incidents
- Incident handling: When incidents happen, a resilient organization will respond with a strong team and rehearsed approach. This allows for faster response, reduction in cost, and possible mitigation of further damage. Organizations need to know what to do, have the controls in place to provide required intelligence, and be aware of what further capabilities they can use to effectively respond to an incident. It’s impossible to achieve this type of response by simply implementing policies and procedures. The response team will need to practice and simulate likely incidents and therefore know what to do and have the tools identified in part
- Third-party response: At times, an organization is unable to respond with only their in-house team. A third-party incident may require them to work across teams, for instance. As a part of the preparation phase, teams will have likely practiced executing this type of response; however, when responding to an incident, the organization must have existing contracts and documentation to share with these teams. Essentially, the pre-existing relationship can provide a massive benefit. In the situation where it does not, having the right documentation and controls in place to investigate will also help.
- Notifications: During an incident, the organization will be required to make the decision on when and how to notify internally and externally. In some situations, there are legal requirements such as notifying the ICO within 72 hours within GDPR, which adds an additional layer of complexity. Many organizations I have interacted with are often torn between notifying the public and receiving the negative response while retaining the transparency that consumers value. Further still, there are some organizations that either weren’t given the ability to choose the date or were notified externally of a breach. For me, providing notification early on as well as in an appropriately worded and transparent way is a massive benefit when attempting to salvage one’s reputation following a security incident.
Area 3: Recover from incidents
- Returning to Business as Usual (BAU): Some incidents require office relocation while others require an organization to purchase new hardware and software licenses or even hire full-time employees in a long-term or even permanent position. Whatever the requirements, a vital piece of returning to BAU means that the organization must firstly know what business as usual looks like for its particular case. It must also determine whether it has the financial resources necessary for recovery. All resources have a finite value, but a budget runs out, and the workday has only so many hours in it. When validating the business capability to respond to an incident, has the organization actually considered their return to BAU effectively?
- Lessons learned: Incidents happen, and even practiced teams make mistakes and find failures or gaps within their processes. A resilient organization addresses these shortcomings through formal analysis. Team members must feel safe to identify failures of themselves, of others, and of processes in order to better the resilience posture overall. Towards that end, it’s important for leaders to document and then follow up on actions taken. They should also utilize recent events and knowledge to empower the workforce and make a more effective solution for the problem at hand. They need to set the tone.
- Remediation: This is my favorite piece of my industry, especially when dealing with real-life incidents, for the quality of requirements exists and is hopefully documented. Remediation isn’t looking at the bare minimums or throwing money at a problem in the hope it will go away. Remediation is about identifying gaps within the organization and dealing with them. Is it a lack of holistic view of security? Is it never having a proper risk assessment done? Is it a skills gap within your existing teams? Is it no SIEM or a lack of effective logs, etc.? Remediation takes on a variety of hats, from enhancements made by the defensive side, to validations performed by the offensive team, to leaders refreshing the business processes and teaching non-technical teams how to protect themselves.
If resilience is the capacity to recover quickly and cyber resilience is about maintaining an acceptable level of operations within a challenge, then resilient organizations must prioritize a holistic understanding of their people, processes, and technology. They must then effectively document and continuously validate their processes throughout the life cycle. Resilient organizations do this in order to withstand, respond to, and recover from security incidents when the time comes.
Editor’s Note: The opinions expressed in this guest author article are solely those of the contributor, and do not necessarily reflect those of Tripwire, Inc.