In this sixth article in the CSM series, we will examine the Change Control use case scenario, where the first installment of this series provided a general overview of continuous security monitoring, and the second article explained how CSM can help your organization react better to threats.
In the third article, we discussed the challenges regarding full visibility into your environment, and the fourth article looked at classifying your network assets, and the fifth article in the series examined specific attack use cases.
The Change Control Use Case
Beyond detecting an attack, another key capability of using CSM for Change Control is to isolate any unplanned non-malicious changes to figure out why the change occurred outside normal change processes. You can also verify planned and authorized changes to close the operational process loop.
Before we discuss the data sources you need, we should mention monitoring frequency. As with the attack use case, the NIST definition — monitor as frequently as you need to — fits here as well. For highly critical devices you want to look for changes continuously, because if the device is attacked or suffers a bad change the result could be data loss.
As we mentioned under the attack use case, automation is critical to maintaining a consistent and accurate monitoring process. Ensure you minimize human effort, increase efficiency, and minimize human error.
To evaluate a specific change you will want to collect the following data sources:
- Assets: As we discussed above, you cannot monitor what you don’t know about; without knowing how critical an asset is, you cannot choose the most appropriate way to monitor it. This requires an ongoing — dare we say, ‘continuous‘— discovery capability to detect new devices appearing on your network, as well as a mechanism for profiling and classifying them.
- Work orders: A key aspect of change control is handling unauthorized and authorized changes. To do that you need an idea of which changes are part of a patch, update, or maintenance request. That requires a link to your work management system to learn whether a device was scheduled for work.
- Patching process: Sometimes installing security patches is outside the purview of the operations group, rather something the security function takes care of. Not that we think that’s the right way to run things, but not all operational processes are managed in the same system. If different systems are used to manage the work involved in changes and patches you need visibility into both.
- Configurations: This use case is all about determining differentials in configurations and software loaded on devices, and using that information to figure out whether you’re under attack or it’s an operational process problem. This requires the ability to assess the configuration of devices, and to store a change history so you can review deltas to pinpoint exactly what any specific change did and when.
- File integrity: Another indication of a change is when a system or other sensitive file changes. You should be able to pinpoint when a file is changed, by whom, and whether it’s authorized. We have always been fans of more data rather than less, so if you can collect device forensics, more detailed event logs, and/or network full packet captures — do that.
Unlike the attack use case, which shows more variation in how you evaluate alerts generated by the monitoring process, the decision flow for change control is straightforward:
1. Detect change: Through your security monitoring initiative you will be notified that a change happened on a device you are watching.
2. Is this change authorized? Next you will want to cross-reference any changes against the work management system(s) managing all the operational changes in your environment. It is important you link your operational tracking systems with the CSM environment — otherwise you will spend a lot of time investigating authorized changes. We understand these systems tend to be run by different operational groups, but to have a fully functional process those walls need to be broken down.
3. If authorized, was the change completed successfully? If the change was completed then move on. Nothing else to see here. The hope is this verification can be done in an automated fashion to ensure you aren’t spending time validating stuff that already completed successfully, so your valuable (and expensive) humans can spend their time dealing with exceptions. If the change failed for some reason, you need to send that information back into the work management system (perhaps some fancy DevOps thing, or your trouble ticket system) to have the work done again.
4. If not authorized, is it an attack? At this point you need to do a quick triage to figure out whether this is an attack warranting further investigation or escalation, or merely an operational failure. Context is important for determining whether it’s an ongoing attack.
5. If it’s an attack, investigate: If you determine it’s an attack you need to investigate. We dealt with this process in both Incident Response Fundamentals and React Faster and Better.
6. If it’s not an attack, figure out who screwed up: If you made it to this point the good news is that your unauthorized change is an operational mishap rather than an attack. So you need to figure out why the mistake happened and take corrective measures within the change process to ensure it doesn’t happen again.
One further clarification on the distinction between the attack and change control use cases. If you have only implemented the change use case and collected the data appropriate for it, then your visibility into what malware is doing and how broadly it has spread up to this point will be limited.
But that doesn’t mean starting with change control provides no value for detecting attacks. An alert of an unauthorized change can give you a heads-up for an imminent issue — you just may not have the data to fully investigate it.
The entire point of any monitoring initiative is to make better decisions on what needs to be done and how to allocate resources. First let’s take in-process attacks off the table — they were covered in the attack use case, and obviously take priority over pretty much everything else.
So how do you determine whether it’s an attack? Look at this in terms of attack surface. Does the change make the device easier to attack or control? If so it is effectively an attack. Some operational failures result in increased attack surface and so should be handled as attacks, even if the actor wasn’t malicious.
This focus on attack surface takes intent out to enable simpler and more objective analysis. An innocent operational failure that increases attack surface isn’t any less of a problem than a malicious action.
The device is more exposed than it was before the change and needs to be remediated. That’s why we favor the attack use case as the basis for security monitoring, with a simplification to deal with change control and compliance.
In case of an operational mishap you have a further decision to make: when to roll it back. That depends on the nature of the change, the criticality of the device, and whether the rollback can be automated.
For changes that don’t increase attack surface there is less urgency to roll back, unless the change broke an application or otherwise impacted availability. So operational mishaps can be put back into the stack of work and processed according to the other operational processes managing workflow in the organization.
An innocent operational failure that increases attack surface isn’t any less of a problem than a malicious action.
In the next article in the CSM series, we will examine the compliance use case… Stay Tuned!
Editor’s Note: This post is a series of excerpts from the Continuous Security Monitoring whitepaper developed by Mike Rothman of Securosis, and was developed independently and objectively using the Securosis Totally Transparent Research process. The entire paper is available here.
About the Author: Securosis Analyst/President Mike Rothman’s bold perspectives and irreverent style are invaluable as companies determine effective strategies to grapple with the dynamic security threatscape. Mike specializes in the sexy aspects of security — such as protecting networks and endpoints, security management, and compliance. Mike is one of the most sought-after speakers and commentators in the security business, and brings a deep background in information security. After 20 years in and around security, he’s one of the guys who “knows where the bodies are buried” in the space. Mike published The Pragmatic CSO in 2007 to introduce technically oriented security professionals to the nuances of what is required to be a senior security professional. He can be reached at mrothman (at) securosis (dot) com.
- Defining Continuous Diagnostics and Mitigation
- Managing the Complexity of the Attack Surface
- Prevention and Detection Strategies for Backdoors and Hardware Attacks
- Leveraging the Windows Registry in Digital Forensics Investigations
The Executive’s Guide to the Top 20 Critical Security Controls
Tripwire has compiled an e-book, titled The Executive’s Guide to the Top 20 Critical Security Controls: Key Takeaways and Improvement Opportunities, which is available for download [registration form required].
Title image courtesy of ShutterStock