In my previous article, we discussed how organizations are shifting how IT resources are deployed and managed. We covered three methods in particular: automated image creation and deployment, immutable image deployment and containers. We’ll now explore how organizations can make the best of these methods in a dynamic environment.
Dealing with Change when the Targets are Moving
In a dynamic environment, the assets that you’re monitoring are changing often. One Tripwire customer is onboarding and offboarding approximately 800 systems every day! You would never see that much system churn 20 years ago when Solaris ruled the datacenter and when systems were slowly rolled out and ran multiple applications for years at a time. Today, systems may go into production for just a few hours before they are destroyed. But even for those few hours, they must be monitored, and we need to ensure that they were configured correctly when they started (SCM checks) and that no serious vulnerabilities are present (IP360 Check). The IP360 Axon agent will start a check of a system the first time it comes up. The Tripwire Enterprise Axon agent can baseline and do an SCM check when an agent comes up by notifying the TE Console that it’s there. At that point, classification and baselining can begin. This type of asset management and getting monitoring setup right away requires automation. When new assets come into the system, they must be immediately classified:
- What type of system is this?
- What application(s) is it running?
- What policy does it follow? (CIS, SOX, PCI, DISA, etc)
- Who gets alerts?
- And there are others….
Once classified, Tripwire can apply tags to the asset (“node” in Tripwire terminology). Once tagged, the Tripwire Rules that apply to each area can be baselined, thus allowing monitoring to begin. Once monitoring has begun, Tripwire can alert you to changes and initiate a new vulnerability scan when a file is changed on the system. An environment that is using immutable OS images generally have time frames that may be too short for running scheduled checks, and notifications should happen immediately if a change is detected. A change is an incident. So, when the environment requires it, real-time change tracking can be turned on for some or all of the files being monitored at the on-board time, as well. Immutable images are often destroyed. Whenever a change needs to be made to an image, a new version of that application image is created – updates and all. The running image is destroyed, and the new image is created and should be tested prior to going into the image store and started (as stated above). Changes to the image itself are usually tracked in GitHub or an equivalent repository. When the image is started, it is again tested to ensure the configuration is correct, no vulnerabilities are present and the files that make up the image match the expected files in GitHub. All of this testing and confirming must be done automatically. If anything has changed between the deployment of the image and the time the image is run, an alert should be raised. The reason you want confirmation of the image is that an image in a repository could be modified by a process other than your usual CI/CD pipeline, and you need to confirm that the situation doesn’t apply before running the image. For the images that are destroyed, the asset needs to be cleaned up in the Tripwire console. Any change information should be stored in a long-term repository (Splunk or QRadar, etc). In this way, old information that is no longer relevant to your running environment isn’t filling up your Tripwire database. This process is called off-boarding. In off-boarding, the asset is determined to be terminated (check with a CMDB, a Cloud Mgmt Console or time-based mechanism) and deleted from the Tripwire Console.
Reporting on Dynamically Changing Assets
When the way in which assets are deployed, used and updated moves away from systems that are running for months and months at a time to systems that exist for only a few hours, reporting needs to change, as well. When static servers predominated, SCM and change management over time were a great way to measure how well your program was doing. Trending and remediation metrics were also important. When remediation now means updating a code base and never updating the running instances, what exactly are you reporting on? The running systems are all in a standard state, and you don’t update them directly. Vulnerabilities or SCM failures go right to the DevOps team to fix in their code so they can rerun the pipeline. (Failures can go right into JUnit for instance.) Does management need to know about failures that are fixed before they go into production? The risk/compliance teams really just need a view of the current system images:
- Are there any vulnerabilities/SCM failures in running images?
- Are there any vulnerabilities/SCM failures in the image store but not running?
- How long have they been there since reported to DevOps teams?
Things of interest that you will want to know over time:
- How many vulnerabilities were found in the past week/month/quarter during release and fixed? (A way to show this to the Dev/Ops team is to keep things clean and actually find these before they go into production.)
- How many SCM failures were detected in systems when they are instantiated over the past week/month/year?
- Which applications have the most failures detected?
In the end, shifting left will make you more secure and reduce the number of vulnerabilities hanging out in your environment. Doing the shift-left with ITSM in mind, still controlling change, doing verification steps and maintaining separation of duties will ensure that security holes and problems don’t creep back into your systems through neglect and an over-reliance of trust in the development team.