Skip to content ↓ | Skip to navigation ↓

Today’s post is all about Control 14 of the CSIS 20 Critical Security Controls – Maintenance, Monitoring, and Analysis of Audit Logs (the last post pertained to Control 13).  Here I’ll explore the (29) requirements I’ve parsed out of the control (I used the PDF version, but the online version is here) and offer my thoughts on what I’ve found [*].

Key Take Aways

  1. Enable (centralized) Logging. QED.
  2. Review Logs Regularly.  And, not just “look” at them.  Use a tool to help you make sense of it all – this isn’t the realm for manual processes.  Find vendors that offer content out-of-the-box or that can be easily customized for your specific needs.  Remember this: Tools should be force multipliers for you.  That means that they should help your security analyst get the work of several done in the same time period.
  3. Take The Time.  Getting this right is important, but if you take the time to get some of the other Controls in place (i.e. Controls 1, 2, 3, and 10) and you’re using “gold” images/configuration files, then you’re actually in a really good place.  Just make sure the “gold” information is appropriately configured, and that part of your asset deployment process includes validating centralized logging is taking place.

Potential Areas Of Improvement

  1. Just Some Minor Cleanup. All things considered, I feel that this is a decently written control.  It doesn’t seem to stray too far from what I expected it’s target subject matter to be, and it is, for the most part, clear (there are a few areas that could use some crystallization).
  2. Oh, and Metrics Clarification.  I’d like to see some clearer metrics.  I’ll be the first to admit that I am not a statistician or a “metrics guy” by trade or training.  But I have been learning.  The metrics given throughout these Controls are probably intended to suggest that the controls are, in effect, working – that they’re doing some good.  But, specific measures and the sample size from which the measures are taken matters very much to the ability to generalize to the entire enterprise.  We have resource constraints, that’s true, but if the sample size is too small, your conclusions will be wrong at best.

Requesting Feedback On

  • Requirement 11
  • Requirement 19

Requirement Listing

  1. Description: Each organization should include at least two synchronized time sources (i.e., Network Time Protocol or NTP) from which all servers and network equipment retrieve time information on a regular basis so that timestamps in logs are consistent.
    • Notes: If you use one internal and another external, be sure the one you use internally doesn’t reference the same external NTP source the others are using. A ‘regular’ basis is something that you’ll need to determine for your organization. What you need to be concerned with is your ‘drift’ tolerance. Default tolerance for Kerberos in Windows is five minutes. Consult benchmark sources for specific guidance.
  2. Description: Validate audit log settings for each hardware device and the software installed on it, ensuring that logs include a date, timestamp, source addresses, destination addresses, and various other useful elements of each packet and/or transaction.
    • Notes: Don’t limit yourself to network devices, which might reasonably be the first thing that comes to mind when you see ‘hardware device.’ At least one of the ‘various other’ useful elements should be any user-specific and/or application-specific information. The more context you have around what generated a given record the better picture you’ll have going forward.
  3. Description: Systems should record logs in a standardized format such as syslog entries or those outlined by the Common Event Expression initiative. If systems cannot generate logs in a standardized format, log normalization tools can be deployed to convert logs into such a format.
    • Notes: This one is particularly interesting to me because I have been somewhat involved in the CEE efforts. I believe that standardized, logical log formats should be embraced with one or more syntactical bindings (i.e. XML, JSON, something else). If you doubt the benefit of this, ask your SIEM guys what a pain in the rear normalization rules can be – how much time they have wasted on them – and you’ll get a better idea.
  4. Description: Ensure that all systems that store logs have adequate storage space for the logs generated on a regular basis, so that log files will not fill up between log rotation intervals.
    • Notes: This is a basic requirement, but it’s one that can easily be overlooked. Operationally speaking, you should ensure not only that you have enough space at the outset of the asset’s lifecycle, but throughout. As time moves on, it’s easier to fill up space we thought we’d never use.
  5. Description: The logs must be archived and digitally signed on a periodic basis.
    • Notes: At least archive with a well-defined process and paper trail. It may not always be feasible for you to digitally sign – when you decide to do this, you’ve got key management issues with which to contend. Signing sounds great, but there are other ways to ensure the integrity. Evaluate the methods that are right for you.
  6. Description: Develop a log retention policy to make sure that the logs are kept for a sufficient period of time. As APT (advanced persistent threat) continues to stealthily break into systems, organizations are often compromised for several months without detection. The logs must be kept for a longer period of time than it takes an organization to detect an attack so they can accurately determine what occurred.
    • Notes: Some organizations will have log retention policies prescribed to them, at least for certain systems. If you believe your organization is prone to advanced attacks, then you might want to retain logs from a wider set of your assets for a longer period of time (it depends on how slow and low you believe your adversaries to be, and a balance between that and e-discovery requirements).
  7. Description: All remote access to a network, whether to the DMZ or the internal network (i.e., VPN, dial-up, or other mechanism), should be logged verbosely.
    • Notes: My assumption on this requirement is that they mean ‘remote’ with respect to the entire perimeter of the organization. One could argue that, given two internal segments A and B, accessing segment B from segment A could be considered ‘remote’ from B’s frame of reference.
  8. Description: Operating systems should be configured to log access control events associated with a user attempting to access a resource (e.g., a file or directory) without the appropriate permissions.
    • Notes: This is a fairly standard requirement that should be reflected in most benchmarks. Some of the more stringent benchmarks will tell you to additionally log successful attempts to access objects within the system. The rub here, however, is in your definition of the term ‘resource.’ The requirement gives an example of file or directory, but what about named pipes, mutex’s, sockets, and other system resources?
  9. Description: Failed logon attempts must also be logged.
    • Notes: As with the previous requirement, this is fairly standard practice, and some benchmarks will have you log successful attempts as well.
  10. Description: Security personnel and/or system administrators should run biweekly reports that identify anomalies in logs. They should then actively review the anomalies, documenting their findings.
    • Notes: For my tastes, this isn’t often enough. With the right toolset in place this should happen at least daily and with automation, such that red flags are brought to the attention of the administrator.
  11. Description: Network boundary devices, including firewalls, network-based IPS, and inbound and outbound proxies, should be configured to verbosely log all traffic (both allowed and blocked) arriving at the device.
    • Notes: What’s the difference between this requirement and requirement 7? Is all of this presumed to be happening below layer 7 and the previous requires full session verbose auditing?
  12. Description: For all servers, organizations should ensure that logs are written to write-only devices or to dedicated logging servers running on separate machines from hosts generating the event logs, lowering the chance that an attacker can manipulate logs stored locally on compromised machines.
    • Notes: Centralized logging is awesome, but don’t forget about the contingency of having that connection down for a time. Ensure that your design has a mechanism in place to write-only to local storage then pick up sending events to the central location at a later time. This is another reason file integrity management and configuration management software that continuously monitor all of your servers are critically important – you’ll know when a write-only file changes.
  13. Description: Deploy a SIM/SEM (security incident management/security event management) or log analytic tools for log aggregation and consolidation from multiple machines and for log correlation and analysis.
    • Notes: Be very prepared, if you do this, to spend human resources on operations. If you’re a small shop, you might consider finding an outsourced provider to do this for you, because this is, with the present state of tools in the market, not an inexpensive operation. Tools are getting better each year, and new categories of tools are being invented to help make this type of detection easier. So, pay attention to the marketplace whether you decide this is right for you or not.
  14. Description: Using the SIM/SEM tool, system administrators and security personnel should devise profiles of common events from given systems so that they can tune detection to focus on unusual activity, avoid false positives, more rapidly identify anomalies, and prevent overwhelming analysts with insignificant alerts.
    • Notes: Do not do this alone. If you by a SIM/SEM (we call it SIEM), you should rely as much as possible on your vendor to provide high-quality, out-of-the-box profiles for you. SIEMs and audit logging solutions should be ‘business aware’ to help you then customize the default profiles to your specific needs. In this realm, as with configuration management, content really is king.
  15. Description: Carefully monitor for service creation events. On Windows systems, many attackers use psexec functionality to spread from system to system. Creation of a service is an unusual event and should be monitored closely.
    • Notes: The first question I had here is: Why is this labeled as ‘advanced’ in the Control document? To me, this is something that most benchmarks recommend, even for the ‘lightly’ secured systems. The last sentence of the requirement really should make this something you do sooner rather than later.
  16. Description: The system must be capable of logging all events across the network.
    • Notes: The only way you’re going to test this is correlationally. There’s no way to prove it outright, as the tests below (see ‘requirement’ 21) indicate. What you should do is ignore the recommendation in the Control Test 21 and instead ensure that you have an adequate sample size in your test to make a realistic statement about your entire population. Find someone in finance who can help you with the statistical end of things if you don’t grok this stuff.
  17. Description: The logging must be validated across both network-based and host-based systems.
    • Notes: Self-explanatory. But, I don’t like the terminology. I’d much prefer the asset hierarchy in NIST’s Asset Identification specification to be used by this Control Framework, such that all it-assets are validated. If you’re creating a more comprehensive operational risk management program, then you could add non-technical logging as well (i.e. sign-in sheets, etc.)
  18. Description: Any event must generate a log entry that includes a date, timestamp, source address, destination address, and other details about the packet.
    • Notes: As with requirement 16, ensure your sample size is adequate enough to allow appropriate inferences. If your sample size is too small, you’re not going to be able to infer that ‘all’ of your log entries contain these things.
  19. Description: Any activity performed on the network must be logged immediately to all devices along the critical path.
    • Notes: What, exactly, is the ‘critical path,’ and what are ‘all devices’ sitting on it? I honestly don’t know. Any activity is a LOT of activity, and the implication is that log entries generated by these activities are to be sent to multiple destinations. To me this is unclear. Anyone have anything they care to comment on this?
  20. Description: When a device detects that it is not capable of generating logs (due to a log server crash or other issue), it must generate an alert or e-mail for enterprise administrative personnel within 24 hours.
    • Notes: This timeframe seems absurd. Notify immediately or ASAP (within, say, 5 minutes). If the system is critical, shut down gracefully and securely until the situation is rectified. If the system is not critical, a proper risk assessment will tell you what to do (i.e. continue operating for X period of time, shut down appropriately, or modify system behavior in some other way).
  21. Description: To evaluate the implementation of Control 14 on a periodic basis, an evaluation team must review the security logs of various network devices, servers, and hosts. At a minimum the following devices must be tested: two routers, two firewalls, two switches, 10 servers, and 10 client systems.
    • Notes: No notes other than to ensure appropriate sample size.
  22. Description: The testing team should use traffic-generating tools to send packets through the systems under analysis to verify that the traffic is logged.
    • Notes: No notes other than to ensure appropriate sample size.
  23. Description: The evaluation team must verify that the system generates audit logs and, if not, an alert or e-mail notice regarding the failed logging must be sent within 24 hours.
    • Notes: No notes other than to ensure appropriate sample size.
  24. Description: It is important that the team verify that all activity has been detected.
    • Notes: No notes other than to ensure appropriate sample size.
  25. Description: The evaluation team must verify that the system provides details of the location of each machine, including information about the asset owner.
    • Notes: No notes other than to ensure appropriate sample size.

Other Controls Reviewed In This Series

Footnotes

A method and format explanation can be found at the beginning of Control 1.

Editor’s Note: This article was written by a former contributor to The State of Security who now resides with a non-profit group with an excellent reputation. We thank him for his opinions and perspective, and wish we could acknowledge him directly for his outstanding efforts on this series.

Related Articles:

 

P.S. Have you met John Powers, supernatural CISO?