At Tripwire, we have recently seen increasing interest from our customers in being able to match up file changes found by our products with threat intelligence that comes from a variety of external sources. We have run into a common issue here when we have gotten down to the implementation details, and so I write this post as a plea to all the new, emerging and growing providers of threat intelligence in the world:
We need multiple hash types for every file you are sharing intelligence about.
Let me start with some background. Threat intelligence can be used to mean everything from an IP address associated with malicious actors to sophisticated modeling of advanced threats. A lot of that intelligence may be network-centric data that is not immediately applicable to what you can see on a specific system, but just as often, a frequently encountered observable (a term you’ll start to hear more often as it’s part of the common threat intelligence vocabulary) is a file hash.
Is the file hash the end-all-be-all of systems-level threat intelligence?
Definitely not. File hashes are easy to change and so, it is very important that even if we start with them, we quickly move up the food chain of an attack into observing the more sophisticated indicators of the attack that are more difficult to change. But sharing threat intelligence is still very much in its infancy, and file hashes are a great starting point – they are easy to gather, easy to share, and repeated attempts to utilize an identical attack with the same binaries, malware, or other file content can be identified this way. If you want to start integrating threat intelligence into systems-level monitoring, this is where you start.
So, what’s the problem?
The use of file hashes has evolved with more sophisticated algorithms over the years, and that means we have lost a single accepted digest standard. From the early days of CRC32, to a long period of MD5 standardization, to the modern migration to SHA-x (SHA-1, SHA-256, SHA-512, and the next-generation SHA-3 undergoing standardization), this has been a changing area of cryptography. Without getting into which algorithm should be used, the history, and why, there is no debate that there are multiple hashes in use today in different environments.
Threat intelligence providers have similarly chosen different hashes to share as part of their threat intelligence feeds. There is no standard here that has been adopted that threat intelligence will use any particular hashing scheme, which becomes obvious when you start to pull in intelligence from different sources. I have seen different sources of intelligence that offer MD5, SHA-1, and SHA-256.
Now, if your reaction to a piece of intelligence is to go create file hashes from files, or you have a system in place that already is generating these file hashes and you want to compare that against your inbound intelligence, you have only one option: generate multiple file hashes for each file.
In a world that has not been built with many reasons to do this in the past, this usually is going to consume more time on a system – while it is possible to generate multiple hashes in a single pass through a file, that isn’t very common today. And even if that problem is solved, there is still multiple times the amount of data now stored and collected.
So, now we are incurring an increased monitoring cost to deal with this inbound threat intelligence circumstance. Is that the end of the world? No, but it isn’t ideal, it requires reconfiguring and increasing the use of a broad set of monitoring in place across thousands of companies and millions of systems, and there is an easier solution.
Threat intelligence providers should recognize this common circumstance, and instead of providing hashes in the format most convenient for the provider, instead provide the hashes in every format useful for a consumer. Today, that is:
- (And maybe SHA-512 would be a good choice for future-proofing)
Now the intelligence can be consumed into existing systems, no matter what is being used. Ultimately, that is good for the providers – easier and more ways to leverage that intelligence in monitoring, it is good for the consumers, and it is quite minimal work at the front-end of the intelligence sharing instead of requiring a lot more work on the back-end. That one file being shared only needs to be re-hashed once, instead of the hundreds of millions of re-hashes that would need to happen on consumer systems to accommodate different standards.
Perhaps the industry will end up on a single standard here, and this advice will be unnecessary in time. But for today, if you are sharing threat intelligence, and want to play nicely with the thousands of organizations out there that are interested in consuming in into their existing systems controls – consider taking action on this.
For those intelligence providers that are already doing this – Thank you!