Trust Is Not A Control (And Neither Is Luck): Critiquing The Fannie Mae Critiques
One of the best things I’ve read lately was “Change Controls: Ur Doin It Rong” article by Hal Pomeranz. Hal Pomeranz wrote this after he read the FBI affidavit describing how Rajendrasinh Makwana, a former consultant at Fannie Mae, allegedly planted malicious code on Fannie Mae’s servers after he had been terminated.
What made this article so interesting was that Hal pointed out something that’s seems to often be a blind spot for information security. This risk is often hidden in plain sight, poses a genuine clear and present danger to the business and information security objectives, and one that is often overlooked.
This issue is change control. Hal writes very convincingly the following:
The information in the FBI Agent’s affidavit that really made me sit up and take notice was the following:
“On October 29, 2008, SK, [a Fannie Mae] senior Unix engineer, discovered malicious script embedded within a pre-existing, legitimate script… It was only by chance that SK scrolled down to the bottom of the legitimate script to discover the malicious script.”
In other words, Fannie Mae got very, very lucky here. What I want to know is why Fannie Mae had to trust to luck to detect an attack that (again according to the FBI affidavit), “would have caused millions of dollars of damage and reduced if not shut down operations at [Fannie Mae] for at least a week.”
When a bunch of us were discussing Hal’s provocative article on Twitter, there were some interesting rebuttals and conclusions. However, some of the conclusions just didn’t sit right with me. Such as, “Humans as the best IDS (intrusion detection systems)!” and “That system administrator is a hero!” These statements are all partly true, difficult to disagree with, but can lead to strange conclusions. Something didnt’ seem right to me.
I picked up the phone and called Hal, who I’ve known since 1999, but we haven’t talked in about a year. (Thank you, Twitter.) And as we deconstructed the arguments, slowly, I started to understand what was bothering me, and why some of these critiques are just wrong — or just need to be verbalized more precisely.
It boils down to: preventing and detecting failures (whether operational or information security) can’t be the responsibility of the individual. Instead, it must be the responsibility of the institution.
Just as trust is not really a control, neither is luck.
It’s difficult to disagree that humans are the best IDS (Hal and I discussed that the Cliff Stohl’s famous “Cuckoo’s Egg” story started when a penny discrepancy in a LLBL timesharing system prompted him to ask, “That’s funny. Why is it off by $0.01?” That led to discovery of a genuine espionage operation.). And it’s difficult to argue that the outcomes of heroism is better than if there were no heroism.
But you can’t rely on trust or luck.
So, what would the thought process be to create or verify that we have an effective control environment, where the responsibility is in the institution, not an individual getting lucky? I would think it would go something like this:
- We learn that certain IT services are required to be operating in order to conduct some critical business operations (e.g., 4000 servers running some mission-critical application)
- We identify that a key risk is that the IT service not being available (e.g., due to failure, sabotage, human error, etc.), causing business disruption
- We ask “what could go wrong in that IT service to cause that event?” For instance, it may include:
- Environment failure causes loss of functionality (e.g., power failure, external network failure)
- Application or infrastructure change causes incorrect or loss of functionality (e.g., config file changing, new application release)
- Malicious change or sabotage introduced (e.g., script added)
- We flip these risk statements around to craft our control objectives: let’s focus on change control
- All changes are implemented following a change management process that identifies unauthorized or untested changes that are deployed into production:
- We then design the preventive and detective controls to achieve the control objectives
- Preventive: we have a change management process that enforces authorization and testing requirements
- Detective: All production changes are detected, which management reconciles to authorized changes (here’s where Tripwire fits in)
- Corrective/Deterrent: Management takes decisive action when unauthorized or undocumented changes occur, ensuring “tone at the top” and accountability
(Remember: After benchmarking over 1000 IT organizations, the ITPI research shows that “detecting changes” and “defined consequences for intentional, unauthorized and undocumented changes” are two very accurate predictors of IT and information security performance! Article from SEI/CMU here.)
Now, that’s a control environment that would make IT operations, information security and auditors happy!