My cohort Gene Kim and I were in San Francisco meeting with a number of journalists yesterday. We were discussing trends, virtualization, IT risks, and things of that sort and everyone wanted to know about what new risks, threats, etc. were going to be a problem in the next year.
Everyone wanted to know about the next big thing, but the fact of the matter is that most of our IT problems will come from little things, and not some big “happening.”
The things that will bite us will be the things that always bite us. And they are all rooted in what I like to call “IT Hygiene.” Here are some examples:
Everyone is worried about the bad guys. Yes, they are out there. But what we need to pay attention to are the misconfigured bits of gear (servers, firewalls, laptops, etc.) that allow the bad guys in.
Misconfiguration can include fat-fingered mistakes, unpatched systems, systems not reviewed according to your policies, inconsistent configurations, and more.
The bigger issue is that most IT departments don’t have a systematic way to figure out what’s misconfigured. The result? Risks aren’t known, much less managed.
Too many surgeons, too many scalpels
Most organizations we’ve benchmarked have serious problems with Segregation of Duties (SoD). People with too much access for their role, shared administrator accounts, inadequate review of access lists, too many people with production access, etc.
Gene likens it to an operating room where everyone has a scalpel – not a recipe for a successful patient outcome.
People who don’t know what’s expected
If you don’t have documented policies, shame on you. You will get the wild, wild west you’ve created for yourself.
If you have documented policies but nobody knows about them, shame on you again. How can people adhere to a policy they are unaware of? At home, I sometimes discover my wife’s expectations only after I’ve failed to meet them in some way.
That kind of violation-driven training is not an effective way to communicate IT policy.
It’s best to have documented policies, consistently communicated, and supported by technology (workflow, controls, automation, common “runbooks” and so forth) that make it easy to do things right, and harder to do things wrong.
From a user perspective, this is important as well – do your desktop users know what to do if they think their system is infected? Do they know what a phishing email looks like? Would they click on a bogus “Your computer is infected” popup? Educate them.
People who get away with not following the rules
If you have rules but there are no consequences for breaking them, your rules will not be effective. If you need to get tougher about enforcing your rules, there is an ordered way to manage through this, typically with a three-strikes kind of model:
- First violations are treated as a coaching opportunity, to help people understand how things should have been done and educating them on where to go to ensure employees understand the rules that govern them.
- Second violations should receive some kind of disciplinary action – a notation in a review, perhaps some time doing “grunt work,” or some other kind of un-fun thing.
- Third violations mean the employee isn’t responding well to coaching, and should be moved into a role that prevents them from making changes to your infrastructure (and that role may be in another company, if the offense is bad enough). Think of it as taking the keys away from a teenager who’s demonstrated he can’t behave responsibly with the family car.
Poor understanding of risks and how to manage them
This is a particular blind spot with the early adopters we’ve seen. They jump into the “next big thing” technology without understanding the risks to the organization. And you’ll have a hard time managing the “unknown unknowns” of IT risk.
You might look before you leap, and spend time researching and testing the heck out of your new technology to help identify the risks. Or, you can always wait a while until someone else has discovered and documented the risks.
Inability to tell when any of the above are happening
Many of the issues in the list above happen because people don’t have enough visibility, expertise, or situational awareness to detect that they are happening.
An informed, trained IT staff is vital but they must have a backstop of IT controls that are in place and effective, and management that will hold them accountable. If you want to know how to do that, Gene’s written books on how to make that happen, in the form of The Visible Ops Handbook and Visible Ops Security.
There is hope…
I’m sure I can come up with more examples, but I think that’s enough for now. Think about the fact that the little things can (and probably will) cause you more problems than a big event in the next year. Since most of these are rooted in human behavior, that is likely the way it will remain for the foreseeable future.
Incidentally, there is light at the end of the tunnel. You see, in addition to writing those books I mentioned, Gene founded a company built around providing the means to help you provide systematic audit and control of these aspects of your systems so you can establish and maintain an operationally effective, compliant, and secure infrastructure so you can keep things running while controlling risks to the business.