I was just reading an article about how an updated set of virus definitions from McAfee took down a lot of corporate systems running Windows XP SP3 today.
It seems the update causes a nasty “endless reboot” cycle, and requires hands-on assistance from support to correct the problem. Ouch – that’s gonna leave a mark…
But I’m not here to pick on McAfee – mistakes happen in all software, and any time you deploy new software on millions of PC’s that are at the mercy of the person at the keyboard, you’re likely to encounter some problems.
No, I’m here to talk about the tendency to blindly catapult automatic updates into herds of systems without testing the updates first. In other words, automation.
Automation, in itself is not a bad thing. However, automation is dangerous when it’s used without discipline. Some of the attitudes I’ve observed around automation sound something like this:
- “I understand automated deployment is risky, so I test the software I deploy on a representative sample of my systems before doing large-scale automated deployment. Even then, I do the push in batches so I can contain the work if I run into a problem.”
- “I understand automated deployment is risky so I test the software I deploy on a few systems before I push the software to the rest of my systems.”
- “Automation is a real time-saver! Push, push, push!”
Of course, the tendency to blindly push new software out into the environment can be exacerbated by how lots of IT shops measure “success” – based on activity. Have you ever had an objective that sounds like “Deploy all patches within 24 hours of availability” in your MBO’s?
We often automate “In the name of security” but that seems like it’s taking the easy way out, to me. I’d rather see people analyze the risk vs. benefits of any change, including new software rollouts and patches, do more than cursory testing, and ensure that a recovery plan is in place in case the patch hits the fan.
I’m curious: how do these things look in your organization?