Last week, I had great fun presenting a webinar with Mike Poor from Intelguardians on “Understanding And Mitigating Virtualization Security Risks in VMware ESX.” For me, the best kind of webinar are interactive, where there are a bunch of thought-provoking Q&A that are being fielded throughout the presentation. This webinar was definitely one of those. (BTW, you can access the recorded webcast here.)
We had so many interesting questions and comments that we couldn’t get to all of them. But the big surprise to me was how often the issue of whether it was safe to put production and test environments on the same ESX cluster came up. The unspoken context was whether the risks of VM escape and network isolation escape made this unwise.
(This isn’t VMware specific, of course. It applies to any VMM/hypervisor/etc.)
Mike and I were unanimous in our opinion, which was basically that if you have high value business assets that have security implications (e.g., confidentiality, integrity, availability) or compliance requirements (e.g., regulatory, contractual, financial reporting, etc.), then putting preproduction and production VMs on the same ESX server/cluster may be an example of living very dangerously.
But, it’s not for the reasons that you might first think!
1) Misconfiguration risk: The most significant security risk of putting production and preproduction on the same platform doesn’t come from the risks of VM escape or network isolation not working as designed. Instead, the risk comes from network isolation being misconfigured, or being accidentally disabled by the VM admin. Even if network isolation was perfectly implemented, you’re only a couple right-clicks away (or maybe even one semi-colon away) from accidentally disabling it and having no network isolation.
I can personally point to a couple of times in my own career when I’ve accidentally disabled huge chunks of functionality in databases, firewalls, operating systems and applications. Sometimes because I “just need to disable it for a second” but then forget to re-enable it, or sometimes just by accident (i.e., “all firewalls look the same”).Of course, that never, ever happened to you, right? :-)
2) Account administration risks: Because the administrative accounts and roles would span preproduction and production, there now exists the risk of accidentally allowing developers having the ability to add/modify/delete production VMs. Again, this creates another potential accidental route for developers to access production that may not have existed before.(See my previous posting on “The Sometimes Fun, But Scary, Risks Of VM Administrator Access”. The risks are around segregation of duty in both the accounts and account roles.)
3) Low cost to just get another server and ESX license for preproduction: If this IT service is really running a critical business process that has significant security or compliance implications, which have significant business consequences if we screw it up, then isn’t it worth spending $10-20K to get another server and the associated ESX license so that developers can have their own environment, safely partitioned from production?
4) Ease to migrate/promote preproduction to production: Because it’s so easy to copy and promote VMs into production with virtualization, there’s really not a valid excuse not to separate them, if there are genuine security or compliance risks.
(And if you’ve ever accidentally had two windows open, one to production and one to preproduction, and accidentally rebooted the wrong on, now you have your operational reason to separate them! What, that’s never happened to you? :-)
Conclusion: In short, as long as there’s the risk of human error and accidental misconfiguration, having all your data segmentation and partitioning relying on a software configuration settings functionality may present more risks than you think. And if they can be mitigated just by getting some more hardware and another ESX license, it’s a risk that there’s just not a really good reason to take.