Skip to content ↓ | Skip to navigation ↓

On Network World’s Microsoft Subnet, there is a very solid article on “9 Myths of Microsoft Virtualization -  Busted or Confirmed."  It’s actually an interview with Microsoft’s Mike Neil (general manager of Microsoft virtualization) and it is a fun read.

One of the myths I was very interested in was this one:

No. 3. VMware says that its memory overcommitment feature actually makes its wares cheaper in production environments in terms of total-cost-of ownership than Microsoft’s products (and Xen Server, too). Microsoft (and several users I’ve talked to) say this is a myth … although I’ve also heard that Microsoft is working on a similar feature. Is the "memory overcommitment" a myth and if so, why?

I got even more interested when I read Mr. Neil’s response:

So first off, how many IT pros configure their production servers to overcommit anything? Customers want an SLA and they want to know what resources are being consumed by a VM. Memory costs continue to come down and the number of DIMM sockets are going up, making this argument moot. We are focused on the efficient use of resources and using those resources dynamically — pooling the memory of the whole machine and dynamically balancing the memory between all of the VMs, instead of overcommiting a resource that can lead to bottlenecks. So, you can see the caveats on using overcommit in a production environment. As to Microsoft’s plans for new memory, we don’t look at it as "overcommit" we look at it as "dynamic memory." We want to provide the same benefit without the risk. Watch for future details.

So – what’s the real deal?  Hey you VMware practitioners (read that as “paying customer practitioners):  How big a deal is memory overcommit to you?  And (be straight with me) how much do you actually use it in production?

Hacking Point of Sale
  • Shawn E

    How about High Availability for a real use case?

    In a small 2 host environment, (which Microsoft is going after) in order to provide HA for all VMs you could only utilize half the memory on each host or else you could not protect all your VMs.

    3 hosts, 33% headroom, 4 hosts, 25% and so on – simple math.

    In the Vmware world, with 'over commitment' you don't have to account for this. Ballooning and page sharing will give you the extra memory headroom required if a host goes down. (and DRS will worry about proper placement as well) This doesn't mean you can fill each host past 100% and maintain HA, but its definitely higher than 50%

    Just another case of Microsoft downplaying a feature they don't have.

    What else is new?

  • Hi, thanks for bringing this to our attention.

    So, about the “Memory Overcommitment” thing, why do we always think that it has to be applied in fully-mission-critical-production CIA/NASA environments ?!

    What if you have, for example, VMware Lab Manager, and running it in “production” to facilitate to your IT department all this R&D and stuff, isn’t that an important and yet production system in your infrastructure? Sure it is, now, don’t you want to have this “flexibility” and “relaxing” in memory overcommirment ? we are not going to over flood our hypervisors with VMs all over the place, but we indeed are looking for some cost savings and utilizing our recourses at the best.

    We do have this scenario in our environment, and I definitely see this as an important feature that VMware must keep, and Microsoft must have in Hyper-V 2.0

    Regards

  • Just proof that MS doesn't get it.

    "So first off, how many IT pros configure their production servers to overcommit anything?"

    Uh, when you virtualize you overcommit everything. You are running more vCPUs in your VMs than you have physical CPUs. You have more virtual NICs than physical NICs. You have more virtual I/O adapters than physical I/O adapters. Memory is just another physical resource that you multiplex just like all of the other resources.

    What's even more comical is Mike Neil's use of the phrase "dynamic memory". He describes it as:

    "We are focused on the efficient use of resources and using those resources dynamically — pooling the memory of the whole machine and dynamically balancing the memory between all of the VMs"

    That's exactly where the memory overcommit part comes in. You are handing out more memory to VMs than you have physical memory and then giving that limited physical memory to the VMs that need it most. If all of the VMs need all of their memory all at the same time that's when things start swapping and slow down. Usually that's never the case in a VMware environment because you have Transparent Page Sharing (TPS) that matches up like pages of memory and only writes them once.

    At any rate, this has come around many times and time and time again it has been shown that the majority of customers use overcommit techniques in production. Hopefully those same customers will come here and comment yet again.

  • Hi

    May I refer to my article about Memory overcommit between ESX and Hyper-V… http://www.gabesvirtualworld.com/?p=104

    Also there is a big difference in good memory overcommit and bad memory overcommit. Which I also tough in my article, but maybe not extensively enough:

    Say you have 20 VMs, with each 4Gb RAM requirement, but they don't always use it. After running these VMs for a week, you see that on average you need 45Gb RAM with max usage of 55Gb.

    With a host of 96Gb RAM there is no overcommit.

    With a host of 48Gb RAM there is BAD overcommit because there will be situations you need 55G and then the host has to swapout memory to disk.

    With a host of 64Gb RAM there is good memory overcommit, because there is always enough memory to support the max RAM needed but you're still saving (80-64) 16Gb of RAM.

    Often, when reading posts from Microsoft about memory overcommit they stress that you never want to use overcommit because you then would be swapping memory to disk. I agree this is NOT the kind of overcommit you want to use. You as ESX admin, should make sure that this never happens by keeping an eye on the metrics that tell you if your host needs to swap memory.

    As long as you never let you host swap to disk, go ahead and use memory overcommit and save some money.

    Gabrie

  • Excellent additions to the dialog, everyone. My goal was to get the dialog going beyond just "what MSFT says" and I appreciate the extra input.

    Some comments, in no particular order:

    1) I was never implying that I was looking for some "CIA/NASA environment" – I was simply looking for production implementations since I know from our company's use that we have different levels of risk tolerance in pre-production environments than in production, even if the production systems are "unimportant."

    2) We didn't delete your comment Mike D, but we have moderation turned on because of the recent proliferation of comment spam. We recently added the Captcha aspect to reduce that and may turn on unmoderated comments at some point in the future if it looks like it's working.

    3) Shawn, of COURSE it's MSFT downplaying a feature they don't have <g>. But sometimes that is defensible, and sometimes it's not – that's why I was looking for some customer perspective. Both VMware and MSFT have internal and external evangelists that will defend their respective "sides" on hot issues like overcommit, VMotion, etc. and I find that customers, while often having a favored technology, will spend more time determining the impact of the use (or avoidance) of a feature based on business requirements. That is, in my experience, also more likely to be evaluated for production use – another reason I was specifying production use of overcommit.

    4) Gabe – excellent article! I hadn't read it before, so thanks for sharing the link here. You're on my personal blogroll now!

    Thanks again everyone for helping to advance the dialog around this topic.