Virtual Nomad: August 2011

Finally I could find some free time to prove how efficient Transparent Page Sharing is. To do so I had to disable Large Page support on all 10 hosts of our vSphere 4.1.

Let me briefly describe our virtual server farm. We have 10 ESXi hosts with 2 Xeon 5650 CPUs in each host. 6 hosts are equiped with 96 Gb and 4 hosts - with 48 Gb, which gives us 768 Gb in total. Since all CPUs use Hardware Assisted MMU all ESXi hosts aggressively back virtual machines' RAM with Large Pages. Therefore, TPS won't kick in unless there is a memory contention on a host.

We run 165 virtual machines that have 493 Gb of assigned vRAM. According to Figure1 during the last week we have 488 Gb of consumed RAM and 33 Gb of shared RAM. When I just started my study on TPS I was very surprised to see some shared memory even though Large Pages were enabled. However, now I know that these 33 Gb are just zero pages that are shared when VM is powered on, without even being allocated in physical RAM of ESXi host.

Figure 1. Memory stats - Last week

In Figure 2 we can see that CPU usage is very low accros all cluster for the last week. According to VMware enabling Large Pages can be pretty effective with regard to CPU usage. VMware documents show about 25% lower CPU usage with LP enabled. Considering that we have had about 3% only of CPU usage I didn't see a problem to increase it to get benefits of TPS.

Figure 2. CPU stats - Last week

It took me a while to disable LP support on all hosts and then switch on and off maintenance mode on each host one by one to start taking advantage of TPS. In Figure 3 you can see memory stats for the period of time when I was still moving VMs around. The value of shared memory is not final yet, but you can see how fast ESXi hosts scan and share memory for VMs.

Figure 3. Memory stats - 1 hour after change

Since we disabled large pages support we should get higher rate of TLB misses which in turn should lead to more CPU cycles spent for walking through two page tables:

The VM's page table to find Virtual Address to Guest OS Physical Address mapping
Nested Page Talbe to find Guest Physical Address to Host Physical Address mapping

However, if you look at Figure 4 you won't see CPU usage increase. The spikes you can see there are mostly caused by masssive vmotions from host to host.

Figure 4. CPU stats - 1 hour after change

Unfortunately, I couldn't do all 10 hosts at once during working hours as there are some virtual MS Failover Clusters running in our vSphere and I needed to failover some cluster resources before moving cluster node off the ESXi host, therefore the last host was done only 5 hours later. You can see that the last host was done around 7-20 pm in Figure 5. After 5 hours since change has been implemented we already had 214 Gb of shared memory. Consumed memory decreased from 488 Gb to 307 Gb, thus saving us 180 Gb already. As you can see in Figure 6 there is still no CPU usage penalty due to disabled Large Pages.

Figure 5. Memory stats - 5 hours after change

Figure 6. CPU stats - 5 hours after change

In a couple of screenshots below you can see stats 24 hours since the change was done.

Figure 7. Memory stats - 24 hours after change

Figure 8. CPU stats - 24 hours after change

I have seen some posts on VMware communities where people were asking about the CPU impact with Large Pages disabled. The conclusion of this post can be a good answer to all those questions - switching off the Large Pages saved us 210 Gb and CPU usage still stays very low.

With vSphere 5 release the benefits of TPS became not significant, but I guess there will be a lot of companies staying with vSphere 4.1 for a while. Also new vRAM entitlements made TPS a bit more useful.

Nevertheless, it is difficult to generalize performance impact of disabling Large Pages. I wouldn't recommend to do so without proper test as TPS effeciency can significantly vary due to your vSphere specifics.

Update: I have been using Veeam Monitor to collect all these stats in Cluster view, but today I also checked vCenter performance stats and discovered that there is a difference in Shared and Consumed values. However, if I check Resources stats (which includes all VMs) in Veeam Monitor I get the same stats as I can see in vCenter. According to vCenter the Shared amount of memory is even higher - 280 Gb against 250 Gb in Veeam Monitor. That basically means more than 50% of physical memory savings!

If you find this post useful please share it with any of the buttons below.

As far as I am aware right after VMware had started to receive tons of negative feedbacks regarding new licensing model of vSphere 5 they began reviewing the clients' current vRAM usage. I guess they questioned quite a lot of clients to compare the current amount of physical RAM and actual vRAM assigned and made a proper conclusions.

So here are the latest rumors and I personally truly believe they will come true.

vRAM entitlement per Enterprise and Enterprise Plus license is going to be doubled to 64GB and 96GB corresspondingly.
vRAM entitlement per Essentials and Essentials Plus license will be increased from 24GB to 32GB.
Maxumum vRAM for Essentials /Essentials Plus will increased to 192GB
Maximum 96GB per VM will be counted against your vRAM pool limit even though you might assign 1TB to this VM.

Honestly, I think it is very adequate response to all criticism VMware got for the last few weeks, but I am wondering - why VMware marketing guys couldn't foresee such reaction? Perhaps it shows that VMware was not fully aware about how their products were used in client companies. I guess now VMware will arrange such kind of resource usage reviews on a regular basis.

Taken from here: http://derek858.blogspot.com/2011/07/impending-vmware-vsphere-50-license.html

Update: Official announcement of vRAM entitlement change is expected on 3rd of August.

Update 1: Official announcement is here - http://blogs.vmware.com/rethinkit/2011/08/changes-to-the-vram-licensing-model-introduced-on-july-12-2011.html

Here is comparison table to the changes that were announced on 12th of July.

Another nice thing to mention - vRAM usage will be calculated as an average amount used for last 12 month. So rare spiky raises in usage of vRAM will slightly increase average vRAM usage, but you will not need to pay for such spikes perpetually.

If you find this post useful please share it with any of the buttons below.

Thursday 4 August 2011

Transparent Page Sharing - efficiency of disabling Large Pages

Tuesday 2 August 2011

Multiple complaints forced VMware to change vRAM Entitlements