Real-world virtualization benchmarking: the best server CPUs compared

Name: Real-world virtualization benchmarking: the best server CPUs compared
Item: Real-world virtualization benchmarking: the best server CPUs compared
Author: Johan De Gelas

by Johan De Gelas on May 21, 2009 3:00 AM EST

Posted in
IT Computing

66 Comments | Add A Comment

66 Comments

Inquisitive Minds Want to Know

Tynopik, a nickname for one of our readers, commented: "Is Nehalem better at virtualization simply because it's a faster CPU? Or are the VM-specific enhancements making a difference?" For some IT professionals that might not matter, but many of our readers are very keen (rightfully so!) to understand the "why" and "how". Which characteristics make a certain CPU a winner in vApus Mark I? What about as we make further progress with our stress testing, profiling, and benchmarking research for virtualization in general?

Understanding how the individual applications behave would be very interesting, but this is close to impossible with our current stress test scenario. We give each of the four VMs four virtual CPUs, and there are only eight physical CPUs available. The result is that the VMs steal time from each other and thus influence each other's results. It is therefore easier to zoom in on the total scores rather than the individual scores. We measured the following numbers with ESXtop:

Dual Opteron 8389 2.9GHz CPU Usage
	Percentage of CPU Time
Web portal VM1	19.8
Web portal VM2	19.425
OLAP VM	27.2125
OLTP VM	27.0625
Total "Work"	93.5
"Pure" Hypervisor	1.9375
Idle	4.5625

The "pure" hypervisor percentage is calculated as what is left after subtracting the work that is done in the VMs and the "idle worlds". The work done in the VMs includes the VMM, which is part of the hypervisor. It is impossible, as far as we know, to determine the exact amount of time spent in the guest OS and in the hypervisor. That is the reason why we speak of "pure" hypervisor work: it does not include all the hypervisor work, but it is the part that happens in the address space of the hypervisor kernel.

Notice how the scheduler of ESX is pretty smart as it gives the more intensive OLAP and OLTP VMs more physical CPU time. You could say that those VMs "steal" a bit of time from the web portal VMs. The Nehalem based Xeons shows very similar numbers when it comes to CPU usage:

Dual Xeon X5570 CPU Usage (no Hyper-Threading)
	Percentage of CPU time
Web portal VM1	18.5
Web portal VM2	17.88
OLAP VM	27.88
OLTP VM	27.89
Total "Work"	92.14
"Pure" Hypervisor	1.2
Idle	6.66

With Hyper-Threading, we see something interesting. VMware ESXtop does not count the "Hyper-Threading CPUs" as real CPUs but does see that the CPUs are utilized better:

Dual Xeon X5570 CPU Usage (Hyper-Threading Enabled)
	Percentage of CPU time
Web portal VM1	20.13
Web portal VM2	20.32
OLAP VM	28.91
OLTP VM	28.28
Total "Work"	97.64
"Pure" Hypervisor	1.04
Idle	1.32

Idle time is reduced from 6.7% to 1.3%.

The Xeon 54XX: no longer a virtualization wretch

It's also interesting that VMmark tells us that the Shanghais and Nehalems are running circles around the relatively young Xeon 54xx platform, while our vApus Mark I tells us that while the Xeon 54xx might not be the first choice for virtualization, it is nevertheless a viable platform for consolidation. The ESXtop numbers you just saw gives us some valuable clues, and the Xeon 54xx "virtualization revival" is a result of the way we test now. Allow us to explain.

In our case, we have eight physical cores with four VMs and four vCPUs each. So on average the hypervisor has to allocate two physical CPUs to each virtual machine. ESXtop shows us that the scheduler plays it smart. In many cases, a VM gets one dual-core die on the Xeon 54xx, and cache coherency messages are exchanged via a very fast shared L2 cache. ESXtop indicates quite a few "core migrations" but never "socket migrations". In other words, the ESX scheduler keeps the virtual machines on the same cores as much as possible, keeping the L2 cache "warm". In this scenario, the Xeon 5450 can leverage a formidable weapon: the very fast and large 6MB that each two cores share. In contrast, two cores working on the same VM have to content themselves with a tiny 512KB L2 and a slower and a smaller L3 cache (4MB per two cores) on Nehalem. The way we tested right now is probably the best case for the Xeon 54xx Harpertown. We'll update with two and three tile results later.

Quad Opteron: room for more

Our current benchmark scenario is not taxing enough for a quad Opteron server:

Quad Opteron 8389 CPU Usage
	Percentage of CPU time
Web portal VM1	14.70625
Web portal VM2	14.93125
OLAP VM	23.75
OLTP VM	23.625
Total "Work"	77.0125
"Pure" Hypervisor	2.85
Idle	21.5625

Still, we were curious how a quad machine would handle our virtualization workload, even at 77% CPU load. Be warned that the numbers below are not accurate, but give some initial ideas.

Despite the fact that we are only using 77% of the four CPUs compared to the 94-97% on Intel, the quad socket machine remains out of reach of the dual CPU systems. The quad Shanghai server outperforms the best dual socket Intel by 31% and improves performance by 58% over its dual socket sibling. We expect that once we run with two or three "tiles" (8 or 12 VMs), the quad socket machine will probably outperform the dual shanghai by -- roughly estimated -- 90%. Again, this is a completely different picture than what we see in VMmark.

Analysis: "Nehalem" vs. "Shanghai" Caches, Memory Bandwidth, or Pure Clock Speed?

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

66 Comments

View All Comments

JohanAnandtech - Friday, May 22, 2009 - link
Most of the time, the number of sessions on TS are limited by the amount of memory. Can you give some insight in what you are running inside a session? If it is light on CPU or I/O resources, sizing will be based on the amount of memory per session only.
dragunover - Thursday, May 21, 2009 - link
would be interesting if this was done on desktop CPU's with price / performance ratios
jmke - Thursday, May 21, 2009 - link
nope, that would not be interesting at all. You don't want desktop motherboards, RAM or CPUs in your server room;
nor do you run ESX at home. So there's no point to test performance of desktop CPUs.
simtex - Thursday, May 21, 2009 - link
Why so harsh, virtualization will eventually become a part of desktops users everyday life.

Imagine, tabbing between different virtualization, like you do in your browser. You might have a secure virtualization for your webapplications, a fast virtualization for your games. Another for streaming music and maybe capturing television. All on one computer, which you seldom have to reboot because everything runs virtualized.
Azsen - Monday, May 25, 2009 - link
Why would you run all those applications on your desktop in VMs? Surely they would just be separate application processes running under the one OS.
flipmode - Thursday, May 21, 2009 - link
Speaking from the perspective of how the article can be the most valuable, it is definitely better off to stick to true server hardware for the time being.

For desktop users, it is a curiosity that "may eventually" impart some useful data. The tests are immediately valuable for servers and for current server hardware. They are merely of academic curiosity for desktop users on hardware that will be outdated by the time virtualization truly becomes a mainstream scenario on the desktop.

And I do not think he was being harsh, I think he was just being as brief as possible.

Real-world virtualization benchmarking: the best server CPUs compared

Post Your Comment

66 Comments

View All Comments

JohanAnandtech - Friday, May 22, 2009 - link

dragunover - Thursday, May 21, 2009 - link

jmke - Thursday, May 21, 2009 - link

simtex - Thursday, May 21, 2009 - link

Azsen - Monday, May 25, 2009 - link

flipmode - Thursday, May 21, 2009 - link

Log in

Don't have an account? Sign up now