Affordable storage for the SME, part one
by Johan De Gelas on November 7, 2007 4:00 AM EST- Posted in
- IT Computing
Latency and Further Analyses
So let us delve a bit deeper in our SQLIO benchmarking. What kind of latency may we expect with these systems? In many cases latency (seek time + 1/2 rotation) will be the bottleneck. To understand the behavior of our different storage system better we tested with both a 2GB and 20GB file.
Bandwidth doesn't really get any lower when you access the hard disk sequentially in our configuration. This is a result of "zone recording": the tracks in the "outside zone" of the hard disk have a lot more sectors (and thus data) than the inner tracks. As we only tested with a 20GB file on a total of 500GB (7x 73GB) of disks space, all disk activity was in the fast outside zone of the disks.
Latency is very low as we don't need to move the head of the hard disk; most of the time the heads stay on the outside tracks. When we have to move the heads, they only have to make a very small move from one track to an adjacent one. Random access is a lot more interesting...
With a 20GB file, the chance that the actuator has to move the head to jump to the next random block is a lot higher than with a 2GB file. In addition, the head movements become longer and are no longer only short strokes.
It now gets clear why the iSCSI SLES target performs so well at Random Reads. Look at the latency at 2GB; it's "impossibly low" as it is lower than the DAS configuration. This indicates that there is more cache activity going on than on the DAS configuration. Since both use the same RAID controller this "extra cache activity" is not happening on the level of the RAID controller but on the OS Level. Indeed, as we looked at the buffers of the SLES installation we saw the buffers increase quickly from a few KB to 1708MB. This means that the majority of our 2GB RAM on the Intel SSR212MC2 is caching the 2GB test file. Once we moved to 20GB this Linux file system caching could not help anymore and will probably add latency instead of lowering it. The Microsoft iSCSI target software does not seem to use this kind of caching.
This has an interesting result: the iSCSI SLES target is very interesting if you want the best random performance with a relatively small database. You can then try to put as much memory as possible in your iSCSI storage rack. On the other side of the coin is the fact that once the cache is too small, performance decreases quickly.
RAID 6 ?
As the Promise system was the only one with RAID 6 we did not put all our results in graphs. Our testing shows that RAID 6 is almost in every circumstance about 5 to 10% slower than RAID 5. For many people that will be a small price to pay as a failed disk no longer means that the array is unprotected until a replacement disk is installed. In case that the RAID 5 has to be rebuild, disks get accessed very intensively and as such disks are more prone to fail.
Management Interface
While it is not the focus of this article, we should mention that both the Intel Storage server and the Promise VTRAK E310f run a web server that offers management access to the storage server configuration via your LAN or the internet. Promise provides a very extensive GUI that guides users through all the possible options, a CLI and a menu driven CLU. The CLI or CLU can be accessed via a relatively fast 115200 serial connection. (We don't have fond memories of accessing the Cisco OS via a 9600 bps interface).
You use the CLI or CLU to setup password and the network IP, after which you can configure the disk array in a very nice GUI
Besides diagnostics, disk array management, and user management, it is also possible to set up several other services such as a mail server that warns you if one of the drives fails and if the hot spare has been used or not.
Intel's software is a bit more sober; you won't find red flashy lights going off on a picture of the rack if something is wrong. However, the Intel RAID web console does a great job of quickly showing all the technical data you need such as stripe size, caching policies, etc.
So let us delve a bit deeper in our SQLIO benchmarking. What kind of latency may we expect with these systems? In many cases latency (seek time + 1/2 rotation) will be the bottleneck. To understand the behavior of our different storage system better we tested with both a 2GB and 20GB file.
Bandwidth doesn't really get any lower when you access the hard disk sequentially in our configuration. This is a result of "zone recording": the tracks in the "outside zone" of the hard disk have a lot more sectors (and thus data) than the inner tracks. As we only tested with a 20GB file on a total of 500GB (7x 73GB) of disks space, all disk activity was in the fast outside zone of the disks.
Latency is very low as we don't need to move the head of the hard disk; most of the time the heads stay on the outside tracks. When we have to move the heads, they only have to make a very small move from one track to an adjacent one. Random access is a lot more interesting...
With a 20GB file, the chance that the actuator has to move the head to jump to the next random block is a lot higher than with a 2GB file. In addition, the head movements become longer and are no longer only short strokes.
It now gets clear why the iSCSI SLES target performs so well at Random Reads. Look at the latency at 2GB; it's "impossibly low" as it is lower than the DAS configuration. This indicates that there is more cache activity going on than on the DAS configuration. Since both use the same RAID controller this "extra cache activity" is not happening on the level of the RAID controller but on the OS Level. Indeed, as we looked at the buffers of the SLES installation we saw the buffers increase quickly from a few KB to 1708MB. This means that the majority of our 2GB RAM on the Intel SSR212MC2 is caching the 2GB test file. Once we moved to 20GB this Linux file system caching could not help anymore and will probably add latency instead of lowering it. The Microsoft iSCSI target software does not seem to use this kind of caching.
This has an interesting result: the iSCSI SLES target is very interesting if you want the best random performance with a relatively small database. You can then try to put as much memory as possible in your iSCSI storage rack. On the other side of the coin is the fact that once the cache is too small, performance decreases quickly.
RAID 6 ?
As the Promise system was the only one with RAID 6 we did not put all our results in graphs. Our testing shows that RAID 6 is almost in every circumstance about 5 to 10% slower than RAID 5. For many people that will be a small price to pay as a failed disk no longer means that the array is unprotected until a replacement disk is installed. In case that the RAID 5 has to be rebuild, disks get accessed very intensively and as such disks are more prone to fail.
Management Interface
While it is not the focus of this article, we should mention that both the Intel Storage server and the Promise VTRAK E310f run a web server that offers management access to the storage server configuration via your LAN or the internet. Promise provides a very extensive GUI that guides users through all the possible options, a CLI and a menu driven CLU. The CLI or CLU can be accessed via a relatively fast 115200 serial connection. (We don't have fond memories of accessing the Cisco OS via a 9600 bps interface).
You use the CLI or CLU to setup password and the network IP, after which you can configure the disk array in a very nice GUI
Besides diagnostics, disk array management, and user management, it is also possible to set up several other services such as a mail server that warns you if one of the drives fails and if the hot spare has been used or not.
Intel's software is a bit more sober; you won't find red flashy lights going off on a picture of the rack if something is wrong. However, the Intel RAID web console does a great job of quickly showing all the technical data you need such as stripe size, caching policies, etc.
21 Comments
View All Comments
Anton Kolomyeytsev - Friday, November 16, 2007 - link
Guys I really appreciate you throwing away StarWind! W/o even letting people know what configuration did you use, did you enable caching, did you use flat image files, did you map whole disk rather then partition, what initiator did you use (StarPort or MS iSCSI), did you apply recommended TCP stack settings etc. Probably it's our problem as we've managed to release the stuff people cannot properly configure but why did not you contact us telling you have issues so we could help you to sort them out?With the WinTarget R.I.P. (and MS selling it's successor thru the OEMs only), StarWind thrown away and SANmelody and IPStor not even mentioned (and they are key players!) I think your review is pretty useless... Most of the people are looking for software solutions when you're talking about "affordable SAN". Do you plan to have second round?
Thanks once again and keep doing great job! :)
Anton Kolomyeytsev
CEO, Rocket Division Software
Johnniewalker - Sunday, November 11, 2007 - link
If you get a chance, it would be great to see what kind of performance you get out of an iscsi hba, like the one from qlogic.When it gets down to it, the DAS numbers are great for a baseline, but what if you have 4+ servers running those io tests? That's what shared storage is for anyhow. Then compare the aggregate io vs DAS numbers?
For example, can 4 servers can hit 25MB/s each in the SQLio random read 8kb test for a total of 100MB/s ? How much is cpu utilization reduced with one or more iscsi hba in each server vs the software drivers? Where/how does the number of spindles move these numbers? At what point does the number of disk overwhelm one iscsi hba, two iscsi hba's, one FC hba, two FC hbas, and one or two scsi controllers?
IMHO iscsi is the future. Most switches are cheap enough that you can easily build a seperate dedicated iscsi network. You'd be doing that if you went with fiber channel anyhow, but at a much higher expense (and additional learning curve) if you don't already have it, right?
Then all we need is someone who has some really nice gui to manage the system - a nice purdy web interface that runs on a virtual machine somewhere, that shows with one glance the health, performance, and utilization of your system(s).
System(s) have Zero faults.
Volume(s) are at 30.0 Terabytes out of 40.00 (75%)
CPU utilization is averaging 32% over the last 15 minutes.
Memory utilization is averaging 85% over the last 15 minutes.
IOs peaked at 10,000 (50%) and average 5000 (25%) over the last 15 minutes.
Pinch me!
-johhniewalker
afan - Friday, November 9, 2007 - link
You can get one of the recently-released 10Gbps PCI-E TCP/IP card for <$800, and they support iSCSI.here's one example:
http://www.intel.com/network/connectivity/products...">http://www.intel.com/network/connectivi...oducts/p...
The chip might be used by Myricom and others, (I'm not sure), and there's a linux and a bsd driver - a nice selling point.
10gb ethernet is what should really change things.
They look amazing on paper -- I'd love to see them tested:
http://www.intel.com/network/connectivity/products...">http://www.intel.com/network/connectivi...ucts/ser...
JohanAnandtech - Saturday, November 10, 2007 - link
The problem is that currently you only got two choices: expensive CX4 copper which is short range (<15 m) and not very flexible (it is a like infiniband cables) or Optic fiber cabling. Both HBAs and cables are rather expensive and require rather expensive switches (still less than FC, but still). So you the price gap with FC is a lot smaller. Of course you have a bit more bandwidth (but I fear you won't get much more than 5 GBit, has to be test of course), and you do not need to learn fc.Personally I would like to wait for 10 gbit over UTP-cat 6... But I am open to suggestion why the current 10 gbit would be very interesting too.
afan - Saturday, November 10, 2007 - link
Thanks for your answer, J.first, as far as I know, CX4 cables aren't as cheap as cat_x, but they aren't all _that_ expensive to be a showstopper. If you need more length, you can go for the fibre cables -- which go _really_ far:
http://www.google.com/products?q=cx4+cable&btn...">http://www.google.com/products?q=cx4+ca...amp;btnG...
I think the cx4 card (~$800)is pretty damn cheap for what you get: (and remember it doesn't have pci-x limitations).
Check out the intel marketing buzz on iSCSI and the junk they're doing to speed up TCP/IP, too. It's good reading, and I'd love to see the hype tested in the real world.
I agree with you that UTP-cat 6 would be much better, more standardized, much cheaper, better range, etc. I know that, but if this is we've got now, so be-it, and I think it's pretty killer, but I haven't tested it : ).
Dell, cisco, hp, and others have CX4 adapters for their managed switches - they aren't very expensive and go right to the backplane of the switch.
here are some dell switches that support CX-4, at least:
http://www.dell.com/content/products/compare.aspx/...">http://www.dell.com/content/products/co...er3?c=us...
these are the current 10gbe intel flavors:
copper: Intel® PRO/10GbE CX4 Server Adapter
fibre:
Intel® PRO/10GbE SR Server Adapter
Intel® PRO/10GbE LR Server Adapter
Intel® 10 Gigabit XF SR Server Adapters
a pita is the limited number of x8 PCI-E slots in most server mobos.
keep up your great reporting.
best, nw
somedude1234 - Wednesday, November 7, 2007 - link
First off, great article. I'm looking forward to the rest of this series.From everything I've read coming out of MS, the StorPort driver should provide better performance. Any reason why you chose to go with SCSIPort? Emulex offers drivers for both on their website.
JohanAnandtech - Thursday, November 8, 2007 - link
Thanks. It is something that Tijl and myself will look into, and report back in the next article.Czar - Wednesday, November 7, 2007 - link
Love that anandtech is going into this direction :DRealy looking forward to your iscsi article. Only used fiber connected sans, have a ibm ds6800 at work :) Never used iscsi but veeery interested into it, what I have heard so far is that its mostly just very good for development purposes, not for production enviroments. And that you should turn of I think chaps or whatever it its called on the switches, so the icsci san doesnt overflow the network with are you there when it transfers to the iscsi target.
JohanAnandtech - Thursday, November 8, 2007 - link
Just wait a few weeks :-). Anandtech IT will become much more than just one of the many tabs :-)
We will look into it, but I think it should be enough to place your iSCSI storage on a nonblocking switch on separate VLAN. Or am I missing something?
Czar - Monday, November 12, 2007 - link
think I found ithttp://searchstorage.techtarget.com/generic/0,2955...">http://searchstorage.techtarget.com/generic/0,2955...
"Common Ethernet switch ports tend to introduce latency into iSCSI traffic, and this reduces performance. Experts suggest deploying high-performance Ethernet switches that sport fast, low-latency ports. In addition, you may choose to tweak iSCSI performance further by overriding "auto-negotiation" and manually adjusting speed settings on the NIC and switch. This lets you enable traffic flow control on the NIC and switch, setting Ethernet jumbo frames on the NIC and switch to 9000 bytes or higher -- transferring far more data in each packet while requiring less overhead. Jumbo frames are reported to improve throughput as much as 50%. "
This is what I was talking about.
Realy looking forward to the next article :)