Intel Xeon 3.6 2MB vs AMD Opteron 252 Database Test
by Jason Clark & Ross Whitehead on February 14, 2005 8:00 AM EST- Posted in
- IT Computing
"Order Entry" Stress Test: Measuring Enterprise Class Performance
One complaint that we've historically received regarding our Forums database test was that it isn't strenuous enough for some of the Enterprise customers to make a good decision based on the results.
In our infinite desire to please everyone, we worked very closely with a company that could provide us with a truly Enterprise Class SQL stress application. We cannot reveal the identity of the Corporation that provided us with the application because of non-disclosure agreements in place. As a result, we will not go into specifics of the application, but rather provide an overview of its database interaction so that you can grasp the profile of this application, and understand the results of the tests better (and how they relate to your database environment).
We will use an Order Entry system as an analogy for how this test interacts with the database. All interaction with the database is via stored procedures. The main stored procedures used during the test are:
sp_AddOrder - inserts an Order
sp_AddLineItem - inserts a Line Item for an Order
sp_UpdateOrderShippingStatus - updates a status to "Shipped"
sp_AssignOrderToLoadingDock - inserts a record to indicate from which Loading Dock the Order should be shipped
sp_AddLoadingDock - inserts a new record to define an available Loading Dock
sp_GetOrderAndLineItems - selects all information related to an Order and its Line Items
The above is only intended as an overview of the stored procedure functionality; obviously, the stored procedures perform other validation, and audit operations.
Each Order had a random number of Line Items, ranging from one to three. Also randomized was the Line Items chosen for an order, from a pool of approximately 1500 line items.
Each test was run for 10 minutes and was repeated three times. The average between the three tests was used. The number of Reads to Writes was maintained at 10 reads for every write. We debated for a long while about which ratio of reads to writes would best serve the benchmark, and we decided that there was no correct answer. So, we went with 10.
The application was developed using C#, and all database connectivity was accomplished using ADO.NET and 20 threads - 10 for reading and 10 for inserting.
So, to ensure that IO was not the bottleneck, each test was started with an empty database and expanded to ensure that auto-grow activity did not occur during the test. Additionally, a gigabit switch was used between the client and the server. During the execution of the tests, there were no applications running on the server or monitoring software. Task Manager, Profiler, and Performance Monitor were used when establishing the baseline for the test, but never during execution of the tests.
At the beginning of each platform, both the server and client workstation were rebooted to ensure a clean and consistent environment. The database was always copied to the 8-disk RAID 0 array with no other files present to ensure that file placement and fragmentation was consistent between runs. In between each of the three tests, the database was deleted, and the empty one was copied again to the clean array. SQL Server was not restarted.
97 Comments
View All Comments
Jason Clark - Monday, February 14, 2005 - link
An article we are contemplating is desktop parts in a SQL test, and web. Lots of folks in smaller orgranizations and even medium to some extent build their own boxes.Interesting?
Regs - Monday, February 14, 2005 - link
Thanks for Clarifying me #24. For some odd reason I'm thinking about the differences between the branch predicator of a A64 and Intel and I got in over my head.But you are right about the cache, spatial and temporal locality.
rivieracadman - Monday, February 14, 2005 - link
I would suspect that the raw speed of the Xeon coupled with the larger cash to reduce latency would make the Xeon perform well in any benchmark that was both threaded and delt with small data sets, such as reads, queries, and searches. On the other hand, the Opteron due to its lower memory access overhead, and shear bandwidth, would do better in areas with large data sets such as data transfers, data recovery, and large complex calculations. If this is correct, which you have pretty much confirmed, then I would suspect that the Opteron would do better in the web server tests as long as the pages served were larger then say 15K. Not that this is any magical number, but the Xeon would have to pull from memory more at this point.As for the HT bus. I wouldn't think you would use the entire 1Ghz bus on a database benchmark. You really need to perform some workstation benchmarks to fill the bus.
Since everyone else here is adding to the wish list. I would like to see a real world combined query, read, change, write benchmark. I think the Xeon does better when searching and reading because of its shear speed, but the Opteron would do better when a record is altered and resubmitted to a database. This is more of a real world example in my opinion, and since both are architectually diffrent, it would allow for both CPUs to show their true colors in what would be considered every tasks.
blckgrffn - Monday, February 14, 2005 - link
Having repetitive data is what having cache is all about, the long pipelined architecture of the P4 needs the large local cache to minimize time-expensive ram lookups to compensate for the time-expensive deep pipe operations that get tossed when mis-predicted. So, the 2meg cache could help the prescott in many places and is not limited only to SQL. I think that we can probably look at the the EE P4's and get a feeling for what the new prescotts will bring to the table, but we can hope that all of those additions that were made to the Prescott core are allowed to shine with more cache present.fitten - Monday, February 14, 2005 - link
#9, this was a server benchmark test. Servers are about stability and such. Anyone who overclocks a critical server (database, etc.) should be fired on the spot.They may do overclocking tests in the workstation review that was mentioned.
Ross Whitehead - Monday, February 14, 2005 - link
#20 - I agree the AMDs instructions/clock count is high, but we were surprised that the 25% increase in HT did not provide any measurable difference.Regs - Monday, February 14, 2005 - link
*Their IPC counts are higher*Need more coffee
Regs - Monday, February 14, 2005 - link
#11 - I doubt there will be a performance gain for games with just added cache. The problem with the prescott is it's low IPC core and leakage. Anyways, Apps on the desk top use a lot of repeatedly used data arrays with similar instruction sets. So why would the CPU core benefit a larger L2 cache for games when it's just going to be the similar type of code it just processed?#16 - AMD's are not bandwidth starved. Their high instructions per clock count are higher. So the pipeline is a lot shorter which means it does not run a risk of pipeline stalls if it was not fed enough data from the bus unlike the Intel.
Jason Clark - Monday, February 14, 2005 - link
mickyb,Quad 3.6 Xeon systems don't exist as far as I know, correct me if I'm wrong. Quad Xeon systems are still the 400MHz FSB Xeons that are clocked at most 3GHz.
mickyb - Monday, February 14, 2005 - link
The Intel XEON has always been competitive. You guys are thinking about gaming. I would like to see 4 way perforamnce and see a graph on benchmarks compared to number of CPUs. AMD has previously done well in this area.