Original Link: https://www.anandtech.com/show/1101
AMD Opteron Coverage - Part 4: Desktop Performance
by Anand Lal Shimpi on April 24, 2003 10:57 PM EST- Posted in
- CPUs
The past week has been an exciting one for the IT community; the release of AMD's Opteron microprocessor has restored competitive balance between the two microprocessor giants. Although we have a very large IT user base, there are a good number of our more than 3 million monthly readers who could care less about the performance of a 2-way Opteron under SQL Server.
The Enthusiast community is still waiting for Athlon 64, and to be tempted by the performance of the Opteron and not given an idea of how a single processor Opteron would fair on the desktop is simply cruel.
The temptation continues when you realize that although none of the AMD-chipset based Opteron boards have an AGP slot, boards based on NVIDIA's nForce3 Pro chipset will. For those of you that haven't read Part 1 of our Opteron coverage, NVIDIA's nForce3 Pro chipset is a single-chip solution for uniprocessor Opteron workstation and enthusiast-class PCs; as we just mentioned, with an AGP 8X slot.
ASUS is NVIDIA's launch partner for the nForce3 Pro, but unfortunately official review samples of their nForce3 Pro boards won't be ready for another month or two. "Great," you think, we go all this way to tempt you and then say that there's no way for us to give you an idea of how competitive the Opteron would be in a desktop scenario.
The biggest limitation to giving you a good idea of what sort of desktop performance to expect out of the Opteron is that none of the current platforms have an AGP slot, but with a decently fast PCI graphics card we could still do a nice Opteron to Athlon XP comparison to give us an idea of what sort of performance to expect down the line from Opteron/nForce3 and Athlon 64 platforms.
And that is the focus of today's article, to give you an idea of the performance the Opteron processor itself will be able to offer in non-server environments.
Turning a Server Board into a Desktop Solution
As we mentioned in the introduction to this piece, our biggest limitation was that the slow on-board graphics of the server motherboards we were using would not only prevent us from running any sort of 3D tests, but it would also significantly slow down 2D application performance.
The PCI GeForce4 MX 440 was our video card of choice
Without an AGP slot, we were left with one option - to use a PCI video card. The GeForce4 MX 440 is available in a PCI version, and thus we equipped all of our test beds with the card. Since the focus of this review is on the overall CPU performance, running our gaming tests at 640x480 would not only serve our purpose of limiting the video bottleneck but it would also be a reasonable tradeoff to make as we are interested solely in CPU performance. If you remember, up until the release of the GeForce4 Ti 4600 and the Radeon 9700 Pro, almost all of our gaming tests for CPU reviews were conducted at 640x480.
Predicting Athlon 64 Performance?
One thing you have to keep in mind while looking at the performance comparisons we're about to show you is that the Opteron is no Xeon in that there are some significant performance differences between the Opteron and the Athlon 64.
As a quick recap of the architectural changes, here's what makes an Opteron different:
The pincount of the Opteron alone should give you an indication that it is a noticeably different chip than the Athlon 64. While the desktop Athlon 64 weighs in with a plentiful 754 pins, the Opteron has no less than 940 pins. What are the additional pins being used for?
While the desktop Athlon 64 only has a single Hyper Transport link, each Opteron CPU has three links - two for connecting to other processors and one for connecting to I/O bridges (e.g. South Bridge, PCI-X bridge, etc…).
The next difference between the Athlon 64 and the Opteron is that the Opteron features a 144-bit wide DDR memory interface, in comparison to the Athlon 64's 64-bit DDR memory controller. The 144-bit wide memory bus is over twice as wide as the Athlon's, but offers basically twice the memory bandwidth. The additional bits are parity bits, as the Opteron's memory controller only supports ECC memory.
We used Corsair DIMMs in our testsAlthough the Athlon 64 and Opteron both feature a 128KB L1 cache (64KB instruction cache, 64KB data cache), the Athlon 64 will be available in both large and small L2 cache sizes (1MB and 256 respectively) while the Opteron will stick with 1MB.
To sum things up, the main difference you have to worry about is that with the Athlon 64, you'll have essentially half the memory bandwidth at your disposal; so in memory bandwidth intensive applications, expect Athlon 64 performance to be lower at a given clock speed than what we can show you here on an Opteron.
Also keep in mind that the motherboards we're testing with are not tuned for extracting every last ounce of performance, they are setup for stable, reliable, 24/7 operation. Overall performance will be higher on nForce3 Pro boards aimed at the enthusiast market as well as Athlon 64 motherboards once the chip is released.
The Test
We kept the number of comparison candidates in this review to a minimum, as we don't want to make any sweeping generalizations about the performance of 1P Opteron workstations or Athlon 64 until we're given the appropriate platforms to make final judgments upon. However, what we did want to do was the following:
- Compare the Opteron to the Athlon XP on a clock for clock basis,
- Compare the Opteron to the fastest desktop Pentium 4, and
- Compare the Opteron to the fastest desktop Athlon XP
The first bullet is the most critical one to establish; we can compare the Opteron 244 (1.80GHz) to an Athlon XP 2200+ (1.80GHz) and get a good idea of how much of a benefit we get from the architectural improvements to the core and the on-die memory controller, but how are we to know whether the performance increase is coming from the Opteron architecture or the larger L2 cache?
In order to determine what applications are simply benefiting from a larger cache, we threw in a hypothetical 1.80GHz Barton processor into the mix. You'll remember from our Athlon XP 3000+ Review that the Barton core simply adds another 256KB to the Athlon XP's L2 cache, the perfect candidate for determining whether benchmarks are cache sensitive from the Athlon's perspective.
With all of that said, let's get to the test configuration:
Windows
XP Professional Test Bed
|
|
Hardware
Configuration
|
|
CPU |
AMD
Athlon XP 3000+ (2.167GHz) Barton
AMD Athlon XP 2200+ (1.80GHz) Barton - Underclocked 3000+ AMD Athlon XP 2200+ (1.80GHz) AMD Opteron 244 (1.80GHz) Uniprocessor only Intel Pentium 4 3.0CGHz - HyperThreading Enabled |
Motherboard |
ASUS
A7N8X - NVIDIA nForce2 Chipset (green bars)
Intel D875PBZ - Intel 875P Chipset (blue bars) Rioworks HDAMA - AMD 8000 Chipset (red bar) |
RAM |
2
x 256MB DDR400 CAS2 Corsair XMS3200 DIMMs
2 x 256MB Registered ECC DDR333 Corsair DIMMs |
Sound |
None
|
Hard Drive |
120GB
Western Digital Special Edition 8MB Cache ATA/100 HDD
|
Video Cards |
NVIDIA
GeForce4 MX 440 SE PCI
|
If you're looking for synthetic benchmarks that deal with the performance of the L1/L2 caches and the on-die memory controller of the Opteron, be sure to read Part 1 of our coverage that details the architecture behind the processor as well as some low-level benchmarks of just those areas.
Content Creation & General Usage Performance
|
There's less than a 2% performance difference between the Barton and Thoroughbred 1.80GHz Athlon XP cores, meaning that this benchmark isn't going to benefit too much from the Opteron's larger cache. With that said, the 27% performance increase over an identically clocked Athlon XP is nothing to scoff at; the Opteron 244 is even faster than the Athlon XP 3000+.
Unfortunately for AMD, it will take much more than a 1.80GHz Athlon 64 in order to dethrone the Pentium 4 here; the 3.0C has no less than a 22% performance advantage over the Opteron 244.
|
Under business applications, the situation changes dramatically; for starters, the applications are much more influenced by cache size as is evident by the Barton vs. Thoroughbred comparison. The Pentium 4 also doesn't fair too well in this test, as most business applications (and integer code in general) are branch-heavy, favoring shorter pipelined architectures with small mispredict penalties.
At 1.80GHz, the Opteron 244 is able to outperform its identically clocked Athlon XP sibling by a healthy 21%, just barely outperforming the Athlon XP 3000+.
Gaming Performance - Unreal Tournament 2003
Because we were using such a slow GPU we didn't force maximum detail settings, instead we just let UT pick the best settings for the GeForce4 MX 440 and stuck with them.
|
Things really got interesting as we looked at gaming performance; while all three Athlon XPs performed basically on par with each other, the Opteron was able to distance itself quite well from the rest of the pack. Just by looking at the Athlon XP scores you would think that we were GPU limited, but as you can see by the fact that the Opteron extends a 12% lead we are also platform limited with the Athlon XPs.
|
The botmatch test is much more CPU bound and thus we see a slightly larger spread of scores; the Opteron continues to take the crown with nothing short of a 25% performance advantage over an identically clocked Athlon XP. We can attribute the majority of this performance advantage to the CPU's on-die memory controller, as well as the Opteron's improved branch prediction unit.
Gaming Performance (continued)
|
Quake III based games also seem to favor the very low latency memory access the Opteron can offer.
|
The domination continues under Jedi Knight II; at first we thought something was wrong with the Athlon XP test bed, however no number of re-installing, switching motherboards and reconfiguring the system could get any of the Athlon XPs to perform any better. When paired with a PCI GeForce4 MX 440 and at such a low resolution, the Athlon XP platform ends up being bottlenecked very early on for some strange reason.
|
If these benchmarks are any indication of Athlon 64 performance, then we can look forward to one excellent gaming processor this Fall...
Video Encoding Performance - DiVX/XMpeg 4.5
What was once reserved for "professional" use only has now become a task for many home PCs - media encoding. Today's media encoding requirements are more demanding than ever and are still some of the most intensive procedures you can run on your PC.
We'll start off with a "quick" conversion of a DVD rip (more specifically, Chapter 40 from the Star Wars Episode I DVD) to a DiVX MPEG-4 file. We used the latest DiVX codec (5.03) in conjunction with Xmpeg 4.5 to perform the encoding at 720 x 480.
We set the encoding speed to Fastest, disabled audio processing and left all of the remaining settings on their defaults. We recorded the last frame rate given during the encoding process as the progress bar hit 100%
|
Media encoding has always been a strength of the Pentium 4 architecture and despite the improvements to the Opteron, the fundamental architecture is still very K7-like. The end result is that although performance improves a bit under our DivX encoding test, the Pentium 4 with Hyper Threading enabled is still well beyond reach for AMD.
Video Encoding Performance - Windows Media Encoder 9.0
For our next video encoding test we took Windows Media Encoder 9.0 and encoded the same chapter from the Star Wars Episode I DVD into a 2Mbps VBR WMV file using Media Encoder's built in 2Mbps DVD VBR settings. The time reported is in minutes to encode, lower being better obviously:
|
The situation is a bit more positive for AMD in our Windows Media Encoder 9 test; the Pentium 4 is still untouchable though.
3D Rendering Performance - 3dsmax R5
For our 3ds max 5 benchmarks we chose all of the benchmark scenes that ship with the product - SinglePipe2.max, Underwater_Environment_Finished.max, 3dsmax5_rays.max, cballs2.max and vol_light2.max.
|
|
|
|
|
The performance improvements the Opteron offers over the Athlon XP range from negligable to noticeable under 3dsmax.
3D Rendering Performance - Lightwave 3D 7.5
While 3dsmax 5 is SSE2 optimized, the level of optimization is nowhere near what NewTek reported with Lightwave upon releasing version 7.0b. The performance improvements offered by the new SSE2 optimized version were all above 20% using NewTek's supplied benchmarking scenes.
We chose three benchmarks to use, two of the lesser SSE2 optimized scenes and another that is more optimized just to get an idea of the potential that lies for Pentium 4 users running heavily optimized application.
|
In the SSE2 optimized Lightwave test scenes, the Opteron's SSE2 core manages to bring performance dangerously close to that of the Pentium 4.
|
Here we actually have a case where the Opteron is outperformed by the Athlon XP, although the difference is small enough that it's probably due to normal variations in the benchmark and the lack of a performance-tuned motherboard.
|
Once again we see that in SSE2 environments, the Opteron's SSE2 engine isn't bad at all.
Final Words
To summarize, here's a chart showing the performance improvement the Opteron offered over an identically clocked Athlon XP:
|
Taking into account that most of our tests weren't overly sensitive to cache size increases and aren't incredibly memory bandwidth intensive, these performance figures suggest some very strong potential for the Athlon 64. If AMD can launch the Athlon 64, even at 2GHz, it could be competitive with a Pentium 4 3.0C across the board; we'll have to wait and see what will be necessary for AMD to remain competitive once Prescott hits later this year.
These numbers also speak very highly of the performance we can expect out of Opteron/nForce3 Pro systems; for the gamers without a budget, the Opteron/nForce3 may be an interesting platform to look into, although we'll leave final judgments until we actually test gaming environments with a higher performing AGP card.
The final thing to take away from this performance review is that the Opteron's SSE2 performance is respectable, it looks like AMD will finally be able to be competitive in areas that were previously Intel-dominated. There will still be some situations, such as video encoding, where the Pentium 4's architecture will continue to dominate. Remember that at the core, the Opteron is still very much a K7; no amount of cache or low latency memory accesses will change applications that simply run faster on Intel's NetBurst architecture.
With that said, we conclude our coverage of AMD's latest processor in what has turned out to be "Opteron Week." There have been talks in the lab about doing a 5th part to the series, we'll see if the moons are in proper alignment for that later. For now, we hope you enjoyed the coverage and be sure to take a look at all of our articles in the series:
- AMD's Opteron Coverage - Part 1: Intro to Opteron/K8 Architecture
- AMD's Opteron Coverage - Part 2: Enterprise Performance
- AMD's Opteron Coverage - Part 3: The First Servers Arrive