Original Link: https://www.anandtech.com/show/2458



Dancin' with the Devil

Just about every review site has taken a unique approach to motherboard testing at one time or another, especially when it comes to overclocking. Over the last few months, we have taken a slightly different approach with the performance-oriented motherboards by offering additional technical information and a few BIOS guides to help end-users set up the ever more complicated BIOS functions that are available on today's boards. Our primary directive is to ensure everything we achieve in the cozy confines of the labs is fully repeatable with the same off-the-shelf retail components and settings.

The basic test criteria we use when overclocking the Intel enthusiast boards is setting FSB speeds in the 400-465FSB region. We ensure tRD is set as low as possible while keeping component voltages to a minimum level for stable long-term use. This type of overclocking might not sound very sexy - Patrick Swayze would dump us fast for not doing the Argentine tango with high FSB rates. However, our recent forays into dissecting DDR2 and DDR3-based motherboard overclocking have shown that tRD (Read Delay) is the most important tunable BIOS option available to the overclocker when seeking measureable performance improvements when overclocking Core 2 processors.

Those who wish to increase their insight or review just what the Northbridge strap setting and tRD are, how they both work, and why we are enamored with these settings are advised to read this article first and parse through this one for additional background information. Most overclockers are thoroughly aware that increasing processor voltage is required to allow speed scaling, but may fail to realize that raising MCH voltage is necessary when using a higher processor multiplier. The primary reason is to cope with the additional data throughput via the chipset busses as FSB levels rise.

Early implementations of the P965 chipset allowed FSB scaling to reach levels that were unheard of (in conjunction with the Core 2 Duo) by increasing the overall chipset latency at certain FSB points. Quite often, the deficit in chipset latencies at higher FSB speeds was enough to nullify a processor core speed increase of 100Mhz or more - meaning that a higher FSB overclock would trail the performance of a lower CPU speed using a lower FSB and higher CPU multiplier (due to Northbridge strap latency changes). It was not until the P965 was about to be replaced by the P35 that we finally began to see the release of high performance 965 boards - namely the Abit Quad GT and the DFI P965-S "Dark". These boards locked the 1066 Northbridge strap, while allowing Front Side Bus speeds near or above 500FSB that retained linear performance scaling to some degree.



To date, we have noticed that properly tuned Intel P35/X38/X48 chipsets also feature near linear performance scaling as FSB speeds rise. In fact, although NB strap changes are available manually, the straps really do nothing at this point but allow the use of different memory dividers. We are finding that increases in Northbridge voltages between a 6x and 7x multiplier may not be huge if even needed, but when we step over to the 8x multiplier (at equivalent settings) we have noticed a major jump is usually required in VMCH to hold the applied overclock "stable".

This is true even on the latest top-end motherboards featuring the X38/X48 chipsets. Therefore, using an 8X or 9X multiplier to show a high FSB holds more merit to the board and CPU capabilities than using a 6X multiplier to show off a board's high FSB capabilities. If a board can achieve a high stable FSB with a higher multiplier, it makes sense that a lower multiplier will have no problem achieving the same or greater FSB speeds.

For the seasoned overclocker nothing we have said above is anything new or groundbreaking. Our main goal though is to remain realistic for our readers, so that we show what is really possible rather than something that can only be held together with chewing gum and sticky tape for 5 minutes for that impressive SuperPi 1M screenshot.



Doing the Salsa…

One of the biggest obstacles faced by users when they are overclocking is heat. It's a well-known fact that cooler component temperatures usually lead to greater stability. The primary sources of heat in an overclocked system are the CPU, the GPU, the Northbridge, the CPU PWM supply, the memory modules (when overvolted), and finally the Southbridge. All of these areas will require active cooling to some degree to prevent temperatures from reaching levels that either shorten component lifespan or induce instability.

We already know that the X38 chipset runs hot when it's overvolted, as will the 680i/780i and to a slightly lesser degree the X48. Passive cooling solutions are all the rage for those who like a quiet waltz, but when aiming for high-speed overclocks they are a definite "no-no". Motherboard manufacturers are trying hard to dissipate as much of this heat as possible by using increasingly elaborate heatpipe-based cooling solutions, but they still require active airflow to maximize cooling when systems are overclocked.

While overheating due to inadequate cooling can cause system instability, the other major issue when overvolting is the effect it can have on component lifespan. The level of overvoltage that can be "safely" applied to a specific component varies. As Intel and AMD reduce the die sizes of their processors, the percentage of overvoltage required for speed scaling decreases. To a certain degree, there will be a reduction in overclocking ceiling on the entry-level processors, especially with the higher base FSB speeds.



While the overall voltage requirement for stock speeds is lower on a smaller die, the level of tolerance a component has to prolonged overvoltage also decreases. Although the 45nm processors scale very well with voltage, anything past 1.36V is out of warranty on Intel's end. Still, we have seen users (ourselves included) provide these CPUs with in excess of 1.5V under load, and in some cases we have heard of processor failure when subjected to these voltages for extended periods. Regardless, most good examples of an E8400/E8500 will scale to 4GHz near 1.3Vcore, while additional speed over this voltage level requires a non-linear rise in voltage per additional clock cycle from the CPU. We expect users who are lucky enough to stumble upon the best silicon will be able to run up to 4.2GHz while remaining at the upper-end of Intel's warranty voltage levels.

The question becomes: what is the best way to achieve a performance overclock based upon sound technical knowledge, and are we still able to get the most from our systems without exposing our components to harmful voltages or heat levels?

Those of us who are looking for the best overall CPU performance are leaning towards the X38/X48 boards at this time. Of course, we want to see what the chipsets can do, where they operate at their best, and why. Our article today will start the process of answering these questions, hopefully in a way that makes sense. We are going to take one of ASUS' top-level boards - the X38-based Maximus Extreme - and see what makes it tick. We will follow up later with additional results from the likes of Gigabyte, MSI, DFI, abit, and Foxconn in our X48 launch article.

Of course, we must stress that the very fact that there is performance variability between chipsets suggests there can be variances between motherboards (even the same model), so once again it comes down to luck if your MCH is average, good, or exceptional. We hope some of what we cover here today shows why we aim for modest FSB speeds at lower tRD settings in our board guides rather than going for high FSB with loose memory/chipset performance setups that are usually good for nothing other than a few pretty screenshots and glitzy marketing campaigns. Speaking of which, when does "Dancing with the Stars" return to the airwaves?



Two-Stepping with the Test Bed…

ASUS Maximus Extreme
Dual-Core Overclocking / Benchmark Testbed
Processor Intel Core 2 Duo E6850
Intel Core 2 Duo - E8500
CPU Voltage Various
Cooling Swiftech Apogee GTX, Thermochill PA120.3 radiator, dual Laing DDC Ultra pumps in series, 1/2" ID (3/4" OD) Tygon tubing, 3x Panaflo 120x38mm fans @ 7-12v in push configuration for CPU, 1x Panaflo 120x25mm fan for cooling the MCH
Power Supply OCZ Pro Xstream 1000w
Memory OCZ DDR3 PC3-14400 (DDR-1800) Platinum Edition (2G/4GB)
Memory Settings Various
Video Cards MSI 8800GTS-512
Video Drivers NVIDIA 169.28
Hard Drive Western Digital 7200RPM 250GB - WD2500KS
Optical Drives Plextor PX-755A
Case Lian Li 75
BIOS 0803
Operating System Windows XP Professional SP2, Vista 64-Ultimate
.

[Ed:While I'm here, let me apologize for all the ballroom dance references; I strike a pretty impressive pose in a tail suit, but even I wouldn't go so far as to do ballroom and overclocking analogies. ;-)]

Our review of the ASUS Maximus Extreme is located here, while jumping here will provide details about the P5E3 Premium for those needing further details on these boards.

We started our testing using a 65nm E6850 - a processor that is capable of reaching high FSB speeds with relative ease. Once the groundwork for testing was in place, we moved over to Intel's latest and greatest dual-core, the E8500 based on Penryn technology. Although many users are using or considering quad-core processors currently, we have already covered much of their overclocking ability by using them as the basis for our recent performance motherboard reviews. Dual-core processors allow higher FSB overclocking potential, and as we are concentrating on FSB related performance and VMCH scaling today, using these processors is the logical choice.

The dual-core processors are far easier to overclock and provide a lower overall thermal output when overvolted - not to mention that the dual-core processors are far kinder to PWM voltage supply circuits when overclocked. Although we are using a DDR3-based motherboard here, we should begin to see availability of additional DDR2-based X48 boards in the coming weeks. We may venture into similar testing on the DDR2 boards and quad-core processors in the coming weeks if our readers deem this type of article interesting.

We used a Lian Li 75 case for these particular tests. Northbridge cooling comes from a 120mm fan, which also provides additional airflow to the PWM MOSFETs and memory modules. Passive cooling of the X38 chipset results in system instability with as little as 1.29VMCH (9X400FSB) when its temperature reaches 48C during stress testing, pretty much ruling out any kind of serious overclocking without airflow over the components.



E6850 and VMCH/FSB/tRD do the Disco


6x548FSB


7x544FSB


7x548FSB


8x500FSB


9x445FSB


We ran a series of quick stability tests using OCCT. Although OCCT will pass at a speed of 6X548 FSB with as little as 1.41VMCH, stepping up to the 7x multiplier requires a significant increase in MCH voltage. Interestingly enough, backing off a mere 4MHz on the FSB (to 544MHz) brings the MCH voltage down to 1.54V.

Using anything beyond 544 FSB requires an absolute brute force approach to get the board stable enough to generate more than a few screenshots. Lower multiplier locked CPUs will naturally need these higher FSB speeds to acquire high processor overclocks to match CPUs with higher or unlocked multipliers.

This is assuming the processor itself is capable of running high FSB speeds, and finding those is about as tough as doing the foxtrot with a Sumo wrestler. Let's move over to the E8500 and see what happens to the achievable FSB rates. We will also compare performance points between multipliers and FSB speeds.



Judges Scores for the E8500 Quickstep

Crysis


CRYSIS
BENCHMARK - E8500 CPU


CRYSIS
BENCHMARK - E8500 CPU

Unreal Tournament 3


UT3
Benchmark - E8500 CPU


UT3
Benchmark - E8500 CPU

Cinebench 10


CINEBENCH
10

WinRAR 3.70


WinRAR
3.70 Cache & Memory

In the Crysis test, full stability is not achievable at 500FSB using the 8x multiplier, and we barely could finish benchmarks at the 7x multiplier. We tried a number of methods to get beyond this issue including Clock Twister adjustments, 2N memory speeds, and even an increase in CAS Latency and memory sub-timing adjustments. All of our endeavors failed; even with reduced performance levels we still had system instability.

This may be due to the rather coarse GTL adjustments offered by the Maximus Extreme, but more likely the problem is that Crysis is a very unforgiving benchmark at high FSB/Multiplier speeds. Therefore, a 45nm processor with anything less than an 8.5 multiplier is going to struggle in achieving 4GHz stable on this particular board when running this application.

We chose to run Crysis at high detail levels, as we believe owners of an 8800 GTS 512 will want to do the same. It is clear that a mere 100MHZ extra on the processor will do nothing for your frame rates at high detail levels, as the game is obviously GPU limited. We did notice a slight separation of 3~5 FPS between the multiplier settings on medium details.

In our other benchmarks, UT3 and CINEBENCH actually score better with the lower tRD of 6 at 445FSB in comparison to a tRD setting of 7 at 500FSB. However, WinRAR shows a different result with the 8x multiplier delivering better scores. Given that WinRAR depends a lot on memory bandwidth, that result is not too surprising. In additional application testing across several games and video/audio programs, we generally found the 9x multiplier scored better - sometimes in a measurable fashion - while always remaining stable.



E8500 FSB/VMCH Polka Party


The voltages shown in these tables are the minimum we can expect to use to keep each configuration completely stable. Games like Crysis represent one of the toughest system loads we have ever seen from a program. The redline E8500 results at 500 FSB are not completely stable due to Crysis, even though we can pass a myriad of other stress tests.

The ability to run other games and video/audio/office applications generated a mixed bag of results, but these same applications ran perfectly fine at the 9x/9.5x multipliers. However, we would recommend the 9x multiplier for everyday operation as the voltages required (plus increased thermal outputs) to ensure stability at the 9.5x multiplier are a greater penalty than the very slight increases in application performance.


We also checked the overclocking ability of the Maximus Extreme with 4GB of memory. The total load capacitance of four memory modules limits the X38 chipset to a 2N command rate over DDR3-1600 speeds at CAS 7. A slight bump in VMCH is required to hold things together with 4GB of memory in comparison to 2GB.

Now that we have this base data, let's take a quick look at why using a tRD setting of 6 in the 440-470FSB region makes so much sense with dual-core CPUs like the E8400/E8500.



The Calculator Hustle

The first aspect of why we believe higher FSB rates are not always better for system performance can be shown using the following calculation to find the actual tRD time in nanoseconds:



We first dissected this equation in our Asus Rampage Formula article, and it remains true today. To recap, a lower tRD time yields faster read data transfer between the memory and CPU.

If we use the equation above to study the actual tRD values at 500FSB and 445FSB, we will see that using a tRD of 6 actually results in a lower tRD delay cycle. This is exactly why the somewhat sensitive UT3 graphics engine shows a discrepancy in 8x500 performance at a higher tRD of 7 (graphs are on page 4) as one example. We notice the same pattern in any application that is latency sensitive.

In order to equal the tRD cycle time of 6 at 445FSB we can rearrange the equation above to find the equivalent FSB requirement at a tRD of 7 quite easily:

TRD6 @ 445FSB = 13.483ns

13.483ns / Experimental TRD (7) = 1.926ns

Therefore:

[1000/1.926ns] = FSB (519FSB)

Even if the CPU is capable of the additional 152MHz (using an 8x multiplier), the VMCH requirement to achieve a tRD cycle time of 13.48ns at 519FSB would quite simply be off the scale for 24/7 use. At 500FSB and a "Moderate" setting in the ASUS performance enhancing "Clock Twister" BIOS option, we already need 1.65VMCH and can't even begin to run games like Crysis or other applications such as Nero Recode 2. Yet 445FSB and a tRD of 6 still has enough overhead to allow further scaling if the processor is capable. It makes far more sense to make an attempt at 9X461, which is still within reach of our 1.65VMCH limit as shown in the graph below.



Not only is 461FSB possible, but we also manage to reduce the tRD cycle time even further - down to 12.93ns to be exact.



A tRD of 5 is not available on this board; even if it was, it would not be without side effects. The first problem is that 400FSB at a tRD of 5 gives us a tRD cycle time of 12.5ns. This is certainly great, but if we look at the graphs above, it is quite possible that such a low tRD cycle time would leave no remaining overhead in Northbridge voltage for scaling much further than 410FSB (tRD cycle time of 12.19ns). We verified the secondary setting by using the ASUS P5E3 Premium board we have in the labs (which has access to setting a tRD of 5). A tRD of 5 naturally requires the 2:1 memory ratio to achieve a speed of DDR3-1600; at CAS 6 this requires significant VDimm due to the very "tight" CAS latency. Increasing this setting to CAS 7 results in a large performance drop.

Let's take a look at the actual CAS (Column Address Strobe) memory latencies at various memory speeds to see what the optimum primary memory CAS value is for performance, voltage (VDimm), and system scaling potential.



CALWI waltzes off into the sunset

The second factor for determining a system operating point is to find the optimum memory CAS setting known as "CALWI". Further technical insight into "CALWI" performance tuning is available here.

We have already determined that our optimum field of operation is between 440-470FSB at tRD settings of 6 using this motherboard. Now we need to look at how CAS settings and memory speed scaling affects memory CAS (Column Address Strobe) latency. Actual CAS latency is measured in nanoseconds (ns), and it should come as no surprise that lower values mean faster performance.

As always, this performance will come at the expense of both your wallet and increased component memory voltages (depending on how good your memory actually is in most cases). The graph below uses the CALWI equation to plot CAS latency values at various FSB rates and also shows the minimum VDimm required for stability at a given CAS latency:



Since we are aiming for an FSB speed in the region of 450FSB (to maintain a low and stable tRD cycle time), using the 2:1 memory divider ratio gives us a memory speed of 1800MHz - absolutely perfect for CAS 7 at 7.77ns with 2V on the memory. Should we wish to get a little adventurous and shoot for higher FSB rates, we can get closer to a CAS latency of 7.5ns while staying under 2.1V on the memory modules, but this is not recommended for 24/7 operation. Using any other memory divider at 450FSB places the memory speed too low, negating the benefits of DDR3 memory.

This leads us to ask why some memory vendors are releasing memory modules binned at CAS 9 DDR3-1900 as "performance" memory. You can use the equation above to work out that CAS 9 at DDR3-1900 gives us a pitiful 9.47ns - which we would hardly call "performance" memory. As things stand in the chipset business right now, CAS 9 does not make a good dance partner, unless you like a slow Waltz with your partner stepping on your toes.

The ugly aspect of high FSB overclocking rears its head again at 500FSB. If we use the 2:1 memory divider ratio, we end up with a memory speed of 2000MHz. This requires a minimum of CAS 8 (not stable yet at this speed) to operate and requires a Command Rate of 2N, bringing additional performance loss - two kicks in the shin. Using the next best divider brings us to a memory speed of 1600MHz. If we care about overall performance, we do not want to be stuck with a tRD of 7 and a memory setting of CAS 6, with little potential for additional scaling.

CAS 8 is quite ugly too. In order to achieve anywhere near the latency of CAS 7 at 1800MHz (7.77ns), we need a memory speed in the region of 2060MHz at CAS 8:

8 X 2000 / 2060 = 7.76ns

Based upon how the X38 (and X48 in early testing) chipset scales and current DDR3 memory limits, running a speed of 2060MHz is impossible at 2.0V VDimm and a safe MCH voltage. Now you can see why our chosen 440-470FSB operating point with a dual-core CPU makes perfect sense on the ASUS Maximus Extreme and other X38/X48 boards in general.

Quick Thoughts

As you can tell from the tone of this article, we care quite a bit about overall system performance, not to mention stability. Even if the gains are small, why not do things properly? What this little journey has proven to us is that the X38 (and X48 is similar) is clearly designed for optimum use at around 460FSB with higher multiplier processors.

For those who aspire to find the holy grail of 4GHz using a dual-core 45nm processor, we think the E8400 and a 9X multiplier is the sweet spot based on current test results. We will go into additional detail on the E8500 in an upcoming article that will feature "RDWI" - no, that is not a new driving while intoxicated offense, but detailed information about optimum tRD scaling windows.

In the end, we believe that a balanced and optimized platform is much more important than one that shows off high FSB speeds at the expense of performance, thermals, and stability. After all, would you rather have a balanced dance partner that can perform a variety of dances from ballet to hip-hop, or do you want someone that only looks good tap dancing after a few Red Bulls but quickly burns out?

Log in

Don't have an account? Sign up now