|
DDR vs. Dual Channel
RDRAM: by Bert
McComas, InQuest Market Research |
|
|||||||||
|
|
Introduction Since
its introduction in September 1999, Direct Rambus has been cursed with
high prices, low availability, technical troubles and questionable performance.
As Intel’s
primary RDRAM platform, the 820 chip set has similarly fallen under
widespread criticism in the media and on the web. In the channel, system
and board makers are reporting an inventory buildup of these hard to
sell 820 platforms. With the 820 chip set on the ropes, PC133 enters
the mainstream almost entirely uncontested. Even Intel’s contrived resistance
to PC133 is scheduled to disappear shortly with the introduction of
Solano (815). The Platforms In November and
December of 1999, various independent sources reported very favorable
hands-on benchmark results for DDR using first silicon of Micron’s Samurai
DDR chip. After further optimization, Micron is validating production
worthy silicon. We had the good fortune of being able to spend a week
with this platform in our lab. Linpack MFLOPS As one of the oldest
and most trusted benchmarks in existence, the Linpack MFLOPs benchmark
evaluates and memory limited double precision floating point performance.
By varying the size of the data matrix, the performance impact of the
L1, L2 and DRAM can be observed. Our charts exclude results that are
dependent entirely on L1 and L2 performance, focusing instead on DRAM
limited performance with dataset sizes ranging from 512KBytes to 1.5MBytes.
Stream We ran Stream under DOS, Win98 and WinNT4. As with Linpack, results were recorded only after a clean boot to the OS. Under DOS, DDR delvers a DRAM performance advantage of nearly 20% on average
Under Win98se, the performance delta increases to nearly 30% favoring DDR.
Under NT4 the performance delta shrinks to less than 4%, this time favoring the 840 chip set.
The NT4 results introduce a performance aberration (compared to the other versions of Stream) that I am not fully able to explain. After a bit more testing we confirmed that when the Micron platform was configured with four 128M registered DIMMs, its Wstream-NT performance increased, essentially eliminating the performance delta between the two systems. This indicates that Wstream under NT benefits from wide interleaving (4 way), and may not be as sensitive to latency as other applications and benchmarks. It is an interesting case. We should be careful to observe how other tests turn out when comparing Win98 vs. WinNT. WinTune Memory Bandwidth Test Version 4 of WinTune was used to evaluate many aspects of DRAM performance under Win98 and NT. In this case there was no real difference in the results between the two operating systems, though under NT the results fluctuated less run to run compared to Win98.
The results show
a remarkable advantage for DDR, particularly for Write and Copy activity.
Reads showed essentially no difference, though the 840 did actually
exceed DDR in 2Mbyte reads by 0.5%.
The overall memory
score quoted in the table above is produced by the benchmark. This score
is 14.3%, slightly lower than the averages generated for this table.
WinTune generates its overall bandwidth score as an average of all other
bandwidth scores measured in MB/s (including a processor centric 4Kbyte
number that was excluded from this analysis). SysMark 2000 The most comprehensive
and reliable business application benchmark in the industry is Sysmark
2000. It loads and runs a dozen leading applications for basic business
productivity and for advanced content creation.
The 840 outperforms
DDR in only two applications - Premier and Microsoft Media Encoder.
Interestingly, these two applications are very closely related. Both
perform batch oriented video file compression. These two applications
fall into the category of professional or semi-professional content
creation applications, along with Bryce and Elastic Reality. They are
generally not intended for the casual user, a home user or business
PC. These software tools are in use by a relatively small user base
of professional computer graphics artists – as compared to the vast
number of users that rely on the other business and personal productivity
applications tested in SysMark 2000.
ZD CPUmark99 As a key element of Intel’s ICOMP index, CPUmark has proven to be a reputable test to evaluate the processor’s integer performance and cached memory performancem, independent of graphics or hard disk.
The 1.2% performance delta is quite respectable considering the small amount of DRAM activity produced by this benchmark. Unlike previous high end processors, with Coppermine’s reduced 256KB cache size, DRAM performance differences can be identified using this benchmark. 3D Game Performance – Expendable Using
the popular game demo Expendable, we tested at two different screen
resolutions. It turns out that at either resolution, this game is still
primarily CPU limited in its performance. As resolution increases from
640x480x16 to 1024x768x32, the overall accelerator fill rate demand
increases by more than 5X. The GeForce DDR fill rate capacity is so
high that there is only a very small frame rate delta between these
two resolutions.
3D Game Performance - Quake3 Arena Quake is clearly
the most enduring game and perhaps the most credible game benchmark
in the industry. The platforms were configured under Win98 with identical
drivers using the nVidia Quadro accelerator with all performance features
enabled - including AGP fastwrites. With all chip set register and Windows
registry settings left in their default modes, the DDR platform was
physically verified by Micron to properly run AGP fastwrite cycles.
As such Micron’s DDR platform is the only non-Intel platform that I
know of to report fastwrite compatibility.
As one would expect, DDR’s advantage is at its highest at lower resolutions (which are less fill rate limited) and also under Demo2 rather than Demo1 (because of the difference in CPU load described above). Overall, Micron’s chip set delivers a 6.6% – 8.6% advantage over the 840 in this test. As resolution increases, one would expect this delta to shrink as performance becomes almost entirely accelerator limited. 3D WinBench 2000 With the platforms identically configured as above (Quake3), but screen resolutions at 1024x768x16, ZD’s 3D WinBench 2000 also demonstrates a significant performance delta - again favoring DDR. In the floating point centric processor tests the performance delta is small, but in the accelerated game script tests DDR pulls ahead by over 4%.
MCAD Workstation Performance - Viewperf Under NT4, we tested Viewperf using the Diamond FireGL1 accelerator at 1024x768x32. We took the best score of three runs, but saw very little variation from run to run. In this case, the 840 came out ahead by an average of 0.8%.
As seen in the table below, in the DX-05 test, the 840’s lead peaks at 2.7%, but in all other tests the performance differences are 1% or less.
ZD Serverbench Performance ZD Serverbench measures
sustained server throughput based on a varying number of simultaneously
active client PCs. Using 100mbit Ethernet, up to 20 client sessions
placed long-term continuous demand on the server. Both platforms demonstrated
nearly equivalent performance, though some very small differential was
observed.
DDR wins over the
840 by a small margin when the number of clients is low, sustaining
an equal or better position up to 12 clients. At 16 and 20 clients,
the 840 displays an advantage. If these results prove reproducible,
it might be attributed to the 840’s PCI implementation rather than to
it DRAM performance charactistics. Summary In
the overwhelming majority of cases, DDR exceeds the performance of dual
channel RDRAM, at times by a very substantial margin. There are several
cases where there is very little difference, and finally a few where
the 840 pulls ahead by a small margin. Bert McComas can be reached by e-mailing to mccomas@inqst.com or by telephone at 480-813-7785. |
|||||||||