With
the long anticipated release of Intel's new 820 chip set, the world
has its first opportunity to evaluate the system level performance impact
of DRDRAM. Until now, publicly available performance reports on DRDRAM
have compared Rambus to PC100 or PC133 SDRAM.
According
to these reviews, the benchmarkable performance difference between Rambus
and SDRAM is usually between 0 to 2%. When differences attributable
to DRAM performance are discovered, the advantage seems to flip flop
unpredictably between SDRAM and Rambus. Amid this confusion and lack
of clarity, the majority of users will tend to gravitate toward the
low cost solution.
Performance
Markets
Yet,
there are key market segments that will seek and pay for premium performance
in DRAM (and other aspects of platform performance). For example, the
overwhelming majority of server vendors have chosen DDR SDRAM to enable
the next level of performance. Similarly, 3D graphics chip makers have
selected DDR for their upcoming high end products. The force behind
these decisions has been a combination of cost, performance, capacity
and infrastructure continuity.
However, there are two remaining performance driven market segments
for which a clear transition direction has not yet emerged. To be specific,
workstation users and PC enthusiasts have not yet embarked on a clear
migration toward PC133, DDR or Rambus. Users in these market segments
have demonstrated a reluctant willingness to sacrifice infrastructure
continuity and cost in the name of performance. Currently, they are
seeking the performance advantage that would move them in one direction
or another.
DDR SDRAM
The next generation SDRAM standard, DDR, uses a double data rate clocking
technique to push its peak burst bandwidth to 2.1 GB/s, as compared
to 1 GB/s for PC133 and 1.6 GB/s for Rambus. To the extent that some
assume DRAM bandwidth to be a possible future inhibitor to system performance,
DDR should be able to fill the bill. But most will agree that future
performance potential, or "headroom", is not nearly as important
as a solid and consistent performance advantage that can be demonstrated
on today's processors, platforms and applications.
To clarify the point, "headroom" is marketing gobbledygook
for DRAM bandwidth that significantly exceeds the practical demand of
current processors. This is not necessarily synonymous with an immediate
performance advantage. When it comes to DRAM, immediate performance
advantages are usually derived through optimized DRAM access latency.
With the anticipation
that DDR SDRAM might potentially offer both (increased headroom and
an immediate performance advantage), the market has been keenly awaiting
DDR platforms for a thorough performance evaluation.
Micron Technology
For weeks, it has been rumored and reported that Micron Technology has
been developing a high performance DDR based north bridge chip for use
with Intel's PIII class processors. Such a product would seem well suited
to a broad range of platforms and market segments including workstations,
small workgroup servers, application servers and the high-end enthusiast
PC market.
Micron's commitment to this development effort seems fueled by its interest
to enable DDR SDRAM in the market. As such, this north bridge IC will
probably find its way to market through a number of different yet complementary
avenues. This will begin to unfold before the end of 1999.
Platform
Configurations
Micron supplied InQuest with an early development board based on first
silicon of the north bridge IC and first silicon Micron DDR SDRAMs.
Though the silicon is still very fresh, the system was rock solid at
266MHz with three of the four memory slots occupied with buffered 64BM
DIMMs.
Based on industry specifications, it should be possible to manufacture
unbuffered systems with two or three DIMM slots. Buffered DIMMs will
be required for 266MHz operation using 4 DIMMs. Hitachi is currently
manufacturing 256Mbit DDR SDRAMs that will enable single DIMM capacities
of 512Mbytes and stacked DIMMs of 1Gbyte each.
Intel supplied
InQuest with a pre-production 3 RIMM version of its Vancouver motherboard
and PIII-733 processor. Samsung offered two double sided 256Mbyte 800MHz
RIMMs, each containing 16 128Mbit RDRAMs. The 820+DRDRAM platform has
nearly reached production worthiness, and we expect that 820 based systems
should become generally available in Q1 of 2000, more than 14 months
and perhaps six revisions after its first silicon.
CPU-DRAM Performance
To evaluate bandwidth headroom to the processor, we chose the well-known
StreamD benchmark released by the University of Virginia. It is a popular
cross platform benchmark that evaluates effective bandwidth from DRAM
to the processor. StreamD is the most reproducible and precise benchmark
of its kind. Its margin of error is regularly under 1%.
The results here
are remarkably decisive. DDR beats the 820+Rambus by a significant margin
in all of the tests, exceeding 30% in some cases and averaging to a
24.4% performance advantage for this benchmark.
It is interesting
to note DDR's significant performance advantage of 20% and 34% in the
Copy32 and Copy64 functions. It is popularly believed that one of weaknesses
of SDRAM (including DDR) is its longer bus turn around latency. As such,
it is somewhat surprising that Rambus does not produce a more competitive
score on these tests. Also, when comparing platform bandwidth figures
for Copy32 vs. Copy64, one would expect that Copy64 data rates should
be equal to or higher than the Copy32 figure. This is true for DDR,
but in the 820 platform, Copy64 is actually 10% slower than Copy32 performance.
Recently, a Windows version of the Stream benchmark has been developed,
known as WSTREAM.EXE. The precision and consistency of this test is
not nearly as high as StreamD. It regularly suffers from a compound
precision error rate of up to 30%. In addition, this program is documented
by its developer as being inaccurate under Windows NT Server 4.0. Though
I have less confidence in these numbers, I include them here for completeness.
Scores were recorded
for 10 benchmark runs on each platform with 200 interations. The benchmark
was launched several times after a clean Win98 boot, then again after
loading and unloading numerous large applications. The widely varying
results from all of these tests were averaged to generate the figures
in the chart above.
In this benchmark DDR exceeds the performance of the 820 platform by
an average of 2.7%. The only test where DDR falls behind (by 0.5%) is
in the TRIAD test. It is probably no co-incidence that Intel refers
only to the TRIAD test when making reference to this benchmark in its
presentation material at IDF and in its white papers. (Also, for an
unknown reason Intel refers to the WSTREAM.EXE benchmark as StreamNT.)
Even though the platform performance differences seem muted in this
version of the benchmark, it is still clear that DDR pulls ahead, showing
particular strength again in the copy function.
Intel
Platform Tests
Intel's Platform Test program includes two bandwidth evaluation programs.
The first is a concurrent CPU/AGP/DRAM bandwidth test, and the second
is an AGP bandwidth stress test.
The first test has proven to be unreliable in the past, but Intel has
released an updated version (v1.2). Versions 1.0 and 1.1 report results
that are clearly in error. Though this is not an exhaustive evaluation,
it seems from the results gathered here that the test may no longer
report results that are mathematically impossible.
The Platform Bandwidth test reports results that are ostensibly the
same for the DDR platform and the 512Mbyte Camino configuration. It
is interesting to observe, however, that the Camino score improves by
a very reproducible 2% after reducing its configuration to 256Mbytes
of RDRAM. More on this later.
At the Fall '99
IDF, Intel offered its effective bandwidth analysis for DDR and Rambus
as shown on the left portion of the chart above. If we use Intel's Platform
Test results as an indication, Intel's estimates may be in need of serious
revision. Micron's PC266 DDR outperformed Intel's estimates for 200MHz
DDR by a whopping 58%, while the 820 under-performed by 15%. For clarification,
on the right half of the diagram above, InQuest offers an enhanced version
of Intel's chart reflecting the actual test results reported in this
document using Intel's benchmarks.
AGP Bandwidth Analysis
The second part of Intel's Platform Test program evaluates AGP to DRAM
bandwidth by saturating the AGP bus with Execute Mode texturing activity.
Initially, this test seemed uninteresting because it demonstrated no
significant performance differentiation between the various platforms
available at the time of its original release. Essentially, all AGP2x
systems scored about 30fps, while all Agp4X systems scored about 40fps.
Micron's DDR platform adds a bit of spice to the situation. This benchmark
reveals some very interesting performance differences between DDR and
Rambus, with an interesting twist in performance based on RDRAM capacity.

Here it can be seen
that the DDR platform significantly outperforms Camino+Rambus. For a
long time, it has been widely presumed that Intel's implementation of
AGP and its data path to DRAM is better than any other chip set or architecture
available. In this case DDR outperforms the 820 with 256MB of RDRAM
by 13.1%, and outperforms the 512MB configuration by 19.8%.
A side note
- here again is evidence that Camino performance diminishes when configured
with 512Mbytes, as compared to 256Mbytes of RDRAM. In this case the
loss is 5.6%.
Next, in order to ensure that the huge DDR advantage demonstrated in
the test above was not a fluke, I also ran tests using Intel's IBASES
v1.5. This program evaluates execute mode frame rates as it copies multiple
textures per frame via AGP to the display buffer. Ranging from one texture
per frame all the way up to 256 textures per frame, the DDR platform
delivers a solid performance advantage at every step as AGP texturing
demand is increased. The median improvement that DDR offers over RDRAM
is 11.6%.
This
solidly substantiates the results observed using the AGP portion of Intel's
Platform Tests. It can be stated with certainty that Micron's DDR platform
offers significantly better AGP to DRAM bandwidth than Camino with Rambus.
Other Observations: DDR vs. Rambus
An application benchmark analysis of DDR vs. RDRAM will be forthcoming.
A quick pass on both platforms with several function specific benchmarks
such as CPUmark99, 3Dmark Max, Intel Media Benchmark and games such as
Expendable have revealed a very small but consistently measurable performance
advantage for DDR over RDRAM. This work will be conducted more exhaustively
in the near future. Considering
the early state of this DDR development and validation platform, there
are several pending optimizations that could further expand its performance
lead. Foremost on the list is the use of unbuffered DIMMs rather than
the buffered DIMMs use in these tests. This will have the effect of removing
one clock of latency from the DRAM subsystem. It is reasonable to expect
many benchmarks to show an immediate benefit from such a change.
Also, the use of larger DDR SDRAM configurations will have a positive
impact on system performance for application benchmarks (as compared to
the 128M and 192M configurations used in this exercise).
In the same vein, it has been interesting to observe that enlarging RDRAM
configurations on the 820 platform seems to have the opposite effect.
This is very likely due to the limitation in the 820 that prevents it
from maintaining more than 16 memory chips in the 'on' state. When more
than 16 RDRAMs are used in a system, a performance penalty can arise due
to power management as demonstrated in these benchmarks. This may be a
difficult problem to circumvent in the performance sensitive markets.
Summary
Analysis
The 10-30% performance advantage for DDR over Rambus demonstrated in the
benchmark scores above are truly astounding. This performance advantage
will immediately be appreciated in server platforms, in high-end graphics
and in other high performance systems.
Even the low
end PC may stand to benefit very soon from DDR technology as UMA (shared
memory) platforms appear in the year 2000. Indeed, DDR may be the catalyst
that enables UMA to grow from the bottom of the market to the mainstream
beginning in the second half of 2000.
Regardless of the application or target platform, the single key element
in the DRAM performance equation is to deliver superior memory performance
without adding significantly to system cost. DDR may be the technology
to deliver this combination to the mainstream.
Bert McComas
can be reached by e-mailing to mccomas@inqst.com
or by telephone at 480-813-7785.