[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

HP abandons Itanium2 in workstations -- NUMA/HT v. NUMA/Inifibandv. FSB/AGTL+



NOTE:  You'all be sure to pick up the 2004 November issue of Sys Admin
if you're interested in Opteron (hint, hint ;-).


From: dsavage@peaknet.net
> Last Friday InfoWorld on-line and ZDNet each ran stories about HP halting
> production of their Itanium-based workstations. Their decision is hardly
> surprising, given the low sales of HP zx2000 and zx6000 workstations.
> The real story here is that HP and Intel co-designed the Itanium chip to
> be binary compatible with HP's Palo Alto (PA-)RISC processor and to
> provide an evolutionary pathway for all those legacy HP systems. With HP's
> departure from the Itanium workstation market, that chip's only future
> lies in SMP servers and clustered arrays -- hardly a mass market. And many
> existing PA-RISC customers must now begin planning for end-of-life
> conversions.

Yep.  "Itanic" was a computer science ideal that flopped in silicon. 
Forget IA-32 (x86) compatibility, it bombed at its own instruction set
architecture (ISA) with EPIC and Predication.  Run-time optimizations
like out-of-order execution, register renaming and solid branch
prediction combined with mitigating impact when a branch mispredict
occurs is not optional.  IA-64 is being so modified as of this moment.

Even more ironic was that using former Digital FX!32 binary translation
technology (from Alpha), IA-64 executes IA-32 instructions faster than
using its silicon compatibility.

The only thing "Itanic" offers is the Infiniband interconnect.  But who
is going to pay for it?  Sure, it's better than the current GTL+ used in
IA-32, even IA-32e (aka EM64T) which isn't really 64-bit from the
interconnect standpoint (see below).  But what does it offer over that
"other interconnect" -- ya know, Digital's former Lightspeed Data
Transport, now adopted by virtually _everyone_ outside of Intel as AMD
HyperTransport?

Because Intel put all its eggs in its EPIC-Predication basket, they have
_yet_ to redesign a new, full IA-32 core since 1994 in the Pentium Pro. 
Pentium 4 was an 18-month "extend the pipes forever so it scales beyond
1.5GHz, add more extensions beyond Pentium 3" retrofit.  IA-32e is a
half-baked attempt given the continued reliance on AGTL+ interconnect
with the "front side bottleneck" (FSB).

The only hope for Intel is to now accelerate the convergence of
Infiniband as an Interconnect for IA-32e.  Until then, I consider IA-32e
to _still_ be a 32-bit/4GB platform.

> Craig Barrett's imminent departure as Intel's CEO probably has more to do
> with his approaching retirement age than the Itanium debacle, but look for
> a major shakeup in their product line when a new CEO takes the helm. AMD's
> Athlon processors now power more than half of all PCs sold. Their Opteron
> and Athlon64 x86-64 processors are headed toward dominance in the
> workstation and server markets, despite eleventh hour efforts by Intel to
> compete with Xeon processors with 64-bit extensions grafted on.

Which still have the AGTL+ interconnect as an issue.

> No amount of collusion with Microsoft to delay the release of 64-bit Windows
> will forestall the inevitible.

Actually, Microsoft's "64-bit problem" is the same one they had back in
the NT 3.1 days.  Win32 code is just as "inportable" to Win64, as Win16
was before it.  And although .NET was supposed to solve this problem for
transitioning to Win64, just like Win32"s" (remember that one?) was for
Win16, the end result is that XP 64-bit Edition is nothing more than NT
3.1 all over again.

Yeah, that's WoW -- Windows on Windows, now only Win32 on Win64, just
like NT 3.1 had Win16 on Win32.  All it does is run the AMD in "64-bit
Long Mode," while _all_ major libraries and applications are 32-bit,
doing WoW calls.  If you want to run _true_ 64-bit applications, it is
up to the software vendor to include the _required_ 64-bit libraries.

Microsoft "passing the buck," as usual.

GNU/POSIX, on the otherhand, has been largely 64-bit "clean" (let alone
far less "endian ignorant") for a long time.  As such, accommodating AMD
Long Mode with co-existing the lib[32] and lib64 is almost second
nature.  There are a few issues, but it's why 

Anyone who has seen even the gaming benchmarks of XP 64-bit with UT2004
64-bit v. the same on Linux/x86-64 has to laugh.  The former is slower,
the latter kicks massive ass (up to 40% faster).  And that's just 64-bit
computational performance with no "Pentium deoptimizations."

Once you add in 2 or, better yet, 4-way HyperTransport for I/O, Xeon is
anemic and doesn't scale in comparison.  You have to go Itanic to get
the same as Opteron, and then we're back to the $$$ issue.  Heck, HP's
own Proliant DL585 (Quad-Opteron) _pasted_ its own Proliant DL580
(Quad-Xeon) at _32-bit_ Windows Services (TPC-C, Exchange, SAP DB) even
when using under 4GB of memory -- all for 33% less in price!

Now image that with a _true_ NUMA/x86-64 kernel, more than 4GB memory
and _properly_ designed I/O (NIC on one CPU/HT, storage on another
CPU/HT).

> To survive, Intel must go back to the drawing boards and develop an
> Opteron-compatible CPU that's faster and cheaper than AMD's. And they
> must do it by 2006.

That seems to be their plan.

A _full_ x86-64 processor using Itanic's Infiniband, instead of
HyperTransport, including a full I/O MMU (possibly superior?) and other
goodies.  I think "Yamhill" itself was a two-team project -- one to just
"get out" EM64T with 64-bit extensions, and one to _really_ build a true
64-bit "total platform" competitor.

>From the superscalar standpoint, Opteron/Athlon64 is not that much
different than 32-bit Athlon.  It's the interconnect that's the key. 
Intel has _nothing_ equivalent in Xeon, and Itanic is far too costly.


From: Robert Citek <rwcitek@alum.calberkeley.org>
> What's the difference between the Opteron and the Athlong64?
> Are they not the same thing?

Slightly modified cores for different interconnect capabilities.  All
cores _are_ capable of dual-DDR memory channels, but packaging 

The two original cores were "Sledgehammer" (Opteron 800) and
"Clawhammer" (all others, Opteron 200, 100 and all Athlon64, FX,
etc...).  The former scales to 8+ processors.  The latter is designed
for 1-2 processors.

The most current 1-2P core is "Newcastle."  Although it is still built
on a 0.13um SOI fabrication feature size, the layout and efficiency
(including power) is drastically improved.  The Opteron 150, 250 and
low-power 144, 244 are based on it.

I can't remember the equivalent codename for the upto 8P core, but it
resulted in both the Opteron 850 as well as the low-power Opteron 844.

The only thing that saves Intel is that the capital of AMD can't match
Intel's R&D.  New fabs costs billions of dollars, so AMD has very
limited resources.  AMD's flagship fab continues to be only Dresden, and
they have 3 others of trailing edge fabrication capability in Israel and
the US -- although a new one is coming on-line somewhere in the US I
believe.  Intel has 17+ fabs in Mylaisia, and a half-dozen of those are
well beyond anything AMD has even coming on-line.  It's all about $$$.

AMD previously tried to have Tawanese UMC pick up their slack, but UMC
is definitely only providing trailing edge fabrication.  In personally
dealing with Tawanese companies like TSMC, they'll ship you a completely
useless set of packaged ICs.  At one point, we told them to stop
manfacturing and send us the raw wafers.  Upon inspection along with
software simulations, we discovered not only their lithography process
was flawed for our chip, but their SRAM design we contracted them for
was also non-working.

IBM is AMD's new partner.  IBM is still the leader in new fabrication
techniques, even over Intel.  But IBM's Power ISA initiatives will
question their dedication to AMD in 2005+.


From: Louis Elrod <louis.elrod@bluecurveconsulting.com>
> They are actually fairly similar but yet different. The overall
> architecture is the same but the Opteron is designed to work with
> ECC/Registered memory and SMP configurations where as the Athlon 64 is
> designed to be a next generation desktop processor.

Not exactly.  That's a packaging consideration, not a processor one.  It
has nothing to do with the chips themselves.  So let's reduce it to
"platform."

Socket-940:  1-8 processors, 2 Registered*1* DDR + 1-3 HT*2*/processor
Socket-939:  1 processor, 2 DDR + 1 HT/processor
Socket-754:  1 processor, 1 DDR + 1 HT/processor

NOTES:

*1* A common misconception is that ECC is required for Socket-940. 
_Only_ registered memory is required (although most are typically ECC
too).

*2* AFAIK, the Opteron 100 has 1 HyperTranport channels, Opteron 200 has
2 and the Opteron 800 has 3.  I've seen enthusiast sites report this as
1-3-4, instead of 1-2-3.  I think it might have to do with the
interpretation of 2-8P channels, because I consider 1 internal for
multiprocessing (and only list the external channels that actually
connect to other CPUs or HyperTransport I/O peripheral chips).

BTW, there is no longer anything such as "front side bus" in AMD x86-64.

The DDR channels have their own, dedicated clock from the CPU.  You can
use anything from DDR200 (PC1600) to DDR400 (PC3200) without affecting
any other clock in the system.  BTW, JEDEC recommendations on maximum
banks (DIMMS) of DDR Synchronous DRAM are as follows:

                            Unregistered          Registered
100MHz DDR (200/PC1600)     6 banks (3 DIMMs)     12 banks (6 DIMMs)
133MHz DDR (266/PC2100)     4 banks (2 DIMMs)      8 banks (4 DIMMs)
166MHz DDR (333/PC2700)     4 banks (2 DIMMs)      8 banks (4 DIMMs)
200MHz DDR (400/PC3200)     2 banks (1 DIMM)       4 banks (2 DIMMs)

Recommended signaling on the HyperTransport links are follows:  

  Socket-940:    400MHz DDR @ 32+32-bit = 6.4GBps DTR (x 1-3 _per_ CPU)
  Socket-939:    500MHz DDR @ 32+32-bit = 8.0GBps DTR (x 1 on 1 CPU)
  Socket-754:    400MHz DDR @ 32+32-bit = 6.4GBps DTR (x 1 on 1 CPU)

Now that's the maximum DTR for each HT link.  Only Socket-940 has
multiple, and per multiple CPUs, and only with Opteron 200/800.  You
will typically only get that between CPUs with Opteron 200/800.

When you see 600, 800 or 1,000MHz "Front Side Bus" thrown around for AMD
x86-64 processors, they are _only_ talking about the clock of the
HyperTransport interface.  That's _only_ for I/O or, in the case of
Opteron 200/800, inter-CPU communication.  That's not an "accurate
performance" rating of the interconnect -- let alone takes into account
the fact there are _multiple_ "front side buses" to not only memory, but
_multiple_ HyperTransport channels in Opteron 200/800.  Which brings me
to my next point.

I mean, who cares if you have a 256-bit "wide" Dual or Quad-Xeon "front
side bus" to memory and the northbridge.  Even AGTL+ @ a 800MHz (Quad
200MHz) FSB x 256-bit resulting in 25.6GBps is the _total_ for the
_entire_ platform, not to mention that _all_ CPU, memory and I/O content
for the _same_ "bus" (unless it is a costly, proprietary NUMA design,
which HP and a few others do offer for mega-$$$).

Using Opteron 800, I'm getting 2@3.2GBps+3@6.4GBps - 25.6GBps on _each_
CPU.  Using AMD's Quad-Opteron 800 reference design, which is
implemented in the HP Proliant DL585, that's a _true_ 8@6.4GBps between
CPUs, 4@6.4GBps directly to memory, and 2-4@2.4GBps (see below) to the
AMD8131/8132 PCI-X channels.  You're basically talking over 50GBps of
aggregate, but actual "system" interconnect, and 5-10GBps of aggregate
"peripherial" interconnect to I/O and, optionally, graphics.

AMD's own AMD8151 (AGP3.0) and AMD8131/8131 (PCI-X1.0/2.0)
HyperTransport "tunnel chips" run at only 300MHz DDR @ 16+16-bit HT,
resulting in an aggregate of 2.4GBps DTR.  That's more than enough for
any peripherial.  AMD's AMD8131 (PCI/Legacy PC) HyperTransport "bridge
chip" runs at 150-300MHz DDR @ 8+8-bit HT, resulting in an aggregate of
0.6-1.2GBps.

Most non-AMD HyperTransport tunnel/bridge chips use similar.  They run
at 300-500MHz DDR @ 16+16 or 32+32 HT at the graphics, PCI-X or
PCI-Express "northbridge" (sans memory in x86-64), and then a 150-250MHz
@ 8+8 HT to the Legacy PCI, PC and optional PCI-Express x1 channels in
the "southbridge."  In fact, using 250MHz@8+8-bit (1GBps) HyperTransport
in the north-south link has been around for almost 3 years, even in
chipsets for Intel processors -- e.g., SiS MuTIOL and nVidia MCP.  It
replaces the old 32/64-bit@33/66MHz PCI link (0.125-0.5GBps) used in
Intel PIIX/ICH2 (ICH4+ use PCI-Express x4 for 1GBps), ViA V.Link, etc...

For Opteron 200 systems, there is _nothing_ stopping mainboard
manufacturers from attaching an nVidia AGP/PCI-Express "chipset" to one
Opteron 200's free (for I/O) HyperTransport channel (the other link
being to the other CPU), and then an AMD8131/8132 PCI-X1.0/2.0
HyperTransport "tunnel" to another Opteron 200's free HyperTransport
channel.  In fact, IWill has done this now -- resulting in the "best of
all worlds" in I/O.

Sun Microsystems takes it even farther in their new 1100/2100z
workstations.  They put as much peripheral interconnect for I/O in the
2-way 2100z Opteron 200 as AMD does in HP's 4-way DL585 Opteron 800
(AMD's reference 4-way design).  Sun even has a "segmentable" board,
where you can upgrade the bottom half from AGP/PCI-X to PCI-Express at a
latter date (the top portion still has its own PCI-X/PCI).


From: Steven Pritchard <steve@silug.org>
> Almost.  The short answer is the Opteron is a server (and high-end
> workstation) processor, and the Athlon64 is a consumer processor.  The
> long answer is that the Opterons and Athlon64s use the same core
> (64-bit, but with full 32-bit compatibility[*]).  The differences
> between the Opterons and the various Athlon64s are the socket type
> used (940-pin, 939-pin, or 754-pin), memory compatibility, and cache
> size.  The various Opteron models are numbered based on their SMP

FYI, sorry to be anal, but the term "SMP" really hasn't been accurate
since the Athlon MP.  SMP is limited to Intel GTL (P), GTL+
(PPro-P3/Xeon), AGTL+ (P4-5/Xeon).  It connotates a symmetric, shared
bus with absolutely no processor affinity of any kind.  If you think I'm
being overly anal, just ask HP, IBM and other high-end PC OEMs if they
like their own, proprietary NUMA implementations of Xeon being called
SMP.  ;-ppp

Digital EV6 (Slot-B Alpha 264, Socket-462 Athlon MP) is cross-bar MP.
AMD NUMA/HyperTransport (Socket-940 Opteron) is point-to-point NUMA MP.

Athlon MP emulates SMP.  But there are mainboards with BIOSes when,
combined with Linux/Athlon, are not actually SMP.  That's because the
underlying EV6 architecture is a 40-bit platform (remember, it can
support 64-bit Alpha 264 ;-).  SMP is completely inaccurate as of the
48-bit Opteron platform, even when running a 32-bit OS that is
completely ignorant of processor affinity for anything.  It's still
used.

Although Opteron can "appear" to be SMP, Opteron, even in a 32-bit OS
with no program or I/O processor affinity, it is definitely not SMP. 
I/O affinity, beyond traditional NUMA implementations with processor
affinity for programs, thanx to Linux/x86-64's full and native support
for the x86-64's I/O MMU, is why, technically speaking, Opteron makes
Xeon "its bitch."

Intel market's AGTL+ as a 50-bit platform.  It's a 32-bit platform with
36-bit processor address extensions (PAE) and some 50-bit capabilities. 
But make no mistake, you pass 32-bit/4GB, even with EM64T processors
running Linux/x86-64, and you are definitely _not_ getting
hardware-level addressing.  It's wholly unsafe to do so.  There is no
I/O MMU in EM64T, which is why Linux/x86-64 modified for EM64T does
anything beyond 4GB in _software_.

So, again, consider even EM64T Xeon to be a 32-bit/4GB platform.

> Throw the Socket 754 Semprons in the mix, and I get really confused.  :-)

CHEAT:  Socket-754 Sempron is just a remarked Socket-754 Newcastle with
cache and instructions disabled.  Socket-462 Sempron is just a remarked
Athlon Model 10 (Barton w/256KB L2) period.

Both have models set to P4/P5-_Celeron_, instead of the P4/P5.

> Oh, and some of that table came from here:
>   http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_9485_9487^10248,00.html
> [*] By "64-bit, but with full 32-bit compatibility", what I meant was
>     that these processors all will run in either 32-bit mode (I have a
>     customer running RH9/i386 on some Opterons), "mixed" 32/64-bit
>     mode (64-bit kernel, 32-bit userspace), or full 64-bit mode
>     (64-bit kernel, 64-bit userspace).  For reasons I won't go into
>     right now, running 64-bit all the way is best, but 32-bit runs
>     just fine (and quite quickly, faster than similarly clocked Athlon
>     XPs).

Of course, because of the _removal_ of the "Front Side Bottleneck."


-- 
Bryan J. Smith                                  b.j.smith@ieee.org 
------------------------------------------------------------------ 
"Communities don't have rights. Only individuals in the community
 have rights. ... That idea of community rights is firmly rooted
 in the 'Communist Manifesto.'" -- Michael Badnarik



-
To unsubscribe, send email to majordomo@silug.org with
"unsubscribe silug-discuss" in the body.