[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

IBM PowerPC970 changes everything (and FSB means _nothing_ outsideof Intel now) -- WAS: I hate Linux



On Sat, 2004-01-31 at 16:39, Mike Connor wrote:
> I noticed the Super Computer status as well-- those supposedly have a 1GHz 
> Front Side Bus compared to a max of 533MHz from Intel and probably the 
> fastest FPU out there-- they should crunch and move the result very well.  
> But often do you run a mostly floating point application? (games will hand 
> that off to the video card)  I'll go read about the results before I say 
> any more though.

First off, FSB means _nothing_ when you move away from traditional Intel
design.  Intel continues to use interconnnect _everything_ at the
"northbridge" with a "synchronized" clock.  If one node needs to run
slower, _everything_ runs slower.

SIDE NOTE:  Intel _does_ use a 200MHz quad-pumped = 800MHz effective FSB
on some single-processor Socket-478 P4 systems, but it does _not_ on its
multi-processor Socket-603/604 Xeon processors which still use only a
133MHz quad-pumped = 533MHz effective FSB.

AMD Socket-754 (Athlon64) and Socket-939/940 (AthlonFX/Opteron) as well
as IBM PowerPC970/Apple G5 use _HyperTransport_ and, optionally,
Non-Uniform Memory Architecture (NUMA).  That means all inter-chip and
I/O communication are over the HyperTransport bus.  For NUMA, all memory
is _local_ to the CPU (or GPU, if directly attached to the
HyperTransport interconnect).  Non-NUMA interconnect the memory to the
HyperTransport bus.

HyperTransport is a variable clock (upto 800MHz DDR = 1.6GHz effective),
varible width (1-16 bit), _bi-directional_ (meaning bits in
_each_direction_, so 2x wdith).  At 800MHz DRR (1.6GHz effective),
16-bit bi-directional (32-bit effective width), HyperTransport provides
upto 6.4GBps of Data Transfer Rate (DTR).

In the case of Socket-754 and Socket-939/940 (NOTE:  Socket-940 supports
Socket-939 processors), they use NUMA, meaning the memory and
HyperTransport interconnects are _separate_.  They cool thing about this
means that the CPU and HyperTransport do not "slow down" if slower
memory is used.  The HyperTransport clock does not have to be timed with
the CPU.

In the case of Apple G5, they _are_ timing their HyperTransport with the
CPU.  I believe it is as follows:  
  1.6GHz CPU = 400MHz DDR (800MHz effective) HyperTransport = 3.2GBps
  1.8GHz CPU = 450MHz DDR (900MHz effective) HyperTransport = 3.6GBps
  2.0GHz CPU = 500MHz DDR (1GHz effective) HyperTransport = 4.0GBps

Apple markets FSB on the G5 not because it is necessarily important, but
because the "clock consumer" (consumers who are all about GHz and don't
understand system design) want to hear it.  I also think Apple is not
choosing NUMA for the G5, and putting the memory controller in its CPU
to HyperTransport to I/O tunnel.  This is probably why the
HyperTransport is timed sync with the CPU, which is _not_ normal with
HyperTransport.

In a HyperTransport system, it is supposed to be _fixed_ and
_independent_ of the CPU and memory clocks.

On PCs, it varies, because the memory is direct to each CPU, so both
have their dedicated HyperTransport clocks.  No matter what speed
Athlon64/FX/Opteron you drop in, the HyperTransport clock stays at its
_fixed_ speed.

For the Socket-754 (Athlon64), I believe the nVidia nForce3 is using a
600MHz (1.2GHz effective) HyperTransport for 4.8GBps, and ViA's latest
K8T800 (?) is now using a 800MHz (1.6GHz effective) HyperTransport for
6.4GBps.  The AMD8111 used on both Socket-754 (Athlon64) and
Socket-939/940 (AthlonFX/Opteron) uses a 800MHz (1.6GHz effective)
HyperTransport for 6.4GBps.  Newer Socket-939 chipsets are coming out,
and various nVidia/ViA/SiS chipsets are being made to support both
Socket-754 and Socket-939 (but not multi-processor capable Socket-940?)
processors.

Socket-754 (Athlon64) has only *1* HyperTransport interconnect and *1*
DDR Memory channel, variable between PC1600-3200 memory (1.6-3.2GBps
DTR) -- again, _independent_ of the CPU and HyperTransport clocks. 
I.e., Memory on a Socket-754 mainboard always runs at the _most_optimal_
latency and DTR performance, period.

Socket-939/940 (AthlonFX/Opteron) provides upto *3* HyperTransport
interconnects and *2* DDR Memory channels per CPU.  AthlonFX only
supports a single CPU, for a single 6.4GBps I/O interconnect, so it only
takes advantage of the _true_ dual-DDR memory channel (not that "fake"
dual-DDR "interleave" on Athlon32) for upto 6.4GBps of memory DTR.

The number of Opteron HyperTransport channels depends on the
HyperTransport model -- 1xx = 1 (single processor, so only a single
6.4GBps HT), 8xx = 3 (upto 8 processors, 3 HyperTransports for
connection up to 3 other "nodes" be it CPUs, GPUs or HyperTransport
tunnels/bridges).  Not sure about the 2xx (dual-processor), it may be 2
(one to connect to the other CPU, one for I/O), but I have not
confirmed, and it may only be 1 (the typical AMD8111 HyperTransport
"tunnel" IC used on these dual-Opteron mainboards could be on the HT
connection between the two CPUs).

I/O gets interesting with the Opteron, especially for 2+ way.  The
cheapest mainboards just use the AMD8111 (I/O-AGP) and AMD8151
(PCI/LPC).  The more expensive ones use the AMD8131 (PCI-X), adding far
more PCI/PCI-X I/O capability -- far more than what Intel (or even
ServerWorks) offers on anything without going Itanium or with a specifc,
"high-end" OEM 4+ way Xeon.


-- 
Bryan J. Smith, E.I. -- Engineer, Technologist, School Teacher
b.j.smith@ieee.org



-
To unsubscribe, send email to majordomo@silug.org with
"unsubscribe silug-discuss" in the body.