Re: Why is SUN falling so far behind IBM?

From: Arthur Corliss (acorliss_at_bifrost.nevaeh-linux.org)
Date: 08/06/04


Date: Fri, 06 Aug 2004 19:20:02 -0000

On 2004-08-05, Benjamin Gawert <bgawert@gmx.de> wrote:
>
> In what regards is MIPS better than x86 or IA64 (I assume You are targeting
> Itanium with "VLIW")?

In that the RISC set isn't just downright strange and awkward. Consider this,
Intel made the following decisions with their VLIW implementation:
You get fourteen bits for opcodes in each 41 bit instruction. That's a hell
of a lot of room for instruction sets, and yet they decided that opcodes
weren't going to be universally unique, instead they're going to mean different
things depending on the target execution unit. And if that isn't bad enough,
instructions are delivered in three op bundles, and *that* bundle has five
bits reserved to help route the embedded ops -- but not directly, but
according to a instruction "template" (which is what the five bits actually
designate). And given that a lot of op combinations across multiple
execution units are not covered by the templates, that makes them illegal
altogether, so you can very well end up with either poorly optimised
instructions grouped in a bundled, or even noops.

To compound these potential shortfalls, you have to rely specifically on
the compiler to group instructions correctly to get maximum simultaeous
execution. The processor has no support for internal optimisations of
those groups.

I'm sorry, but that's a pretty convoluted setup with too much potential for
wasted bandwidth. I'd rather contend with a large number of shallow
pipelines with decent branch-prediction logic.

> Well, MIPS _is_ RISC. And besides that, Your statement is wrong. POWER,
> PA-RISC and Itanium do more OPS/clock cycle than even the R16k.

I should have been more clear: I meant that even compared to *other* RISC
vendors MIPS comes up well. In any event, you're only partly right. The
Power4 core can do (theoretically) 8 IPCs, but that's only eight fetch/issue,
it can only retire five. You also get hit with a 10 cycle misprediction
penalty every time due to deeper pipelines. The R14k (I don't have the stats
on the R16k, so this isn't entirely a fair comparison) only has six stages,
and can retire as many as it can fetch/issue (which I think is five). In
practice, this means that it can consistently hit closer to its theoretical
IPC than the Power4 can.

So, now reality sets in: comparing SPEC CPU 2000 for a 1.7GHz Power4 and
a 600MHz R14k I see that the Power4 scores a little more than twice the R14k
for almost three times the clock rate on CINT2000. CFP2000 shows a bit more
than three times, so the processors are about on par there. Let's see, the
only scores for Itanium from the same time frame as SGI's most recent reported
scores show it coming up short. Scores from 2003 - current for Itanium are
better, but I have no doubt that system architecture improvements for the
Origin are also improved. PA-8700 comes up short on both as well.

So, based on that I doubt the R16k would look worse against these processors.
Futhermore, I could compare the Power4's 680 million transistors and 50+ watts
of power consumption to the R14k's 7 million transistors and 17+ watts draw.
Taken holistically, I'd say MIPS has one of the most efficient architectures
out there.

> And not to forget that this is just a theoretical view. The only thing that
> counts is real world performance. It doesn't count if MIPS does more
> instructions per clock cycle if the absolute system performance is behind of
> all the competitors...

That's a whole other ball of wax, then. There's a damned good reason why
SGI is still in business, just like Cray and their vector processors. In the
real world clock rate isn't as important as memory and disk I/O, and NUMAflex
is a damned good architecture when you need to scale to the thousands of
processors. And they still support twice the number of processors per SSI
nodes than they can on Altix/Linux.

> SGI made a wise decision to look for something different. Something that's
> more up to date. Something SGI can make some money. Something software
> vendors are willing to write software for (which isn't the case for IRIX).
> Looks they are very successful with their Itanium-based Altix...

SGI hasn't abandoned MIPS yet, though they may have to. As good as the tech
is, they're a lot like DEC. Bad marketing can kill the best tecnology.

-- 
	--Arthur Corliss
	  Bolverk's Lair -- http://arthur.corlissfamily.org/
	  Digital Mages -- http://www.digitalmages.com/
	  "Live Free or Die, the Only Way to Live" -- NH State Motto


Relevant Pages

  • Re: YouTube - $98 Linux Laptop from China - The HiVision MiniNote
    ... load/store instructions like many other clones and add some of their ... There's no reason to omit halfword load and store. ... MIPS has many other patents on features they added later, ...
    (comp.arch.embedded)
  • Re: 64-bit architectures & 32-bit instructions
    ... all were extended to upward-compatible 64-bit ISAs, ... In MIPS case: AND, OR, etc. ... One might think one would want 64-bit memory-reference instructions ... with the displacement entirely, but many have found it useful ...
    (comp.arch)
  • Re: Why MIPS for embedded systems
    ... MIPS mean Millions of Instructions Per Second but you must consider ... Some processors have a separate program code and data bus and some don't ... Does Microsoft and all Wintel apps finally use the 64bit data bus width? ... So MIPS are architecture dependent and I assume ...
    (comp.arch.embedded)
  • Re: Need a Atomic function for Mips 4000c
    ... Make sure your processor really implement that instructions. ... NEC omitted ll/sc from its Vr4100 CPU ... lwarx reserve the address,If anybody modified it before it ... I need such a function in Mips. ...
    (comp.sys.mips)
  • Linux propack install - where is the cd?
    ... I have followed the instructions on upgrading an Altix ... Installing the SGI ... Device mapping table ...
    (comp.sys.sgi.admin)