Re: Bad performance on alpha? (make buildworld)

From: Peter Jeremy (peter.jeremy_at_alcatel.com.au)
Date: 02/25/04

  • Next message: Petri Helenius: "Re: Bad performance on alpha? (make buildworld)"
    Date: Wed, 25 Feb 2004 13:59:53 +1100
    To: Charles Swiger <cswiger@mac.com>
    
    

    On 2004-Feb-24 20:17:07 -0500, Charles Swiger <cswiger@mac.com> wrote:
    >On Feb 24, 2004, at 3:26 PM, Nikos Ntarmos wrote:
    >>IIRC the 600MHz EV56's performance wrt integer operations (such as
    >>compiling) is somewhere in the vicinity of a 400MHz P-II, so the
    >>difference you see in turn-around times when buildworld'ing isn't
    >>quite that big. If the operations were identical, you should see
    >>better times when building on the alpha. However, also take into
    >>account that compiling (and optimizing) for a RISC CPU, apart from
    >>generating larger binaries, is AFAIK supposedly more difficult than
    >>compiling (and optimizing) for a CISC CPU.
    >
    >I'm afraid you've got this backwards. :-)

    Maybe in theory, but not necessarily in practice.

    >The primary attributes of RISC architectures, namely lots of registers,
    >a relatively simple but orthagonal instruction set, and a relatively
    >fast clock rate / CPI ~= 1.0 / a short pipeline make it far easier for
    >the compiler to generate and optimize code.

    Alpha pipelines are only short in a relative sense - the EV5 pipeline
    is 7 (integer) or 9 (FP) stages and I suspect the EV56 pipeline is the
    same. In theory, it is 4-way superscalar but the different execution
    units aren't equivalent and the compiler has to understand which
    instructions will be allocated to which execution units in order to
    minimise stalls.

    >CISC architectures make the compilers job much harder because they tend
    >to require lots of register spills, they tend to have very long
    >pipelines which involve hazards and require a lot of instruction
    >reordering to avoid stalling the pipeline to often. The amount of CPU
    >clocks it takes per instruction (CPI) often varies on CISC as is
    >generally much larger than ~1.0, and sometimes varies from CPU model to
    >CPU model making it far more difficult to determine the "fastest"
    >instruction sequence.

    Recent iA32 implementations (basically anything more recent than a
    PII) are RISC cores which directly execute a subset of the iA32
    instruction set with the remainder handled by microcode. You get
    quite respectable results by treating it as a load/store RISC
    architecture and relying on the L1 cache to handle the register spills
    in a timely fashion. The pipelines and super-scalar execution
    abilities are all handled in hardware. Register scoreboarding allows
    the implementation to have more physical registers than the programmer
    view supports - allowing multiple instructions to simultaneously see
    different values in the same visible register.

    The compiler has to expend a lot of effort on instruction scheduling
    to get decent performance out of a typical RISC architecture. Much of
    this is automatically handled by the hardware on an iA32 and you can
    get equivalent results with a much simpler compiler.

    Peter
    _______________________________________________
    freebsd-performance@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-performance
    To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org"


  • Next message: Petri Helenius: "Re: Bad performance on alpha? (make buildworld)"

    Relevant Pages