Re: Multicore Is Bad News For Supercomputers



"Main, Kerry" <Kerry.Main@xxxxxx> wrote in message news:9D02E14BC0A2AE43A5D16A4CD8EC5A593EDB45EC3F@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
-----Original Message-----
From: Keith Parris [mailto:keithparris_nospam@xxxxxxxxx]
Sent: December 2, 2008 5:23 PM
To: Info-VAX@xxxxxxxxxxxx
Subject: Re: Multicore Is Bad News For Supercomputers

Main, Kerry wrote:
> Its not only multi-cores that is the issue, but rather new buses like
Intel's
> new QuickPath (formerly called CSI).
>
> The new X86/Itanium bus architecture is NUMA based and that is a new
paradigm
> that for very high performance requires accessing local memory much
more than
> remote memory. Hence, the OS and App's need to be aware and be able
to maximize
> perf with this architecture.

It's NUMA, but not NUMA like you remember from the Wildire (GS-320,
GS-160, GS-80) series, which had such severe performance problems
because of the 3X difference in latency between local (QBB) and remote
(across-QBB) memory accesses.

Intel's QuickPath is NUMA of the EV7 flavor, which had remote access
times only 1.something times as slow as local memory access, and where
the slowest (farthest) memory access was still faster than the fastest
memory access in a QBB-based design.

G'day Keith ..

Re: new NUMA .. yeah, I know it is much different than the original
NUMA designs from the early wildfire days, but if the OS and App are
not NUMA aware, then for Supercomputer performance, it may make a
difference for those who want to take advantage of the every cycle.

Since memory/cache is local to each CPU, local cache references will
be faster than going over the interconnect (granted, it is faster
than the older buses) to a remote CPU, not finding it in cache and
then going to main memory (or disk).

If the App and OS are not NUMA aware (e.g. scheduling of processes
and where they run), then all sorts of cache thrashing could occur.
This will be a concern for supercomputing, but potentially also for
other high perf app environments as well.

Is this not correct?


Regards

Kerry Main
Senior Consultant
HP Services Canada
Voice: 613-254-8911
Fax: 613-591-4477
kerryDOTmainAThpDOTcom
(remove the DOT's and AT)

OpenVMS - the secure, multi-site OS that just works.

From the article, NUMA won't work for the types of problems their looking
at. One of the statements in the article was that memory references could be from several processors away, which definitely rules out NUMA.

Mike.










.



Relevant Pages

  • RE: Multicore Is Bad News For Supercomputers
    ... Multicore Is Bad News For Supercomputers ... The new X86/Itanium bus architecture is NUMA based and that is a new ... the slowest memory access was still faster than the fastest ... Kerry Main ...
    (comp.os.vms)
  • Migrate pages from a ccNUMA node to another
    ... We are left with Non Uniform Memory Architectures. ... You can make use of the forthcoming NUMA APIs to set up your NUMA environment: ... (e.g. it is a reference benchmark) ... Page migration tries to help you out in these situations. ...
    (Linux-Kernel)
  • Re: NUMA API - wish list
    ... > resources from some NUMA domains to others, ... all available memory bandwidth and the best average memory latency. ... require to go all the way to a full workload manager ... But NUMA knowledge is purely for optimization. ...
    (Linux-Kernel)
  • Re: [OOPS] 2.6.9-rc4, dual Opteron, NUMA, 8GB
    ... >> of there only being one NUMA zone. ... > It also works with NUMA and 8GB if I balance the memory between the two ... Linux has no such limitations in the NUMA code, ...
    (Linux-Kernel)
  • Re: [rfc] balance-on-fork NUMA placement
    ... course will leave the memory behind...), but it does give a bit more ... We certainly used to copy all page tables on fork. ... the NUMA scheduler, and every time we decide it's a bad idea. ... cycle slower, not faster, so exec is generally a much better time to do ...
    (Linux-Kernel)