Re: race on multi-processor solaris

From: Casper H.S. Dik (Casper.Dik_at_Sun.COM)
Date: 11/26/03


Date: 26 Nov 2003 09:15:36 GMT

angoyal@siebel.com (Anuj Goyal) writes:

>Here are some thoughts on :

>1. I could implement a mutex situation and just let the OS take care
>of the queuing nastines. Let's say I am OS providing the mutex
>capability... if 1000 threads are trying to get access one little int,
>the OS will have to put 1000 threads on a queue right? This involves
>little hardware contention, but lots of software contention (ie lots
>of cycles burned implementing the queue stuff)

I'm not sure I'd call that "lots of cycles"; and a sleeping
thread doesn't use CPU; so CPU cycles are available to do
other things. With NUMA architectures the "lockless" algorithms
are undoubtedly even more interesing; if one CPU owns the cacheline
will it really be fair to CPUs which are 100s of nanoseconds away?

>2. I could implement the atomic_increment and cause hardware
>contention - which is also bad, but for me is a bit "better" than
>software contention because hopefully less cycles (overall) will be
>used. Correct me if I am wrong.

It's unclear: the contention on atomatic instructions jams the
bus because it is uses many "RTO" cycles; a thread which is
just idle and sleeps does cause any cycles. (A Solaris kernel
mutex, e.g., will not even try to spin if the thread currently
owning the mutex isn't running on another CPU; it will just
sleep instead)

>3. I could implement a try-and-enqueue approach. Try for some number
>of cycles (preferably less than the # of cycles the kernel would take
>to put the thread on the mutex queue) to increment the integer and
>once my cycles are up, place myself on a queue. This is basically a
>spinlock right?

Yes.

Note that example programs with many threads aren't really valid
scalability tests as only as many threads as CPUs can possibly run
at a time so contention is low anyway unless when you have
many CPUs.

Casper



Relevant Pages

  • Re: Atmel releasing FLASH AVR32 ?
    ... A dual thread 40 MHz CPU can replace two 20 MHz CPUs. ... that a thread can only run max 1/2 or 1/3rd of the cycles ... switch at the start of the pipeline, ... equivalent to the interrupt latency. ...
    (comp.arch.embedded)
  • Re: Apple II Disk Drive Question
    ... derived from the Apple II CPU clock which runs at ... which will write one bit every four CPU cycles, ... adjusting the speed of the two drives to create the necessary ... know the rotation speed of both the writing and reading drives, ...
    (comp.sys.apple2)
  • Re: Adjusting PC Hyperthreading for Spice Simulation
    ... 3 instructions (or 3 cycles' worth of instructions) per CPU ... PPro upwards can execute multiple load/store ... 1100 MHz to 1400 MHz CPU cores had appeared we had DDR 333 ram. ...
    (sci.electronics.design)
  • Re: Apple II Disk Drive Question
    ... which will write one bit every four CPU cycles, ... disk is spinning. ... adjusting the speed of the two drives to create the necessary ...
    (comp.sys.apple2)
  • Re: interactive task starvation
    ... Where exactly are those extra cycles going I wonder? ... blows my mind though for reasons I've just said. ... in other processes which are starvating the CPU (eg: ... as no other workload has been identified. ...
    (Linux-Kernel)