Re: Multi-threaded app behaviour question



On Tue, 8 Sep 2009 01:46:13 -0700 (PDT)
philkime <Philip@xxxxxxxxxxx> wrote:

I currently have an app which seems to be a victim of some sort of
process deadlock or possibly just bad design and wondered what
thoughts others may have about this.

Box has 8 dual-core CPUS running Sol 10.

That would be 16 processors.

If process A spawns 8 or more child threads, the following seems to
happen

* All CPUs are not at all saturated - mainly idle.
* I see the child threads all sat on a CPU at priority 59 and they are
sleeping for about 20 seconds until they do
something. Tracing it shows that they are sleeping doing reads on a
socket which seems to be connected to
parent process A.
* Process A is also at priority 59 but not on a CPU.

This 20 second or so delay doesn't happen if I set Process A to spawn
less than 8 threads.

Given that your box has 16 processors, there is no reason to believe
that the number of processors has an influence on the behaviour of the
program. One of the basic tenets of programming is that application
programmers don't grok threads.

I am assuming that the problem is that the
children are waiting for data from the parent which can't get CPU time
because everything is at priority 59.

Nope. On a 16 processor box, 9 threads cannot be starved for CPU time.
The programmers goofed and got the thread synchronisation wrong. It
happens all the time, even with thread-friendly languages such as Java.

After about 20 seconds,
something kicks one child off the CPU (what timer is this?)

Probably a timer in the process. Solaris doesn't have a magical 20
second timer. The application probably has one as a crutch to avoid
solving a real issue. Usually, timers are implemented as signals so
trussing for alarm calls would show this.

and the parent gets some time to deliver data. I don't understand the
"8 threads" limit to this behaviour as the box naturally reports 16
CPUs because they are dual-core.

As said above, it's unlikely to have anything at all to do with the box
or the OS, and everything with programmers not understanding
multi-threading.

I have done a lot of tracing, dtracing etc. and everything seems to
support this hypothesis so far. I can't work out whether this is a
tuning problem or just badly written (expensive, commercial) software
which doesn't multi-thread very well?

Unlikely to be a mere tuning problem. Solaris does both multi-threading
and multi-tasking very well, thank you.

Of course processor binding/renicing Process A to test this
hypothesis doesn't work as this is inherited by the children. Any
thoughts?

Children? Is this a multi-threading program or a program that spawns
child processes? If the latter, then it's likely that the programmers
got the parent/child synchronisation wrong, and never tested it on a
genuine multi-processor box so that it did not show (there are very
real differences in behaviour when co-operating programs are involved
when going from one to multiple processors). Alternatively, they have
only access to an 8-way system and have produced a kludge that
malfunctions when there are more processors.

If you (or your company) will be sued when you divulge the name of the
piece of crap causing the problem, stay mum. If not, pillory the thing;
you'll probably find someone at least to commiserate or, with a bit of
luck, with a workaround or solution.

--
Stefaan A Eeckels
--
"A ship in the harbor is safe. But that's not what ships are built for."
-- Rear Admiral Dr. Grace Murray Hopper.
.



Relevant Pages

  • Re: A chip too far? Where is your solution Mr Larkin?
    ... Blowing in the wind, ... But wasting performance for reliablity though is never going to fly - not least because most times it would not have the desired effect. ... Hardware enforced memory protection for threads on a timesliced CPU can be made every bit as reliable as giving each one a physical CPU just a SMOP. ... hardware control over the things that programmers tend to screw up. ...
    (sci.electronics.design)
  • Re: OT Dual core CPUs versus faster single core CPUs?
    ... Then one core could be assigned to each application process ... The OS cpu will assign it a task, create its memory image, set up its ... "Most people fail to consider that good programmers are very ...
    (sci.electronics.design)
  • Re: a dozen cpus on a chip
    ... Multiple CPUs are hard to manage efficiently for general purpose ... All of which are easily done by timeslicing a single CPU. ... of executable files as my whimsy dictates. ... sloppy programming languages and sloppy programmers. ...
    (sci.electronics.design)
  • Re: why is there not a lisp pc?
    ... but I am still not convinced that this was the main reason. ... the number of programmers is itself endogenous: ... Since that time, better RISC chips have appeared, yet the 30 yo ... the "general purpose CPU niche"? ...
    (comp.lang.lisp)
  • Re: 2.6.16-rc6-mm2
    ... CPU: Trace cache: 12K uops, ... Calibrating delay using timer specific routine.. ... # ACPI Support ...
    (Linux-Kernel)