Re: Multi-threaded app behaviour question
- From: Stefaan A Eeckels <hoendech@xxxxxx>
- Date: Fri, 11 Sep 2009 23:45:17 +0200
On Tue, 8 Sep 2009 01:46:13 -0700 (PDT)
philkime <Philip@xxxxxxxxxxx> wrote:
I currently have an app which seems to be a victim of some sort of
process deadlock or possibly just bad design and wondered what
thoughts others may have about this.
Box has 8 dual-core CPUS running Sol 10.
That would be 16 processors.
If process A spawns 8 or more child threads, the following seems to
happen
* All CPUs are not at all saturated - mainly idle.
* I see the child threads all sat on a CPU at priority 59 and they are
sleeping for about 20 seconds until they do
something. Tracing it shows that they are sleeping doing reads on a
socket which seems to be connected to
parent process A.
* Process A is also at priority 59 but not on a CPU.
This 20 second or so delay doesn't happen if I set Process A to spawn
less than 8 threads.
Given that your box has 16 processors, there is no reason to believe
that the number of processors has an influence on the behaviour of the
program. One of the basic tenets of programming is that application
programmers don't grok threads.
I am assuming that the problem is that the
children are waiting for data from the parent which can't get CPU time
because everything is at priority 59.
Nope. On a 16 processor box, 9 threads cannot be starved for CPU time.
The programmers goofed and got the thread synchronisation wrong. It
happens all the time, even with thread-friendly languages such as Java.
After about 20 seconds,
something kicks one child off the CPU (what timer is this?)
Probably a timer in the process. Solaris doesn't have a magical 20
second timer. The application probably has one as a crutch to avoid
solving a real issue. Usually, timers are implemented as signals so
trussing for alarm calls would show this.
and the parent gets some time to deliver data. I don't understand the
"8 threads" limit to this behaviour as the box naturally reports 16
CPUs because they are dual-core.
As said above, it's unlikely to have anything at all to do with the box
or the OS, and everything with programmers not understanding
multi-threading.
I have done a lot of tracing, dtracing etc. and everything seems to
support this hypothesis so far. I can't work out whether this is a
tuning problem or just badly written (expensive, commercial) software
which doesn't multi-thread very well?
Unlikely to be a mere tuning problem. Solaris does both multi-threading
and multi-tasking very well, thank you.
Of course processor binding/renicing Process A to test this
hypothesis doesn't work as this is inherited by the children. Any
thoughts?
Children? Is this a multi-threading program or a program that spawns
child processes? If the latter, then it's likely that the programmers
got the parent/child synchronisation wrong, and never tested it on a
genuine multi-processor box so that it did not show (there are very
real differences in behaviour when co-operating programs are involved
when going from one to multiple processors). Alternatively, they have
only access to an 8-way system and have produced a kludge that
malfunctions when there are more processors.
If you (or your company) will be sued when you divulge the name of the
piece of crap causing the problem, stay mum. If not, pillory the thing;
you'll probably find someone at least to commiserate or, with a bit of
luck, with a workaround or solution.
--
Stefaan A Eeckels
--
"A ship in the harbor is safe. But that's not what ships are built for."
-- Rear Admiral Dr. Grace Murray Hopper.
.
- References:
- Multi-threaded app behaviour question
- From: philkime
- Multi-threaded app behaviour question
- Prev by Date: Multi-threaded app behaviour question
- Next by Date: Re: Multi-threaded app behaviour question
- Previous by thread: Multi-threaded app behaviour question
- Next by thread: Re: Multi-threaded app behaviour question
- Index(es):
Relevant Pages
|