Processes in <exiting> state



We have been running into problems with processes getting stuck in the
<exiting> state and consuming a lot of system time.

We seem to trigger this problem by using ^C to terminate one of our
applications. This doesn't always kill the process, though, so then we
do a 'kill -9' on the process. The process doesn't exit, though. At
this point the process shows as <exiting> on 'ps -ef'.

UID PID PPID C STIME TTY TIME CMD
- 26398 - - - <exiting>

These processes aren't defunct - they are still running, although you
can no longer see how much time they are accumulating. vmstat shows no
free time on the processor - it is spending all its time in system
mode. I ran a system trace and it appears that these processes are
stuck trying to write to a socket over and over again. The following
is typical.

MBUF m_getclustm canwait=M_WAIT type=MT_DATA callfrom=00150DF0
callfrom2=00000
000 pid=29016 ()
MBUF return from m_getclustm mbuf=70450900 dataptr=70160720

TCP tcp_usrreq so=704F8800 req=00000009 m=70450900 nam=00000000

MBUF m_free mbuf=70450900 dataptr=70160720 callfrom=00125DD8
callfrom2=0000000
0 pid=29016 ()
MBUF return from m_free mbuf=70450900

TCP tcp_output tp=704F89F0 so=704F8800
TCP tcp_output tp=704F89F0 so=704F8800

TCP tcp_usrreq_err so=704F88t -

From this it appears that these processes are trying to write to
a socket. I can find the corresponding connection using netstat -Ao,
and it shows that the connection is in the CLOSE_WAIT state. So the
server has already closed the other end of the connection. We seem
stuck trying to respond or to close our end of the connection.

704f89f0 tcp4 4 0 loopback.35575 loopback.10005
CLOSE_WAIT
so_state: (ISCONNECTED|CANTRCVMORE|NBIO)
timeo:0 uid:31017
so_special: (LOCKBALE|MEMCOMPRESS|DISABLE)
so_special2: (PROC)
sndbuf:
hiwat:0 lowat:0 mbcnt:0 mbmax:0
sb_flags: (LOCK)
rcvbuf:
hiwat:67424 lowat:1 mbcnt:256 mbmax:269696
sb_flags: (SEL)
TCP:
mss:16856 flags: (NODELAY)

Has anyone encountered this problem before, or do you know what causes
it? A search in Google and on IBM's web sites show some problems with
TTYs causing similar symptoms on AIX back in 1996 and 1997, but nothing
about sockets and nothing more recent than that.

Or failing an explanation, do you have any idea how to track this
problem down further?

--
Tom Einertson E-mail: tome@xxxxxxxxxxxxxxxx
SIEMENS Power Transmission & Distribution Phone: (952) 607-2244
Energy Management & Automation Division Fax: (952) 607-2018
10900 Wayzata Boulevard, Suite 400
Minnetonka, MN, 55305


.



Relevant Pages

  • Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
    ... That connection has been stuck for 9 ... discovered to see if TCP still tries to do something, ... that flow around the same time. ...
    (Linux-Kernel)
  • Re: ADO refresh problem...
    ... I will be moving to SQL Server so I'm trying to write my code in ADO now. ... It looks like I'm stuck using DAO for those subforms. ... connection object or creating a new one? ...
    (comp.databases.ms-access)
  • Re: OT: NetMon software
    ... 'broadband' to 1.5Mb as we'd allegedly sucked down over ... You're not _stuck_ with BT as an ISP - you can buy the actual internet ... connection off whichever ADSL provider you want. ...
    (uk.rec.motorcycles)
  • Re: Exchange 2003 and Messages Stuck in Queue
    ... Could it be that you are listed on a BlackList your connection is ... Some messages are stuck in the queue on our front end server and are stuck ... Troubleshooting thus far has included disabling AV, transport sinks, turning ...
    (microsoft.public.exchange.admin)
  • Re: Fundamentals question, is this how it works?
    ... If i let it run at full speed it gets stuck. ... TCP is a stream-based protocol, which means that it ignores any attempt (on ... then the receiving side might get ... all 2048 bytes in one call to recv(), or it might need many calls to recv ...
    (microsoft.public.win32.programmer.networks)