netstat issue on Tru64. Kernel bug?
From: Loic Domaigne (loic-dev_at_gmx.net)
Date: 06/07/04
- Previous message: nikki_wire: "Re: A little UNIX scripting challenge..."
- Next in thread: Brian Haley: "Re: netstat issue on Tru64. Kernel bug?"
- Reply: Brian Haley: "Re: netstat issue on Tru64. Kernel bug?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 7 Jun 2004 03:10:21 -0700
Hello Everybody,
Usually (wanabee) programmers claim compiler and OS responsible for
their programs not working properly... But this time, I feel that's
might be really a kernel issue, rather than a broken application.
I posted a summary some times ago on the Tru64 manager mailing list,
but didn't get any answer so far. The experiment below has been
conducted on Tru64 4.0F. I would be interested to know if this netstat
issue persists on newer Tru64 (5.x).
Any comments are appreciated, especially if I fool myselves with the
"example program".
Cheers,
Loic.
<copy>
Dear Tru64 Managers,
I have noticed that many people were faced to the "netstat hangs"
problem. But I didn't found a clear answer regarding that issue.
I ran two weeks ago in that problem too, and here a summary of my
investigations (on Tru64 4.0F, but this might apply to newer Tru64 as
well, if I refer to the posts on this list).
If a process writes in a message queue in such way that it overflows
the queue (for instance, no receiver presents), then when the maximum
of oustanding messages is reached (40 on my system), netstat hangs.
BTW, not only netstat, but also program like lsof. As soon as the
queue is removed, netstat (resp. lsof) works fine again.
I believe, this is a Tru64 issue? Since, the only process that should
be eventually "punished" is the writer (if IPC_NOWAIT isn't passed to
msgflg, the writer should block). However, even if a message queue has
kernel persistence, I don't believe that other processes like netstat
should block too...
Below, you shall find overflowQ.c, a program that does nothing but
overflow a message queue, as well a description to reproduce the
problem (steps.txt).
You might known all of this already... But I felt it would be perhaps
a good idea to post that summary.
Cheers,
Loic.
------------------ overflowQ.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#include <errno.h>
#define MSGQ_KEY 0x1234
#define PERM 0660
struct mymsg {
long mtype;
char mtext[1];
};
int
main()
{
int msg_id; /* message Queue id. */
struct mymsg msg; /* message to send. */
int nmsg; /* number of msg send. */
int status; /* status returned by msgsnd(). */
/*
* Create Message Queue
*/
if ( (msg_id = msgget (MSGQ_KEY, PERM | IPC_CREAT | IPC_EXCL)) == -1
) {
perror ("msgget: ");
exit (EXIT_FAILURE);
}
/*
* now write message one by one until Queue is full
*/
msg.mtype = 1;
msg.mtext[0] = 'A';
nmsg = 0;
do {
status = msgsnd (msg_id, &msg, sizeof(msg.mtext), IPC_NOWAIT);
nmsg++;
printf ("Sent msg #%d\r", nmsg);
}
while ( status == 0 );
if (errno != EAGAIN ) { /* eh? we didn't overflow? */
perror ("msgsnd: ");
exit (EXIT_FAILURE);
}
printf ("Queue full, sent %d messages\n", nmsg-1);
exit (EXIT_SUCCESS);
}
------------------ steps.txt
===============================
Steps to reproduce the problem:
===============================
bash$ netstat
Proto Recv-Q Send-Q Local Address Foreign Address
(state)
tcp 0 0 vdphr1.sshd wks4.33524
ESTABLISHED
tcp 0 0 localhost.3922 localhost.2301
TIME_WAIT
tcp 0 0 localhost.3923 localhost.2301
TIME_WAIT
...
bash$ overflowQ
Queue full, sent 40 messages
bash$ netstat
[ hangs... ]
bash$ ipcs -qa
T ID KEY MODE OWNER GROUP CREATOR CGROUP
CBYTES QNUM QBYTES LSPID LRPID STIME RTIME CTIME
q 0 0x41003ec7 --rw------- root system root system
0 0 16384 3 13647 14:24:29 14:24:29 8:37:07
q 130 0x1234 --rw-rw---- loic vdp loic vdp
40 40 16384 13662 0 14:25:06 no-entry 14:25:06
removing the queue solve the problem:
bash$ ipcrm -q 130
bash$ netstat
Active Internet connections
Proto Recv-Q Send-Q Local Address Foreign Address
(state)
tcp 0 0 vdphr1.sshd wks4.33524
ESTABLISHED
tcp 0 0 localhost.3939 localhost.2301
TIME_WAIT
tcp 0 0 localhost.3940 localhost.2301
TIME_WAIT
...
Note: there is no need to restart kloadsrv.
</copy>
- Previous message: nikki_wire: "Re: A little UNIX scripting challenge..."
- Next in thread: Brian Haley: "Re: netstat issue on Tru64. Kernel bug?"
- Reply: Brian Haley: "Re: netstat issue on Tru64. Kernel bug?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|