netstat issue on Tru64. Kernel bug?

From: Loic Domaigne (loic-dev_at_gmx.net)
Date: 06/07/04

  • Next message: Syed: "How to created messages file by monthly"
    Date: 7 Jun 2004 03:10:21 -0700
    
    

    Hello Everybody,

    Usually (wanabee) programmers claim compiler and OS responsible for
    their programs not working properly... But this time, I feel that's
    might be really a kernel issue, rather than a broken application.

    I posted a summary some times ago on the Tru64 manager mailing list,
    but didn't get any answer so far. The experiment below has been
    conducted on Tru64 4.0F. I would be interested to know if this netstat
    issue persists on newer Tru64 (5.x).

    Any comments are appreciated, especially if I fool myselves with the
    "example program".

    Cheers,
    Loic.

    <copy>

    Dear Tru64 Managers,

    I have noticed that many people were faced to the "netstat hangs"
    problem. But I didn't found a clear answer regarding that issue.

    I ran two weeks ago in that problem too, and here a summary of my
    investigations (on Tru64 4.0F, but this might apply to newer Tru64 as
    well, if I refer to the posts on this list).

    If a process writes in a message queue in such way that it overflows
    the queue (for instance, no receiver presents), then when the maximum
    of oustanding messages is reached (40 on my system), netstat hangs.
    BTW, not only netstat, but also program like lsof. As soon as the
    queue is removed, netstat (resp. lsof) works fine again.

    I believe, this is a Tru64 issue? Since, the only process that should
    be eventually "punished" is the writer (if IPC_NOWAIT isn't passed to
    msgflg, the writer should block). However, even if a message queue has
    kernel persistence, I don't believe that other processes like netstat
    should block too...

    Below, you shall find overflowQ.c, a program that does nothing but
    overflow a message queue, as well a description to reproduce the
    problem (steps.txt).

    You might known all of this already... But I felt it would be perhaps
    a good idea to post that summary.

    Cheers,
    Loic.

    ------------------ overflowQ.c

    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>
    #include <sys/types.h>
    #include <sys/ipc.h>
    #include <sys/msg.h>
    #include <errno.h>

    #define MSGQ_KEY 0x1234
    #define PERM 0660

    struct mymsg {
     long mtype;
     char mtext[1];
    };

    int
    main()
    {
     int msg_id; /* message Queue id. */
     struct mymsg msg; /* message to send. */
     int nmsg; /* number of msg send. */
     int status; /* status returned by msgsnd(). */
     /*
      * Create Message Queue
      */
     if ( (msg_id = msgget (MSGQ_KEY, PERM | IPC_CREAT | IPC_EXCL)) == -1
    ) {
       perror ("msgget: ");
       exit (EXIT_FAILURE);
     }
     /*
      * now write message one by one until Queue is full
      */
     msg.mtype = 1;
     msg.mtext[0] = 'A';
     nmsg = 0;
     do {
       status = msgsnd (msg_id, &msg, sizeof(msg.mtext), IPC_NOWAIT);
       nmsg++;
       printf ("Sent msg #%d\r", nmsg);
     }
     while ( status == 0 );
     
     if (errno != EAGAIN ) { /* eh? we didn't overflow? */
       perror ("msgsnd: ");
       exit (EXIT_FAILURE);
     }
     
     printf ("Queue full, sent %d messages\n", nmsg-1);
     exit (EXIT_SUCCESS);
    }

    ------------------ steps.txt

    ===============================
    Steps to reproduce the problem:
    ===============================

    bash$ netstat
    Proto Recv-Q Send-Q Local Address Foreign Address
    (state)
    tcp 0 0 vdphr1.sshd wks4.33524
    ESTABLISHED
    tcp 0 0 localhost.3922 localhost.2301
    TIME_WAIT
    tcp 0 0 localhost.3923 localhost.2301
    TIME_WAIT
    ...

    bash$ overflowQ
    Queue full, sent 40 messages

    bash$ netstat
    [ hangs... ]

    bash$ ipcs -qa
    T ID KEY MODE OWNER GROUP CREATOR CGROUP
    CBYTES QNUM QBYTES LSPID LRPID STIME RTIME CTIME
    q 0 0x41003ec7 --rw------- root system root system
        0 0 16384 3 13647 14:24:29 14:24:29 8:37:07
    q 130 0x1234 --rw-rw---- loic vdp loic vdp
       40 40 16384 13662 0 14:25:06 no-entry 14:25:06

    removing the queue solve the problem:
    bash$ ipcrm -q 130
    bash$ netstat
    Active Internet connections
    Proto Recv-Q Send-Q Local Address Foreign Address
    (state)
    tcp 0 0 vdphr1.sshd wks4.33524
    ESTABLISHED
    tcp 0 0 localhost.3939 localhost.2301
    TIME_WAIT
    tcp 0 0 localhost.3940 localhost.2301
    TIME_WAIT
    ...

    Note: there is no need to restart kloadsrv.

    </copy>


  • Next message: Syed: "How to created messages file by monthly"

    Relevant Pages

    • About netstat hangs...
      ... I have noticed that many people were faced to the "netstat hangs" ... If a process writes in a message queue in such way that it overflows the ... bash$ overflowQ ...
      (Tru64-UNIX-Managers)
    • Re: ipc/msg.c "cleanup" breaks fakeroot on Alpha
      ... avoid taking a lock on a message queue that just got freed? ... queue after a grace period. ... "volatile" shouln't be necessary. ... or the target thread was destroyed before wake_up_process completed. ...
      (Linux-Kernel)
    • Re: Need explaination of BeginInvoke timing
      ... it just queues the delegate invocation on the regular message queue for the control's owning thread. ... Actually, while you don't post enough code to be certain, it appears that the event handlers are just handlers for the regular Control events Enter, MouseUp, and MouseDown. ... The code is setting the flag, queuing a delegate that resets the flag which will be executed after all of the messages that are already in the queue, and then the remaining events are processed, including the MouseDown event and the delegate invocation that was queued via BeginInvoke. ...
      (microsoft.public.dotnet.languages.csharp)
    • Re: Event handling mechanism in Windows
      ... The IRP that originated the request `points back` to the requestor thread. ... and the APC adds a message to the message queue of the foreground thread ...
      (microsoft.public.win32.programmer.kernel)
    • Re: Dumb RTOS Question
      ... shows up in the queue. ... I thought a message queue would work or have ... then suspend for a safe amount of time before looping back to check them ... I just have not figured out this kooky RTOS stuff. ...
      (comp.arch.embedded)