Re: waiting on sbwait

From: Robert Watson (rwatson_at_freebsd.org)
Date: 06/25/04

  • Next message: Xin LI: "Preliminary sys/netinet style patch"
    Date: Thu, 24 Jun 2004 23:35:40 -0400 (EDT)
    To: Danny Braniss <danny@cs.huji.ac.il>
    
    

    On Wed, 23 Jun 2004, Danny Braniss wrote:

    > sometimes we get
    > load: 0.04 cmd: dmesg 13453 [nfsrcvlk] 0.00u 0.00s 0% 148k
    >
    > and looking through the code, there might be some connection between
    > sbwait and nfsrcvlk, but i doubt that it's sockets that im running out
    > off, neither mbufs, since:
    >
    > foundation> netstat -m
    > 326/1216/26624 mbufs in use (current/peak/max):
    > 326 mbufs allocated to data
    > 321/428/6656 mbuf clusters in use (current/peak/max)
    > 1160 Kbytes allocated to network (5% of mb_map in use)
    > 0 requests for memory denied
    > 0 requests for memory delayed
    > 0 calls to protocol drain routines
    >
    > also, the process enters sbwait either in sosend or soreceive, make me
    > believe that it's some resource, rather than data, that is missing.
    >
    > the fact that this 'unresponsivness' happens sometimes is making this
    > rather challenging, but try to tell this to the users :-)

    sbwait() occurs when a thread is blocked in a socket waiting for space in
    the socket to send, or for data in the socket on a receive. This can
    happen either because a process is directly performing socket I/O -- for
    example, sending or receiving on a TCP or UDP socket -- or, it can happen
    when a process is using a facility that performs socket I/O in its kernel
    thread. For example, the NFS client. So the sbwait state could be a
    result of filled buffers of NFS. If I had to guess, it might well be NFS.
    However, there are actually ways to tell :-).

    The easiest is to compile your kernel with DDB, and when a process hangs
    with those symptoms, break into the debugger and do a trace on its pid.
    You'll get back a stack trace. If it's using a send/recv system call that
    terminates in the socket code without hitting VFS/NFS, it's blocked on
    network I/O, perhaps because it's sending or receiving a lot of data and
    hasn't finished. If you see it pass through NFS-related functions, then
    it's waiting for NFS network I/O, which could reflect a busy NFS server,
    network segment, packet loss, etc.

    Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
    robert@fledge.watson.org Senior Research Scientist, McAfee Research

    _______________________________________________
    freebsd-hackers@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
    To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"


  • Next message: Xin LI: "Preliminary sys/netinet style patch"

    Relevant Pages

    • Re: waiting on sbwait
      ... >> sbwait and nfsrcvlk, but i doubt that it's sockets that im running out ... > the socket to send, or for data in the socket on a receive. ... For example, the NFS client. ... > it's waiting for NFS network I/O, which could reflect a busy NFS server, ...
      (freebsd-hackers)
    • Re: How necessary is SSH_AUTH_SOCK?
      ... How is having the socket file on an NFS server a problem? ... an NFS environment, as it means anyone can read any files which are ... With some pain it can be more secure (i.e. access controls on all your ...
      (SSH)
    • [PATCH 31/33] nfs: enable swap on NFS
      ... Provide an a_ops->swapfileimplementation for NFS. ... NFS socket to SOCK_MEMALLOC and run socket reconnect under PF_MEMALLOC as well ... goto out; ... +config NFS_SWAP ...
      (Linux-Kernel)
    • [PATCH 28/29] nfs: enable swap on NFS
      ... Implement all the new swapfile a_ops for NFS. ... SOCK_MEMALLOC and run socket reconnect under PF_MEMALLOC as well as reset ... int err, status = -EIO; ... goto out; ...
      (Linux-Kernel)
    • [PATCH 27/28] nfs: enable swap on NFS
      ... Implement all the new swapfile a_ops for NFS. ... SOCK_MEMALLOC and run socket reconnect under PF_MEMALLOC as well as reset ... int err, status = -EIO; ... goto out; ...
      (Linux-Kernel)