Busy BIND + 5.2.1 = UDP Packet Loss

From: Leo Bicknell (bicknell_at_ufp.org)
Date: 10/27/04

  • Next message: Lucas Holt: "Re: CMOS Checksum error"
    Date: Wed, 27 Oct 2004 10:14:19 -0400
    To: freebsd-hackers@freebsd.org
    
    
    

    I recently upgraded a fairly busy nameserver to FreeBSD 5.2.1, and
    I'm seeing packet loss from time to time on the box. I've done
    some digging, and the box seems to be dropping UDP packets. Netstat
    output:

    udp:
            177604011 datagrams received
            0 with incomplete header
            7 with bad data length field
            2735233 with bad checksum
            83753 with no checksum
            205540 dropped due to no socket
            1917 broadcast/multicast datagrams dropped due to no socket
            10627437 dropped due to full socket buffers
            0 not for hashed pcb
            164033877 delivered
            169793422 datagrams output

    The "dropped due to full socket buffers" seems to be the issue. I
    am also concerned by the number of packets with bad checksums, but
    I have no previous data point.

    I am seeing loss with DNS, but also with ping and given the few pauses
    in my ssh sessions with TCP as well. I don't see anything remarkable
    with the TCP or ICMP statistics. I don't think there's anything wrong
    in MBUF land, statistics here for reference:

    % netstat -m
    mbuf usage:
            GEN cache: 0/256 (in use/in pool)
            CPU #0 cache: 335/672 (in use/in pool)
            Total: 335/928 (in use/in pool)
            Mbuf cache high watermark: 512
            Maximum possible: 51200
            Allocated mbuf types:
              291 mbufs allocated to data
              14 mbufs allocated to ancillary data
              16 mbufs allocated to fragment reassembly queue headers
              14 mbufs allocated to socket names and addresses
            1% of mbuf map consumed
    mbuf cluster usage:
            GEN cache: 0/152 (in use/in pool)
            CPU #0 cache: 289/400 (in use/in pool)
            Total: 289/552 (in use/in pool)
            Cluster cache high watermark: 128
            Maximum possible: 25600
            2% of cluster map consumed
    1336 KBytes of wired memory reserved (49% in use)
    0 requests for memory denied
    0 requests for memory delayed
    0 calls to protocol drain routines

    The only other thing I found of interest was some interrupt drops:

    % sysctl -a | grep drops
    net.inet.ip.intr_queue_drops: 13981

    So, given the traffic profile (nameserver, heavy UDP) and the info
    here can someone help point me in the right direction? I'm not sure
    where to go from here?

    -- 
           Leo Bicknell - bicknell@ufp.org - CCIE 3440
            PGP keys at http://www.ufp.org/~bicknell/
    Read TMBG List - tmbg-list-request@tmbg.org, www.tmbg.org
    
    



  • Next message: Lucas Holt: "Re: CMOS Checksum error"

    Relevant Pages

    • Re: [PATCH 00/28] Swap over NFS -v16
      ... To do so we need to distinguish needed from unneeded packets; ... our state must not consume memory, ... a/ in caches, such as the fragment cache and the route cache ...
      (Linux-Kernel)
    • Re: [PATCH 00/28] Swap over NFS -v16
      ... of dirty memory so that when we desperately need memory on the ... any incoming packets. ... So suppose we forgot about all the allocation tracking (that doesn't ... A packet is received, it can be a fragment, it will be placed in the ...
      (Linux-Kernel)
    • Re: [PATCH 00/28] Swap over NFS -v16
      ... To do so we need to distinguish needed from unneeded packets; ... our state must not consume memory, ... a/ in caches, such as the fragment cache and the route cache ...
      (Linux-Kernel)
    • [patch 2.6.12-rc3] dell_rbu: Resubmitting patch for new Dell BIOS update driver
      ... This also has a fix where the packets were leaked in the function create_packet line#227. ... +The BIOS update is done by writing the new BIOS image in to contiguous physical ... +memory addressable by the BIOS. ... +The disadvantage of contiguous allocation is that it may not be always possible ...
      (Linux-Kernel)
    • Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism
      ... - allocate from the global critical receive pool on receive ... where we now completely deadlock and never recover, including shutting off the router and firewall, because they don't have enough memory to recv packets either. ...
      (Linux-Kernel)