Tru64 server can't handle 900 network clients

From: Ole Holm Nielsen (ohnielse_at_fysik.dtu.dk)
Date: 09/17/04

  • Next message: Dege, Robert C.: "CLI Patch removal"
    Date: Fri, 17 Sep 2004 20:49:38 +0200
    To: tru64-unix-managers@ornl.gov
    
    

    I'm stumped by an apparent limit in the Tru64 UNIX kernel (v5.1A pk6)
    to handle client node MAC-addresses for close to 1000 NFS clients.
    We expanded our Linux cluster to 900+ nodes, and suddenly the
    Tru64 UNIX NFS file-server randomly looses network communication
    with many (or most) of the new nodes. A "ping" doesn't work at
    either end of the server-client connection. Communication between
    Linux servers and nodes works perfectly, however, so we do not
    believe there to be a problem with the network setup.

    What happens is I believe "ARP cache trashing": The Tru64 kernel
    apparently can't cope with close to 1000 MAC-addresses simultaneously
    because a fixed-size ARP cache fills up, and the kernel starts
    deleting MAC-addresses from the ARP cache randomly. See "man 7 arp"
    on a Linux box about the cache. On the Linux boxes we solve the
    ARP cache problem by loading a static cache from the /etc/ethers file,
    but on Tru64 UNIX this causes a dead-sure communications failure :-(

    Browsing the Tru64 UNIX manuals and the "dxkerneltuner" tool, I
    haven't been able to find any kernel parameter which may increase
    the maximum size of the ARP cache. Can anyone help ?
    Note: The 900 nodes are divided about equally between two Gigabit
    interfaces on the Tru64 UNIX server.

    Ole Holm Nielsen
    Department of Physics, Technical University of Denmark,
    Building 307, DK-2800 Kongens Lyngby, Denmark


  • Next message: Dege, Robert C.: "CLI Patch removal"

    Relevant Pages

    • Re: Need UNIX clarification
      ... the Tru64 UNIX kernel is multi-threaded. ... applications running in user space, SMP systems, and a way to allow ... The contents of the transmission may also be subject to ...
      (comp.os.vms)
    • SUMMARY: Tru64 server cant handle 900 network clients
      ... network you need to know how to increase the Ethernet ARP cache size in Tru64 UNIX. ... On Linux hosts the same modification can be implemented via the ...
      (Tru64-UNIX-Managers)
    • Re: DS25 Kernel Attributes
      ... > Is there a way to know the right value for those Kernel attributes? ... You have to balance performance against a finite amount of ... The system needs with application needs. ... The other option is to contact a consultant or HP Tru64 UNIX software ...
      (comp.unix.tru64)
    • Kernel Building for Dummies
      ... Does anyone have a step-by-step guide to building a kernel ... on Tru64 UNIX? ... the SCSI card driver with the current kernel. ...
      (comp.unix.tru64)