NFS server hangup

From: Graham Allan (allan_at_physics.umn.edu)
Date: 12/31/03


Date: Wed, 31 Dec 2003 15:30:56 -0600
To: tru64-unix-managers@ornl.gov

New Years Eve probably isn't the optimum time to be looking for
answers, but twice in the last couple of days our main NFS server has
hung up. It's a DS20 (2/500) with 2.5G memory, Tru64 v5.1A PK4. Logging
in to the server, things appear more or less normal but all clients
report "NFS server xxx not responding" - we normally see this
occasionally, but in this case it never recovers.

Running /sbin/init.d/nfs stop/start fails to recover. syslog shows:

Dec 31 15:03:55 spartha nfsd:[111457]: Can't bind UDP addr: Address already in use

probably because if I look at the output of "ps", I see "nfsd" in state
"U" - the old nfsd is failing to exit. Unfortunately I don't know what
state it was in before I tried stopping it...

Finally, halting the system also fails - it hangs (no messages visible
- blue screen after X shuts down).

I probably should also have looked at the output of "ps axml" to see
the state of the kernel threads, but I only looked at this part of the
man page after restarting, so will have to wait for next time...

The server does have a lot of NFS clients. It was running with 32 each
of TCP/UDP clients, though as most of the clients are UDP, I may reduce
the TCP thread count and raise UDP.

Some local software (in /usr/local) was updated over the past few days
- things like perl, openssl, stunnel, and so on - but it's hard for me
to image how that could be related.

Any ideas on a possible cause (or solution)?

G.

-- 
-------------------------------------------------------------------------
Graham Allan - I.T. Manager - gta@umn.edu - (612) 624-5040
School of Physics and Astronomy - University of Minnesota
-------------------------------------------------------------------------


Relevant Pages

  • Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z
    ... The client machines are running a distributed ... The clients create, move and delete directories ... problems when the NFS server was SuSE 10.0. ... The exported filesystem is XFS. ...
    (SuSE)
  • Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z
    ... The client machines are running a distributed ... The clients create, move and delete directories ... problems when the NFS server was SuSE 10.0. ... The exported filesystem is XFS. ...
    (Linux-Kernel)
  • Re: [opensuse] nfs_update_inode: inode X mode changed, Y to Z
    ... The client machines are running a distributed ... The clients create, move and delete directories ... problems when the NFS server was SuSE 10.0. ... The exported filesystem is XFS. ...
    (SuSE)
  • Re: Writing lots of small files to NFS
    ... >>connection open to the NFS server to reduce this overhead? ... > Typically clients keep connections open; but what i shurting you most ... Thank you for this point Casper. ... The lookup calls at 31% concern me though. ...
    (comp.unix.programmer)
  • Re: panic: mb_dtor_pack: ref_cnt != 1
    ... A>>for a FreeBSD-6 NFS server. ... A>>All 3 clients have a small RAM, which may be a cause for faster apparition ... 0/5/6576 sfbufs in use ... requests for sfbufs delayed ...
    (freebsd-current)