Re: Still getting NFS client locking up

From: Soren Schmidt (sos_at_spider.deepcore.dk)
Date: 11/10/03

  • Next message: Bruce Evans: "Re: erroneous message from locked-up machine"
    To: Robert Watson <rwatson@freebsd.org>
    Date: Mon, 10 Nov 2003 18:44:40 +0100 (CET)
    
    

    It seems Robert Watson wrote:
    > How fast are your systems, speaking of which? I live in the world of
    > 300-500 mhz machines at work, and 300-800 mhz boxes at home. If you're
    > using multi-ghz boxes, that could well be the distinguishing factor
    > between our configurations...

    Server is 533MhzVIA C3, clients everything from 300Mhz PII to 2.6G P4.

    > Ok, here's the strategy I was planning to take once I could reproduce it:
    >
    > (1) Attempt to further narrow down responsibility to client/server. In
    > particular, see if an apparent hang on one client affects the other
    > clients.

    For me its just the server end that fails, I've not seen the client hang.

    > (2) Investigate Soren's report that killing and restarting nfsd on the
    > server would clear the hang.

    Yups, that works, in fact I have that in my crontab now every minute
    to keep NFS from hosing my setup here.
    NOTE: I also still need to ifconfig done/up my interfaces on some
    boxes or the netstack will freeze (again done every minute in crontab).
    However when NFS locks up it seems totatlly unrelated, ie all other
    network traffic works...

    > (3) Look at stack traces of involved processes on both the client and
    > server: in particular, look at traces for any client blocked in NFS,
    > any nfsiod processes on the client, and the nfsd processes on the
    > server. Also look at the wait channels on clients and servers for
    > these processes. Particularly interested in whether nfsd processes
    > are blocked trying to grab locks.

    Ok, will do..

    > (4) Look at netstat information for NFS sockets, in particular, if the
    > buffers are full, or not being drained. In particular, on the server,
    > is the input queue not being drained by nfsd worker threads?

    Netstat doesn't seem to give any hints or even usefull info here,
    any special cmdøs you want the output from ?

    > (5) Try backing out src/sys/nfsserver/nfs_serv.c:1.137, which removed
    > another deadlock problem, but did change locking behavior in the NFS
    > server.

    No change already tried.

    > (6) Look at packet traces between the client and server with ethereal,
    > which has pretty good NFS decoding. Is the client retransmitting an
    > RPC to the server and the server just isn't responding, or is the
    > client failing to transmit? At the point of the hang, what sorts of
    > RPCs are outstanding to the server? In the past, we've seen "apparent
    > hangs" when some or another more obscure unusual error case on the NFS
    > server fails to respond to an RPC, which causes the client to "wait
    > forever".

    I can try that easily, I'll get a trace to you later tonight...

    > Things to look for: normally, idle nfsd and nfsiod processes have a WCHAN
    > of "-" (ps -lax), which indicates they're blocked waiting for some event
    > to kick them off. If you see nfsd processes "hung" in another state, it's
    > a good sign we've identified a server problem. In the nfs client
    > processes, "nfsrcvlk" typically indicates a process has sent out an RPC
    > and is now waiting on a response.

    I see the idle '-' wchan here when things go bad IIRC...

    -Søren
    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"


  • Next message: Bruce Evans: "Re: erroneous message from locked-up machine"

    Relevant Pages

    • Errors writing large files via NFS
      ... files larger than a certain size to a NFS server. ... client systems, although the definition of "too large" varies. ... network paths involved, I'm pretty sure we're not seeing a network problem. ...
      (Tru64-UNIX-Managers)
    • Re: Still getting NFS client locking up
      ... > the same NFS lockups. ... > Reading from the server works perfectly all the time. ... > NFS CLIENT: ... in particular, look at traces for any client blocked in NFS, ...
      (freebsd-current)
    • Re: Help me replace some Windows installations
      ... > Possible with untrusted clients in SMB, and trusted clients in NFS. ... >> trust every client that might be connected to this network. ... > Still, user ABC on client, accesses to server with rights of the user ... > which Peter already told you about, or use SMB for Linux to Linux ...
      (comp.os.linux.setup)
    • V210 BGE0@1000FDX
      ... When connecting a server to a Gig interface you need to enable autoneg ... Blocked port after process kill ... NFS oddity ... where hostname is the name of the NFS client which will automount the ...
      (SunManagers)
    • 2.6.9: NFS (+XFS) Problem - Clients getting Stale filehandles.
      ... I'm having a rather vierdNFS Problem. ... We have a disk-backup server ... running an NFS server exporting an XFS filesystem to a number of clients ... Client and server are on the same LAN - no firewall. ...
      (Linux-Kernel)