Re: NFS locking: lockf freezes (rpc.lockd problem?)



On Sun, Aug 27, 2006 at 07:17:34PM +0000, Michael Abbott wrote:
On Sun, 27 Aug 2006, Kostik Belousov wrote:

Make sure that rpc.statd is running.
Yep. Took me some while to figure that one out, but the first lockf test
failed without that.

[...]

As for the other test, let's have a look. Here we are before the test
(NFS server, 4.11, is saturn, test machine, 6.1, is venus):

saturn$ ps auxww | grep rpc\\.
root 48917 0.0 0.1 980 640 ?? Is 7:56am 0:00.01 rpc.lockd
root 115 0.0 0.1 263096 536 ?? Is 18Aug06 0:00.00 rpc.statd

[...]

Well, how odd: as soon as I start the test process 515 on venus goes away.
Now to wait for it to fail... (doesn't take too long):

[...]

In conclusion: I agree with Greg Byshenk that the NFS server is bound to
be the one at fault, BUT, is this "freeze until reboot" behaviour really
what we want? I remain astonished (and irritated) that `kill -9` doesn't
work!

The problem here is that the process is waiting for somthing, and
thus not listening to signals (including your 'kill').

I'm not an expert on this, but my first guess would be that saturn (your
server) is offering something that it can't deliver. That is, the client
asks the server "can you do X?", and the server says "yes I can", so the
client says "do X" and waits -- and the server never does it.

Or alternatively (based on your rpc.statd dying), rpc.lockd on your
client is trying to use rpc.statd to communicate with your server. And
it starts successfully, but then rpc.statd dies (for some reason) and
your lock ends up waiting forever for it to answer.


I would recommend starting both rpc.lockd and rpc.statd with the '-d'
flag, to see if this provides any information as to what is going on.
There may well be a bug somewhere, but you need to find where it is.
I suspect that it is not actually in rpc.statd, as nothing in the
source has changed since January 2005.

An alternative would be to update to RELENG_6 (or at least RELENG_6_1)
and then try again.


--
greg byshenk - gbyshenk@xxxxxxxxxxx - Leiden, NL
_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: What doesnt lend itself to OO?
    ... >> proxy and instructs the server to constuct the real object. ... rather than client code. ... If 'clock' is instantiated in the server, ... > for the server interface at the OOA level. ...
    (comp.object)
  • This is going straight to the pool room
    ... or not the client has privilege to do what they're trying to do, ... The server environment is this: ... 3GL User action Routines that Tier3 will execute on your behalf during the ... Routine Name: USER_INIT ...
    (comp.os.vms)
  • [Full-Disclosure] R: Full-Disclosure Digest, Vol 3, Issue 42
    ... Full-Disclosure Digest, Vol 3, Issue 42 ... SD Server 4.0.70 Directory Traversal Bug ... Arkeia Network Backup Client Remote Access ...
    (Full-Disclosure)
  • Re: What doesnt lend itself to OO?
    ... > rather than client code. ... no way to do that without also touching the object with clock semantics ... will not encapsulate both clock semantics and network semantics. ... The server can do whatever it wants ...
    (comp.object)
  • RE: Fax monitor incoming + outgoing calls?
    ... problem between the client computer and the SBS server. ... Client is using the internal IP address of the SBS server as the ... To the folder redirection GPO issue: ...
    (microsoft.public.windows.server.sbs)