Re: rpc.lockd brokenness (2)
- From: Kris Kennaway <kris@xxxxxxxxxxxxxx>
- Date: Wed, 8 Mar 2006 19:57:22 -0500
On Thu, Mar 09, 2006 at 12:26:44AM +0000, Miguel Lopes Santos Ramos wrote:
From: Kris Kennaway <kris@xxxxxxxxxxxxxx>
Subject: Re: rpc.lockd brokenness (2)
This is intentional. It's how pidfile_*() tests whether the process
is still running. The intention is that if someone tries to open the
pidfile again while the first process is still running, the lock
acquisition will fail and we'll know the other process is still alive,
and therefore avoid starting a second instance.
No, no, you got me wrong. The pidfile is left locked after cron stopped
running (with /etc/rc.d/cron stop). This behaviour must be wrong.
OK, I misunderstood. The rc.d script will signal cron to kill it,
which should be closing the file descriptors and causing rpc.lockd to
release the lock. Perhaps this part is broken. OK, I tested this
with daemon -p, and it indeed seems to be broken:
haessal# daemon -p pid_file sleep 100000
haessal# kill -KILL `cat pid_file`
haessal# ps -p `cat pid_file`
PID TT STAT TIME COMMAND
haessal# lockf -t 0 pid_file echo Yay
lockf: pid_file: already locked
There is a (known) lockd bug here though, which you isolated:
So, this really is bin/80389?
No, I don't think so. The missing ability to cancel locking requests
(i.e. unkillable process while blocked on a lock) has never been
implemented in FreeBSD's rpc.lockd (I'm not aware of a PR about it, so
I filed my own earlier tonight), and the problem above might be a
I am a bit disappointed. First, this problem didn't cause me trouble before
I went to 6-STABLE, now I must either disable cron or disable locking (which
And I'm still not completely convinced. That problem, if I understand correctly,
existed before January...
The pidfile_*() functions are new, before that the pidfile handling
was done differently.
There are two things...
- cron.pid shouldn't be locked after cron terminated. (this interaction was
fully saved as http://mega.ist.utl.pt/~mlsr/nfs-nofile.bin)
Actually the locking isn't traced here; I misunderstood how it works,
and the lock transactions are done on another UDP port. You have to
use rpcinfo to figure out which one it is, since it varies. Anyway,
the above sequence reproduces it.
- cron shouldn't hang on startup just because the file is locked, since
pidfile_open opens it with O_NONBLOCK (unlike lockf).
I haven't been able to reproduce this, e.g. lockf -t 0 does O_NONBLOCK
locking and works correctly when the file is already locked. Perhaps
it's another locked file (not the pidfile) that was also leaked in the
same way, and is being opened without O_NONBLOCK.
- cron shouldn't hang in such a way that it is not killable... (and should
not also the open system call in lockf be interruptible?)
This is the bug (really: missing feature) that I described in my
Description: PGP signature