Re: Lock files on NFS



"Hubble" <reiner@xxxxxxxxx> writes:
if link call was successful then
we have the lock
else
stat .lock.<hostname>.<PID>
if link count == 2 then
we have the lock /* ??? */
else
we failed to get the lock

Note that link(2) can fail for various reasons, see the man page
section ERRORS (http://bama.ua.edu/cgi-bin/man-cgi?link+2). Namely
EINTR or EAGAIN on some systems can happen, since link is a so called
"slow" system call. There may be other reasons that indicate that link
did not "complete" during the system call, but nevertheless completed.

So the scenario is:

1. The link() call:
link(".lock.foo.42", ".lock");
returns before the action is completed, and link() therefore
returns a failure indication.

2. But the action of creating the link (which requires several steps)
actually continues on the NFS server, and eventually succeeds.

3. Checking the link count on the linked-to file is a reliable way to
detect this.

4. The program is going to hold the lock for some time (perhaps a few
seconds); by the time it's done, both ".lock" and ".lock.foo.42"
will exist, and both can be removed to release the lock.

That makes sense, but it's part 3 that I'm still having a little
trouble with. If ".lock.foo.42" already exists, and ".lock" doesn't,
then ideally the link() call will succeed. But because of NFS
semantics, it can appear to fail. In that case, I check the link
count on ".lock.foo.42" -- but isn't there still a delay before
".lock" is created? If another process on another system tries to
execute

link(".lock.bar.137", ".lock");

during that delay, isn't it possible that it could appear to succeed?

Or is this race condition eliminated by the operations being
serialized on the NFS server?

In any case, my testing shows that the code I'm using now is more
reliable that what I was using before.

--
Keith Thompson (The_Other_Keith) kst-u@xxxxxxx <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
.



Relevant Pages

  • Re: Lock files on NFS
    ... we failed to get the lock ... Note that linkcan fail for various reasons, ... actually continues on the NFS server, and eventually succeeds. ...
    (comp.unix.programmer)
  • Re: PREEMPT_RCU breaks anon_vma locking ?
    ... On Sat, 24 Feb 2007, Oleg Nesterov wrote: ... bundle of vmas completely; but its lock remains a lock and its list ... I've CC'ed Christoph for several reasons. ...
    (Linux-Kernel)
  • Re: [PATCH] pcnet32 driver NAPI support
    ... the same reasons, and will compare what you have with my own ... as most network driver discussion is done here rather than lkml. ... I have never tried changing the ring size on ... interrupt handler which already holds the lock. ...
    (Linux-Kernel)
  • Re: How to acquire objects monitor optionally?
    ... Each has failed for its own reasons. ... I think my answer is that I doubt I could do better than the functionality provided by ReentrantLock and tryLock. ... It's not possible for the main thread to lock, and then the spun-off thread to unlock when it's done. ... problem where the main thread spins off the thread, a condition changes and I no longer want to execute the method (basically the app needs to shutdown), then the spun off thread tries to lock and I get errors. ...
    (comp.lang.java.programmer)
  • Loosing mouse when screen is locked
    ... But I upgraded to kernel 2.6.11.10 for some ... reasons. ... Then, in the evening when I lock my screen using 'xlock', when ...
    (Debian-User)