Re: NFS locking: lockf freezes (rpc.lockd problem?)



Michael Abbott wrote:
What about the non-interruptible sleep? Is this regarded as par for the
course with NFS, or as a problem?

I know that "hard" NFS mounts are treated as completely unkillable, though
why `kill -9` isn't made to work escapes me, but a locking operation which
(presumably) suffers a protocol error? Or is rpc.lockd simply waiting to
hear back from the (presumably broken) NFS server? Even so: `kill -9`
ought to work!

SIGKILL _does_ always work. However, signal processing can
be delayed for various reasons. For example, if a process
is stopped (SIGSTOP), further signals will only take effect
when it continues (SIGCONT).

Signal processing does not occur if a process is currently
not scheduled, which is the case if the process is blocked
on I/O (indicated by "D" in the STAT column of ps(1), also
called the "disk-wait" state). That can happen if the
hardware is broken (disk, controller, cable), so an I/O
request doesn't return. It can also happen if there are
NFS hiccups, as seems to be the case here.

As soon as the "D" state ends, the process becomes runnable
again (i.e. it's put on the schedulers "run queue"), which
means that it'll get a CPU share, and the SIGKILL signal
that you sent it before will be processed, finally.

Some background information: Each process has a bit mask
which stores the set of received signals. kill(2) (and
therefore also kill(1)) only sets a bit in that bit mask.
The next time the process is scheduled onto a CPU, the mask
of received signals is processed and acted upon. That's
not FreeBSD-specific; it works like that on almost all UNIX
systems. Why does it work that way? Well, if signals were
processed for processes not on the CPU, then there would be
a "hole": A process would be able to circumvent the
scheduler, because signal processing happens on behalf of
the process, which means that it runs with the credentials,
resource limits, nice value etc. of that process. Well, in
theory, a special case could be made for SIGKILL, but it's
quite difficult if you don't want break existing semantics
(or creating holes).

Best regards
Oliver

--
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.

"UNIX was not designed to stop you from doing stupid things,
because that would also stop you from doing clever things."
-- Doug Gwyn
_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • [PATCH] generic signal code (small new feature - userspace signal mask), kernel 2.6.16
    ... This is a proposed addition to the linux kernel to reduce the overhead required to mask signals. ... The intended usage is an application with critical sections that need to be guarded against deadlock by preventing signals from being delivered whilst inside one of the critical sections. ... Currently such applications may be very heavy users of the sigprocmask syscall, this proposal provides an additional signal mask stored in userspace that can be updated with a simple store rather than a syscall. ... The *address pointer points to wherever you've decided to keep the userspace sigmask in your thread. ...
    (Linux-Kernel)
  • Re: [take19 0/4] kevent: Generic event handling mechanism.
    ... That's far more expensive than using a mask under control of the program. ... signals are just usual events. ... commonly used poll/select/epoll use timespec'. ... Fact is therefore that poll plus its spawn is the only interface ...
    (Linux-Kernel)
  • Re: [PATCH] generic signal code (small new feature - userspace signal mask), kernel 2.6.16
    ... overhead required to mask signals. ... Tell kernel about all blocked signals -> Note new kernel mask ... This does do the whole signal delivery dance twice if it gets unlucky, ...
    (Linux-Kernel)
  • Re: system(), popen(), and SIGCHLD
    ... > blocks SIGCHILD as required by systemit may break popen, ... you are done with your manipulations, you can restore the old value. ... Otherwise you could end up failing to respond to signals that you were ... may not be delivered after the mask is removed. ...
    (comp.unix.programmer)