Re: '/etc/nfs stop' stopping more than NFS



Roger Cornelius wrote:

> > 5. Mount a filesystem from a different Linux client; test.
>
> No other linux client is available.
>
> > 6. Mount a filesystem from some other sort of system (non-OSR5); test.
>
> I'll have a Mac OS X system next week, but no non-OSR5 systems until
> then.
>
> > 7. Mount a filesystem from another OSR5 system; test.
>
> Don't have one of these either.

These are unfortunate gaps in the testing, but I guess we have to work
with what you have. Besides, it's at best unlikely that they would have
shown any differences...

> > 8. Start two background processes, one of which definitely does not, and
> > one definitely does, have a file open on that filesystem:
> >
> > sleep 1000 < /etc/termcap &
> > sleep 1000 < /u/nfs/importfs/cmilinux/SOME-FILE-THAT-EXISTS &
> >
> > Now run the `fuser` command, saving its output in a file. Search that
> > file for the PIDs of those two processes. `fuser` will normally report
> > at PID twice if it has the target file open twice. Do you see that --
> > i.e. do you see one mention of the /etc/termcap process and two of the
> > other? Also use `lsof` to probe this. Ummm, I'm not sure if `lsof
> > cmilinux:/u/nfs/exportfs/cmilar-001` or `lsof /u/nfs/importfs/cmilinux`
> > is more likely to produce useful results -- check that here, where you
> > have a definite user of the filesystem.
>
> 'fuser cmilinux:/u/nfs/exportfs/cmilar-001' included once, the pid of
> the process with the file open on the nfs filesystem, but did not
> include the process with /etc/termcap open. 'lsof
> /u/nfs/importfs/cmilinux' correctly lists the process with the open
> file on the nfs filesystem. lsof does not accept the host:file syntax.

So it _is_ able to properly distinguish on some level.

BTW, you now have the bare bones of a workaround: fix `numountall` to
use `lsof` on the NFS mount point, instead of `fuser` on the host:dir.
You would still have the mysterious bad behavior in `fuser`, but I think
`numountall` is the only thing that cares if it works with NFS. (Might
be the only script in the system that uses `fuser` at all?)

> There is an interesting correlation between the number of times fuser
> reports a pid and the number of streams/network files (lsof's
> terminology) reported for the pid by lsof -p. I can't include a sample
> of the results without google word-wrapping it, so the full list of
> pids and lsof output can be found here:
>
> http://tenzing.org/cusm/fuser-lsof.out
>
> The first line of each group (self-explanitorily) lists the pid and
> number of times fuser reported it. Subsequent lines are the output of
> lsof -p for that pid. One exception to the correlation seems to be the
> data for pid 322 which fuser reported 4 times but which lsof reports 6
> open streams files.
>
> The script that produced the list is here:
>
> http://tenzing.org/cusm/fuser-lsof.sh
>
> So can anything be made of the correlation? It looks like fuser is
> erroneously attributing any open streams/network files it finds.

Hmmm. The correlation is very strong (`inetd` 12 files, `snmpd` 9,
etc.) I would readily believe that the _first_ step of fuser's
procedure to determine whether a file is on NFS is to ask whether it is
STREAMS-based. There must be many steps past that, which are apparently
somehow getting short-circuited.

A couple of other things to try:

1. relink the kernel, allowing it to "rebuild the kernel environment".
Boot from that -- any change? I'm thinking that somewhere along the way
it must be asking "is this device major/minor I dug out of the inode
table one that I believe relates to NFS?". If some device nodes are out
of sync with the kernel build, who knows what could happen.

2. What is the major number of /dev/nfsd? There have historically been
a few bugs associated with major numbers above 127. I thought they had
all been flushed out long ago, but you never know.

2a. If it _is_ 128 or higher, you can surgically change it. This would
be a fiddly procedure, let's not discuss it until we know whether it
applies.

3. You're on OSR507 + MP4, which is newer than I've used. I don't know
if MP4 replaced the `fuser` binary. You can find out, generically, by
doing (as root):

# cd /opt/K/SCO/Unix
# find . -name fuser -print

If this outputs more than one line, the pathnames will tell you which
patch(es) replaced it. Dredge up the various old binaries (just copy
them elsewhere with different names -- "/tmp/fuser.mp3" or whatever).
The patch installs will have made the old binaries not executable, perms
000. chmod them to 700 and test them. Any differences?

>Bela<
.



Relevant Pages

  • Re: /etc/nfs stop stopping more than NFS
    ... >> file on the nfs filesystem. ... > use `lsof` on the NFS mount point, instead of `fuser` on the host:dir. ... >> erroneously attributing any open streams/network files it finds. ...
    (comp.unix.sco.misc)
  • Re: /etc/nfs stop stopping more than NFS
    ... >> numountall calls fuser to generate the list of processes then invokes ... Unmount the problem filesystem, ... Test `fuser` on any other NFS client mount you currently have mounted ... Running lsof on the ...
    (comp.unix.sco.misc)
  • Re: Filesystem wont unmount
    ... Failing that, lsof, if you have it. ... I've had a couple of instances of files I couldn't unmount in the past, ... Has anyone ever had a filesystem they could not unmount? ... I looked at it with "fuser" and it was clean. ...
    (AIX-L)
  • Re: nfs problem
    ... you can get lsof from www.bullfreeware.com ... server hosting the NFS share crashed. ... server) we are unable to umount the filesystem. ...
    (AIX-L)
  • Re: /etc/nfs stop stopping more than NFS
    ... >> We're NFS mounting a linux filesystem on our OSR5 box. ... >> which processes to kill. ... > `fuser` is an ugly kernel-diving program. ...
    (comp.unix.sco.misc)