Re: '/etc/nfs stop' stopping more than NFS



Roger Cornelius wrote:

> > > OSR507 w/MP4,UP3
> > >
> > > '/etc/nfs stop' erroneously kills boatloads of processes it shouldn't.

> numountall calls fuser to generate the list of processes then invokes
> kill to kill them. fuser does actually report those processes and I
> can duplicate this by running it at the command line.

Ok, then `fuser` is going wrong. No further research inside
`numountall` is needed...

As I said before, I don't have current access to an OSR5 system with NFS
client mounts. In my past experience, I've observed that `fuser` does
work correctly on NFS mount points (also, clearly, it must do, or your
complaint would be universal -- all people shutting down NFS would
experience process overkill).

So, let's try some other stuff.

1. Unmount the problem filesystem, run the same `fuser` command. Same
result? Expected result: some error message about the argument being
bad.

2. Remount the problem filesystem, test `fuser`. Is it any better?

3. Test `fuser` on any other NFS client mount you currently have mounted
(don't mount a new one specifically for this test, just count it as
"untestable" if you have no others).

4. Mount a different filesystem from the same Linux client; test.

5. Mount a filesystem from a different Linux client; test.

6. Mount a filesystem from some other sort of system (non-OSR5); test.

7. Mount a filesystem from another OSR5 system; test.

> Here's the relevant entry from /etc/default/filesys:
>
> bdev=cmilinux:/u/nfs/exportfs/cmilar-001 \
> mountdir=/u/nfs/importfs/cmilinux mount=yes fstyp=NFS \
> rcmount=yes mountflags=
>
> And the fuser command I'm using, which is essentially what the usage in
> numountall boils down to:
>
> fuser cmilinux:/u/nfs/exportfs/cmilar-001

Should be fine.

> The kill in numountall is definately the culprit. Running lsof on the
> fs mount point produces no output, and lsof on the pids produced by
> fuser produce no output pointing to the fs.

So `fuser` is erroneously reporting processes to be using that mount,
and `lsof` correctly observes that they are not.

Ah, here's another test to run:

8. Start two background processes, one of which definitely does not, and
one definitely does, have a file open on that filesystem:

sleep 1000 < /etc/termcap &
sleep 1000 < /u/nfs/importfs/cmilinux/SOME-FILE-THAT-EXISTS &

Now run the `fuser` command, saving its output in a file. Search that
file for the PIDs of those two processes. `fuser` will normally report
at PID twice if it has the target file open twice. Do you see that --
i.e. do you see one mention of the /etc/termcap process and two of the
other? Also use `lsof` to probe this. Ummm, I'm not sure if `lsof
cmilinux:/u/nfs/exportfs/cmilar-001` or `lsof /u/nfs/importfs/cmilinux`
is more likely to produce useful results -- check that here, where you
have a definite user of the filesystem.

> I've invoked fuser with both trace and truss but I don't recognize
> anything suspicious in the output, which may just be my lack of
> understanding. The output of those commands can be seen here:
>
> http://tenzing.org/cusm/trace.out
> http://tenzing.org/cusm/truss.out

I checked the `trace`, doesn't look particularly interesting. I can see
that it's very busy doing sysi86(RDUBLK) ("get this process's
information"), which is just what I would expect.

....

All the tests above won't prove anything, I'm just trying to shake loose
some more interesting tidbits that might lead to a resolution.

>Bela<
.