Random crash and/or reboots
From: Jack L. Stone (jackstone_at_sage-one.net)
Date: 09/07/03
- Previous message: Scott Ballantyne: "linux_base"
- Next in thread: Chuck Swiger: "Re: Random crash and/or reboots"
- Reply: Chuck Swiger: "Re: Random crash and/or reboots"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Sun, 07 Sep 2003 10:29:00 -0500 To: freebsd-questions@freebsd.org
Mail server: 4.8-RELEASE-p3
A while back, on a couple of occasions, I posted a query about some bad
behavior on my mail server. For the past several months, it has been either
crashing/reboot or just rebooting. It's ALWAYS triggered by a SSH login,
but at random and ONLY at the "su" to root -- usually the most time before
reboot is about 2+ weeks and then contrasted by 2 in a row right after the
reboot -- actually no pattern. It has never happened directly at the console.
I have replaced every single piece of hardware, e.g., PSU, cables, NICs,
including finally a switching of the whole machine, except for the hard
disk that contains the system. That had to remain in the new machine. Even
then, I have moved the entire system & contents to another new HD. Thus, I
concluded it to be a software problem.
There are no indications of anything in the logs, and no core dumps. It
just stops and reboots, and any random time it pick. Only a couple of times
it has crashed without the remote login.
One tip was that I might have stale NFS mountabs -- cleared them out, but
problem persisted.
The above tip was suggested when I mentioned that on a couple or more of
the occurrences, I managed to get to the console quickly enough to see (in
bright bold) "lockmgr locking against myself" -- or close to that. My
google of that error does mention stale mounts, but mostly about esoteric
code stuff. No fix found anywhere.
Then, on this list, I saw the thread about other having mysterious reboots
and one suggestion was to run lsof(8) on continuous loops so that a log
file would be captured of open files when these reboots occurred. I have
captured 6 of these logs. I don't see anything that jumps out as being a
common file problem. I have placed 6 text files at the URLs below
containing only 300 lines of each log, which should contain enough info for
a comparison. (I let the logs grow to 200MB before restarting the lsof loop
each time -- of course these samples are chopped off at the moment of
crash/reboot along with the 300 other files before that moment)
I am at a loss, other than rebuilding the system from scratch, but that is
no assurance of a fix. The one thing unique here is that it is the mail
server and runs spamd (spamassassin-2.55), spamass-milter-2.0 (which has a
lock file) and procmail-3.22 (which does a lot of locking).
I am suspicious of the locking going on with the above spam-fight programs,
which may clash when a SSH login & su occurs. I believe a lock is required
for it too...??
Would appreciate anyone's time and efforts to look at these files and see
if anything is spotted that I don't see. the most recent is #6-lsof.txt and
works backwards. The 6-lsof.txt was just this morning.
http://sageweb/tmp/1-lsof.txt
http://sageweb/tmp/2-lsof.txt
http://sageweb/tmp/3-lsof.txt
http://sageweb/tmp/4-lsof.txt
http://sageweb/tmp/5-lsof.txt
http://sageweb/tmp/6-lsof.txt
Much obliged!
Best regards,
Jack L. Stone,
Administrator
SageOne Net
http://www.sage-one.net
jackstone@sage-one.net
_______________________________________________
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org"
- Previous message: Scott Ballantyne: "linux_base"
- Next in thread: Chuck Swiger: "Re: Random crash and/or reboots"
- Reply: Chuck Swiger: "Re: Random crash and/or reboots"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]