Random crash and/or reboots

From: Jack L. Stone (jackstone_at_sage-one.net)
Date: 09/07/03

  • Next message: Dan Strick: "Does Wine work on FreeBSD 5.1?"
    Date: Sun, 07 Sep 2003 10:29:00 -0500
    To: freebsd-questions@freebsd.org
    
    

    Mail server: 4.8-RELEASE-p3

    A while back, on a couple of occasions, I posted a query about some bad
    behavior on my mail server. For the past several months, it has been either
    crashing/reboot or just rebooting. It's ALWAYS triggered by a SSH login,
    but at random and ONLY at the "su" to root -- usually the most time before
    reboot is about 2+ weeks and then contrasted by 2 in a row right after the
    reboot -- actually no pattern. It has never happened directly at the console.

    I have replaced every single piece of hardware, e.g., PSU, cables, NICs,
    including finally a switching of the whole machine, except for the hard
    disk that contains the system. That had to remain in the new machine. Even
    then, I have moved the entire system & contents to another new HD. Thus, I
    concluded it to be a software problem.

    There are no indications of anything in the logs, and no core dumps. It
    just stops and reboots, and any random time it pick. Only a couple of times
    it has crashed without the remote login.

    One tip was that I might have stale NFS mountabs -- cleared them out, but
    problem persisted.

    The above tip was suggested when I mentioned that on a couple or more of
    the occurrences, I managed to get to the console quickly enough to see (in
    bright bold) "lockmgr locking against myself" -- or close to that. My
    google of that error does mention stale mounts, but mostly about esoteric
    code stuff. No fix found anywhere.

    Then, on this list, I saw the thread about other having mysterious reboots
    and one suggestion was to run lsof(8) on continuous loops so that a log
    file would be captured of open files when these reboots occurred. I have
    captured 6 of these logs. I don't see anything that jumps out as being a
    common file problem. I have placed 6 text files at the URLs below
    containing only 300 lines of each log, which should contain enough info for
    a comparison. (I let the logs grow to 200MB before restarting the lsof loop
    each time -- of course these samples are chopped off at the moment of
    crash/reboot along with the 300 other files before that moment)

    I am at a loss, other than rebuilding the system from scratch, but that is
    no assurance of a fix. The one thing unique here is that it is the mail
    server and runs spamd (spamassassin-2.55), spamass-milter-2.0 (which has a
    lock file) and procmail-3.22 (which does a lot of locking).

    I am suspicious of the locking going on with the above spam-fight programs,
    which may clash when a SSH login & su occurs. I believe a lock is required
    for it too...??

    Would appreciate anyone's time and efforts to look at these files and see
    if anything is spotted that I don't see. the most recent is #6-lsof.txt and
    works backwards. The 6-lsof.txt was just this morning.

    http://sageweb/tmp/1-lsof.txt
    http://sageweb/tmp/2-lsof.txt
    http://sageweb/tmp/3-lsof.txt
    http://sageweb/tmp/4-lsof.txt
    http://sageweb/tmp/5-lsof.txt
    http://sageweb/tmp/6-lsof.txt

    Much obliged!

    Best regards,
    Jack L. Stone,
    Administrator

    SageOne Net
    http://www.sage-one.net
    jackstone@sage-one.net
    _______________________________________________
    freebsd-questions@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-questions
    To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org"


  • Next message: Dan Strick: "Does Wine work on FreeBSD 5.1?"