Re: Random panics with 5.3-REL, SMP

From: Robert Watson (rwatson_at_freebsd.org)
Date: 11/24/04

  • Next message: John Baldwin: "Re: Transparent bridges (a. k. a. HUB-to-PCI bridges)?"
    Date: Wed, 24 Nov 2004 15:35:11 +0000 (GMT)
    To: Hogan Whittall <hogan@ninthgate.net>
    
    

    On Tue, 23 Nov 2004, Hogan Whittall wrote:

    > I'm still getting random panics, however. Doesn't appear to be related
    > to anything in particular and seems to usually happen after being up for
    > 1-2 days. I've attempted to get a coredump but the last panic wedged
    > while dumping to disk. I'm going to be out of town for a week and won't
    > have access to the box, but if anyone has experienced something like
    > this before and knows of a fix, please let me know. Here are the specs
    > of the machine:

    "Random panics" is a little vague as a starting point, but here are some
    thoughts to look at when back from your vacation:

    - Using a serial console to the box, you can reliably gather information
      without the core dump mechanism working.

    - "Random panics" could mean "A lot of seemingly different panics
      happening with relatively frequency", or it might mean "A few similar
      panics, happening at random intervals". It would be useful to clarify
      which it is. Recognizing that you may not be familiar with the intimate
      details of kernel failure modes, the ways in which one might classify
      failures as being "similar" is by the nature of the panic and the stack
      trace to reach the panic. Panics usually fall into two forms: an
      explicit call to panic() by code that has detected a failure of a kernel
      invariant ("this should never happen"), or a page fault ("the kernel
      touched some memory it shouldn't have"). Panics typically print a fault
      description, such as a pointer dereferenced, or the nature of the
      invariant test that triggered. The same message might indicate the
      same problem occuring. A stack trace can be generated using the "trace"
      command in DDB, and is a subset of the information you might get by
      pointing gdb at a core. If the stack traces look similar (especially
      with regard to the functions close to the frame where the panic took
      place), the failure mode might be regarded to be similar also.
      Regardless, when reporting panics, the panic line or header of the fault
      report are excellent starting points.

    - In terms of debugging information, it would be very useful if you could
      hook up a serial console, and when a panic occurs, send the output of
      "show pcpu" and "show trace". If an SMP box with an SMP kernel, run
      "show pcpu" for each cpu, and trace the active threads on each. The
      output of "ps" is usually pretty valuable, as it will show what the
      system was doing, and if many threads are waiting fore something, it
      will show what they are waiting for. With file system related panics
      or hangs, the output of "show lockedvnods" is often also very useful, as
      it will show what file system objects were being actively used, and by
      what threads. If running with WITNESS (see below), "show locks" can be
      very helpful, as it will assist in understanding and debugging the
      synchronization state of the kernel.

    - If a bug leads to an eventual panic, that problem caused by the bug will
      sometimes be better described if you have some of the kernel debugging
      kernel enabled. For example, INVARIANTS and/or WITNESS. Depending on
      the impact to performance you can take on the box, you might want to try
      some features, then others. Features like INVARIANTS may also help
      catch the problem earlier, making the problem easier to diagnose.

    I've found the single most useful tool in debugging failure modes is a
    serial console, as it provides ready scroll-back to earlier console
    output, a fairly reliable ability to enter the debugger using a break, as
    well as functionality like remote DDB, logging of DDB output, etc. I've
    heard people report very similar benefits and experiences with firewire
    debugging, but since I don't really live in the world of firewire, I'll
    point at serial ports :-).

    Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
    robert@fledge.watson.org Principal Research Scientist, McAfee Research

    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"


  • Next message: John Baldwin: "Re: Transparent bridges (a. k. a. HUB-to-PCI bridges)?"

    Relevant Pages

    • Re: Kernel panic on PowerEdge 1950 under certain stress load
      ... under load related to network, get panic after different time intervals. ... never had kernel panics to deal with). ... I will try to get a kernel trace -- it may not happen for awhile since I ...
      (freebsd-hackers)
    • Re: ULE status, invalid load, buildkernel times.
      ... i've just updated my kernel and it paniced right after ... KDB_TRACE shows a trace during the panic. ... The problem is that ddb is ... activated before the machine actually panics, ...
      (freebsd-current)
    • Re: panic in propagate_priority w/ postgresql under heavy load
      ... > the kernel down with debug code seems to avoid the panic. ... result of a corrupted mutex, and when the mutex code goes to perform ... you can trace them to find one that is in panic. ... overhead), and when the system panics, you'll get a db> prompt. ...
      (freebsd-hackers)
    • Re: panic after ifconfig gif0 destroy
      ... Can you recompile a debugging kernel with DDB support? ... panics it will drop the console into the DDB debugger. ... If you can hook it up to a serial console you can just copy&paste ...
      (freebsd-current)
    • Re: HEADS UP: netipx mega-MFC (1/2)
      ... 5.3-S as of yesterday now panics again. ... Compile your kernel with DDB/KDB, ... Make sure you have a kernel with debugging symbols on-hand. ... serial console or not, but I find that a serial console is very ...
      (freebsd-stable)