Re: Sudden reboot

From: Methusela Oreiley (wilsonag_at_gte.net)
Date: 06/26/03


Date: Thu, 26 Jun 2003 02:01:07 GMT


[[ This message was both posted and mailed: see
   the "To," "Cc," and "Newsgroups" headers for details. ]]

There are two ways to reboot UNIX (in theory):
   1 - Go to run level 6 (commanded by root or power monitor).
   2 - Kernel executed panic() (will eventually hit run level 6).

That is a very short list. Lack of a log entry exclused root command
and power monitor.

panic() is executed to "bail" when a "bad thing" happens. Bad things
usually occur in situations where software reaches an invalid state.
This is usually due to the following.
   A - program error (buggy driver or function)
   B - hardware error (broke or incompatible hardware)

This is also a very short list.

Intermittent problems (such as this one) can only be captured using
crash files.

When an application execute panic(), it will issue an API call to the
kernel that will will immediately suspend the process (zombie), the
kernel will record memory from the application into a crash file for
debugging, then the process will be harvested. This can be used to
debug the application. When the kernel performs panic(), it becomes
unable to handle things like file systems, so crash files are recorded
by an entirely different means, but the usage is identical
(troubleshooting).

The panic() function in the kernel calls a dump routine which will
store off kernel RAM into the swap partition. In the next boot (which
hapens right away), the crash utility is run. If a kernel dump is found
in the swap partition by the crash utility, it will compress it and
save it to a file (probably in /var or /adm). This occurs before
virtual memory is started.

The kernel may need to be reconfigured or rebuilt to obtain a crash
file, and a local disk needs to be connected (kernel dump will probably
not work over NFS).

The debug utility is used to analyze the crash file (probably kdb). You
will need to specify the location of the kernel file and the location
of the crash file. There are four important functions required for
debugging (use "?" to show a list within the debugger).
   Process list
   Process selection
   Stack walkback
   Register dump

First, dump out the process list. Second, select each process (one at a
time) and display the stack and registers. You are looking for the
register set for a process that showns (panic) in the stack. This might
be the "active" process (the last one running).

The chain of events that ended up at the panic() function may start
with an interrupt. Most kernel events would start with an API call from
an application, but an unimplemented interrupt or error interrupt could
have occured. Interrupts will stop whatever function was active and
start a new function. Interrupt service functions probably have "intr"
in the function name (may be easy to recognize), and all interrupt
handling functions are in kernel memory. The interrupt may have been
triggered by an API call from a process that became active shortly
before panic(). In other words: the functions later in the stack may
not have been called by those listed earlier in the stack.

Unfortunately I do not know enough about SPARC to help you further.
Once you get this far, you should be able to get further assistance to
isolate the cause.

You may get more information by typing "kernel+driver+development" into
the search menu at http://docs.sun.com.

Best of luck.

Greg Wilson
nanoatzin99@netscape.net

---------

In article <db4636cf.0306200450.e27c3d9@posting.google.com>, Tanya
<tanya.levitsky@getronics.com> wrote:

> I am monitoring Unix Servers for customer, and yesterday I had one of
> the servers reboot suddenly without any errors or warnings. I checked
> /var/adm/messages, authlog, no users were loged on, and there is no
> evidence of any problems. The server is Sun 3800, split in 2 domains
> (only one went down), running Solaris 8. Has something like that ever
> happen to anyone? Am I missing anything? I would really appretiate any
> input. Thanks.



Relevant Pages

  • Re: Interrupt context...
    ... > gone through most of the posts on interrupt in usenet. ... > kernel stack and ISR is executed. ... More may be saved depending on the architecture. ... Here the kernel have assembler code to save all general ...
    (comp.os.linux.development.system)
  • Re: VxWorks Interrupts
    ... Where could find the memory map when VxWorks is running or booting... ... That's where the ISR_STACK_SIZE is set for the kernel. ... Some of this gets changed for cpu's that don't have interrupt stacks, ... and gets permuted for cpu's where the stack grows upwards toward higher ...
    (comp.os.vxworks)
  • x86_64 kernel stack organization
    ... I could not find a document that described the x86_64 kernel stack ... Interrupt stack. ... Used for external hardware interrupts. ...
    (Linux-Kernel)
  • Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
    ... > running in user-space or kernel when an interrupt goes off then I would ... The per arch is actually easy, and I would write it, but I ... The idea is that an interrupt from user space will be the ONLY thing on ... the stack while an interrupt from the kernel will have kernel stack ...
    (Linux-Kernel)
  • [Full-disclosure] PHRACK 64: ATTACKING THE CORE
    ... - The Slab Allocator ... - Slab overflow exploiting: ... - Forcing a kernel path to sleep ... - Stack Frame Flow Recovery ...
    (Full-Disclosure)