Re: 6.0 random freezes



Peter Jeremy said the following on 12/12/05 13:40:

Define "freezing": Does it respond to pings? Can you switch VTYs? Do the num-lock/caps-lock LEDs respond? Do some processes seem to freeze before others?

I used the word "freeze" instead of "crash", because the latter often
gets associated with some errors reported by the kernel in system logs
or on the console. In this case there are absolutely no error messages. I have also remote logging enabled (on another machine over the network), but there's nothing either.


When the thing happens, the server appears to respond to pings for the
first few minutes, but everything goes down until I go to the data canter.

When I plug a keyboard, there's no response at all - no LEDs, no VTYs, Ctrl-Alt-Esc, etc. You might think of "hint.atkbd.0.flags" not being set
properly, but it's right (i.e. unchanged, it appears to default to that
on i386 5.x+) and other machines with identical configuration do accept
keyboard.


I have no information about processes. Only the thing I have is a real time CPU load graph. I have a script tailing the output of a "vmstat cpu 15" and drawing a graph with user/system/idle times, so according to that graph there are no load spikes or unusual variations before the crashes. The usual user/system/idle percentages look like 10/7/83.

I suggest you add the following to your kernel config:
 options         KDB                     # Enable kernel debugger support.
 options         DDB                     # Support DDB.

I just set these along with the DEBUG option below, and got the new
kernel (from 6.0-RELEASE sources dated Dec 9) running on both machines,
so we'll see.

When it hangs, break into DDB (Ctrl-Alt-Esc on the console or BREAK on
a serial console).

As a start, run 'show lockedvnods' and 'ps'.  My guess is that you'll
see a lock that has a number of waiters - which is probably the
culprit.  Use 'panic' or 'call doadump' to get a crashdump and then
you can use kgdb to rummage around once you reboot - see
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebg-gdb.html

I don't have any experience in chasing kernel bugs, so I'm not sure
whether I would be able to get something useful, but I'll try that on the next crash. But if I have no keyboard response I won't be able to save it, right?


I do not know what a serial console is and would need some time to get along with it. Would I get something in addition to what I can get from the standard console?

< makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols

I suggest you add this back in. Without it, you can't debug any crash dumps that you manage to get (and add "dumpdev" to your rc.conf).

My bad, I realized that it's kind of harmless, but it was weeks later
after I put the box in production. It's back there now.

The "dumpdev" variable seems to default to AUTO, i.e. trying to use the first swap device if it's bigger than the RAM (in my case yes), so I guess I don't need to touch it.

Whilst I realise that you can't have production machines freezing on
schedule, your assistance in providing more information about your
problem will help make 6.x more stable.

Yes, I know and I will try. Today I already had a couple of crashes
(got lucky, no nasty data corruptions this time), and I cannot afford this to continue.


I'm already working on the downgrade, but most likely I will have at least one of these 2 machines still running 6.x during the next day or two.

After the downgrade we could eventually set a test bed and start hammering it with requests. The problem would be how to trigger the crash and whether we would be able to reproduce it at all.

Thanks for the prompt reply!

Regards,
Atanas
_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: solaris 10 x86, single user console
    ... What happens if you append "-s" to the kernel line in the main grub entry? ... doing i don't get anything useful at the console. ... i went through the BIOS settings and the video section (or any of it, ... i've used LU on other machines ...
    (comp.unix.solaris)
  • RE: fedora-list Digest, Vol 10, Issue 241
    ... > On the windows side everything works just fine from the win XPsp2 machines ... > the Windows XP firewall and then I could share/connect to those. ... Differences between the kernel source in FC2 and the kernel ...
    (Fedora)
  • miibus, ed0, and the realpath security advisory
    ... dmesg and kernel config below. ... miibus0: <MII bus> on dc0 ... sc0: <System console> on isa0 ... # The `bpf' pseudo-device enables the Berkeley Packet Filter. ...
    (freebsd-questions)
  • [parisc] 2.6.24-rc3 (64-bit, smp) fails to boot on 9000/785/J5600
    ... v2.6.24-rc3-19-g2ffbb83 fails very early in the boot procedure. ... kernel was compiled using gcc 4.1.2. ... you may need to switch your console. ... # IPVS transport protocol load balancing support ...
    (Linux-Kernel)
  • [parisc] 2.6.24-rc3 (64-bit, smp) fails to boot on 9000/785/J5600
    ... v2.6.24-rc3-19-g2ffbb83 fails very early in the boot procedure. ... kernel was compiled using gcc 4.1.2. ... you may need to switch your console. ... # IPVS transport protocol load balancing support ...
    (Linux-Kernel)

Loading