Debugging server hangs in 7.2-RELEASE




I am so completely running out of ideas on how to debug this, maybe someone else has some ideas?

The problem appears to be that very suddenly, the disk busy (according to vmstat) skyrockets to >100 (from 0) and then the 'runnable but swapped' column slowly rises ...

One person suggested that for them, they saw similar when msi/msi-x was enabled ... after searching the source code, I found that msi was used in the bge driver, but I couldn't find msix used anywhere else on that machine, so disabled msi ... its still exhibiting the issue ...

I get no errors on the serial console to indicate any problems, and until a relatively recent upgrade of the kernel ( (I can't give an exact date), this server was one of my most solid ...

I figure there is a single process that is starting up on the machine that is causing this, but no matter what I try, it is eluding me.

I have KDB enabled in the kernel, and the serial console setup so that I can break to it ... but when this problem happens, doing 'cr ~ ^b' through the serial console doesn't do anything, or, it just prints the message about breaking to the debugger and then hangs there ...

My next option is to start time travelling backwards to see if I can find a 'stable kernel' again, but if it is just one process causing this, then going back to older kernels isn't necessarily going to accomplish anything ...

Is there something else I can do here to debug this? Its hard to believe we are such an advance OS, but debugging issues like this is so elusive :(



----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@xxxxxxx MSN . scrappy@xxxxxxx
Yahoo . yscrappy Skype: hub.org ICQ . 7615664
_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: Firewire blues
    ... I compiled a kernel with exactly the same options that you cited below. ... it entering the debugger and waiting for the remote gdb attach. ... When I try to attach from the debug machine, ...
    (freebsd-hackers)
  • Re: RELENG_7 and HEAD: bge causes system hang
    ... I don't know the details of this particular situation, but I can speak to at least one known issue in DDB: right now, getting into DDB from a serial console is a very quick and straight forward path, requiring only the delivery of the serial interrupt and execution of its fast handler. ... I've found breaking into the debugger much easier from a serial console for several years. ... As Giant has been pushed off larger and larger parts of the kernel, the syscons break path has gotten a lot more reliable. ... There will always be certain cases where a console break will not work, and those include cases where interrupts are disabled on all CPUs. ...
    (freebsd-current)
  • Re: RELENG_7 and HEAD: bge causes system hang
    ... Why did the system hang and not allow the kernel debugger ... much easier from a serial console for several years. ... Is there any way to forcibly enter the DDB on a serialless laptop, ...
    (freebsd-current)
  • Re: Unable to get debug message which is in kernel
    ... kernel image on to the target and I have ensured that ARMInit function ... I don't get the debug message " ARMInit done" printed on the ... I take it that you have no JTAG debugger? ... I am using the same OEMDebugInit, OEMWriteDebugString, ...
    (microsoft.public.windowsce.platbuilder)
  • Re: Debug and RELFSD
    ... There have been some notes in the past about problems with the kernel ... but if I build a debug version and load it via debug ... Kernel debugger connected. ...
    (microsoft.public.windowsce.platbuilder)