Re: FreeBSD 4.11 P13 Crash
- From: Peter Jeremy <peterjeremy@xxxxxxxxxxxxxxxx>
- Date: Wed, 1 Mar 2006 05:30:49 +1100
On Mon, 2006-Feb-27 20:52:57 -0500, Carroll Kong wrote:
Okay this time my kernel was recompiled so there are no modules to make it
easier to see all of the symbols.
If you cd to your kernel build directory (eg /usr/src/sys/compile/DAEMON)
and run 'make gdbinit' and then use kgdb in that directory, there are
a number of functions to let you load KLD symbols.
Sometimes the box cycles through the fatal traps 12. Other times it does...
not.
This box was stable before I upgraded from 4.9->4.11.
It's always possible that you've hit a software bug. Would it be practical
to downgrade to your 4.9 configuration and see if the problem goes away?
[Note that ths does not totally rule out hardware as the changed memory
footprint may reveal a hardware problem].
I have since swapped the RAM, motherboard, RAM again (I bought another stick
thinking maybe my new RAM was coincidentally bugged), one of the Intel NICs,
and my 3Ware controller. The problem still occurred and actually more
frequently. The usual frequency was about 14 days or so. It just crashed
in less than 23 hours and then again within 25 minutes.
Assuming a similar system load[*], this does suggest failing hardware.
My suspicions would be system cooling or PSU. Your P4 should just
throttle back if it gets too warm but other parts of your system (RAM,
northbridge, southbridge etc) may start mis-behaving if they get too
warm.
- PowerSupply (I suppose anything is possible, please note it is on an APC
UPS, but the power supply might be delivering bad juice?)
I'd put this as the likely culprit - consumer-grade PSUs are not
conservatively rated and modern systems put quite a strain on the
power supplies (in terms of very high dI/dt loads).
year in the past. As a note, the problem is NOT load related. In fact, one
time the fatal panic said the running process was "idle". :)
[*] A corrupted word in memory can sit around for a relatively long time
before something de-references it. A lot of packet handing code exists
at interrupt level and so will only trigger when a packet arrives - even
if the system is otherwise idle.
--
Peter Jeremy
_______________________________________________
freebsd-hackers@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@xxxxxxxxxxx"
- Follow-Ups:
- RE: FreeBSD 4.11 P13 Crash
- From: Carroll Kong
- RE: FreeBSD 4.11 P13 Crash
- References:
- FreeBSD 4.11 P13 Crash
- From: Carroll Kong
- FreeBSD 4.11 P13 Crash
- Prev by Date: Re: Re: Fastest timecounter ?
- Next by Date: Re: question about preemption code
- Previous by thread: FreeBSD 4.11 P13 Crash
- Next by thread: RE: FreeBSD 4.11 P13 Crash
- Index(es):