RE: FreeBSD 4.11 P13 Crash
- From: "Carroll Kong" <me@xxxxxxxxxxxxxxx>
- Date: Tue, 28 Feb 2006 16:48:50 -0500
I've ordered a new CPU and power supply already. After installing those
parts, I hope the problem "goes" away. I would probably bet it's more
likely the power as someone else already mentioned that's a big culprit.
If it still fails after those two changes, then I can consider the
downgrade. I figured my setup can't be that unusual so someone else would
have run into this issue if it was indeed a software bug. Furthermore, I am
biased towards FreeBSD servers. They just aren't buggy beasts by nature!
:)
I don't think it is cooling since the system's temperature is somewhat the
same. I'll take it into consideration though as anything is possible at
this point.
Thanks for the other tips and notes. It's good to have some solid answers!
- Carroll Kong
-----Original Message-----
From: Peter Jeremy [mailto:peterjeremy@xxxxxxxxxxxxxxxx]
Sent: Tuesday, February 28, 2006 1:31 PM
To: Carroll Kong
Cc: hackers@xxxxxxxxxxx
Subject: Re: FreeBSD 4.11 P13 Crash
On Mon, 2006-Feb-27 20:52:57 -0500, Carroll Kong wrote:
Okay this time my kernel was recompiled so there are nomodules to make
it easier to see all of the symbols.
If you cd to your kernel build directory (eg
/usr/src/sys/compile/DAEMON) and run 'make gdbinit' and then
use kgdb in that directory, there are a number of functions
to let you load KLD symbols.
Sometimes the box cycles through the fatal traps 12. Other times it...
does not.
This box was stable before I upgraded from 4.9->4.11.
It's always possible that you've hit a software bug. Would
it be practical to downgrade to your 4.9 configuration and
see if the problem goes away?
[Note that ths does not totally rule out hardware as the
changed memory footprint may reveal a hardware problem].
I have since swapped the RAM, motherboard, RAM again (Ibought another
stick thinking maybe my new RAM was coincidentally bugged),one of the
Intel NICs, and my 3Ware controller. The problem still occurred anddays or so.
actually more frequently. The usual frequency was about 14
It just crashed in less than 23 hours and then again within25 minutes.
Assuming a similar system load[*], this does suggest failing hardware.
My suspicions would be system cooling or PSU. Your P4 should
just throttle back if it gets too warm but other parts of
your system (RAM, northbridge, southbridge etc) may start
mis-behaving if they get too warm.
- PowerSupply (I suppose anything is possible, please noteit is on an
APC UPS, but the power supply might be delivering bad juice?)
I'd put this as the likely culprit - consumer-grade PSUs are
not conservatively rated and modern systems put quite a
strain on the power supplies (in terms of very high dI/dt loads).
year in the past. As a note, the problem is NOT load related. In"idle". :)
fact, one time the fatal panic said the running process was
[*] A corrupted word in memory can sit around for a
relatively long time before something de-references it. A
lot of packet handing code exists at interrupt level and so
will only trigger when a packet arrives - even if the system
is otherwise idle.
--
Peter Jeremy
_______________________________________________
freebsd-hackers@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@xxxxxxxxxxx"
- References:
- Re: FreeBSD 4.11 P13 Crash
- From: Peter Jeremy
- Re: FreeBSD 4.11 P13 Crash
- Prev by Date: Re: Accessing address space of a process through kld!!
- Next by Date: Re: UMA zone allocator memory fragmentation questions
- Previous by thread: Re: FreeBSD 4.11 P13 Crash
- Index(es):