Re[2]: FreeBSD Machines dieing, we've tried so much....

fenix_at_ramb.com.ua
Date: 06/22/05

  • Next message: Chad Leigh -- Shire.Net LLC: "Re: FreeBSD Machines dieing, we've tried so much...."
    Date: Wed, 22 Jun 2005 19:10:46 +0300
    To: Matt Juszczak <matt@atopia.net>
    
    

    Hello, Matt.

    >>The vast majority of panics are hardware-related. It is rare nowadays
    >>for a usermode program to make the system panic. In particular you said
    >>the problem happens more under load. That really points even more to a
    >>hardware problem - bad CPU cache ram, bad ram, scsi termination, that
    >>sort of thing.
    >>
    >>Ted
    >>
    >>

    > This is kind of going to be a blanket post to all the recent suggestions
    > to me. I appreciate suggestions :) Ted, sorry, my other posts had
    > dmesg and hardware specs, etc. I just couldn't remember the subject line
    > of that thread. I'll be more descriptive here.

    > We have two different servers crashing. Both are SMP, but on different
    > hardware. We have five freeBSD servers in total, and only two are
    > affected. That is why I do not believe this is a hardware problem.

    > In any case, the machines are in a cold room where the temperature is
    > constantly maintained. 20 other servers in there are perfectly stable,
    > with no probs.

    > This particular machine that crashed last night while running portsdb
    > -uU is a Super Micro machine, with hyperthreading disabled in the bios,
    > dual CPU 3.06 ghz, with 4 gigs memory. We ran mem test on orion (the
    > machine that crashed last night) a week or so ago, and it found 70,000
    > ECC errors. Those were fixed and that machine has been stable until
    > last night. I've now disabled SMP support, we'll see if that keeps it
    > stable or not. Portsdb -uU ran without problems after I disabled SMP.

    > As far as uranus, the other box (we keep a planet scheme for a certain
    > set of servers), we ran memtest86 and found no errors at all. That box
    > crashed about two days ago but has been stable since. It has not lasted
    > more than a week without doing a kernel trap and freezing.

    > It seems that both these servers have this problem. Out of the five
    > FreeBSD servers we have, these two are the ones with the highest load.
    > Maybe a higher load on the other three servers would cause the same
    > problem. I agree with you that this is a hardware problem, but on more
    > than one server with two different architectures and our highest load
    > makes me re-consider.

    > If this is truly a bug in FreeBSD 5.4-RELEASE, maybe this is something
    > that has been fixed in -stable? I will compile a debug kernel today and
    > try to provide a trace to the problem. I'll do it on which ever server
    > crashes next.

    I had same situation with to different high loaded servers (both SMP, with 8Gb of
    ram, and HT enabled,), with 5.4 Release, after disabeling HT and cvsup
    OS to 5.4-stable all working fine without any problems, last reboot was 28
    days ago.

    > _______________________________________________
    > freebsd-questions@freebsd.org mailing list
    > http://lists.freebsd.org/mailman/listinfo/freebsd-questions
    > To unsubscribe, send any mail to
    > "freebsd-questions-unsubscribe@freebsd.org"

    -- 
    Best regards,
    Sergey S. Ropchan
    _______________________________________________
    freebsd-questions@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-questions
    To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org"
    

  • Next message: Chad Leigh -- Shire.Net LLC: "Re: FreeBSD Machines dieing, we've tried so much...."

    Relevant Pages

    • Re: Active Directory Question
      ... "As to whether we go smaller servers to spread the load or a large unit has ... Some applications pick a domain controller and don't spread the load. ... However, if your app is ...
      (microsoft.public.windows.server.active_directory)
    • Re: Active Directory Question
      ... > Some applications pick a domain controller and don't spread the load. ... > application you'll need to know how the app will behave before you can ... >> As far as production servers go, we tend to purchase IBM Xseries servers ...
      (microsoft.public.windows.server.active_directory)
    • Re: [KORG] Re: kernel.org lies about latest -mm kernel
      ... on the frontend machines our basic working set no longer stays resident ... a much higher I/O load. ... their presence generates traffic on all web servers regardless of the ...
      (Linux-Kernel)
    • Re: FreeBSD Machines dieing, weve tried so much....
      ... >> the problem happens more under load. ... > other posts had dmesg and hardware specs, ... > We have two different servers crashing. ... > on which ever server crashes next. ...
      (freebsd-questions)
    • Re: FreeBSD Machines dieing, weve tried so much....
      ... dmesg and hardware specs, etc. ... We have two different servers crashing. ... these two are the ones with the highest load. ... crashes next. ...
      (freebsd-questions)