RE: hang with raid, postgresql

From: Don Bowman (don_at_sandvine.com)
Date: 05/31/04

  • Next message: Conrad Sabatier: "Followup re: CD-RW on acd1 not working (bad hardware, it seems)"
    To: 'Doug White' <dwhite@gumbysoft.com>, Don Bowman <don@sandvine.com>
    Date: Mon, 31 May 2004 14:20:32 -0400
    
    

    From: Doug White [mailto:dwhite@gumbysoft.com]
    > On Sun, 30 May 2004, Don Bowman wrote:
    >
    > > From: Doug White [mailto:dwhite@gumbysoft.com]
    > > > On Sun, 30 May 2004, Don Bowman wrote:
    > > >
    > > > >
    > > > > I have a system with 2x 2.8GHz XEON (P4), intel e7501 chipset,
    > > > > 4GB of ram, aac [adaptec 2200s] raid with 4 scsi
    > > > > disks. I have also tried asr (adaptec 2015).
    > > > > I have tried two different motherboards.
    > > > > The only application the machine runs is postgresql,
    > > > > with about ~30 databases, about ~250GB of data.
    > > > >
    > > > > I'm finding the machine locks up solid once a day
    > > > > or so (sometimes more, sometimes less, no pattern
    > > > > of time of day). I know its not a hardware issue, it
    > > > > is reliable with FreeBSD 4.7. I've run through memory
    > > > > test, disk test, etc.
    > > > >
    > > > > There appears to be a correlation between
    > > > > disk activity (postgresql vacuum) and the lockup,
    > > > > but i can't be sure.
    > > >
    > > > Temperature?
    > > >
    > > > What motherboard is it exactly?
    > >
    > > lmmon shows the mobo temperature @ 28C. It is in
    > > an AC-controlled environment (~20C ambient). The system
    > > has 6 blower fans, ducted over the CPU's, with the
    > > copper heat sinks designed for the 3.2GHz XEON.
    >
    > alright so its a pretty beefy server chassis, although it
    > could also be an
    > underperforming power supply or a scsi terminator.

    it has 3 separate power supplies, all have been verified.
    Its the 3rd piece of hardware i've tried.

    >
    > > It has 3 power supplies, each with separate AC
    > > inlet, fed from a UPS with filtered power.
    > > It should have ~150% airflow redundancy, and
    > > ~200% power redundancy.
    > > This is a supermicro X5DPE motherboard.
    >
    > Do you happen to have the IPMI option board for this system?

    No IPMI.

    >
    > Still seems hardware-related to me, although I've found hard
    > hangs caused
    > by buggy optimization on amd64.

    I don't think so. I extensively tested it with freebsd 4.7, memtest86.
    The scsi bus was checked with a scope, and was checked with an
    'ahd' controller so that we could see iuCRC errors, SCB time outs,
    etc (ahd is excellent @ reporting errors, much better than
    any other driver). Two disk tests were run (iozone as a benchmark,
    iotest as a test) for several days.

    I'm pretty sure this is a garden variety sw problem.
    Currently i am suspicious of the acpi... this machine hangs
    on boot if acpi is not enabled, so its hard to test that
    theory :) The hang is in setting up and enumerating pnp isa
    devices. I guess i could expend energy to figure that out.

    My next step (which i'm not looking forward to) is to try
    and solder the TAP connector on and hook up my emulator.
    I really really don't want to do that.

    --don
    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"


  • Next message: Conrad Sabatier: "Followup re: CD-RW on acd1 not working (bad hardware, it seems)"