Re: panic in propagate_priority w/ postgresql under heavy load

From: John Baldwin (jhb_at_FreeBSD.org)
Date: 09/20/05

  • Next message: Ulf Zimmermann: "HP DL140g2 with serial ata hangs at ata0-master identifying"
    To: Koen Martens <fbsd@metro.cx>
    Date: Tue, 20 Sep 2005 16:04:43 -0400
    
    

    On Monday 19 September 2005 03:35 pm, Koen Martens wrote:
    > Vinod Kashyap wrote:
    > > You seem to be booting off of a 9000 (twa) controller and not 7000/8000
    > > (twe).
    > > It could be because of a 9000 firmware bug that you are not being able
    > > to
    > > get the dump. The firmware wrongly interprets physical address 0x0 as
    > > invalid
    > > during dumps, and fails the operations. This bug will be fixed in
    > > future
    > > firmware releases.
    >
    > Ok, it's been a while, here is an update on this.
    >
    > I ran a heavily instrumented kernel for two weeks on the server, it
    > did not crash in that time. I then took out the witness and kdb/ddb
    > stuff, because the decreased performance was a bit of a nuisance,
    > however i retained the ability to obtain a crash dump. I had to
    > limit physical memory, put it on 1.8GB in loader.conf:hw.physmem
    > because swap and physmem are both 2GB. Tested with 'reboot -d' gave
    > me a core dump.
    >
    > Without the debug stuff in the kernel, it crashed within 2 days,
    > same story: postgresql process, function propagate_priority.
    > However, no dump was written to disk :(
    >
    > Furthermore, i've been seeing the same crash (in propagate_priority)
    > on another box in mysql processes. Both servers seem to panic every
    > 2-3 days. I have another server of the exact same hardware
    > configuration, but it is mainly idling most of the time. Haven't
    > seen that one crash yet.
    >
    > I am thinking now that it is a bug in the twa driver, so i'll have
    > to dig in to that. Furthermore, it seems to have to do with some
    > sort of concurrency issue or otherwise timing-sensitive issue,
    > because slowing the kernel down with debug code seems to avoid the
    > panic. But, as i am completely new to the freebsd kernel and don't
    > even know what turnstiles are, i imagine i will have a hard time. So
    > if anyone can offer some help, please :)
    >
    > Ok, thanks for your attention,

    This panic usually happens either because a thread went to sleep while holding
    a mutex (WITNESS will warn you about this when it happens, but as you noted,
    it slows things down). It can also happen perhaps if a thread exits while
    holding a lock or if a thread is blocked on a mutex that is destroyed after
    it blocks on it.

    -- 
    John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
    "Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
    _______________________________________________
    freebsd-hackers@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
    To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
    

  • Next message: Ulf Zimmermann: "HP DL140g2 with serial ata hangs at ata0-master identifying"

    Relevant Pages