Re: 5.1, Data Corruption, Intel, Oh my! [patch] - Fatal trap 12

From: Terry Lambert (tlambert2_at_mindspring.com)
Date: 08/13/03

  • Next message: Terry Lambert: "Re: Crash in g_dev_strategy / CURRENT as of yesterday."
    Date: Tue, 12 Aug 2003 23:38:07 -0700
    To: Peter Edwards <peter.edwards@openet-telecom.com>
    
    

    Peter Edwards wrote:
    > > ... He might also want to look for any function pointer
    > > that takes 5 arguments;
    >
    > Nice tactic, but misleading in this case, methinks.
    >
    > I assume your basing this on the 5 arguments shown in the backtrace.
    > The 5 arguments passed to the "function" at 0x5949 is probably just
    > defaulted; I doubt it has any significance.
    >
    > Long version:
    >
    > ddb tries to work out the number of arguments passed to a function at a
    > particular stack frame first based on symbolic information for the
    > function itself (obviously not an option here), then based on the
    > instruction at the return address in that frame. This works at best
    > sporadically in the face of -O compiled C code. The fact that there's no
    > function under the "(null)" would strongly suggest that ddb got confused
    > with the frame pointer here and didn't get any useful information with
    > which to work out the argument count.

    I don't know how accurate this assumption is. I don't thing
    DDB is confused, because the NULL is consistent with the reported
    fault address. Even if we assume that it's confused, the PC is
    enough information to locate the function pointer dereference that
    is occurring. I also have to assume that the function pointer is
    in scope, since it's able to call through it to fault the kernel.

    > In the face of failure, ddb just wildly prints out the 5 words under the
    > stack pointer.

    I did suggest that the correct thing to do would be to decode
    what those words were pointing at, and thereby what types the
    arguments were...

    > Given that there's no real function at 0x5949, the stack frame won't
    > have been set up at all, the frame pointer is still pointing to the
    > caller's frame, which could be foobar anyway.

    The stack frame is set up, since you don't run at all without
    a stack, period. The stack may be corrupt, in this case, but
    that's an incredibly rare failure mode recently, and mostly
    this still looks like a NULL pointer dereference to me.

    > What can be useful is to print out the values on the stack symbolically.
    > (in gdb, p/a ((void **)$sp)[0]@100. I'm sure ddb can do something
    > similar, but no idea how...). And hope to find the caller's return
    > address lying in the output.

    The best way would be to take a system dump, and then use GDB.

    It turns out that, for the most part, you can rebuild a kernel
    with the symbols, even if you didn't have one, and the names
    you will get back will be "nearby"; hopefully, though, there's
    a kernel.debug lying around for this thing.

    In general, we'd be seeing people reporting this all over the
    place, loudly, if it wasn't a custom kernel in the first place,
    so I'd probably start there.

    -- Terry
    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"


  • Next message: Terry Lambert: "Re: Crash in g_dev_strategy / CURRENT as of yesterday."

    Relevant Pages