Re: memory corruption/panic solved ("FAILURE - ATAPI_IDENTIFY no interrupt")

From: Nate Lawson (nate_at_cryptography.com)
Date: 08/02/04

  • Next message: Vanilla I. Shu: "Re: make zh-xsim ports failed"
    Date: Sun, 01 Aug 2004 17:05:19 -0700
    To: Brian Fundakowski Feldman <green@freebsd.org>
    
    

    Brian Fundakowski Feldman wrote:
    > On Fri, Jul 30, 2004 at 03:48:52PM -0700, Nate Lawson wrote:
    >>I've tracked down the source of the memory corruption in -current that
    >>results when booting with various CD and DVD drives (especially the ones
    >>that come with Thinkpads including T23, R32, T41, etc.) The panic is
    >>obvious when running with INVARIANTS ("memory modified after free") but
    >>not so obvious in other configurations. For instance, without
    >>INVARIANTS, part of the rt_info structure is corrupted on my wireless
    >>card, resulting in a panic during ifconfig on boot. This is likely the
    >>source of other problems, including phk's ACPI panic (again, only
    >>triggered when booting with the CD drive in the bay.)
    >>
    >>The root problem is that ata_timeout() fires and calls ata_pio_read()
    >>which overwrites 512 bytes random memory. There are actually two bugs
    >>here that overwrite memory. The code path is as follows:
    >
    > Good job identifying it more exactly. I decided it should just fundamentally
    > be using GEOM primitives everywhere to move the solutions to all these
    > side cases into where they're already handled generically... still think
    > that's probably the right solution, but I'm glad to see this specific
    > problem fixed.

    I'm not sure if this is a troll or not but I'll answer it seriously.
    GEOM and other upper layers are never the right place to handle error
    recovery for transactions initiated at the lower layers (like this
    device scan).

    In every system I've seen, error recovery is the hardest part of storage
    code to get right and is seldom well-tested. It's a very difficult
    problem that involves a lot of careful fault injection/testing.
    Divergence in hardware fault handling behavior only complicates things.

    -Nate

    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"


  • Next message: Vanilla I. Shu: "Re: make zh-xsim ports failed"

    Relevant Pages