Re: FreeBSD 4.8, ASR2120, SMP, degraded RAID1/mirror => storage failure

rysanek_at_fccps.cz
Date: 09/10/03

  • Next message: Michael A. Alestock: "Upgrade/Backup question"
    Date: Wed, 10 Sep 2003 11:11:21 +0200 (CEST)
    To: Scott Long <scott_long@btc.adaptec.com>
    
    

    Dear Mr. Long,

    thanks a lot for taking the time to respond
    - especially given that you're on vaccations and
    that it's almost 2 a.m. your time.

    I apologize for having used vague formulations in my
    past mail. Also, perhaps I have made up wrong meanings
    for some vocabulary occuring in the driver's code.
    Specifically:

    > 2. What is a zero-padded FIB? I concede that the AIF handling in the driver
    > is sub-par and needs to be revisited, so I'd like to know what you are seeing.
    >
    I was referring to this:

    aac_dequeue_fib: called
    aac0: aac_host_command: FIB @ 0xe1984000
    aac0: XferState 0
    aac0: Command 0
    aac0: StructType 0
    aac0: Flags 0x0
    aac0: Size 0
    aac0: SenderSize 0
    aac0: SenderAddress 0x0
    aac0: RcvrAddress 0x0
    aac0: SenderData 0x0
    aac0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    aac0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    aac0: unknown command from controller

    The size is a zero, the data dump at the end contains all
    zeroes. That's why I called it a "zero-padded FIB".

    This only occurs when an "unhandled array failure" arrives
    - when the machine is about to hang upon runtime array degradation or
    at boot from a degraded array. Which normally only happens
    with SMP+APIC_IO enabled. Not with a UP kernel.

    All the other FIB listings that I've seen contain some
    non-zero data and claim non-zero length...

    In my last message, I have attached a tarball with some logs.
    To see what I'm talking about, please take a look at this:

    - runtime array degradation - compare the two logs:
     - SMP, unrecoverable failure:
        logs/DEBUG_CAM_AAC_L2/SMP-2_disk_failed (line 23)
     - UP, system keeps going just fine:
        logs/DEBUG_CAM_AAC_L2/NOSMP-2_disk_failed (line 76)

    - boot from a degraded array - compare the two logs:
     - SMP, unrecoverable failure
        logs/DEBUG_AAC_L4/SMP-3_boot_with_degraded_array_failed (line 296)
     - UP, system boots just fine:
        logs/DEBUG_AAC_L4/NOSMP-3_boot_with_degraded_array_OK (line 273)

    > 3. The split and corrupted messages on the console were likely due to
    > kernel printfs happening from different contexts at the same time. The
    > printf facility has no serializing ability, unfortunately.
    >
    OK.

    > 4. I'm unclear on what you mean by there being a problem in the
    > asynchronous handling of device printfs and host command fibs. I'd be
    > very interested in more information on this.
    >
    I didn't mean to say that there was the cause of my problem in that area.

    I meant to say that I have a problem understanding what's going on.
    I'm not a skilled coder, I have a hard time understanding how the
    driver's code works. I am able to see where a function is called
    with some arguments and returns with a result. However, when a SCSI
    command is issued to the controller, at the driver level the
    request/response doesn't happen within a single function.
    One function queues the command to the controller via the MMIO (?)
    region and the response from the controller eventually comes
    back within an interrupt, invoking an interrupt handler.
    The response may be a valid SCSI response to the SCSI command,
    or a "something went wrong" **Monitor** event.
    I am vaguely aware that the SCSI controller can reorder commands in the
    queue or process them out of order.
    Combine this with the unserialized logging and I'm lost :)
    Sorry.

    If there's something specific I should check for, please let me know.
    Thanks for being patient with my hasty descriptions :)

    Frank Rysanek

    _______________________________________________
    freebsd-questions@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-questions
    To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org"


  • Next message: Michael A. Alestock: "Upgrade/Backup question"

    Relevant Pages