UPDATE(3): Problems with volume on Storageworks MSA 1000

kdea_at_alpine-la.com
Date: 07/14/05

  • Next message: Credit Union Bank: "Credit Union National Association Security Notice - Card-Systems Breach"
    Date: Wed, 13 Jul 2005 17:37:22 -0700
    To: tru64-unix-managers@ornl.gov
    
    

    Hello Managers,

    The problem I've been having the the last few weeks continues. We had some
    time, so we decided to keep at it. To update, we've tried new SCSI cables,
    this time using "official" Compaq/HP SCSI cables. For some reason I can't
    explain, the new shelf came up, created a file domain, created file set,
    and mounted properly. The original sets of disks was still no change (I/O
    error). I was able to copy a lot of data this new directory with no
    problem. Later, without any warnings, after a reset, this added set of
    disks would not mount, just as the original sets of disks (I/O error)!!

    Without touching any hardware, I tried to delete the fileset with "rmfset".
    I didn't work. Only complaining about I/O error. So I went to the
    /etc/fdmns directory, deleted the ".advfslock_*" file and the *_domain
    directory. This effectively deleted that partition. Then, still without
    touching the hardware, I ran "mkfdmn" and "mkfset" upon the same disks, and
    it worked! Again I could mount it, and write data to it. Not touching any
    hardware, we now believe that it's a software or OS issue.

    Do any of these symptoms ring a bell for anyone?

    Thank you,

    Kevin Dea

    ---second update---

    This is my last ditch effort to appeal for help before I decide to wipe out
    my data and start from scratch, as on responder ended up doing when he had
    a similar problem..

    Quick summary: I'm adding a shelf to an MSA 1000 which already has an
    attached shelf of disks with data. At the time I flashed the firmware to
    the latest, v.4.32A. Upon boot up, I discovered the partition on the
    existing shelf will not mount, only giving me "I/O error" with a mount
    command.

     I've given up with the new shelf, I just want to get the old shelf
    working. I can see the entire volume when I look at diskconfig, and the
    size is correct. Trying to manually mount the volume still only gives me
    "I/O error". Trying to use "verify" also gives me "I/O error". I don't
    get the SCSI event error, as I did on my original post anymore, I know that
    will only show up, if I boot the machine when it shows "pga0.0.0.1.2 Link
    is down" in the SRM screen. A simple reset usually clears that up. I can
    see the disks/controllers when I use the "hwmgr" command, and "wwidmgr"
    command in SRM.

    I've downgraded the MSA1000 firmware to 3.36, that didn't help. I did not
    bother changing the ES45 firmware back to 6.7, since the problem was
    already happening before I upgraded it to 6.9. I've checked the MSA1000
    connections, and the profile is at "Tru64". I've also tried setting it to
    "default" but no change. I've cleaned off all the fiber connectors with
    compressed air.

    ES45 Tru64 5.1B-2.

    Please tell me what I can do without wiping out my data!

    Kevin

    ---updated post---

    I've gotten a few responses and suggestions, but I'm still having this
    problem. Dr. Blinn believes that it's a firmware incompatibility between
    the ES45, and the MSA 1000, and suggested I downgraded it. Since I had the
    problem, even with the ES45 firmware at 6.7, I think the fault is more the
    MSA 1000. If anyone has firmware files for the MSA 1000, older than 4.32A,
    please point it out to me.

    Joachim Jaeckel asked if the connection type on the MSA 1000 was set to
    'Tru64'. It was not. It was set to 'default'. When I made this change,
    the SCSI errors that I posted in the original message seems to have
    disapperared from my messages. However, the disks still will not mount
    without saying 'I/O error'.

    I was asked about patches, and it does have patch kit
    T64V51BB25AS0004-20040616 OSF540 on it which is Tru64 5.1B-2, I think. I
    can't seem to get the latest patch, 5.1B-3 installed on it. When I try to
    install, it keeps going back to the menu.

    Just to clear up any confusion, I already had a shelf's worth of disk on
    the MSA 1000, that had already been partitioned and has data on it. I've
    added another shelf. After the firmware flash, I can't get the existing
    disks to mount.

    ---Original message---

    I have been tasked to add storage shelves to our StorageWorks MSA 1000
    connected to an ES45 (Tru64 5.1B), and it's been one problem after another.
    When I first turned off the MSA 1000 to connect the shelves, and turned it
    back on, one of the original drives showed up bad. That was replaced.
    Then I upgraded the firmware to v.4.32A (I believe from 4.31). I also
    upgraded the ES45 firmware to v6.9. But now I can't get my existing or new
    volumes that are on the MSA 1000 to show up. I keep getting this message
    upon bootup.

    Jun 21 11:43:21 vega vmunix: cam_logger: SCSI event packet
    Jun 21 11:43:21 vega vmunix: cam_logger: hardware_id=82 bus 5 target 0 lun
    2
    Jun 21 11:43:21 vega vmunix: cdisk_check_sense
    Jun 21 11:43:21 vega vmunix: Hard Error
    Jun 21 11:43:22 vega vmunix:
    Jun 21 11:43:22 vega vmunix: Hard Error Detected
    Jun 21 11:43:22 vega vmunix: Hardware ID = 82
    Jun 21 11:43:22 vega vmunix: COMPAQ MSA1000 VOLUME 4.32
    Jun 21 11:43:22 vega vmunix: Active CCB at time of error
    Jun 21 11:43:22 vega vmunix: CCB request completed with an error
    Jun 21 11:43:22 vega vmunix: Error, exception, or abnormal condition
    Jun 21 11:43:22 vega vmunix: ILLEGAL REQUEST - Illegal request or CDB
    parameter

    My volumes will not mount, and if I try, I get a vague error about I/O
    error. There are no errors on the MSA 1000 itself. I can create a diskset
    and delete them. All the disks are fine. I tried removing the new volume
    and disconnecting the new shelves, but even the old volume will not show
    up. I am powering them in the right order, shelves, wait, MSA1000, wait,
    ES45.

    If anyone knows what's going on I would appreciate it.

    --
    Kevin Dea
    UNIX System Administrator
    Alpine Electronics Research of America
    

  • Next message: Credit Union Bank: "Credit Union National Association Security Notice - Card-Systems Breach"

    Relevant Pages