i/o error when mounting Advfs filesets




Dear Managers,

We had almost exactly the same problem as he had about a year ago. There
was no solution at the time, and we ended up losing all our data. I would
like to see if anyone else has seen this problem since and

I have an ES45 with an attached MSA1000 SAN with approximately 1.7TB of
disk. The machine was given a graceful restart yesterday, and came back up
without mounting any of the AdvFS partitions. We have one large domain,
broken up into 8 different filesets. The only message I get is "i/o
error". This happens with a restart, a "mount -a" or a manual "mount"
command.

We can see the logical drive using disklabel. I can see all the devices
correctly with wwidmgr on the console or hwmgr on the OS, everything shows
up and is configured correcly. The WWID addresses all match up correctly.
In fact, we broke the spare drive off the set, and successfully made a new
set, new disklabel, new AdvFS domain, new AdvFS fileset, mounted it, and
copied data on it. As far as I can tell, there is no hardware problem.

I've used all the AdvFS tools at my disposal. Using a verify gets an i/o
error. Advscan and fixfdmn dosen't work. I'm sure that salvage will
work, unfortunately, we don't have the time and extra diskspace to recover
the entire domain.

The MSA1000 is at Firmware 4.48 build 342. It has two controllers,
connected to two HBA set as to failover. The disk set is set to 13 disk in
one RAID5 set, plus one disk as a spare. The ES45 is Firmware 7.0-3, and
the operating system is 5.1B-3 (latest patch kit about March 2006). This
has been working stable for a year before this happened.

--
Kevin Dea
UNIX System Administrator
Alpine Electronics Research of America



Relevant Pages

  • Re: IBM 73LZX firmware upgrade question.
    ... >the link is for Pseries disk micorcode. ... >even though the HITACHI SCSI disk is for pc server? ... you can upload the firmware from a toaster onto a SCSI ... so don't assume that a firmware update is always "safe". ...
    (comp.periphs.scsi)
  • SUMMARY: AdvFS Read Errors
    ... Many thanks to Dr. Tom Blinn and Roberto Mackun for their responses. ... What type of corrective action should be taken if you are seeing AdvFS READ ... most applications exit abnormally on disk read errors ...
    (Tru64-UNIX-Managers)
  • Re: [PATCH 7/13]: PCI Err: Symbios SCSI driver recovery
    ... Any queued requests stay queued until they are fulfilled. ... file system corruption because any inconsistent state on the disk ... be generic bugs, rather than PCI error recovery related bugs. ... even though it is still pending in firmware ...
    (Linux-Kernel)
  • SUMMARY: alpha DS10L and large IDE disk
    ... at the SRM console prompt the newly installed disk model is correctly listed by "show dev" but the OS will only see 128 GB from the whole ~250Gb space available. ... having the above in mind i think the limitation comes from the hardware southbridge chip that implements the integrated IDE controller - this means that it is not worth installing IDE disks larger than 130Gb in a DS10L machine. ... thanks to Graham Allan for his mail - pointing that he tested with firmware version 7.2-1 and that the limitation is probably not in the firmware itself. ...
    (Tru64-UNIX-Managers)
  • Re: Intel servers: Debian support? (spec. RAID controllers)
    ... All Intel RAID controllers are supported equally well (they all look the ... same for the kernel), but you *will* need to compile a kernel with it ... Also, these servers have a LOT of firmware, so make sure to update all ... "One disk to rule them all, ...
    (Debian-User)