hardware RAID domain panics

From: Neil R. Smith (neils_at_ariel.met.tamu.edu)
Date: 03/24/04

  • Next message: labelles_at_mscd.edu: "Summary - Strange Error after Upgrade to TRU64 V5.1B"
    Date: Wed, 24 Mar 2004 14:35:24 -0600
    To: tru64-unix-managers@ornl.gov
    
    

    Hi,

    ES45, Tru64 5.1A
    Western Scientific F4 Tornado RAID IDE-SCSI

    We've had this RAID box connected to our ES45 for about 10 months with
    no problems. The RAID is presenting 2 TB and 1.3 TB on two luns.

    The setup went fine way back when I set it up. The two devices were
    incorporated into their own domain, and one AdvFS file set was created
    on each. These are used for data storage, not root,usr,or var.

    Recently, we've been experiencing AdvFS I/O errors followed in short
    order by domain panics.
    fixfdmn showed the following:

    fixfdmn -n d12
    fixfdmn: Checking the RBMT.
    fixfdmn: Can't read page at block -660733904 on '/dev/disk/dsk12c'.
    fixfdmn: Invalid argument
    fixfdmn: Error correcting the RBMT.

             fixfdmn is not able to continue, no changes made to domain,
    exiting.

    The hardware RAID was logging some disk errors so the vendor sent
    replacements. The RAID sets were rebuilt normally without error.

    I 'salvage'd the contents prior to disk replacement (lucky I could nfs
    mount another 3TB RAID box served from Red Hat). The after disk
    replacement and rebuild, I deleted and remade the domains and filesets
    (using exact same naming), and restored the data using 'cp -rp'. On one
    domain, some of the content copied back initiated another round of I/O
    errors and then domain panics. The other began this behavior when
    writing new data from processes on the ES45. The hardware RAID box is
    not logging any errors now. A 'fixfdmn' produced the same result as
    mentioned before, but I CAN complete a 'verify -F' (but not -f) and then
    mount the fileset.

    So now I don't know whether this is hardware or software related.
    Is it the SCSI interface on the ES45? I don't see any errors related to
    that device in the /var/log/messages file. Should I be looking elsewhere
    for errors logged from the SCSI HBA? What would be other sources of
    AdvFS problems with external hardware RAID? Were there related fixes in
    Tru64 5.1B?

    Why would this begin now after 10 months of service? Granted the
    filesets had not reached this capacity before, but we're only talking
    40% of 2TB and 56% of 1.3TB.

    Any suggestions?

    Thanks,
    -Neil

    -- 
    Neil R. Smith, Comp. Sys. Mngr.		neils@tamu.edu
    Dept. Atmospheric Sci., Texas A&M Univ.	979/845-6272 FAX:979/862-4466
    

  • Next message: labelles_at_mscd.edu: "Summary - Strange Error after Upgrade to TRU64 V5.1B"

    Relevant Pages

    • Re: I am DEAD
      ... Hardware Raid 0 ... Install the hard drive from that PC into another PC (say the one ... If so - was Windows XP actually installed on the RAID drives?! ...
      (microsoft.public.windowsxp.help_and_support)
    • Re: IBM xSeries 346 and Serveraid 7k vs. 7e
      ... I'd envisioned a hardware RAID controller as looking to the ... the host treats the hardware RAID controller the same way it would treat any SCSI RAID controller. ... You wouldn't be able to use a normal SCSI controller without a driver. ...
      (comp.os.linux.hardware)
    • Re: Raid: software or hardware
      ... | Hardware RAID is always preferable over software RAID *if* it is ... I have a Tyan S2927A2NRF with 6 SATA ports. ... I don't know what caused the slowdown (controller, bus, driver, ...
      (comp.os.linux.hardware)
    • SUMMARY: What is the Software Management Interface to a 3310 Array?
      ... You don't have to use Veritas or Disk Suite to configure the disks ... in a 3310 with a hardware RAID controller. ... you use the text-drive interface on the RAID to bind ... We are ordering a 3310 array with a single hardware RAID ...
      (SunManagers)
    • Re: Dynamic to basic disk?
      ... I would never recommend a software RAID unless there ... What is it you are hoping to do with a software-based mirror RAID? ... it was set up that way for disk crashes. ... Using the NOW-inexpensive hardware RAID is certainly a good idea, ...
      (microsoft.public.windowsxp.setup_deployment)