errors on an 11Tb filesystem... need suggestions

From: Chris Jones (c.r.jonesNOSPAM_at_larc.nasa.gov)
Date: 01/05/05

  • Next message: mr_peter_stevenson_at_hotmail.com: "Re: Good (free) sound editor for IRIX"
    Date: Wed, 05 Jan 2005 11:43:43 -0500
    
    

    Here's my situation:

    EMC Clariion CX600 unit connected via two fibre channel (qlogic 2310)
    lines to Origin 2000 (running 6.5.18m). There are 16 lun's bound up on
    the CX600 that are assigned 8 each to the two storage processors (SP's)
    which equate to 8 lun's on one controller on the host, 8 lun's on the other.

    Failover is setup so that if lun 'x' on controller 'a' fails somehow,
    /etc/failover.conf will look for that same lun on controller 'b'. All
    16 of these luns are xlv'ed together to make one big 11Tb filesystem.

    This 11Tb filesystem has been in existence in it's current state (and
    due to how it's used it's always at 100%) since Dec. 2003. Well, a few
    weeks ago we ran into an error, and this is what I believe occured. A
    piece of data tried to get accessed, an error occured, and the SGI tried
    to retrieve the data via the failover path. The same error occured
    trying to get the data there (because there is obviously some real
    problem with the accessing.. whether it's bad data or a bad spot in the
    xfs filesystem) and all of a sudden we've got the xlv software bouncing
    the ownership of this lun from controller 'a' to controller 'b' back and
    forth every second or so.

    There are similiar errors being recorded on the SP logs on the CX600.
    This eventually brings access of the filesystem to a grinding halt and
    the only way to clear things up is a reboot of the system.

    I've determined that the root cronjob 'fsr' is what's causing the
    initial accessing of "bad data" (or whatever) to kick off the lun
    bouncing issue. In the interium, I've commented out that crontab entry
    since I couldn't figure out how to have the cronjob run and only operate
    on select filesystems (yeah, I know I could create a wrapper script of
    somekind, I just haven't investigated that route). If anybody knows a
    good way to get fsr to ignore a particular mounted filesystem, please
    let me know.

    So via the vendor, the suspect lun in question has already had one by
    one it's underlying disks swapped out (making use of our hot spare
    disk). But that didn't fix the problem. The suggestion from EMC is to
    unbind/rebind the lun, which of course means WHACKING THE ENTIRE 11Tb
    filesystem. Not a solution I like.

    Some other thoughs that were discussed was for me to via the
    failover.conf file take away the failover path for the "suspect lun".
    But I don't know then what would occur filesystem-wise if the "bad data"
    tries to be accessed (like if I manually run fsr) and it now can't
    failover to the alternate path. Will the application that's accessing
    things (i.e. - fsr) just get an error and move on? Or will something
    worse happen? I'd love somebody's opinion on that.

    Lastly the most recent suggestion from EMC is to run xfsdump of the 11Tb
    filesystem to null, while keeping a log somehow of the activity and
    telling it to bypass any errors it gets (like hitting the "bad data")
    and continue on. This would according to EMC allow us to see where the
    "bad data" lives.

     From looking all over the place on xfsdump, I can't find anything in
    the command that will allow you to run it and get output on the
    filesystem structure that you're dumping. The best I can see is doing
    something like this 'xfsdump -v trace -f /dev/null <my filesystem>' and
    capturing this output to a file. The output to this shows stuff like this:

    xfsdump: dumping regular file ino 133 offset 0 to offset 269 (size 269)

    and I can then traverse the filesystem in question and look for a file
    with the inode number of '133'. I've tested this on a very small scale
    and it seems to work. But doing this on a 11Tb scale is a bit
    overwhelming.

    ***But***, it seems like if I was able to use this method to identify
    the "bad data", I'd be able to copy out the contents of the filesystem
    and *leave out* the "bad data" (assuming I found some spare 11Tb of disk
    space to use).

    Phew.. that was alot. Any thoughs? :)

    -chris

    -- 
    Chris Jones
    (to email me, just take out the NOSPAM)
    Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B)
    This email address may not be added to any commercial mail list with out
    my permission.  Violation of my privacy with advertising or SPAM will
    result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats.
    

  • Next message: mr_peter_stevenson_at_hotmail.com: "Re: Good (free) sound editor for IRIX"

    Relevant Pages

    • Re: RFC: GEOM MULTIPATH rewrite
      ... In automatic mode other paths supposed to be detected via metadata ... I've seen some so-called active/active RAID controllers force a LUN ... in the process) because the LUN received an I/O down a path to the controller ... all I/O to the LUN would fail. ...
      (freebsd-current)
    • SAN 3310 Multiple LUNs Not Visible on Solaris 9 Host
      ... the controller, recognizes the disks. ... I expanded the Logical Drives ... I then created a second partition with the new ... space and mapped that to a host LUN on the primary ...
      (SunManagers)
    • Regular ext4 error warning with HD in USB dock
      ... Since I moved my internal HD into a USB dock externally and mount the ext4 ... one occasion where the filesystem mounted RO but that was months ago and has ... Reserved blocks gid: 0 ... Controller #4 ...
      (Linux-Kernel)
    • Re: Multiple LUNs vrs one large stripe. Opinions?
      ... > I'm leaning towards the 1 large filesystem approach, ... > to me if the LUN queue depth issue really exists these days on modern ... I am assuming you will be using Veritas Volume Manager/fs. ... a single large lun then use Quick I/O feature. ...
      (comp.unix.solaris)
    • Re: How to hot resize a filesystem
      ... becoming very critical for the database runnning ... Since we have a database running on that filesystem, ... presenting a NEW lun to the AIX system that you can then just extendvg, ... rather than extending the LUN from the SAN. ...
      (comp.unix.aix)