Re: errors on an 11Tb filesystem... need suggestions

From: Chris Jones (c.r.jonesNOSPAM_at_larc.nasa.gov)
Date: 01/06/05


Date: Thu, 06 Jan 2005 14:42:27 -0500

Emmanuel Florac wrote:
> Le Wed, 05 Jan 2005 11:43:43 -0500, Chris Jones a écrit :
>
>
>>and I can then traverse the filesystem in question and look for a file
>>with the inode number of '133'. I've tested this on a very small scale
>>and it seems to work. But doing this on a 11Tb scale is a bit
>>overwhelming.
>
>
> If you know the bad data belongs to the inode 133, you may simply locate
> this file whith "find" and "stat" I guess. Then with xfs_bmap you'll be
> able to know which blocks belong to the faulty file; you can then try to
> move the file around, run xfs_repair, etc...
>

I have actually found somewhat of a work around. I had decided to
perform my testing with the 'fsr' (and acutally I called 'fsr_xfs',
since that what 'fsr' appears to be doing since this is an xfs
filesystem).

I was hoping that running it manually (and redirecting the output to a
log file) would at some point hit the 'bad data' and cause my good ol'
lun bouncing issue and maybe log something helpful.

Getting the ok to do this (since I knew it would eventually result in a
reboot of the machine) I gave it a shot. Just for the record I ran this:

fsr_xfs -v -m temp.mtab

where the file 'temp.mtab' (the command usually just runs this on
/etc/mtab) has just the one entry it it... my fileystem like it looks
like in the /etc/mtab file. The -v is verbose and I hoped it would give
me some useful info. It did as you can see here:

Found 1 mounted, writable, XFS filesystems
fsr_xfs -m temp.mtab -t 7200 -f /var/tmp/.fsrlast_xfs ...
START: pass=5 ino=1964285611 /dev/xlv/cx600_archive_cache /archive_cache
/archive_cache startino=1964285611
ino=1964285663
extents before:34 after:2 ino=1964285663
ino=1964285661
extents before:34 after:2 ino=1964285661
ino=1964285672
fsr_xfs startpass 5, endpass 5, time 52 seconds
open(/var/tmp/.fsrlast_xfs) failed: File exists
fsr_xfs startpass 5, endpass 5, time 52 seconds

That's where the logfile ends and the lun bouncing from controller 'a'
to controller 'b' begins. I then powercycled (couldn't reboot because
the shutdown sequence never finishes... sucks huh?) hard and brought
things back up.

There are four inode numbers mentioned in the logfile... and doing some
'ls -liR' and grep'ing for the inode number inside the filesystem I
found that all four of them were in the same directory. I took the file
for the last inode mentioned and tried copying it out to my home
directory... and immediately the lun bouncing between controllers began.

So I now know for sure that at lease one file in the directory in
question lives in a bad place disk-wise. Therefore at least for the
moment I can disable this directory in question (and I've already
disabled the 'fsr' root crontab entry). Unless there's more 'bad data'
that gets discovered, I'm safe for now.

But the long term solution is getting this data *off* the filesystem and
re-doing the lun in question.

Hope those notes help somebody out in the future!

-chris

-- 
Chris Jones
(to email me, just take out the NOSPAM)
Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B)
This email address may not be added to any commercial mail list with out
my permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats.


Relevant Pages

  • [PATCH 1/2] VFS: update overview document
    ... -The Virtual File System (otherwise known as the Virtual Filesystem ... The pathname argument is used by the VFS to search through the ... -directory entry cache (dentry cache or "dcache"). ... -An individual dentry usually has a pointer to an inode. ...
    (Linux-Kernel)
  • Re: Starting a grad project that may change kernel VFS. Early research
    ... directory of the filesystem after an unclean shutdown. ... No. Updating the size at the same time as the main inode write is far ... You don't need to sync before umount. ... have a file living in src/linux/v2.6.29/README, and it is a hard link ...
    (Linux-Kernel)
  • Re: [PATCH] prune_icache_sb
    ... In Linux a filesystem is a dumb layer which sits between the VFS and the ... inode (that this cluster lock is created for). ...
    (Linux-Kernel)
  • [RFC][0/21]extend file size and filesystem size
    ... I have tried to extend the filesystem size and file size in ext2/3 ... A summary of my reform to extend the specs in ext2/3 is as below. ... type of variables in relation to block and inode, ... Change the type of 4byte variables manipulating a block or ...
    (Linux-Kernel)
  • [UPDATE][0/24]extend file size and filesystem size
    ... i_blocks of VFS inode is sector unit. ... There is not really a reason to make a filesystem ... filesystem-block unit. ... Change the type of 4byte variables manipulating a block or ...
    (Linux-Kernel)