Re: errors on an 11Tb filesystem... need suggestions
From: Chris Jones (c.r.jonesNOSPAM_at_larc.nasa.gov)
Date: 01/06/05
- Next message: UNIX Museum: "Re: How do I format XFS on HD (IRIX 6.5)?"
- Previous message: Emmanuel Florac: "Re: errors on an 11Tb filesystem... need suggestions"
- In reply to: Emmanuel Florac: "Re: errors on an 11Tb filesystem... need suggestions"
- Next in thread: Emmanuel Florac: "Re: errors on an 11Tb filesystem... need suggestions"
- Reply: Emmanuel Florac: "Re: errors on an 11Tb filesystem... need suggestions"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Thu, 06 Jan 2005 14:42:27 -0500
Emmanuel Florac wrote:
> Le Wed, 05 Jan 2005 11:43:43 -0500, Chris Jones a écrit :
>
>
>>and I can then traverse the filesystem in question and look for a file
>>with the inode number of '133'. I've tested this on a very small scale
>>and it seems to work. But doing this on a 11Tb scale is a bit
>>overwhelming.
>
>
> If you know the bad data belongs to the inode 133, you may simply locate
> this file whith "find" and "stat" I guess. Then with xfs_bmap you'll be
> able to know which blocks belong to the faulty file; you can then try to
> move the file around, run xfs_repair, etc...
>
I have actually found somewhat of a work around. I had decided to
perform my testing with the 'fsr' (and acutally I called 'fsr_xfs',
since that what 'fsr' appears to be doing since this is an xfs
filesystem).
I was hoping that running it manually (and redirecting the output to a
log file) would at some point hit the 'bad data' and cause my good ol'
lun bouncing issue and maybe log something helpful.
Getting the ok to do this (since I knew it would eventually result in a
reboot of the machine) I gave it a shot. Just for the record I ran this:
fsr_xfs -v -m temp.mtab
where the file 'temp.mtab' (the command usually just runs this on
/etc/mtab) has just the one entry it it... my fileystem like it looks
like in the /etc/mtab file. The -v is verbose and I hoped it would give
me some useful info. It did as you can see here:
Found 1 mounted, writable, XFS filesystems
fsr_xfs -m temp.mtab -t 7200 -f /var/tmp/.fsrlast_xfs ...
START: pass=5 ino=1964285611 /dev/xlv/cx600_archive_cache /archive_cache
/archive_cache startino=1964285611
ino=1964285663
extents before:34 after:2 ino=1964285663
ino=1964285661
extents before:34 after:2 ino=1964285661
ino=1964285672
fsr_xfs startpass 5, endpass 5, time 52 seconds
open(/var/tmp/.fsrlast_xfs) failed: File exists
fsr_xfs startpass 5, endpass 5, time 52 seconds
That's where the logfile ends and the lun bouncing from controller 'a'
to controller 'b' begins. I then powercycled (couldn't reboot because
the shutdown sequence never finishes... sucks huh?) hard and brought
things back up.
There are four inode numbers mentioned in the logfile... and doing some
'ls -liR' and grep'ing for the inode number inside the filesystem I
found that all four of them were in the same directory. I took the file
for the last inode mentioned and tried copying it out to my home
directory... and immediately the lun bouncing between controllers began.
So I now know for sure that at lease one file in the directory in
question lives in a bad place disk-wise. Therefore at least for the
moment I can disable this directory in question (and I've already
disabled the 'fsr' root crontab entry). Unless there's more 'bad data'
that gets discovered, I'm safe for now.
But the long term solution is getting this data *off* the filesystem and
re-doing the lun in question.
Hope those notes help somebody out in the future!
-chris
-- Chris Jones (to email me, just take out the NOSPAM) Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats.
- Next message: UNIX Museum: "Re: How do I format XFS on HD (IRIX 6.5)?"
- Previous message: Emmanuel Florac: "Re: errors on an 11Tb filesystem... need suggestions"
- In reply to: Emmanuel Florac: "Re: errors on an 11Tb filesystem... need suggestions"
- Next in thread: Emmanuel Florac: "Re: errors on an 11Tb filesystem... need suggestions"
- Reply: Emmanuel Florac: "Re: errors on an 11Tb filesystem... need suggestions"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|