fsck issue; please help!
- From: "Joe D." <newbie_from_newbie@xxxxxxxxx>
- Date: 12 Jun 2006 06:25:20 -0700
Hello all;
I'm a newb at AIX, but several years w. other flavors of UNIX. We're
running AIX 5.2 on a RS6000 server that is due to go live to production
in several weeks. I have been trying to track the cause of system
crashes.
The errpt facility says there is a file system with
JFS_META_CORRUPTION, gives me a major/minor #. Other messages
indicate there was a dump produced, but I haven't yet found how to
extract and analyze it. I've pasted them to the bottom of this posting.
We have no IBM support on the server, and the hardware support folks
(3rd party) are next to useless for troubleshooing. Here's what I've
done for troubleshooting thus far:
1) fsck on the mounted file systems (I know; it even tells me right up
front that this does not produce dependable results). The mounted file
systems check out OK, except for /usr, which shows thean error with an
inode. I can't find the indicated inode ANYWHERE on the running system.
The output of the fsck is also listed at the bottom of the posting.
2) boot to single user from CD, and from the menu options: select
maintenance mode; access the root VG; select the volume group # ; and
then " 2) Access this Volume Group and start a shell before mounting
filesystems". Now, at this point, the only thing mounted is /dev/ram0
and /dev/cd0. I performed an fsck on the file systems (both as /var,
/usr, etc. AND as /dev/hd2, etc.), but all file systems checked clean
with no errors, and no messages indicating it was clearing up any
discrepancies.
3) Boot to running OS; system comes up fine, no messages in startup
indicating any file system problems, aside from normal messages
indicating 'log replay in progress'.
4) fsck STILL shows the inode problem on the .usr file system. !?!
Can anyone tell me if I've done this correctly, or missed a step?
Anybody else run into this issue?
Additionally, if anyone would be so kind as to help out with the
folIowing newb questions:
- I don't see any /var/adm/messages or /var/adm/syslog files; are
these the standrard system logging locations? If not, how do I set up
logging to system logs on AIX?
- Are there any other places where I can possibly find the cause of
the crash, or error messages leading up to the crash?
- how can I extract the dump from the indicated dump location, and are
there tools available to SA's for analyzing?
- we don't have an attached table drive; we're backing up the server
via NetBackup. I'd like to do mksysb's also; will dumping one out to an
nfs (or local) mounted file system be of any use in a recovery
situation? Say, if I have to boot from CD and recover?
Thanks in advance for any assistance; am trying to get up to speed via
Goodle, usenet, etc.. Any additional help is greatly appreciated.
Joe D
Error report messages, and fsck results:
# errpt -a|more
---------------------------------------------------------------------------
LABEL: RMCD_INFO_0_ST
IDENTIFIER: A6DF45AA
Date/Time: Sun Jun 11 00:54:43 EDT
Sequence Number: 967
Machine Id: 00C3BB9E4C00
Node Id: periop-svr01
Class: O
Type: INFO
Resource Name: RMCdaemon
Description
The daemon is started.
Probable Causes
The Resource Monitoring and Control daemon has been started.
User Causes
The startsrc -s ctrmc command has been executed or
the rmcctrl -s command has been executed.
Recommended Actions
Confirm that the daemon should be started.
Detail Data
DETECTING MODULE
RSCT,rmcd.c,1.43,211
ERROR ID
6eKora0H6uW2/a5F1BIKd8....................
REFERENCE CODE
---------------------------------------------------------------------------
LABEL: SYSDUMP_STACK
IDENTIFIER: B38E3397
Date/Time: Sun Jun 11 00:53:24 EDT
Sequence Number: 966
Machine Id: 00C3BB9E4C00
Node Id: periop-svr01
Class: S
Type: UNKN
Resource Name: SYSDUMP
Description
Previous system dump information
Probable Causes
UNEXPECTED SYSTEM HALT
User Causes
SYSTEM DUMP REQUESTED BY USER
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Failure Causes
UNEXPECTED SYSTEM HALT
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
Crash Code
0000 0700
Crash Stack
000af7c8 v_jfscorruption+68
000af7c4 v_jfscorruption+64
000af7c4 v_jfscorruption+64
000bfb8c v_findiblk+850
000bc7c8 v_fpagein+4d8
000bd938 v_pagein+b0
0007473c pfget+400
0018756c v_pfget+47c
0040ee00 trcconfig_dmy+fffff290
---------------------------------------------------------------------------
LABEL: DUMP_STATS
IDENTIFIER: C0AA5338
Date/Time: Sun Jun 11 00:52:59 EDT
Sequence Number: 964
Machine Id: 00C3BB9E4C00
Node Id: periop-svr01
Class: S
Type: UNKN
Resource Name: SYSDUMP
Description
SYSTEM DUMP
Probable Causes
UNEXPECTED SYSTEM HALT
User Causes
SYSTEM DUMP REQUESTED BY USER
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Failure Causes
UNEXPECTED SYSTEM HALT
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
DUMP DEVICE
/dev/hd7
DUMP SIZE
1801283584
TIME
Sun Jun 11 00:44:54 2006
DUMP TYPE (1 = PRIMARY, 2 = SECONDARY)
1
DUMP STATUS
0
ERROR CODE
0
FILE NAME
PROCESSOR ID
1
---------------------------------------------------------------------------
LABEL: JFS_META_CORRUPTION
IDENTIFIER: 684A365B
Date/Time: Sun Jun 11 00:44:54 EDT
Sequence Number: 963
Machine Id: 00C3BB9E4C00
Node Id: periop-svr01
Class: U
Type: UNKN
Resource Name: SYSPFS
Resource Class: NONE
Resource Type: NONE
Location:
VPD:
Description
FILE SYSTEM CORRUPTION
Probable Causes
INVALID FILE SYSTEM CONTROL DATA
Recommended Actions
PERFORM FULL FILE SYSTEM RECOVERY USING FSCK UTILITY
OBTAIN DUMP
CHECK ERROR LOG FOR ADDITIONAL RELATED ENTRIES
Failure Causes
ADAPTER HARDWARE OR MICROCODE
DISK DRIVE HARDWARE OR MICROCODE
SOFTWARE PROGRAM
STORAGE CABLE LOOSE, DEFECTIVE, OR UNTERMINATED
Recommended Actions
CHECK CABLES AND THEIR CONNECTIONS
INSTALL LATEST ADAPTER AND DRIVE MICROCODE
INSTALL LATEST STORAGE DEVICE DRIVERS
IF PROBLEM PERSISTS, CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
FILE NAME
v_mapsubs.c
LINE NO.
437
MAJOR/MINOR DEVICE NUMBER
0000 0005
ADDITIONAL INFORMATION
4A46 5345 448B 9FC6 0000 0002 0000 0084 0003 1215 0000 0000 0000 0000
0000 000C
F100 00E0 05F0 14D8 0000 162D 0000 0000 000C 4BCD 0000 0000 0000 0000
0000 0000
0003 10C3 0000 007E 0004 02AC 0204 19D0 0000 0000 0000 0000 0000 0000
0000 0000
---------------------------------------------------------------------------
LABEL: ERRLOG_ON
IDENTIFIER: 9DBCFDEE
Date/Time: Sun Jun 11 00:53:23 EDT
Sequence Number: 962
Machine Id: 00C3BB9E4C00
Node Id: periop-svr01
Class: O
Type: TEMP
Resource Name: errdemon
Description
ERROR LOGGING TURNED ON
Probable Causes
ERRDEMON STARTED AUTOMATICALLY
User Causes
/USR/LIB/ERRDEMON COMMAND
Recommended Actions
NONE
---------------------------------------------------------------------------
** Checking /dev/hd2 (/usr) MOUNTED FILE SYSTEM; WRITING SUPPRESSED;
Checking a mounted filesystem does not produce dependable results.
** Phase 1 - Check Blocks and Sizes
Unknown file type I=370880 owner=1986306789 mode=16602570755
size=-1229976709 mtime=Feb 20 00:12 1925 (NOT CLEARED) (TERMINATED)
.
- Follow-Ups:
- Re: fsck issue; please help!
- From: Hajo Ehlers
- Re: fsck issue; please help!
- Prev by Date: Re: what is /dev/ipldevice and bosboot
- Next by Date: Re: NSORDER variable
- Previous by thread: what is /dev/ipldevice and bosboot
- Next by thread: Re: fsck issue; please help!
- Index(es):
Relevant Pages
|