fsck issue; please help!



Hello all;

I'm a newb at AIX, but several years w. other flavors of UNIX. We're
running AIX 5.2 on a RS6000 server that is due to go live to production
in several weeks. I have been trying to track the cause of system
crashes.

The errpt facility says there is a file system with
JFS_META_CORRUPTION, gives me a major/minor #. Other messages
indicate there was a dump produced, but I haven't yet found how to
extract and analyze it. I've pasted them to the bottom of this posting.


We have no IBM support on the server, and the hardware support folks
(3rd party) are next to useless for troubleshooing. Here's what I've
done for troubleshooting thus far:

1) fsck on the mounted file systems (I know; it even tells me right up
front that this does not produce dependable results). The mounted file
systems check out OK, except for /usr, which shows thean error with an
inode. I can't find the indicated inode ANYWHERE on the running system.
The output of the fsck is also listed at the bottom of the posting.

2) boot to single user from CD, and from the menu options: select
maintenance mode; access the root VG; select the volume group # ; and
then " 2) Access this Volume Group and start a shell before mounting
filesystems". Now, at this point, the only thing mounted is /dev/ram0
and /dev/cd0. I performed an fsck on the file systems (both as /var,
/usr, etc. AND as /dev/hd2, etc.), but all file systems checked clean
with no errors, and no messages indicating it was clearing up any
discrepancies.

3) Boot to running OS; system comes up fine, no messages in startup
indicating any file system problems, aside from normal messages
indicating 'log replay in progress'.

4) fsck STILL shows the inode problem on the .usr file system. !?!

Can anyone tell me if I've done this correctly, or missed a step?
Anybody else run into this issue?

Additionally, if anyone would be so kind as to help out with the
folIowing newb questions:
- I don't see any /var/adm/messages or /var/adm/syslog files; are
these the standrard system logging locations? If not, how do I set up
logging to system logs on AIX?
- Are there any other places where I can possibly find the cause of
the crash, or error messages leading up to the crash?
- how can I extract the dump from the indicated dump location, and are
there tools available to SA's for analyzing?
- we don't have an attached table drive; we're backing up the server
via NetBackup. I'd like to do mksysb's also; will dumping one out to an
nfs (or local) mounted file system be of any use in a recovery
situation? Say, if I have to boot from CD and recover?

Thanks in advance for any assistance; am trying to get up to speed via
Goodle, usenet, etc.. Any additional help is greatly appreciated.

Joe D


Error report messages, and fsck results:

# errpt -a|more
---------------------------------------------------------------------------
LABEL: RMCD_INFO_0_ST
IDENTIFIER: A6DF45AA

Date/Time: Sun Jun 11 00:54:43 EDT
Sequence Number: 967
Machine Id: 00C3BB9E4C00
Node Id: periop-svr01
Class: O
Type: INFO
Resource Name: RMCdaemon

Description
The daemon is started.

Probable Causes
The Resource Monitoring and Control daemon has been started.

User Causes
The startsrc -s ctrmc command has been executed or
the rmcctrl -s command has been executed.

Recommended Actions
Confirm that the daemon should be started.

Detail Data
DETECTING MODULE
RSCT,rmcd.c,1.43,211
ERROR ID
6eKora0H6uW2/a5F1BIKd8....................
REFERENCE CODE

---------------------------------------------------------------------------
LABEL: SYSDUMP_STACK
IDENTIFIER: B38E3397

Date/Time: Sun Jun 11 00:53:24 EDT
Sequence Number: 966
Machine Id: 00C3BB9E4C00
Node Id: periop-svr01
Class: S
Type: UNKN
Resource Name: SYSDUMP

Description
Previous system dump information

Probable Causes
UNEXPECTED SYSTEM HALT

User Causes
SYSTEM DUMP REQUESTED BY USER

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES

Failure Causes
UNEXPECTED SYSTEM HALT

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
Crash Code
0000 0700
Crash Stack
000af7c8 v_jfscorruption+68
000af7c4 v_jfscorruption+64
000af7c4 v_jfscorruption+64
000bfb8c v_findiblk+850
000bc7c8 v_fpagein+4d8
000bd938 v_pagein+b0
0007473c pfget+400
0018756c v_pfget+47c
0040ee00 trcconfig_dmy+fffff290

---------------------------------------------------------------------------
LABEL: DUMP_STATS
IDENTIFIER: C0AA5338

Date/Time: Sun Jun 11 00:52:59 EDT
Sequence Number: 964
Machine Id: 00C3BB9E4C00
Node Id: periop-svr01
Class: S
Type: UNKN
Resource Name: SYSDUMP

Description
SYSTEM DUMP

Probable Causes
UNEXPECTED SYSTEM HALT

User Causes
SYSTEM DUMP REQUESTED BY USER

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES

Failure Causes
UNEXPECTED SYSTEM HALT

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
DUMP DEVICE
/dev/hd7
DUMP SIZE
1801283584
TIME
Sun Jun 11 00:44:54 2006
DUMP TYPE (1 = PRIMARY, 2 = SECONDARY)
1
DUMP STATUS
0
ERROR CODE
0
FILE NAME

PROCESSOR ID
1
---------------------------------------------------------------------------
LABEL: JFS_META_CORRUPTION
IDENTIFIER: 684A365B

Date/Time: Sun Jun 11 00:44:54 EDT
Sequence Number: 963
Machine Id: 00C3BB9E4C00
Node Id: periop-svr01
Class: U
Type: UNKN
Resource Name: SYSPFS
Resource Class: NONE
Resource Type: NONE
Location:
VPD:

Description
FILE SYSTEM CORRUPTION

Probable Causes
INVALID FILE SYSTEM CONTROL DATA

Recommended Actions
PERFORM FULL FILE SYSTEM RECOVERY USING FSCK UTILITY
OBTAIN DUMP
CHECK ERROR LOG FOR ADDITIONAL RELATED ENTRIES

Failure Causes
ADAPTER HARDWARE OR MICROCODE
DISK DRIVE HARDWARE OR MICROCODE
SOFTWARE PROGRAM
STORAGE CABLE LOOSE, DEFECTIVE, OR UNTERMINATED

Recommended Actions
CHECK CABLES AND THEIR CONNECTIONS
INSTALL LATEST ADAPTER AND DRIVE MICROCODE
INSTALL LATEST STORAGE DEVICE DRIVERS
IF PROBLEM PERSISTS, CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data
FILE NAME
v_mapsubs.c
LINE NO.
437
MAJOR/MINOR DEVICE NUMBER
0000 0005
ADDITIONAL INFORMATION
4A46 5345 448B 9FC6 0000 0002 0000 0084 0003 1215 0000 0000 0000 0000
0000 000C
F100 00E0 05F0 14D8 0000 162D 0000 0000 000C 4BCD 0000 0000 0000 0000
0000 0000
0003 10C3 0000 007E 0004 02AC 0204 19D0 0000 0000 0000 0000 0000 0000
0000 0000
---------------------------------------------------------------------------
LABEL: ERRLOG_ON
IDENTIFIER: 9DBCFDEE

Date/Time: Sun Jun 11 00:53:23 EDT
Sequence Number: 962
Machine Id: 00C3BB9E4C00
Node Id: periop-svr01
Class: O
Type: TEMP
Resource Name: errdemon

Description
ERROR LOGGING TURNED ON

Probable Causes
ERRDEMON STARTED AUTOMATICALLY

User Causes
/USR/LIB/ERRDEMON COMMAND

Recommended Actions
NONE

---------------------------------------------------------------------------



** Checking /dev/hd2 (/usr) MOUNTED FILE SYSTEM; WRITING SUPPRESSED;
Checking a mounted filesystem does not produce dependable results.
** Phase 1 - Check Blocks and Sizes
Unknown file type I=370880 owner=1986306789 mode=16602570755
size=-1229976709 mtime=Feb 20 00:12 1925 (NOT CLEARED) (TERMINATED)

.



Relevant Pages

  • Re: Backing up FREEBSD
    ... Many people are now buying large extra disk drives just to contain ... less convenient for archival backups. ... You can easily write to either tape or disk with dump. ... things you want to back up are put in a particular partition (file system) ...
    (freebsd-questions)
  • Re: dump and ext3
    ... Just because dump doesn't work doesn't mean that backups aren't ... distributor-supplied utilities make up the core OS? ... >> mount a file system within a file on an existing file system ...
    (RedHat)
  • Re: what to do about "cannot dump to dumpdev hd(1/41): space for
    ... hard disk showing 255 hds, ... If the the swap was ... If a panic reports 0 blocks of dump space, ... Changing the starting block of a valid file system ...
    (comp.unix.sco.misc)
  • Re: Can I Rebuild / and /usr Remotely? Ideas?
    ... You may also need to have some space to unroll a dump ... >if the way you are breaking up the file system in to parts with links ... >out our /usr and put it in its own space, you will first need to restore ... The restore order would be ...
    (freebsd-questions)
  • Re: Browsers Resolving Symbolic Link (e.g. Windows Shortcuts)
    ... The HTML document is located physically in the absolute path ... First off what you are asking about is a file system issue, ... to an element retrieving a resource INDIRECTLY, ... In either case the request can be translated from what was asked ...
    (alt.html)