HELP - stale partitions
- From: Zvi Bar-Deroma <zvika@xxxxxxxxxxxxxxxxxxxxx>
- Date: Fri, 15 Dec 2006 12:51:30 +0200
Hi,
[after years without posting it's by second post on different issues within an hour :-)]
I have a serious problem which I have difficulties resolving, on a critical machine (our NFS server).
Environment:
IBM pseries 7028-6c1/p610. aix 5.2ML05 . rootvg is mirrored (hdisk0 & hdisk1, 18GB each). This setup has not changed for the last 5 years. Latest ML was installed around 5/2005. No apparent disk/scsi/storage ever with this machine.
Last week (Dec. 6th.) "stale partition" messages began to appear for all rootvg lv's, and err msgs about hdisk0. Consulting with my dealer's technical support, we tried to unmirror and then mirror rootvg. Mirrroing failed, as can be seen below:
Before command completion, additional instructions may appear below.
0516-1296 lresynclv: Unable to completely resynchronize volume.
The logical volume has bad-block relocation policy turned off.
This may have caused the command to fail.
0516-934 /etc/syncvg: Unable to synchronize logical volume hd6.
0516-932 /etc/syncvg: Unable to synchronize volume group rootvg.
0516-1124 mirrorvg: Quorum requirement turned off, reboot system for this
to take effect for rootvg.
0516-1126 mirrorvg: rootvg successfully mirrored, user should perform
bosboot of system to initialize boot records. Then, user must modify
bootlist to include: hdisk1 hdisk0.
Then we replaced hdisk0 after breaking the mirror, hoping to remirror after adding the new hdisk0 to rootvg. The system wouldn't even add hdisk0 to rootvg ..
Next we did an alt_disk_install from hdisk1 on hdisk0, rebooted with hdisk0 and replaced hdisk1 (so now both disks are new). Note that as lat_disk_install filesets were not installed, we've installed them, but unfortunately we forgot to install fixes for them, and used just the base from the aix 5.2 CDs, giving level of 10 for these filesets. Then managed to mirror rootvg, but after about 30 hours the machine booted, and I have a zillion messages of the form:
EAA3D429 1215095906 U S LVDD PHYSICAL PARTITION MARKED STALE
Immediately after the restart te following message appeared:
9D035E4D 1214165306 P S SYSVMM DATA STORAGE INTERRUPT, PROCESSOR
The deatiled msg for that event was:
LABEL: DSI_PROC
IDENTIFIER: 9D035E4D
Date/Time: Thu Dec 14 16:53:23 IST
Sequence Number: 3019
Machine Id: 0056BC9A4C00
Node Id: aeserv
Class: S
Type: PERM
Resource Name: SYSVMM
Description
DATA STORAGE INTERRUPT, PROCESSOR
Probable Causes
SOFTWARE PROGRAM
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
IF PROBLEM PERSISTS THEN DO THE FOLLOWING
CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
DATA STORAGE INTERRUPT STATUS REGISTER
0000 0000 0000 0000
SEGMENT REGISTER, SEGREG
4000 0000 0000 0000
DATA STORAGE INTERRUPT ADDRESS REGISTER
2000 71A6 F000 0000
EXVAL
2FF3 8DD0 0000 0000
After that began messages of the type:
0BA49C99 1214165606 T H scsi0 SCSI BUS ERROR and then the physical partition stale and op. notification msgs.
At the moment, lspv dows NOT show hdisk0 and lsvg rootvg shows 1 active and 1 stale partition.
If required I'll send the detailed logs, but I didn't think it's right to send it to everyone on the list ...
I'd appreciate any help, in particular concerning what wrong - is it s/w or h/w ? If h/w then what ? scsi ? scsi backplane ?
Regards,
/Zvika
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Zvika Bar-Deroma
Systems and Network manager Phone: (+972)-4-829-2706 ; Fax : (+972)-4-829-2315
Faculty of Aerospace Engineering, Technion, Haifa 32000, Israel
e-mail : zvika@xxxxxxxxxxxxxxxxxxxxx
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Follow-Ups:
- Re: HELP - stale partitions
- From: Hans-Dieter Kutz
- Re: HELP - stale partitions
- Prev by Date: Does tsh require TCB ?
- Next by Date: Re: HELP - stale partitions
- Previous by thread: Does tsh require TCB ?
- Next by thread: Re: HELP - stale partitions
- Index(es):
Relevant Pages
|
|