v880 internal array death



Hello Managers.

I have a 4 node cluster of v880's that refuses to gracefully accept
patching via the 9_Recommended patch cluster. This has been a thorn in
my side for nearly a year.
The 880s run solaris 9, sun cluster 3.5u1, veritas vxvm 3.5u3 for handling
import/deport of luns and sybase 12.5.0.3 databases.
The six internal disks on the 880 are used for booting the system only.
Root is encaplusated and mirrored on disks 0 and 1 and the system can boot
without using rootvol(patched) on disk 2(patched) and 3 (original
unpatched system). Disks 4 and 5 are unused.

There is an open case w/Sun on this but so far no results.
This is a repeatable issue (3 times I've done this in the last 3 weeks)

What's been done.
Pop a node out of cluster
Drop to single user mode.
run the 9_recommended patch cluster
init 0
boot -r
kaboom (this can take anyware from 10 minutes to several hours, depending
on how busy the system is)

Alternatively, running smpatch and downloading and patching "everything
known to man" has the same "kaboom" result.

Booting back into cluster produces the following result. All internal
disks on the 880s shut themselves down and the system eventually panics.

Rebooting the system out of cluster or onto the unpatch os disk brings the
internal disks back to life. Keeping the system patched but out of
cluster seems to be ok but it's hard to tell- the system is idle so it may
take a bunch longer for the problem to manifest itself.
So far Sun has recommended updating firmware (obp and internal fibre
backplane and emulex 9002 hba firmware is all updated). I've been asked
twice by Sun if I have dual paths to my internal storage but afaik there's
only a single loop on each backplane and I have only one backplane.
The problem occured before and after all firmware has been updated.
The problem occurs if an encapsulated root disk is used for boot or a
standalone disk is used for boot. When the system does die there is
usually too much file system corruption to use the patched boot disk
again- so in the case of rootvol it needs to nuked and rebuilt.

Before everything associated with the array fails luxadm shows this:
root@DT5AE1:/:# luxadm display FCloop

SUNWGS INT FCBPL
DISK STATUS
SLOT DISKS (Node WWN)
0 On (O.K.) 2000000087166a36
1 On (O.K.) 200000008715eab2
2 On (O.K.) 20000000871666a4
3 On (O.K.) 2000000087165966
4 On (O.K.) 20000000871650a2
5 On (O.K.) 2000000087161c22
6 On (Login failed)
7 On (Login failed)
8 On (Login failed)
9 On (Login failed)
10 On (Login failed)
11 On (Login failed)
SUBSYSTEM STATUS
FW Revision:9228 Box ID:0
Node WWN:50800200001d2230 Enclosure Name:FCloop
SSC100's - 0=Base Bkpln, 1=Base LoopB, 2=Exp Bkpln, 3=Exp LoopB
SSC100 #0: O.K.(9228/ 15F1)
SSC100 #1: O.K.(9228/ 15F1)
SSC100 #2: Not Installed
SSC100 #3: Not Installed
Temperature Sensors - 0 Base, 1 Expansion
0:26:C
1Not Installed
Default Language is USA English, ASCII
root@DT5AE1:


but what I get is
root@DT5AE1:/:# luxadm display FCloop
Error: Invalid pathname (FCloop)
(luxadm display /dev/es/ses0 shows the same thing)
format fails
df-k works
who works
_______________________________________________
sunmanagers mailing list
sunmanagers@xxxxxxxxxxxxxxx
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



Relevant Pages

  • Daily Report #4165
    ... The resultant cleaned cluster CMDs will ... well-understood host galaxy environment. ... The Nature of the Halos and Thick Disks of Spiral Galaxies ... ACS, NICMOS, and WFPC2 in parallel. ...
    (sci.astro.hubble)
  • Re: Clustering Newbie - SAN Advice
    ... Senior SQL Infrastructure Consultant ... SAN/Smart array or through a fibre channel switch. ... The SAN or Smart array will dictate what internal connection the disks ... single-instance cluster. ...
    (microsoft.public.sqlserver.clustering)
  • changed WWID on cluster member boot disk
    ... single-member cluster; the second member has not yet been added to ... The disks containing the cluster root, ... but an attempt to boot the DS20E as a single-member cluster failed; ... the boot of the stand-alone system, a number of new special device files ...
    (Tru64-UNIX-Managers)
  • Re: Interesting cluster config "deadlock"
    ... I managed an environment where a VAX with locally attached DSSI disks ... needed stuff from the Alphas to boot and the Alphas needed stuff from ... We also needed to retain cluster quorum. ...
    (comp.os.vms)
  • Re: Clustering Newbie - SAN Advice
    ... A SAN generally has gigabtes of cache and uses large internal block sizes ... The SAN or Smart array will dictate what internal connection the disks have. ... Senior SQL Infrastructure Consultant ... single-instance cluster. ...
    (microsoft.public.sqlserver.clustering)