880 panicking

From: Anshuman Kanwar (anshuman_at_expertcity.com)
Date: 06/18/03

  • Next message: Martynas Buozis: "bash and user home dir problem"
    To: "'sunmanagers@sunmanagers.org'" <sunmanagers@sunmanagers.org>
    Date: Wed, 18 Jun 2003 05:17:37 -0700
    
    

    HI Managers,

    I have a 2 node cluster setup :

    2 x v880 cross connected to 2 3310's via LVD SCSI running solaris 9 + sun
    cluster 3.0 + solaris volume manager

    After a deliberate reboot the machine is panicking. Seems like a SCSI bus
    issue. The SCSI bus probe (scsi-probe-all) also seems to get stuck at a
    point. Outputs arte attached below.

    This looks like bad hardware right ? Probably the HBA. Has anyone seen this
    happen with SC 3.0 before ?

    I got no log messages when the node was running.

    Thanks,
    -ansh

    --------------------

    {3} ok probe-scsi-all
    /pci@9,700000/pci@2/scsi@5

    /pci@9,700000/pci@2/scsi@4
    Target 0
      Unit 0 Disk SUN StorEdge 3310 0325
      Unit 1 Disk SUN StorEdge 3310 0325

    /pci@8,600000/SUNW,qlc@2
    LiD HA LUN --- Port WWN --- ----- Disk description -----
     0 0 0 500000e01020a481 FUJITSU MAN3735F SUN72G 0604
     1 1 0 500000e0101fc9f1 FUJITSU MAN3735F SUN72G 0604
     2 2 0 500000e0101fd021 FUJITSU MAN3735F SUN72G 0604
     6 6 0 50800200001c4fe9 SUNW SUNWGS INT FCBPL9226
     3 3 0 500000e0102008c1 FUJITSU MAN3735F SUN72G 0604
     4 4 0 500000e010201f51 FUJITSU MAN3735F SUN72G 0604
     5 5 0 21000004cf2b91b7 SEAGATE ST373405FSUN72G 0638

    /pci@8,700000/scsi@1
    Script interrupt: Reserved phase
    Fatal SCSI error at script address 8 Unexpected disconnect
    Arbitration Complete
    Script interrupt: Reserved phase
    Fatal SCSI error at script address 8 Unexpected disconnect
    Arbitration Complete
    Script interrupt: Reserved phase
    Fatal SCSI error at script ad

    > -----Original Message-----
    > From: Anshuman Kanwar
    > Sent: Tuesday, June 17, 2003 11:51 PM
    > To: 'jh1@sun.com'
    > Subject: case no 63595964
    >
    > << File: cluster1.txt >>

    ---------------------

    Sun Fire 880, No Keyboard
    Copyright 1998-2002 Sun Microsystems, Inc. All rights reserved.
    OpenBoot 4.7.5, 20480 MB memory installed, Serial #53149406.
    Ethernet address 0:3:ba:2a:fe:de, Host ID: 832afede.

    Rebooting with command: boot
    Boot device: /pci@8,600000/SUNW,qlc@2/fp@0,0/disk@w500000e01020a481,0:a
    File and args:
    SunOS Release 5.9 Version Generic_112233-04 64-bit
    Copyright 1983-2002 Sun Microsystems, Inc. All rights reserved.
    Use is subject to license terms.
    WARNING: forceload of misc/md_trans failed
    WARNING: forceload of misc/md_raid failed
    WARNING: forceload of misc/md_hotspares failed
    WARNING: forceload of misc/md_sp failed
    configuring IPv4 interfaces: ce0.
    Hostname: v880-1
    WARNING: /pci@8,700000/scsi@1 (glm0):
            Resetting scsi bus, got incorrect phase from (0,0)
    WARNING: /pci@8,700000/scsi@1 (glm0):
            timeout on bus reset interrupt
    WARNING: glm0: fault detected in device; service unavailable
    WARNING: glm0: timeout on bus reset interrupt
    Could not open /dev/rdsk/c0t6d0s2 to verify device id.
            No such device or address
    device id for '/dev/rdsk/c1t5d0' does not match physical disk's id.
    The drive may have been replaced
    Booting as part of a cluster
    NOTICE: CMM: Node v880-1 (nodeid = 1) with votecount = 1 added.
    NOTICE: CMM: Node v880-2 (nodeid = 2) with votecount = 1 added.
    NOTICE: CMM: Quorum device 1 (/dev/did/rdsk/d8s2) added; votecount = 1,
    bitmask of nodes with configured paths = 0x3.
    WARNING: CMM: Initialization for quorum device /dev/did/rdsk/d8s2 failed
    with error EACCES. Will retry later.
    NOTICE: clcomm: Adapter hme0 constructed
    NOTICE: clcomm: Path v880-1:hme0 - v880-2:hme0 being constructed
    NOTICE: clcomm: Adapter ge0 constructed
    NOTICE: clcomm: Path v880-1:ge0 - v880-2:ge0 being constructed
    NOTICE: CMM: Node v880-1: attempting to join cluster.
    SUNW,pci-gem0: Using Gigabit SERDES Interface
    SUNW,pci-gem0: Auto-Negotiated 1000 Mbps Full-Duplex Link Up
    NOTICE: clcomm: Path v880-1:ge0 - v880-2:ge0 being initiated
    NOTICE: clcomm: Path v880-1:ge0 - v880-2:ge0 online
    NOTICE: CMM: Node v880-2 (nodeid: 2, incarnation #: 1049240903) has become
    reachable.
    WARNING: CMM: Reading reservation keys from quorum device /dev/did/rdsk/d8s2
    failed with error 2.
    NOTICE: CMM: Cluster has reached quorum.
    NOTICE: CMM: Node v880-1 (nodeid = 1) is up; new incarnation number =
    1055914729.
    NOTICE: CMM: Node v880-2 (nodeid = 2) is up; new incarnation number =
    1049240903.
    NOTICE: CMM: Cluster members: v880-1 v880-2.
    NOTICE: CMM: node reconfiguration #3 completed.
    NOTICE: CMM: Node v880-1: joined cluster.
    Could not open /dev/rdsk/c0t6d0s2 to verify device id.
            No such device or address
    device id for '/dev/rdsk/c1t5d0' does not match physical disk's id.
    The drive may have been replaced
    The system is coming up. Please wait.
    WARNING: md: d1105: (Unavailable) needs maintenance
    checking ufs filesystems
    /dev/rdsk/c1t3d0s0: is logging.
    /dev/rdsk/c1t2d0s0: is logging.
    /dev/md/rdsk/d1115: is logging.
    NOTICE: clcomm: Path v880-1:hme0 - v880-2:hme0 being initiated
    NOTICE: clcomm: Path v880-1:hme0 - v880-2:hme0 online
    /dev/rdsk/c1t0d0s7: is logging.
    /dev/md/rdsk/d1006: is logging.
    starting rpc services: rpcbind done.
    Setting netmask of lo0:1 to 255.255.255.255
    Setting netmask of ce0 to 255.255.255.0
    Setting netmask of hme0 to 255.255.255.128
    Setting netmask of hme0:2 to 255.255.255.252
    Setting netmask of ge0 to 255.255.255.128
    Setting default IPv4 interface for multicast: add net 224.0/4: gateway
    v880-1
    syslog service starting.
    obtaining access to all attached disks
    System dump time: Tue Jun 17 22:33:37 2003
    savecore: not enough space in /var/crash/v880-1 (362 MB avail, 1711 MB
    needed)
    Jun 17 22:39:29 v880-1 savecore: not enough space in /var/crash/v880-1 (362
    MB avail, 1711 MB needed)
    volume management starting.
    Jun 17 22:39:35 v880-1 metadevadm: Unnamed device detected. Please run
    devfsadm && metadevadm -r to resolve.
    Executing devfsadm

    Executing metadevadm -r
    Unable to resolve unnamed devices for volume management.
    Please refer to the Solaris Volume Manager documentation,
    Troubleshooting section, at http://docs.sun.com or from
    your local copy.

    panic[cpu1]/thread=2a100125d40: BAD TRAP: type=31 rp=2a100125500
    addr=308000cd500 mmu_fsr=0

    sched: trap type = 0x31
    addr=0x308000cd500
    pid=0, pc=0x108cdf8, sp=0x2a100124da1, tstate=0x880001607, context=0x0
    g1-g7: 149e400, 7ffff, 0, 1, 1, 0, 2a100125d40

    000002a100125230 unix:die+a4 (31, 2a100125500, 308000cd500, 0, 0, 0)
      %l0-3: 0000000000000000 00000300000cd508 000002a100125500 000002a1001253f8
      %l4-7: 0000000000000031 000000000000045b 000000000115ccd8 0000030006cb18b0
    000002a100125310 unix:trap+874 (2a100125500, 0, 10000, 10200, 308, 1)
      %l0-3: 0000000000000001 0000000000000000 0000000001437888 0000000000000031
      %l4-7: 0000000000000006 0000000000000001 0000000000000000 0000000000000000
    000002a100125450 unix:ktl0+48 (300000cd508, 0, 20, 7fffffff8, 0,
    300056cd848)
      %l0-3: 0000000000000000 0000000000001400 0000000880001607 000000000102aaf8
      %l4-7: 000000000147d7c0 0000000001437800 0000000000000000 000002a100125500
    000002a1001255a0 unix:kstat_rele+20 (ffffffffffffffff, 3, 4, 30000373f28,
    76, 140e000)
      %l0-3: 0000000001428f48 0000030006cb18b0 0000000000000000 00000300056ae076
      %l4-7: 00000300001dea48 00000300001deb10 0000000000000076 00000300001dee8a
    000002a100125650 md:md_layered_close+d4 (ffffffffffffffff, 3, 4, 0, 0, 10)
      %l0-3: ffffffffffffffff 0000000000000002 0000030000373f28 00000000ffffffff
      %l4-7: 00000000ffffffff 0000000000000076 00000300056ad850 0000030000361438
    000002a100125700 md_stripe:stripe_close_all_devs+dc (30005631be4,
    30005631be4, 20, 76, 30007377000, 0)
      %l0-3: 0000000000000001 0000000000000001 0000000000000002 0000000000000000
      %l4-7: 0000000000000002 0000030005631b88 00000300056ad850 0000030005631bf8
    000002a1001257b0 md_stripe:stripe_close+88 (5500000451, 3, 4, 30000373f28,
    2, 0)
      %l0-3: 0000000000000000 0000000000000002 00000300056a7f30 0000030005631b88
      %l4-7: 00000300056a7f30 0000000000000451 0000000000000002 0000000000000001
    000002a100125860 md_mirror:mirror_probe_close_all_devs+b8 (5500000451,
    300001e3338, 1, 1, 300001e30e8, 300001e3144)
      %l0-3: 00000000012e2b00 0000000000000001 0000000000000002 000000000000ffff
      %l4-7: 00000300001e30e8 00000300001e3144 00000300001e3318 00000300001e319c
    000002a100125910 md_mirror:mirror_probe_dev+208 (108, 45b, 1437888, 1437888,
    0, 0)
      %l0-3: 0000000000000004 000000000000045b 0000000000000001 000000000000ffff
      %l4-7: 00000300056a7a50 000000000000045b 00000300001e30e8 00000300001e3144
    000002a1001259d0 md:md_probe_one+40 (300136a2048, 2a100125d40, 20, 148a0a8,
    2a100125d40, 0)
      %l0-3: 00000000012e99dc 000000000149e570 0000030007bbfe50 ffffffffffffffff
      %l4-7: 0000000001400090 000000000142d5b8 0000000001442400 0000030005633740
    000002a100125a80 md:md_daemon+220 (0, 149e540, 1437888, 1437888, 148a0b2, 0)
      %l0-3: 00000000011c6994 00000300136a2048 0000000000000000 000002a10012bd40
      %l4-7: 000000000149e570 000000000149e568 0000030000010e00 0000030005633840

    syncing file systems... done
    dumping to /dev/md/dsk/d1001, offset 4195221504, content: kernel
      9% done
    _______________________________________________
    sunmanagers mailing list
    sunmanagers@sunmanagers.org
    http://www.sunmanagers.org/mailman/listinfo/sunmanagers


  • Next message: Martynas Buozis: "bash and user home dir problem"

    Relevant Pages

    • Re: OT: Elephants Cant Dance
      ... RDBMS product. ... but its pieces parts that make it work - the quroum disk, the lock manager, tcpip cluster address etc... ... They added a few more pieces on top of it to manage the database "services", but its base is still the VMS cluster manager.. ...
      (comp.os.vms)
    • Re: Lost Quorum
      ... I'm by no means an expert in this subject matter of Veritas Volume Manager, ... is that we don't support Dynamic Disk on a Cluster. ... > terminate resource... ...
      (microsoft.public.windows.server.clustering)
    • Re: hacmp 5.2 acting weird??
      ... Cluster Topology ... WARNING: The mode of volume group mqmvg cannot be determined on node ... node ONHNAP ... node ONHNAT ...
      (comp.unix.aix)
    • hacmp 5.2 acting weird??
      ... Cluster Topology ... WARNING: The mode of volume group mqmvg cannot be determined on node ... node ONHNAP ... node ONHNAT ...
      (comp.unix.aix)
    • Re: hacmp 5.2 acting weird??
      ... Cluster Topology ... WARNING: The mode of volume group mqmvg cannot be determined on node ... node ONHNAP ... node ONHNAT ...
      (comp.unix.aix)

    Loading