Summary: Sol8 and EVA hangs

From: Eugene Schmidt (fereug_at_acute.co.za)
Date: 11/03/04

  • Next message: Siva Singam Santhakumar: "Summary: Tape drive for Sun Fire v240"
    To: <sunmanagers@sunmanagers.org>
    Date: Wed, 3 Nov 2004 01:33:31 +0200
    
    

    Hi Everybody

    Long overdue summary.

    No applicable answers received. However, it seemed there was some interest
    on this topic.

    Anyway, the Solaris system was healthy, with the failure way downstream in
    the SAN infrastructure (fibre cable between switches). Somehow this slipped
    past the SAN supplier and was only found after this started impacting other
    servers. So much for logs...

    After the fibre was replaced, the errors stopped.

    Best regards

    Eugene
    ===============================================

    Hope someone has seen this one and can help please?

    Customer has an E4500, Solaris 8 with newly attached 2 x EVA disk arrays via
    two QLogic 2200 SBus HBA's. Tesing was 100% and fast.

    Secure Path 3.0D is loaded for channel failover.

    Started experiencing hangs today. What had changed? Was rebooted this
    morning. No changes prior to reboot.

    Initially no errors in /var/adm/messages, but after a second reboot, errors
    started appearing:

    Oct 8 11:00:41 proddb scsi: [ID 243001 kern.warning] WARNING:
    /swsp@0,2/ssd@0,1 (ssd5):
    Oct 8 11:00:41 proddb SCSI transport failed: reason 'aborted':
    retrying command
    Oct 8 11:09:00 proddb scsi: [ID 243001 kern.warning] WARNING:
    /swsp@0,2/ssd@0,0 (ssd4):
    Oct 8 11:09:00 proddb SCSI transport failed: reason 'aborted':
    retrying command
    Oct 8 11:58:52 proddb scsi: [ID 243001 kern.warning] WARNING:
    /swsp@0,2/ssd@0,0 (ssd4):
    Oct 8 11:58:52 proddb SCSI transport failed: reason 'aborted':
    retrying command
    Oct 8 12:11:13 proddb scsi: [ID 243001 kern.warning] WARNING:
    /swsp@0,2/ssd@0,0 (ssd4):

    Disks c7t0d0 c7t0d1 hanging. C6 performs beautifully.

    Switch logs and EVA logs shows nothing.

    No other error messages except the shown above.

    Mounting disk readonly and putting heavy I/O on it emulates problem.

    Also, iostat shows disk as 100% busy, with no I/O passing thru. hsx dev -
    current path - has same hung state:
    "9 9 17 66
                        extended device statistics
        r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 hsx1
        ....
        0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0 100 hsx813
        .....
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0
        0.0 0.8 0.0 0.4 0.0 0.0 0.0 13.9 0 1 c0t1d0
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t6d0
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c6t0d0
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c6t0d1
        0.0 4.2 0.0 18.6 0.0 0.0 0.0 0.4 0 0 c6t0d2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c6t0d3
        0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0 100 c7t0d0
        0.0 0.0 ...
    "

    Below lenghty config files as installed by install script.

    Promise a summary.

    Thx

    E Schmidt
    ==========

    "spmgr" display shows the following config:
    # spmgr display
      Server: acproddb10 Report Created: Fri, Oct 08 16:34:46 2004
      Command: spmgr display
      = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
      Storage: 5000-1FE1-5002-81C0
      Load Balance: Off Auto-restore: Off
      Path Verify: On Verify Interval: 30
      HBAs: qla2200-0 qla2200-2
      Controller: P5849D5AAPW01O, Operational
                   P5849D5AAPW038, Operational
      Devices: c6t0d0 c6t0d1 c6t0d2 c6t0d3

      TGT/LUN Device WWLUN_ID
    #_Paths
        0/ 0 c6t0d0 6005-08B4-0001-3879-0000-D000-0150-0000 4

              Controller Path_Instance HBA Preferred?
    Path_Status
              P5849D5AAPW01O no
                          hsx-1-37-1 qla2200-0 no Active
                          hsx-3655-36-1 qla2200-2 no
    Available

              Controller Path_Instance HBA Preferred?
    Path_Status
              P5849D5AAPW038 no
                          hsx-204-38-1 qla2200-0 no
    Standby
                          hsx-3858-39-1 qla2200-2 no
    Standby

      TGT/LUN Device WWLUN_ID
    #_Paths
        0/ 1 c6t0d1 6005-08B4-0001-3879-0000-D000-0153-0000 4

              Controller Path_Instance HBA Preferred?
    Path_Status
              P5849D5AAPW01O no
                          hsx-2-37-2 qla2200-0 no
    Standby
                          hsx-3656-36-2 qla2200-2 no
    Standby

              Controller Path_Instance HBA Preferred?
    Path_Status
              P5849D5AAPW038 no
                          hsx-205-38-2 qla2200-0 no Active
                          hsx-3859-39-2 qla2200-2 no
    Available

      TGT/LUN Device WWLUN_ID
    #_Paths
        0/ 2 c6t0d2 6005-08B4-0001-3879-0000-D000-0156-0000 4

              Controller Path_Instance HBA Preferred?
    Path_Status
              P5849D5AAPW01O no
                          hsx-3-37-3 qla2200-0 no Active
                          hsx-3657-36-3 qla2200-2 no
    Available

              Controller Path_Instance HBA Preferred?
    Path_Status
              P5849D5AAPW038 no
                          hsx-206-38-3 qla2200-0 no
    Standby
                          hsx-3860-39-3 qla2200-2 no
    Standby

      TGT/LUN Device WWLUN_ID
    #_Paths
        0/ 3 c6t0d3 6005-08B4-0001-3879-0000-D000-0164-0000 4

              Controller Path_Instance HBA Preferred?
    Path_Status
              P5849D5AAPW01O no
                          hsx-4-37-4 qla2200-0 no
    Standby
                          hsx-3658-36-4 qla2200-2 no
    Standby

              Controller Path_Instance HBA Preferred?
    Path_Status
              P5849D5AAPW038 no
                          hsx-207-38-4 qla2200-0 no Active
                          hsx-3861-39-4 qla2200-2 no
    Available

      Storage: 5000-1FE1-5002-2510
      Load Balance: Off Auto-restore: Off
      Path Verify: On Verify Interval: 30
      HBAs: qla2200-0 qla2200-2
      Controller: P5849D5AAPC09X, Operational
                   P5849D5AAPC09E, Operational
      Devices: c7t0d0 c7t0d1 c7t0d2 c7t0d3

      TGT/LUN Device WWLUN_ID
    #_Paths
        0/ 0 c7t0d0 6005-08B4-0001-24D1-0000-A000-0193-0000 4

              Controller Path_Instance HBA Preferred?
    Path_Status
              P5849D5AAPC09X no
                          hsx-813-33-1 qla2200-0 no
    Standby
                          hsx-4467-32-1 qla2200-2 no
    Standby

              Controller Path_Instance HBA Preferred?
    Path_Status
              P5849D5AAPC09E YES
                          hsx-1016-34-1 qla2200-0 no Active
                          hsx-4670-35-1 qla2200-2 no
    Available

      TGT/LUN Device WWLUN_ID
    #_Paths
        0/ 1 c7t0d1 6005-08B4-0001-24D1-0000-A000-0196-0000 4

              Controller Path_Instance HBA Preferred?
    Path_Status
              P5849D5AAPC09X no
                          hsx-814-33-2 qla2200-0 no Active
                          hsx-4468-32-2 qla2200-2 no
    Available

              Controller Path_Instance HBA Preferred?
    Path_Status
              P5849D5AAPC09E no
                          hsx-1017-34-2 qla2200-0 no
    Standby
                          hsx-4671-35-2 qla2200-2 no
    Standby

      TGT/LUN Device WWLUN_ID
    #_Paths
        0/ 2 c7t0d2 6005-08B4-0001-24D1-0000-A000-0199-0000 4

              Controller Path_Instance HBA Preferred?
    Path_Status
              P5849D5AAPC09X no
                          hsx-815-33-3 qla2200-0 no
    Standby
                          hsx-4469-32-3 qla2200-2 no
    Standby

              Controller Path_Instance HBA Preferred?
    Path_Status
              P5849D5AAPC09E YES
                          hsx-1018-34-3 qla2200-0 no Active
                          hsx-4672-35-3 qla2200-2 no
    Available

      TGT/LUN Device WWLUN_ID
    #_Paths
        0/ 3 c7t0d3 6005-08B4-0001-24D1-0000-A000-01A7-0000 4

              Controller Path_Instance HBA Preferred?
    Path_Status
              P5849D5AAPC09X no
                          hsx-816-33-4 qla2200-0 no Active
                          hsx-4470-32-4 qla2200-2 no
    Available

              Controller Path_Instance HBA Preferred?
    Path_Status
              P5849D5AAPC09E no
                          hsx-1019-34-4 qla2200-0 no
    Standby
                          hsx-4673-35-4 qla2200-2 no
    Standby
    ======== END OF OUTPUT ============

    Entries in /etc/system:
    * Start of CPQhsv edits. DO NOT DELETE THIS LINE
    forceload: drv/clone
    set maxphys=8388608
    set sd:sd_max_throttle=32
    set sd:sd_io_time=180
    * End of CPQhsv edits. DO NOT DELETE THIS LINE
    * Start of HPfcraid edits. DO NOT DELETE THIS LINE
    forceload: drv/clone
    forceload: drv/ssd
    set maxphys=8388608
    set sd:sd_max_throttle=32
    set sd:sd_io_time=180
    set ssd:ssd_max_throttle=32
    set ssd:ssd_io_time=180
    * End of HPfcraid edits. DO NOT DELETE THIS LINE

    set shmsys:shminfo_shmmax=4194304000
    ------- EOF ---------------

    Entries in /kernel/drv/ssd.conf:
    #
    # Copyright (c) 1995-1999 by Sun Microsystems, Inc.
    # All rights reserved.
    #
    #ident "@(#)ssd.conf 1.9 99/07/29 SMI"

    name="ssd" parent="SUNW,pln" port=0 target=0;
    ....
    name="ssd" parent="SUNW,pln" port=0 target=15;
    name="ssd" parent="SUNW,pln" port=1 target=0;
    name="ssd" parent="SUNW,pln" port=1 target=1;
    .....
       ditto port=1 to port=5, with target=0 thru target=15
    .....
    name="ssd" parent="SUNW,pln" port=5 target=15;
    name="ssd" parent="sf" target=0;
    name="ssd" parent="fp" target=0;
    name="ssd" parent="ifp" target=127;
    name="ssd" parent="scsi_vhci" target=0;
    ---EOF --------------
    /kernel/drv/hsx.conf:
    #
    # Compaq StorageWorks Secure Path
    # hsx.conf - Hardware Configuration file for hsx, a Disk Array Block
    # SCSI Target driver. Refer to the driver.conf(4) manpage
    # for more information on the syntax of this file.
    #
    # name "hsx" - required
    # class "scsi" - required
    # target SCSI target-ID
    # lun SCSI logical unit number
    # qdepth depth of command queue (1,..,64)
    # parent restrict parent HBA
    # preferred this path is preferred for a controller when load
    # balancing is disabled
    #
    # If no "parent=" qualifier is present, all SCSI-HBA adapters in
    # the system will attempt to attach an HSX instance at the indicated
    # target/lun on the SCSI bus.
    #
    # HSX will only attach device instances for Compaq StorageWorks HSx80
    # disk array targets. The SD device will also want to claim these
    # targets. Explicit use of "parent=" in sd.conf may be required to
    # resolve conflicts.
    #
    # Each HSX instance found will result in a path being provided via
    # the misc/path driver.
    name="hsx" parent="qla2200" target=37 lun=0 qdepth=32;
    name="hsx" parent="qla2200" target=37 lun=1 qdepth=32;
    name="hsx" parent="qla2200" target=37 lun=2 qdepth=32;
    name="hsx" parent="qla2200" target=37 lun=3 qdepth=32;
    name="hsx" parent="qla2200" target=37 lun=4 qdepth=32;
    name="hsx" parent="qla2200" target=37 lun=5 qdepth=32;
    .... etc,
    For targets = 32 to 39 (although not in sequence) , lun= 0 thru 202
    ============= EOF

    Contents of /kernel/drv/qla2300.conf

    # Number of times to retry a SCSI queue full error.
    # Range: 0 - 255
    hba0-queue-full-retry-count=16;

    # Amount of time to delay after a SCSI queue full error before
    # starting any new I/O commands.
    # Range: 0 - 255 seconds
    hba0-queue-full-retry-delay=2;

    # Maximum fibre channel frame size.
    # Range: 512, 1024 or 2048 bytes
    hba0-max-frame-length=1024;

    # Maximum number of commands queued on each logical unit.
    # Range: 1 - 65535
    hba0-execution-throttle=16;

    # Number of port login retry attempts.
    # Range: 0 - 255
    hba0-login-retry-count=8;

    # Enable/disable the use adapter hard loop ID address on the fibre
    # channel bus.
    # 0 = disable, 1 = enabled
    hba0-enable-adapter-hard-loop-ID=0;

    # Adapter hard loop ID address to use on the fibre channel bus.
    # Range: 0 - 125
    hba0-adapter-hard-loop-ID=0;

    # Enable/disable the use LIP reset for loop reset.
    # 0 = disable, 1 = enabled
    hba0-enable-LIP-reset=0;

    # Enable/disable the use LIP full login for loop reset.
    # 0 = disable, 1 = enabled
    hba0-enable-LIP-full-login=1;

    # Enable/disable the use of target reset for loop reset.
    # 0 = disable, 1 = enabled
    hba0-enable-target-reset=0;

    # Amount of time to delay after a loop reset for starting any new
    # I/O commands.
    # Range: 0 - 255 seconds
    hba0-reset-delay=5;

    # Number of times to retry a port that is not responding.
    # Range: 0 - 255
    hba0-port-down-retry-count=90;

    # Maximum number of LUNs to scan for, if a device does not
    # support SCSI Report LUNs command.
    # Range: 1 - 256
    hba0-maximum-luns-per-target=8;

    # Connection options.
    # 0 = loop only
    # 1 = point-to-point only
    # 2 = loop preferred, otherwise point-to-point
    # 3 = point-to-point preferred, otherwise loop
    hba0-connection-options=1;

    # Fibre Channel tape support enable/disable.
    # 0 = disable, 1 = enabled
    hba0-fc-tape=1;

    # PCI latency timer.
    # Range: 0 - 0xF8
    # Default: 0x40
    hba0-pci-latency-timer=0x40;

    # During link down conditions enable/disable the reporting of
    # errors.
    # 0 = disabled, 1 = enable
    hba0-link-down-error=1;

    # Amount of time to wait for loop to come up after it has gone down
    # before reporting I/O errors.
    # Range: 0 - 240 seconds
    hba0-link-down-timeout=10;

    # Persistent binding only option.
    # 0 = Reports to OS discovery of binded and non-binded devices
    # 1 = Reports to OS discovery of persistent binded devices only
    hba0-persistent-binding-configuration=1;

    # Fast error reporting to Solaris, enabled/disabled.
    # 0 = disabled, 1 = enable
    hba0-fast-error-reporting=0;

    # Enable extended logging.
    # 0 = disabled, 1 = enable
    hba0-extended-logging=0;

    #####################################################################
    # WARNING: Beginning of Configuration Data stored by the QLogic #
    # Applications. Consult documentation before editing #
    # any data passed this text. #
    #####################################################################

    # CPQ installation changes made.

    # CPQswsp: start of Secure Path edits. Caution: do not remove! This line is
    used by pkgadd/pkgrm.

    hba0-SCSI-target-id-37-fibre-channel-port-name="50001FE1500281C9";
    hba2-SCSI-target-id-37-fibre-channel-port-name="50001FE1500281C9";
    hba0-SCSI-target-id-38-fibre-channel-port-name="50001FE1500281CC";
    hba2-SCSI-target-id-38-fibre-channel-port-name="50001FE1500281CC";
    hba0-SCSI-target-id-36-fibre-channel-port-name="50001FE1500281C8";
    hba2-SCSI-target-id-36-fibre-channel-port-name="50001FE1500281C8";
    hba0-SCSI-target-id-39-fibre-channel-port-name="50001FE1500281CD";
    hba2-SCSI-target-id-39-fibre-channel-port-name="50001FE1500281CD";
    hba0-SCSI-target-id-33-fibre-channel-port-name="50001FE150022519";
    hba2-SCSI-target-id-33-fibre-channel-port-name="50001FE150022519";
    hba0-SCSI-target-id-34-fibre-channel-port-name="50001FE15002251C";
    hba2-SCSI-target-id-34-fibre-channel-port-name="50001FE15002251C";
    hba0-SCSI-target-id-32-fibre-channel-port-name="50001FE150022518";
    hba2-SCSI-target-id-32-fibre-channel-port-name="50001FE150022518";
    hba0-SCSI-target-id-35-fibre-channel-port-name="50001FE15002251D";
    hba2-SCSI-target-id-35-fibre-channel-port-name="50001FE15002251D";

    # CPQswsp: end of Secure Path edits. Caution: do not remove! This line is
    used by pkgadd/pkgrm.
    =========== EOF =====================
    /kernel/drv/swsp.conf
    # Compaq StorageWorks Secure Path
    # swsp.conf - Configuration file for swsp
    #
    # use swsp.conf to configure which arrays can be controlled by Secure Path
    # add one entry of the following form per array:
    # name="swsp" class="root" portid=0 reg=0x0,0x(instance+1),0x1
    # instance=(instance #) array-name="ARRAY_WWID";
    #
    # configurable parameters can be set globally, or on an array basis by
    # adding one of path-verify, path-verify-period load-balance or auto-restore
    # to the line defining the array instance, or on a line by itself (for
    global)
    #
    # path-verify=?
    # 1= path-verification enabled
    # 0= path-verification disabled
    # path-verify-period=X
    # X = number of seconds between path verification attempts
    #
    # load-balance=?
    # 1= enabled
    # 0= disabled
    #
    # auto-restore=?
    # 1= enabled
    # 0= disabled
    #
    path-verify=1;
    name="swsp" class="root" portid=0 reg=0x0,0x1,0x1 instance=0
    array-name="5000-1FE1-5002-81C0";
    wwlid-0-0="6005-08B4-0001-3879-0000-D000-0150-0000@0,0";
    wwlid-0-1="6005-08B4-0001-3879-0000-D000-0153-0000@0,1";
    wwlid-0-2="6005-08B4-0001-3879-0000-D000-0156-0000@0,2";
    wwlid-0-3="6005-08B4-0001-3879-0000-D000-0164-0000@0,3";
    name="swsp" class="root" portid=0 reg=0x0,0x2,0x1 instance=1
    array-name="5000-1FE1-5002-2510";
    wwlid-1-0="6005-08B4-0001-24D1-0000-A000-0193-0000@0,0";
    wwlid-1-1="6005-08B4-0001-24D1-0000-A000-0196-0000@0,1";
    wwlid-1-2="6005-08B4-0001-24D1-0000-A000-0199-0000@0,2";
    wwlid-1-3="6005-08B4-0001-24D1-0000-A000-01A7-0000@0,3";
    ======================== EOF ========================================
    =====================================================================
    _______________________________________________
    sunmanagers mailing list
    sunmanagers@sunmanagers.org
    http://www.sunmanagers.org/mailman/listinfo/sunmanagers
    _______________________________________________
    sunmanagers mailing list
    sunmanagers@sunmanagers.org
    http://www.sunmanagers.org/mailman/listinfo/sunmanagers


  • Next message: Siva Singam Santhakumar: "Summary: Tape drive for Sun Fire v240"

    Relevant Pages

    • Sol8 and EVA hangs
      ... Secure Path 3.0D is loaded for channel failover. ... retrying command ... # Adapter hard loop ID address to use on the fibre channel bus. ...
      (SunManagers)
    • Re: Using foreach loop to create radiobutton menu
      ... Your foreach loop is not the problem, the problem is how you define the -command. ... The only possible thing the interpreter can do is substitute the current value of range, which is likely the last value once your loop exited. ... One choice is double quotes. ... The double quotes means that $range gets expanded while in the loop, long before the puts command actually runs. ...
      (comp.lang.tcl)
    • Re: Issue implementing Runtime.exec() with StreamGobbler
      ... So jloader basically calls LoaderAutomationRun.java (starts the loop) ... As you can see the last line, the New Command Executer never returned. ... If your test driver is hanging, I'd guess that the test process itself is ...
      (comp.lang.java.programmer)
    • Re: What does Ctrl+C really do
      ... loop) to fgetsthe user's commands. ... As posted before there is a signal handler taking care of that. ... I interprete "windows terminal" as a Command Shell in MS-WIndows, ... Shell detects that the telnet client has been gone it terminates itself. ...
      (comp.os.vxworks)
    • Re: [PHP] var_dump() results
      ... reporting all 2100 rows. ... loop wouldn't loop. ... The first row was reported once (or twice) and ... it just reports the first row. ...
      (php.general)