Summary: IO-Wait of 99% - how to diagnose

extern.Tobias.Kronwitter_at_AUDI.DE
Date: 12/23/04

  • Next message: Filipe Litaiff: "New disk: strange behaviour"
    To: sunmanagers@sunmanagers.org
    Date: Thu, 23 Dec 2004 17:08:39 +0100
    
    

    Hello all,

    thank you for the overwhelming help:

    Kanellopoulos Angelos
    Jeremy Ahl
    PEter (ITServ GmbH)
    Imrick Michael
    Beth Dodge
    Alvin Gunkel
    Clive McAdam
    Terry Franklin
    Murugesan K
    Mossey Fahey
    Victor Engle
    Rebstock, Roland

    The broad consensus is, that most likely have a "display problem".
    The io ist not really high, nor is the server slow, which would indicate a
    high load.

    After a reboot however, the iowait indicated a normal values again. Up to
    now (I waited with this summary) we had no high iowaits any more.
    In case we will experience the problem again, we will install the bug-fix:

            --------------------
            Tobias,
            
            Sun introduced a bug in kernel patch 108528-28. I thought the bug
    was
            fixed in -29 but from your stats it appears not to have been. You
    may
            try installing 117000-05 which seems to be the latest kernel patch.
            
            Here is a link to the bug description on sunsolve and a link to the
    new
            kernel patch,

            
    http://sunsolve.sun.com/search/document.do?assetkey=urn:cds:docid:1-1-497822
    8-1
            
    http://sunsolve.sun.com/pub-cgi/pdownload.pl?target=117000-05&method=h

            Vic
            ------------------

    If so, I will post a second summary.

    Thank you
    Season Greatings to all of you

    Regards Tobias

    Hello all,

    on a Solaris8 / SUN-Fire V440 (SunOS iuaw740 5.8 Generic_108528-29 sun4u
    sparc SUNW,Sun-Fire-V440) we are experiencing a very high IO-Wait problem.
    This Server is configured with Veritas vxvm 4.0 / mp1 and has SAN-Disks
    connected via an Emulex 9002 FCA.

    top reports the following:

    load averages: 0.02, 0.01, 0.02
    11:02:12
    82 processes: 81 sleeping, 1 on cpu
    CPU states: 0.0% idle, 0.0% user, 0.5% kernel, 99.5% iowait, 0.0% swap
    Memory: 8192M real, 6375M free, 469M swap in use, 22G swap free

       PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND
      5818 root 1 58 0 2344K 1440K cpu/0 0:00 0.05% top
       813 root 6 39 0 5240K 4440K sleep 4:56 0.04% picld
     29405 root 6 58 0 4728K 2816K sleep 0:05 0.01% elxdiscoveryd
     28964 root 1 48 0 2544K 2008K sleep 0:00 0.01% bash
      5813 root 1 38 0 6248K 2728K sleep 0:00 0.01% sshd
      1444 root 12 58 0 5368K 5080K sleep 0:31 0.00% mibiisa
     17990 dctm_run 3 58 0 40M 12M sleep 0:07 0.00% documentum
      5816 dctm_run 1 38 0 1392K 1144K sleep 0:00 0.00% sar
      5817 dctm_run 1 48 0 1456K 1128K sleep 0:00 0.00% sadc
      1471 root 1 58 0 0K 0K sleep 0:59 0.00% se.sparcv9.5.8
       980 root 5 58 0 4200K 2440K sleep 0:17 0.00% automountd
     10808 dctm_run 5 58 0 39M 22M sleep 0:09 0.00% documentum
        17 root 1 58 0 12M 10M sleep 0:07 0.00% vxconfigd
      3111 dctm_run 4 58 0 5672K 3728K sleep 0:07 0.00% dmdocbroker
     28215 dctm_run 1 2 0 1896K 1440K sleep 0:06 0.00% ksh

    iostat doesn't indicate hi disk io:

    bash-2.03# iostat 5 15
       tty sd0 sd1 sd2 sd3 cpu
     tin tout kps tps serv kps tps serv kps tps serv kps tps serv us sy wt
    id
       8 35 0 0 0 100 4 9 100 4 10 4 1 9 1 1 25
    73
     125 451 0 0 0 14 4 7 14 4 6 0 0 0 0 1 99
    0
       0 16 0 0 0 0 0 5 0 0 6 0 0 0 0 1 99
    0
       0 16 0 0 0 9 18 3 9 18 3 0 1 4 0 1 99
    0
       0 16 0 0 0 67 26 23 61 26 26 2 1 6 0 0 100
    0
       0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100
    0
       0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100
    0
       0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100
    0
       0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 1 99
    0
       0 16 0 0 0 4 8 3 4 8 3 0 0 4 0 1 98
    0
      10 37 0 0 0 2 1 5 2 1 4 0 1 5 0 0 100
    0
       0 16 0 0 0 4 3 23 21 5 19 0 0 0 0 0 100
    0
      35 137 0 0 0 5 2 7 5 2 5 0 0 0 0 0 100
    0
     126 421 0 0 0 14 4 5 14 4 6 0 0 0 0 1 99
    0
       0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 1 99
    0

    the san disks are not under load either:

    iostat -xnp
                        extended device statistics
        r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0
        1.5 2.7 50.7 50.5 0.0 0.0 0.0 8.8 0 2 c1t0d0
        0.2 0.0 0.1 0.0 0.0 0.0 0.0 0.1 0 0 c1t0d0s0
        0.0 0.1 0.0 0.4 0.0 0.0 0.0 6.2 0 0 c1t0d0s1
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0 0 c1t0d0s2
        1.2 2.7 50.5 50.0 0.0 0.0 0.0 9.7 0 2 c1t0d0s3
        0.2 0.0 0.2 0.0 0.0 0.0 0.0 0.9 0 0 c1t0d0s4
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t0d0s5
        1.5 2.7 51.4 50.0 0.0 0.0 0.0 10.1 0 2 c1t1d0
        0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t1d0s0
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t1d0s1
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0 0 c1t1d0s2
        1.3 2.7 51.3 50.0 0.0 0.0 0.0 10.6 0 2 c1t1d0s3
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.2 0 0 c1t1d0s4
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t1d0s5
        0.6 0.5 2.5 1.9 0.0 0.0 0.0 9.1 0 0 c1t2d0
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0 0 c1t2d0s2
        0.4 0.5 2.3 1.9 0.0 0.0 0.0 12.1 0 0 c1t2d0s3
        0.1 0.0 0.1 0.0 0.0 0.0 0.0 0.2 0 0 c1t2d0s4
        0.4 0.5 1.5 1.9 0.0 0.0 0.0 9.9 0 0 c1t3d0
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t3d0s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.2 0 0 c1t3d0s3
        0.2 0.5 1.4 1.9 0.0 0.0 0.0 12.8 0 0 c1t3d0s4
        0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.9 0 0 c3t30d0
        0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.9 0 0 c3t30d0s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t30d0s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.7 0 0 c3t30d1
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.7 0 0 c3t30d1s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t30d1s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.6 0 0 c3t30d2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.6 0 0 c3t30d2s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t30d2s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0 0 c3t30d3
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0 0 c3t30d3s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t30d3s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t30d4
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t30d4s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t30d4s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t30d5
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t30d5s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t30d5s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0 0 c3t30d6
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0 0 c3t30d6s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t30d6s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0 0 c3t30d7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0 0 c3t30d7s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t30d7s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t30d8
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t30d8s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t30d8s7
        0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.7 0 0 c3t70d0
        0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.7 0 0 c3t70d0s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t70d0s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.6 0 0 c3t70d1
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.6 0 0 c3t70d1s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t70d1s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0 0 c3t70d2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0 0 c3t70d2s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t70d2s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.7 0 0 c3t70d3
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.7 0 0 c3t70d3s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t70d3s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0 0 c3t70d4
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0 0 c3t70d4s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t70d4s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t70d5
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t70d5s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t70d5s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t70d6
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t70d6s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t70d6s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t70d7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t70d7s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t70d7s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t70d8
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 0 c3t70d8s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t70d8s7
        0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.7 0 0 c4t31d0
        0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.7 0 0 c4t31d0s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d0s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d1
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d1s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d1s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d2s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d2s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d3
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d3s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d3s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d4
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d4s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d4s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d5
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d5s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d5s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d6
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d6s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d6s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d7s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d7s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d8
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d8s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t31d8s7
        0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.7 0 0 c4t71d0
        0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.7 0 0 c4t71d0s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d0s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d1
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d1s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d1s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d2s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d2s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d3
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d3s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d3s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d4
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d4s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d4s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d5
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d5s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d5s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d6
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d6s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d6s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d7s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d7s7
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d8
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d8s2
        0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t71d8s7

            --> c3t30d0, c3t70d0 are the same LUN viewed at via two
    hba's => is one plex (mirror) of a volume
                    c4t71d0, c4t71d0 are the same LUN viewed at via two
    hba's => is the other plex of the same volume

    It looks like, the disks aren't the problem.

    Network looks ok also:

    RAWIP
            rawipInDatagrams = 0 rawipInErrors = 0
            rawipInCksumErrs = 0 rawipOutDatagrams = 0
            rawipOutErrors = 0

    UDP
            udpInDatagrams = 33591 udpInErrors = 0
            udpOutDatagrams = 33596 udpOutErrors = 0

    TCP tcpRtoAlgorithm = 4 tcpRtoMin = 400
            tcpRtoMax = 60000 tcpMaxConn = -1
            tcpActiveOpens = 13806 tcpPassiveOpens = 14845
            tcpAttemptFails = 9 tcpEstabResets = 749
            tcpCurrEstab = 15 tcpOutSegs =13312404
            tcpOutDataSegs =11237898 tcpOutDataBytes =63660618
            tcpRetransSegs = 422 tcpRetransBytes =336018
            tcpOutAck =2067205 tcpOutAckDelayed =1853406
            tcpOutUrg = 0 tcpOutWinUpdate = 15
            tcpOutWinProbe = 13 tcpOutControl = 58063
            tcpOutRsts = 1520 tcpOutFastRetrans = 85
            tcpInSegs =11835531
            tcpInAckSegs =10236185 tcpInAckBytes =63671992
            tcpInDupAck = 39885 tcpInAckUnsent = 0
            tcpInInorderSegs =9700449 tcpInInorderBytes =1566751826
            tcpInUnorderSegs = 1 tcpInUnorderBytes = 551
            tcpInDupSegs = 64 tcpInDupBytes = 4171
            tcpInPartDupSegs = 0 tcpInPartDupBytes = 0
            tcpInPastWinSegs = 0 tcpInPastWinBytes = 0
            tcpInWinProbe = 0 tcpInWinUpdate = 3
            tcpInClosed = 184 tcpRttNoUpdate = 347
            tcpRttUpdate =10222115 tcpTimRetrans = 1649
            tcpTimRetransDrop = 5 tcpTimKeepalive = 181
            tcpTimKeepaliveProbe= 16 tcpTimKeepaliveDrop = 1
            tcpListenDrop = 0 tcpListenDropQ0 = 0
            tcpHalfOpenDrop = 0 tcpOutSackRetrans = 0

    IPv4 ipForwarding = 2 ipDefaultTTL = 255
            ipInReceives =11593385 ipInHdrErrors = 0
            ipInAddrErrors = 0 ipInCksumErrs = 0
            ipForwDatagrams = 0 ipForwProhibits = 0
            ipInUnknownProtos = 0 ipInDiscards = 0
            ipInDelivers =11849241 ipOutRequests =13132669
            ipOutDiscards = 0 ipOutNoRoutes = 3
            ipReasmTimeout = 60 ipReasmReqds = 0
            ipReasmOKs = 0 ipReasmFails = 0
            ipReasmDuplicates = 0 ipReasmPartDups = 0
            ipFragOKs = 0 ipFragFails = 0
            ipFragCreates = 0 ipRoutingDiscards = 0
            tcpInErrs = 0 udpNoPorts = 4188
            udpInCksumErrs = 0 udpInOverflows = 0
            rawipInOverflows = 0 ipsecInSucceeded = 0
            ipsecInFailed = 0 ipInIPv6 = 0
            ipOutIPv6 = 0 ipOutSwitchIPv6 = 169

    What else could be the reason ?
    ===============================

    Who could we diagnose this problem ?
    ====================================

    Thank you for your help.
    Tobias
    _______________________________________________
    sunmanagers mailing list
    sunmanagers@sunmanagers.org
    http://www.sunmanagers.org/mailman/listinfo/sunmanagers


  • Next message: Filipe Litaiff: "New disk: strange behaviour"

    Relevant Pages

    • IO-Wait of 99% - how to diagnose
      ... iostat doesn't indicate hi disk io: ... tin tout kps tps serv kps tps serv kps tps serv kps tps serv us sy wt ... hba's => is the other plex of the same volume ... the disks aren't the problem. ...
      (SunManagers)
    • iostat output problem
      ... I'm trying to output only certain disks on my 220R connected to an EMC with ... Since it's running powerpath I see 2x as many disks as needed, ... tin tout kps tps serv kps tps serv kps tps serv kps tps serv us sy wt ... tin tout kps tps serv kps tps serv kps tps serv kps tps serv kps tps serv ...
      (SunManagers)