DCPID - Sample Script - Cluster hangs for over 300 Seconds

David.Knight_at_clubcorp.com
Date: 05/12/04

  • Next message: arup_at_intercode.com.au: "Software"
    Date: Wed, 12 May 2004 08:49:25 -0500
    To: tru64-unix-managers@ornl.gov
    
    

    Managers et al,

            I have been looking into the DCPID to profile my kernel in trouble
    shooting efforts. If any one has information on the DCPI (
    http://h30097.www3.hp.com/dcpi/ ) command/etc I would greatly appreciate
    the info/help with correctly running this utility.

    Now to the problem,
            I have a three member cluster running TruCluster 5.1B with every
    patch kit (3) and early release patch installed. the cluster (or some
    times two or even one member) will hang for anywhere from 100 seconds to
    300 seconds. during this period every things stops responding I can't even
    run a `pwd` command. I have plenty of collect data from when the problem
    happens but during the time even collect stops collecting information so
    there is a gap in the logs. I get the messages (below) from the evm. some
    apps crash and report time out errors but oracle seems to make it threw
    the hard times. If any one out there has experienced this or something
    close to it your advice would be greatly appreciated

    Thanks in advance,
    David Knight

    ============================ Syslog event ============================
    EVM event name: sys.unix.syslog.daemon

        Syslog daemon events are posted by system daemons to alert the
        administrator to an unusual condition. The user name field usually
        indicates which daemon posted the event. The text of the message
        indicates the reason for the event.

    ======================================================================

    Formatted Message:
        CAAD[1049179]: RTD #0: Action Script
        /var/cluster/caa/script/cluster_lockd.scr(check) timed out!
    (timeout=60)

    Event Data Items:
        Event Name : sys.unix.syslog.daemon
        Priority : 600
        PID : 1049080
        PPID : 1048577
        Event Id : 5547
        Member Id : 2
        Timestamp : 05-Jan-2004 08:14:40
        Host IP address : 10.10.5.140
        Cluster IP address: 10.10.5.151
        Host Name : dalunix140.clubcorp.com
        Cluster Name : dalunixcl
        User Name : root
        Format : CAAD[1049179]: RTD #0: Action Script
                            /var/cluster/caa/script/cluster_lockd.scr(check)
    timed
                            out! (timeout=60)
        Reference : cat:evmexp.cat:200

    Variable Items:
        None

    ======================================================================

    ============================ Syslog event ============================
    EVM event name: sys.unix.syslog.daemon

        Syslog daemon events are posted by system daemons to alert the
        administrator to an unusual condition. The user name field usually
        indicates which daemon posted the event. The text of the message
        indicates the reason for the event.

    ======================================================================

    Formatted Message:
        CAAD[1573780]: RTD #0: Action Script
        /var/cluster/caa/script/swcc.scr(check) timed out! (timeout=60)

    Event Data Items:
        Event Name : sys.unix.syslog.daemon
        Priority : 600
        PID : 1573526
        PPID : 1572865
        Event Id : 14788
        Member Id : 3
        Timestamp : 07-May-2004 17:30:17
        Host IP address : 10.10.5.170
        Cluster IP address: 10.10.5.151
        Host Name : dalunix170.clubcorp.com
        Cluster Name : dalunixcl
        User Name : root
        Format : CAAD[1573780]: RTD #0: Action Script
                            /var/cluster/caa/script/swcc.scr(check) timed out!

                            (timeout=60)
        Reference : cat:evmexp.cat:200

    Variable Items:
        None

    ======================================================================


  • Next message: arup_at_intercode.com.au: "Software"