SUMMARY: Removing a cluster alias changed default route?

From: Reed, Judith (jreed_at_navisite.com)
Date: 09/25/03

  • Next message: David Knight: "cluster_lockd.scr(check) timed out - error polling `cluster_lockd` -"
    Date: Thu, 25 Sep 2003 08:13:10 -0400
    To: "Reed, Judith" <jreed@navisite.com>, tru64-unix-managers@ornl.gov
    
    

    I received one response to this - thanks to Martin Ronde Anderson, who
    suggested that we make sure we have the latest patch kit because there
    have been, over time, several corrections in the cluster alias area. I
    also found a reference to patches for cluster alias problems in a few
    web references.

    In the end, however, our problem was due to a brief lapse in network
    connectivity, according to HP phone support. They said that if a cluster
    node's default route NIC goes down for any reason, the system may pick
    up the cluster interconnect as its primary method of communication so it
    can maintain connectivity at all costs - this appears to be what this
    system was doing. And, this bogus default route cannot be removed
    without:
            * stopping gated (/sbin/init.d/gateway stop)
            * stopping aliasd (/sbin/init.d/clu_alias stop)
            * removing default route pointing to cluster interconnect
            * making sure correct default route is established
            * starting aliasd (/sbin/init.d/clu_alias start) which also
                    starts gated
    This worked for us.

    I also learned that while you can remove a cluster alias via sysman
    while the cluster is up, and you can/should tell each node to leave the
    alias with the appropriate cluamgr command, the alias does not actually
    go away (as evidenced by what you see from "cluamgr -s all" until the
    nodes reboot.

    Regards,

    Judith Reed
    jreed@navisite.com
    "They that can give up essential liberty to obtain a little temporary
    safety deserve neither liberty nor safety."
    Benjamin Franklin


  • Next message: David Knight: "cluster_lockd.scr(check) timed out - error polling `cluster_lockd` -"