TruCluster cluster_lockd errors

From: Paul Reilly (pareilly_at_tcd.ie)
Date: 05/28/03

  • Next message: Marc Schlensog: "Is it possible to install Tru64 onto an NFS-share?"
    Date: Wed, 28 May 2003 21:21:36 +0100 (IST)
    To: tru64-unix-managers@ornl.gov
    
    

    Can anyone shed any light on this:

    Two node Tru64 Unix TruCluster running 5.1A (no patches) with
    memory channel interconnect. Since adding some disks to the attached
    SAN and configuring these into one home filedomain with 26 filesets
    (a-z) the cluster has been very unstable. Either one or both members
    report CAA "cluster_lockd.scr timed out! (timout=60)" errors on the
    console. The machines become completely unresponsive, and the only way to
    get things back to normal is to shutdown each member of the cluster, and
    restart them. Things are then ok for somewhere between 30 mins and 5 days
    after which the problem recurrs. Another message we're getting is this:

    ics_handle_get[low_mem]:th0xffffc003fd7d180

    Compaq/HP support have said the first thing to do is bring it up to
    patch kit 4 & if we still have problems then it could be the memory
    channel interconnect.

    We have scheduled downtime (non-roll patch) thursday night to
    bring it up to 5.1A PK4. But that is still a day away, and the machine
    is barely usable. Does anyone have any insight in to what might be
    causing this behaviour? It happens even if we run the cluster with just
    one node up.

    Please reply to pareilly@alf2.tcd.ie.

    Thanks
    Paul


  • Next message: Marc Schlensog: "Is it possible to install Tru64 onto an NFS-share?"