problems during and after cluster dupatch from 5.1B to 5.1B-2

emanuele.lombardi_at_casaccia.enea.it
Date: 10/22/04

  • Next message: Nagy Ákos: "SUMMARY: DRM between different versions of HSOF"
    Date: Fri, 22 Oct 2004 10:11:55 +0200 (CEST)
    To: tru64-unix-managers@ornl.gov
    
    

    Hi alpha gurus!

    I'm asking for your help about my two-members TruCluster 5.1B
    which I no-roll updated to 5.1B-2

    The update was tricky and at the end I noticed the the Revisions printed
    at boot time on /var/adm/messages of both members are still the same as
    before (while dates change).

       before dupatch
    vmunix: Compaq Tru64 UNIX V5.1B (Rev. 2650); Fri Nov 21 18:50:42 CET 2003
    vmunix: TruCluster Server V5.1B (Rev. 1029); 09/29/03 03:10

      after dupatch:
    vmunix: Compaq Tru64 UNIX V5.1B (Rev. 2650); Wed Oct 20 14:29:42 CEST 2004
    vmunix: TruCluster Server V5.1B (Rev. 1029); 06/16/04 03:56

    Do you think I'm running a properly patched 5.1B-2 cluster or not?

    Furthermore at the end of the update I had setld behaving differetly on
    the two members with reguard with the newly installed kits.
    It is better to write the full story:
     
    I did a no-roll update starting dupatch on member1.
    All went properly up to the
       configuration of patches on member0 and member1
       and to the kernel building of member1 (which also went ok)

    Then dupatch wrote:
      Waiting for all cluster members to complete event operation...
      Members member1 member2 did not respond to the cluster-wide event to perform
           the patch operation. Please check these members.

    It then installed some other patches and finally it wrote:

      Press RETURN to proceed...
      Waiting for all cluster members to complete event operation...
      cp: //etc/evmlogger.conf.orig: No such file or directory
      Waiting for other cluster members to reboot...

    But nothing happened on both members (they were happily working as usual)
    After a long while (a couple of hours) I rebooted member2,
    create its new /vmunix using doconfig and booted it again.
    It started working properly.

    Then I copied on member1 the new vmunix to /vmunix and rebooted it.

    The cluster was working properly, but all the new patches
    (Patch: SP04 ...) were said to be "not installed" on member2 while they
    were "installed" on member1

    Thank to a previous mail read from this list I discovered that in
    in
       /usr/cluster/members/member1/.smdb.
    there were the required *.sts files ("_INSTALLED") while in
       /usr/cluster/members/member2/.smdb.
    they were missing (and thus links in /usr/.smdb. were broken on member2)

    I'm sure it was due to the fact that during dupatch there has been not any
    patch configuration phase on member2.

    Anyway I copied the missing *.sts file
     from /usr/cluster/members/member1/.smdb.
     to /usr/cluster/members/member2/.smdb.
    and now setld says on both members that the patches are installed.

    So, finally, my question for you gurus is:

    Since the cluster is working well, can I suppose everything is fine now?
    Do you think the cluster will have problems related on the above matter?

    Thank you very much from Italy,
    Emanuele Lombardi

    -- 
    $$$ Emanuele Lombardi
    $$$ mail:  ENEA  CLIM  Casaccia
    $$$        I-00060 S.M. di Galeria (RM)  ITALY
    $$$ mailto:emanuele.lombardi@casaccia.enea.it
    $$$ tel	+39 0630483366 
    $$$ fax	+39 0630484264             |||
    $$$                                \|/  ;_;
    $$$ What does a process need        |   /"\
    $$$ to become a daemon ?            |   \v/
    $$$                                 |    | 
    $$$ - a fork                        o---/!\---
    $$$                                 |   |_|
    $$$                                 |  _/ \_
    $$$* Contrary to popular belief, UNIX is user friendly.
    $$$  It's just very particular about who it makes friends with.
    $$$* Computers are not intelligent, but they think they are. 
    $$$* True programmers never die, they just branch to an odd address
    $$$* THIS TRANSMISSION WAS MADE POSSIBLE BY 100% RECYCLED ELECTRONS
    

  • Next message: Nagy Ákos: "SUMMARY: DRM between different versions of HSOF"

    Relevant Pages

    • Re: Clustering: switches reliability/redundancy
      ... If NIC A-1 fails, everything still works. ... Each node has a list of legal subclusters, and the general principle is that the transaction coordinator in the cluster, finds agreement on a "survivable" "largest" subcluster with quorum, and your unlucky nodewill be CLUEXITed. ... The general idea is that the list comprises all members, then fewer and fewer members, and at each coordinating step if there is a match of those few members that can maintain quorum, then reconfiguration completes. ...
      (comp.os.vms)
    • Re: 2-node LAVC cluster with quorum disk - network disappears - which node CLUEXITs ?
      ... > the lower LOCKDIRWT is more likely to CLUEXIT first. ... node will act as the directory manager, in relation to the other members ... It forms a deterministic way of a cluster member working ...
      (comp.os.vms)
    • Re: 2 Node Network Load Balance - Duplicate NetBIOS Name
      ... you would still need some method of synching between ... There is no "split" write that writes to all members of an NLB ... a Network Load Balanced cluster. ...
      (microsoft.public.windows.server.clustering)
    • Problem with login to Cluster
      ... Last unsuccessful login: Fri Jun 17 17:12:05 from ds04 ... Due to i logged in to all members before, ... to shutdown the whole cluster but will it really ... Harald Baumgartner Max-Planck-Inst. ...
      (Tru64-UNIX-Managers)
    • Re: [PATCH 3/3] deinline a few large functions in vlan code - v3
      ... members of struct netdevice if VLAN is not configured. ... there is no way I'm applying these patches. ... ifdefs these patches are adding to the tree. ...
      (Linux-Kernel)