ADDITIONAL SUMMARY: Problems with 5.1B PK4 causing machine to hang

From: Chad W Baker (Chad_W_Baker_at_raytheon.com)
Date: 02/11/05

  • Next message: Tom Linden: "Enabling ftp access"
    Date: Fri, 11 Feb 2005 16:05:51 -0500
    To: tru64-unix-managers@ornl.gov
    
    

    Hello,
    A little more investigation turned up a problem with Clearcase v5.0. There
    is a problem that could cause a potential system hang in the kernel virtual
    memory when running Clearcase V5.0 MVFS. The IBM defect number is
    RATLC00728925. The fix is included in patch 41. After installing this, I
    was able to build software in Clearcase without the machine hanging up.

    Chad

    ----- Forwarded by Chad W Baker/RES/Raytheon/US on 02/11/2005 03:59 PM
    -----
                                                                               
                 Chad W Baker
                 <Chad_W_Baker@ray
                 theon.com> To
                 Sent by: tru64-unix-managers@ornl.gov
                 tru64-unix-manage cc
                 rs-owner@ornl.gov
                                                                       Subject
                                           SUMMARY: Problems with 5.1B PK4
                 02/08/2005 03:25 causing machine to hang
                 PM
                                                                               
                                                                               
                                                                               
                                                                               
                                                                               

    Hello Managers,

    Here is the response from I received from Dr. Thomas Blinn - the fix looks
    like it's scheduled for the patch kit release. Original question follows.

    "There was a change in V5.1B PK4 (aka BL25) that can cause certain
    programs to get an "interrupt" return status with no evidence of
    an abnormal condition otherwise. There is not yet a patch that
    undoes this for PK4 in general, but it may be the root cause of
    your problem. It will be changed back to the pre-PK4 behavior
    in the next patch kit, but that will be a while yet. We have seen
    the change break some
    of the shell scripts on SierraCluster systems, and we've seen
    it break the ladebug debugger, I would not be surprised if it
    also breaks, for instance, Clearcase; it may be causing some of
    the "wait" system calls to return unexpected status for child
    processes that are exiting, and I suspect it's load dependent
    as it involves "race" conditions; it's especially likely that
    it will impact multi-threaded applications more than simple
    classic UNIX applications, and I would not be surprised if at
    least part of the Clearcase tool suite is multi-threaded."

    Chad

    #######################################################################

    Hello Managers,

    I have a collection of Alpha machines - ES40s, AS4100s, and XP1000s - all
    running Tru64 5.1B. Before last Friday, there was a mix of patch kit
    versions from 2 to 4. Last Friday, I upgraded our build and development
    environment from PK3 to PK4. Once I did that, we were no longer able to
    build any software, the servers and workstations got into a loop and never
    came out. From what I can tell, there was no specific piece of code being
    built that was causing the problem (the same code build successfully on PK2
    and 3). I also do not think it's the same process each time that hangs it
    up, either, but I can't be sure.

    I've installed PK4 over PK3 and also installed PK4 on a fresh OS install --
    same result. Our config. mgmt software is Clearcase, so all of the makes
    and builds are done through that. As a side note, I was able to run
    configure and make/build gcc without a problem, so I'm guessing it's not
    something in the compiler that's causing this problem.

    Has anyone seen problems similar to this or know what's causing the hang
    ups or how it can be fixed?

    Thanks in advance,
    Chad


  • Next message: Tom Linden: "Enabling ftp access"