SUMMARY: Problems with 5.1B PK4 causing machine to hang

From: Chad W Baker (Chad_W_Baker_at_raytheon.com)
Date: 02/08/05

  • Next message: Howard Arnold: "Summary: mount -ro on file system"
    Date: Tue, 08 Feb 2005 15:25:40 -0500
    To: tru64-unix-managers@ornl.gov
    
    

    Hello Managers,

    Here is the response from I received from Dr. Thomas Blinn - the fix looks
    like it's scheduled for the patch kit release. Original question follows.

    "There was a change in V5.1B PK4 (aka BL25) that can cause certain
    programs to get an "interrupt" return status with no evidence of
    an abnormal condition otherwise. There is not yet a patch that
    undoes this for PK4 in general, but it may be the root cause of
    your problem. It will be changed back to the pre-PK4 behavior
    in the next patch kit, but that will be a while yet. We have seen
    the change break some
    of the shell scripts on SierraCluster systems, and we've seen
    it break the ladebug debugger, I would not be surprised if it
    also breaks, for instance, Clearcase; it may be causing some of
    the "wait" system calls to return unexpected status for child
    processes that are exiting, and I suspect it's load dependent
    as it involves "race" conditions; it's especially likely that
    it will impact multi-threaded applications more than simple
    classic UNIX applications, and I would not be surprised if at
    least part of the Clearcase tool suite is multi-threaded."

    Chad

    #######################################################################

    Hello Managers,

    I have a collection of Alpha machines - ES40s, AS4100s, and XP1000s - all
    running Tru64 5.1B. Before last Friday, there was a mix of patch kit
    versions from 2 to 4. Last Friday, I upgraded our build and development
    environment from PK3 to PK4. Once I did that, we were no longer able to
    build any software, the servers and workstations got into a loop and never
    came out. From what I can tell, there was no specific piece of code being
    built that was causing the problem (the same code build successfully on PK2
    and 3). I also do not think it's the same process each time that hangs it
    up, either, but I can't be sure.

    I've installed PK4 over PK3 and also installed PK4 on a fresh OS install --
    same result. Our config. mgmt software is Clearcase, so all of the makes
    and builds are done through that. As a side note, I was able to run
    configure and make/build gcc without a problem, so I'm guessing it's not
    something in the compiler that's causing this problem.

    Has anyone seen problems similar to this or know what's causing the hang
    ups or how it can be fixed?

    Thanks in advance,
    Chad


  • Next message: Howard Arnold: "Summary: mount -ro on file system"