Abnormal process kill.

From: Neil (nlombardospamtrap_at_rosbif.org)
Date: 05/09/03


Date: Fri, 9 May 2003 10:07:11 +0200

Hi there,

I thought I'd post this, as frankly, I'm stumped. I have no idea and no
clue.

One of my customers is running an HPUX system on 11.00 and has a problem
in which processes die for no reason. The machine is also up to date
with all HPUX patches.

There are no core files, no log record and no specific user-id (i.e. the
process termination appears to be random).

Generally speaking, application processes do not simply die without
reason. My customer thinks that this may be performance related, but as
there are no logs or any other info, I'm doubtful.

Questions and answers that I have already checked out are below:

Q. Conditions such as memory depletion, where the process needs more
memory, but memory reservation or swap-space is not available.

A. This doesn't appear to be a problem, as the machine is not working at
full capacity.

Q. Violating per-process resource limit(s) like cpu-time limit.

A. This may be but, this problem also occurred on very calm days.

Q. Upon such condition, the operating system will signal this to the
process, and the process may act on this thru use of its
signal-handlers. In case that no signal-handler was setup by the
process, default action will be taken.

A. All the processes have signal-handler routines, which handle SIGKILL
type signals. In any case, all the processes do normal termination by
sending SIGKILL signals with kill <process_id>. (Authorized operators
are handling the terminations on main programs with menu options. Main
programs terminate their subprograms automatically.) For our problem
case, they seem to be terminated as if, by kill -9. kill -9 command
generates SIGABRT signal and as this signal is an "operating system
level signal", it couldn.t be handled in programs.

Q. But in general, when running program/scripts from command line, the
executing shell will receive a notification of a failed process.

A. All of our application programs run via the shell scripts running
inside the main programs in the background mode with nohup process_name
&.

During startup of our main programs, they push themselves into the
background mode with setpgrp() and fork() commands, just after
completing their initial controls.

Q. Such "core" files may *not* be created if the process' current
directory cannot be written, or if the application is running with
set-uid/set-gid bits (and the real user is different from the file
owner).

A. set-gid {setpgrp()} is only used in our 4 broadcasting programs and
these broadcasting programs do nothing during broadcast, rather,
subprograms do all the job. Such a problem hasn.t been encountered on
these programs, yet. Also they have the necessary rights on "current
directory".

But still, the processes could be altered to handle some of above
signals (SIGBUS, SIGSEGV, SIGXCPU). (Also it should be considered that
there are at about 60 processes subject to alteration).

Q. which process(es) is/are using much CPU? And what is the relation of
this with the unexpected termination of processes?

A. If we could find a relation of this with unexpected termination, we
should interfere in the problem with certain methods like separating the
functions of the processes or utilizing the function.

Q. under what user-id are (were) the affected processes running?

A. Operators run the broadcasting programs with the aid of built-in
menu.

Q. is there any application log(s) that provides information on process
termination?

A. Majority of our programs record their stop time into their individual
log files. But the programs subject to the process kill problem could
not record their stop time into log the file, they just die before.

Q. is there any "core" file generated (if no indication of core files:
is there any "core" file anywhere on the system)?

A. No, this has not been seen.

Q. are there any messages anywhere when a process terminates
unexpectedly?

A. They generate no message while dying.

Anyone seen this sort of problem before? Any ideas?

TIA,

Neil



Relevant Pages

  • Re: Abnormal process kill.
    ... I've no experience with HP-UX but generally speaking core files can ... things like signals and exit conditions. ... all the processes do normal termination by ... > generates SIGABRT signal and as this signal is an "operating system ...
    (comp.unix.admin)
  • Re: Design Questions on Termination
    ... not need signals at all for my implementation. ... - it is used to request a config reload request. ... get a termination request, how do I gracefully shutdown that thread? ... replace the selectwith any other blocking operation (e.g. openssl ...
    (comp.programming.threads)
  • Re: IEEE-1284 problem
    ... I have been making a motion control board that communicates ... It works fine using just any Dell desktop's on-motherboard parallel port. ... I have looked at the signals extensively with both a scope and a logic analyzer, and can't find any significant differences. ... Termination matters, but things often work when only one side is terminated; it may be that your board isn't terminated, the motherboard is, and your add-in board isn't. ...
    (sci.electronics.design)
  • Re: M2N32 WS professional and 4x1GB: Solved?
    ... same timing and eeprom settings), and this worked,confirming that the ... "CPU on die termination" was set to 300R, which is ok for 1 module, ... as high as the termination allows, just delays the signals too ...
    (alt.comp.periphs.mainboard.asus)
  • M2N32 WS professional and 4x1GB: Solved?
    ... same timing and eeprom settings), and this worked,confirming that the ... "CPU on die termination" was set to 300R, which is ok for 1 module, ... as high as the termination allows, just delays the signals too ...
    (alt.comp.periphs.mainboard.asus)

Quantcast