fork() race in SIGCHLD handler

From: Chris Vine (chris_at_cvine--nospam--.freeserve.co.uk)
Date: 12/30/03


Date: Tue, 30 Dec 2003 14:13:15 +0000

Hi,

In tracking down some unexpected behaviour in a program of mine, I have
noticed that the test code below exhibits a timing race on fork() in Linux
kernel 2.6.0. If the test code below illustrating the problem is compiled
and run, the first output is "child_pid is less than zero", all subsequent
lines being the expected "child_pid is greater than zero". (I get
"child_pid is greater than zero" at all times with kernel 2.4.23).

What is clearly happening is that by the time the child process has exited
and the SIGCHLD handler is executing in the parent process, the parent
process has not had its copy of the static child_pid variable modified by
having the return value of the fork() call assigned to it in its own
address space. The parent process after fork()ing is beginning after the
child process has already terminated.

At first I thought this may be a kernel or glibc bug, but on reflection I am
not so sure - I do not think the asynchronous SIGCHLD handler in the parent
process is required to wait around for the process in which it is executing
to begin executing synchronously after fork()ing. I can get around the
difficulty by putting a flag in the first line of the parent process after
the fork(), on which the SIGCHLD handler can block, but does POSIX say
anything about this? (Apologies for posting this to comp.unix.programmer
as well as comp.os.linux.development.system, but the comp.unix.programmer
regulars seem to know a lot about POSIX).

It is interesting though that the assignment of -1 to child_pid in the last
line of the signal handler does not cause subsequent calls to the handler
to exhibit the "unexpected" behaviour. Probably on all iterations after
the first the address space of the parent process has been set up so that
copies-on-write to its copy of child_pid take place quickly.

This is with glibc 2.3.2, and the compiler is gcc-3.2.3.

Chris.

//////////////////// test code ///////////////////

#include <signal.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdlib.h>
#include <time.h>
 
pid_t child_pid = -1;
 
void childexit_signalhandler(int sig) {
 
  char message_less[] = "child_pid is less than zero\n";
  char message_greater[] = "child_pid is greater than zero\n";
 
  waitpid(-1, 0, WNOHANG); /* eliminate zombies */
 
  if (child_pid < 0) write(0, message_less,
                           sizeof(message_less));
  else if (child_pid > 0) write(0, message_greater,
                                sizeof(message_greater));
 
  child_pid = -1;
}
 
int main(void) {
 
  struct sigaction sig_act_chld;
  int index;
  struct timespec delay;
  struct timespec residual_delay;
 
  sig_act_chld.sa_handler = childexit_signalhandler;
  sigemptyset(&sig_act_chld.sa_mask);
  sig_act_chld.sa_flags = 0;
  sigaction(SIGCHLD, &sig_act_chld, 0);
 
  for (;;) {
    child_pid = fork();
    if (child_pid > 0) { /* parent process */
      /* set up a 1s delay */
      delay.tv_sec = 1;
      delay.tv_nsec = 0;
      while (nanosleep(&delay, &residual_delay) == -1)
        delay = residual_delay;
    }
    if (!child_pid) { /* child process */
      /* do something meaningless */
      for (index = 0; index < 100; index++);
      _exit(0);
    }
  }
  return 0;
}

-- 
To reply by e-mail, remove the "--nospam--" in the address


Relevant Pages

  • fork() race in SIGCHLD handler
    ... noticed that the test code below exhibits a timing race on fork() in Linux ... The parent process after forking is beginning after the ... not so sure - I do not think the asynchronous SIGCHLD handler in the parent ...
    (comp.os.linux.development.system)
  • Re: weird fork problem
    ... Searching in ulimit, the only limitation that I may be breaking is the ... cannot call fork from my program. ... errno: Cannot allocate memory ... that the parent process can spawn change if you are logged in as root ...
    (comp.unix.programmer)
  • Re: Concurrent usage of glibc 2.2.4 and 2.3.2
    ... child process, which execve's /bin/ls (that's how Communicator 4.8 ... Before the fork, the parent process ... Here the LD_LIBRARY_PATH points to the main glibc libraries, ...
    (comp.os.linux.misc)
  • sleep/fork/shell/SIGCHLD interaction problem
    ... Fork a child. ... and I have a SIGCHLD handler which will catch my forked process if it exits. ... print "leave SIGCHLD for pid $pid\n"; ... The handler then does a 'wait' to get the PID and hangs because there isn't a child that's exited. ...
    (comp.lang.perl.misc)
  • Re: fork() race in SIGCHLD handler
    ... process to execute after fork twice in the 2.6.0 series. ... > not so sure - I do not think the asynchronous SIGCHLD handler in the parent ... > process is required to wait around for the process in which it is executing ... > difficulty by putting a flag in the first line of the parent process after ...
    (comp.os.linux.development.system)