Patch for review: resolve a race condition in [sg]etpriority()



Hi,

Here is a patch which resolves a race condition in [sg]etpriority(), but
I think there might be some better idea to resolve the problem, so I
would appreciate if someone would shed me some light :-)

Description of the problem:

In PRIO_USER of both [sg]etpriority(), p_ucred is expected to be a valid
reference to a valid user credential. However, for PRS_NEW processes,
there is chances where it is not properly initialized, causing a fatal
trap 12 [1].

On the other hand, just determining whether a process is in PRS_NEW
state and skip those who are newly born is not enough for semantical
correctness. A process could either have p_{start,end}copy section
copied or not, which needs to be handled with care.

The attached solution:

What I did in the attached patch is basically:

- Before allproc_lock is sx_xunlock'ed in fork1(), lock both the parent
and the child.
- After releasing allproc_lock, do process p_{start,end}copy,
p_{start,end}zero and crhold() immediately, and release parent and
child's p_mtx as soon as possible.
- In getpriority(), simply skip PRS_NEW processes because they does not
affect PRIO_USER path's result. This part is not really necessary for
correctness, though.

The pros for this is that it does not require an extensive API change,
and in theory it does not require that the allproc lock to be held for a
extended period; the cons is that this adds some overhead because it
adds two pairs of mutex lock/unlock.

In a private discussion, there was also some other ideas. One is to
just move the unlock to where the process is completely initialized,
another is to introduce an event handler and msleep() p_state when
necessary, and do a wakeup once we bumped the state, but I think these
solution have their own limitations just like mine.

[1] Reproducing script can be obtained from kern/108071.

Cheers,
--
Xin LI <delphij@xxxxxxxxxxx> http://www.delphij.net/
FreeBSD - The Power to Serve!
Index: kern_fork.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/kern_fork.c,v
retrieving revision 1.266
diff -u -p -r1.266 kern_fork.c
--- kern_fork.c 23 Jan 2007 08:46:50 -0000 1.266
+++ kern_fork.c 24 Jan 2007 08:35:31 -0000
@@ -423,8 +423,22 @@ again:
AUDIT_ARG(pid, p2->p_pid);
LIST_INSERT_HEAD(&allproc, p2, p_list);
LIST_INSERT_HEAD(PIDHASH(p2->p_pid), p2, p_hash);
+
+ PROC_LOCK(p2);
+ PROC_LOCK(p1);
+
sx_xunlock(&allproc_lock);

+ bcopy(&p1->p_startcopy, &p2->p_startcopy,
+ __rangeof(struct proc, p_startcopy, p_endcopy));
+ PROC_UNLOCK(p1);
+
+ bzero(&p2->p_startzero,
+ __rangeof(struct proc, p_startzero, p_endzero));
+
+ p2->p_ucred = crhold(td->td_ucred);
+ PROC_UNLOCK(p2);
+
/*
* Malloc things while we don't hold any locks.
*/
@@ -482,13 +496,9 @@ again:
PROC_LOCK(p2);
PROC_LOCK(p1);

- bzero(&p2->p_startzero,
- __rangeof(struct proc, p_startzero, p_endzero));
bzero(&td2->td_startzero,
__rangeof(struct thread, td_startzero, td_endzero));

- bcopy(&p1->p_startcopy, &p2->p_startcopy,
- __rangeof(struct proc, p_startcopy, p_endcopy));
bcopy(&td->td_startcopy, &td2->td_startcopy,
__rangeof(struct thread, td_startcopy, td_endcopy));

@@ -511,7 +521,6 @@ again:
sched_fork(td, td2);

mtx_unlock_spin(&sched_lock);
- p2->p_ucred = crhold(td->td_ucred);
td2->td_ucred = crhold(p2->p_ucred);
#ifdef AUDIT
audit_proc_fork(p1, p2);
Index: kern_resource.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/kern_resource.c,v
retrieving revision 1.165
diff -u -p -r1.165 kern_resource.c
--- kern_resource.c 17 Jan 2007 14:58:53 -0000 1.165
+++ kern_resource.c 24 Jan 2007 02:06:13 -0000
@@ -143,6 +143,9 @@ getpriority(td, uap)
uap->who = td->td_ucred->cr_uid;
sx_slock(&allproc_lock);
FOREACH_PROC_IN_SYSTEM(p) {
+ /* Do not bother to check PRS_NEW processes */
+ if (p->p_state == PRS_NEW)
+ continue;
PROC_LOCK(p);
if (!p_cansee(td, p) &&
p->p_ucred->cr_uid == uap->who) {

Attachment: signature.asc
Description: OpenPGP digital signature



Relevant Pages

  • Re: [BETA7-panic] sodealloc(): so_count 1
    ... is this patch is something I can feel ... > to both the accept lock and the socket lock give up their pcb from ... > parent (listening socket) before never touching it again. ... > retrieving revision 1.133 ...
    (freebsd-current)
  • Re: [PATCH -mm v4 6/9] atmel_serial: Split the interrupt handler
    ... existed of a set of patches suitable ... plus an additional patch for Preempt-RT only. ... the splitup of the interrupt handler was also needed for ... * uart_start, which takes the lock. ...
    (Linux-Kernel)
  • Re: [PATCH 0/5] ppc RT: Realtime preempt support for PPC
    ... checks the value of lock is assembly and treats lock as signed. ... Thus it is also OK that you left this chunk out of your patch. ... Patch #4 went in via the PPC tree. ...
    (Linux-Kernel)
  • Re: Giantless VFS.
    ... On Sat, 2004-11-20 at 00:24 -0500, Jeff Roberson wrote: ... > I have a patch that I would like people to test and review. ... > This patch removes Giant from the read, write, and fstatsyscalls, ... > number of lock operations for the common cases which resulted in a couple ...
    (freebsd-current)
  • Re: [patch 6/8] 2.6.22-rc3 perfmon2 : IBS implementation for AMD64
    ... pls. see 2nd patch version. ... 'count' may only be incremented if the PMD is used. ... It seems this function is only called if the lock is set, ... Operating System Research Center ...
    (Linux-Kernel)