Re: Updated rusage patch
- From: Attilio Rao <attilio@xxxxxxxxxxx>
- Date: Fri, 08 Jun 2007 16:10:05 +0200
Jeff Roberson wrote:
On Wed, 6 Jun 2007, Jeff Roberson wrote:
On Tue, 5 Jun 2007, Bruce Evans wrote:
This can probably be fixed more simply by calling rufetch() to reset the
time state in threads as a side effect. Do this before resetting the
state in the process.
Ok, I agree with bde here, just call rufetch and this will clear each thread, and then you can clear the rux in the proc.
I'd like to make a list of the remaining problems with rusage and potential fixes. Then we can decide which ones myself and attilio will resolve immediately to clean up some of the effect of the sched lock changes.
1) The ruadd() in thread_exit() is not safe since we're accessing another thread's unlocked rusage structure. Potential solution is to allocate p_ru as part of the proc struct and add into there, which will be protected by the PROC_SLOCK, which bde seemed to like better anyway.
2) We may lose information between exit1() and thread_exit() due to the way p_ru is initialized before we're done exiting. There also seems to be a race where wait() operates on a process before it's done in thread_exit() which means wait may return rusage information without the child added in! The solution will be to fix this race, and then access p_ru directly in wait().
The patch at http://people.freebsd.org/~jeff/rusage3.diff fixes points 1 and 2 as well as the p_runtime iniitialization problem. This moves the collection of child rusage back into exit1() and changes the exiting threads to accumulate their rusage into p_ru under protection of the process spinlock. This also removes the gross lock/unlock of proc slock (formerly sched_lock) from wait and implements something more sensible.
Jeff
3) There is no locking around rufetch() and calcru(). calcru() may apply new rux values to an old rusage, giving inaccurate results. The solution is to either require the proc slock around both calls, or provide a new routine which does the fetch and calc while grabbing the lock itself.
And this should fix (3):
http://users.gufi.org/~rookie/works/patches/schedlock/rusage2.diff
(and reorders rucollect() declaration sorted by name).
A thought:
Shouldn't we actually remove in calcru() (and rufetchcalc()) the copy to the rux object?
When sched_lock was there it would be useful since it had a lot of contention, but now that the per-proc spinlock is protecting those fields it is useless. And consider that calcru1() has no locking inside (so you won't expect particulary long execution times).
Thanks,
Attilio
_______________________________________________
freebsd-arch@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscribe@xxxxxxxxxxx"
- References:
- Re: Updated rusage patch
- From: Bruce Evans
- Re: Updated rusage patch
- From: Jeff Roberson
- Re: Updated rusage patch
- From: Bruce Evans
- Re: Updated rusage patch
- From: Jeff Roberson
- Re: Updated rusage patch
- From: Bruce Evans
- Re: Updated rusage patch
- From: Attilio Rao
- Re: Updated rusage patch
- From: Bruce Evans
- Re: Updated rusage patch
- From: Jeff Roberson
- Re: Updated rusage patch
- From: Jeff Roberson
- Re: Updated rusage patch
- Prev by Date: Re: Updated rusage patch
- Next by Date: Re: *at family of syscalls in FreeBSD
- Previous by thread: Re: Updated rusage patch
- Next by thread: Re: Updated rusage patch
- Index(es):
Relevant Pages
|
|