Re: getaffinity/setaffinity and cpu sets.
- From: Brooks Davis <brooks@xxxxxxxxxxx>
- Date: Sat, 23 Feb 2008 13:40:47 -0600
On Fri, Feb 22, 2008 at 01:52:54PM -1000, Jeff Roberson wrote:
On Fri, 22 Feb 2008, Brooks Davis wrote:
On Fri, Feb 22, 2008 at 12:34:13PM -1000, Jeff Roberson wrote:
On Thu, 21 Feb 2008, Robert Watson wrote:
On Wed, 20 Feb 2008, Jeff Roberson wrote:
- It would be nice to be able to use CPU sets in jail as well,
suggesting
a
hierarchal model with some sort of tagging so you know what CPU sets
were
created in a jail such that you know whether they can be changed in a
jail.
While I recognize this makes things a lot more tricky, I think we
should
basically be planning more carefully with respect to virtualization
when
we
add new interfaces, since it's a widely used feature, and the current
set
of
"stragglers" unsupported in Jail is growing rather than shrinking.
I have implemented a hierarchical model. Each thread has a pointer to
the
cpuset that it's in. If it makes a local modification via setaffinity()
it
gets an anonymous cpuset that is a child of the set assigned to the
process. This anonymous set will also be inherited across fork/thread
creation.
In this model presently there are nodes marked as root. To query the
'system' cpus available we walk up from the current node until we find a
root. These are the 'system' set. A thread may not break out of its
system set. A process may join the root set but it may not modify a root
that is a parent. Jails would create a new root. A process outside of
the
jail can modify the set of processors in the jail but a process within
the
jail/root may not.
The next level down from the root is the assigned set. The root may be
an
assigned set or this may be a subset of the root. Processes may create
sets which are parented back to their root and may include any processors
within their root. The mask of the assigned set is returned as
'available'
processors.
This gives a 1 to 3 level hierarchy. The root, an assigned set, and an
anonymous set. Any of these but the root may be omitted. There is no
current way for userland to create subsets of assigned sets to permit
further nesting. I'm not sure I see value in it right now and it gives
the
possibility of unbound tree depth.
Anonymous sets are immutable as they are shared and changes only apply to
the thread/pid in the WHICH argument and not others which have inherited
from it. Anonymous sets have no id and may not be specifically
manipulated
via a setid. You must refer to the process/thread. From the
administration point of view they don't exist.
When a set is modified we walk down the children recursively and apply
the
new mask. This is done with a global set lock under which all
modifications and tree operations are performed. The td_cpuset pointer
is
protected under the thread_lock() and may read the set without a lock.
This
gives the possibility for certain kinds of races but I believe they are
all
safe.
Hopefully I explained that well enough for people to follow. I realize
it's a lot of text but it's fairly simple book keeping code. This is all
implemented and I'm debugging now.
One place I'd like to implement CPU affinity is in the Sun Grid Engine
execution daemon. I think anonymous set would not be sufficent there
because the model allows new tasks to be started on a particular node at
any time during a parallel job. I'd have to do some more digging in the
code to be entierly certain. I think the less limits we place on the
hierarchy, the better off we'll be unless there are compeling complexity
reasons to avoid them.
With the anonymous set you can bind any thread to any cpu that is visible
to it. How would this not work?
I'm still trying to wrap my head around the anonymous sets. Is the idea
that once you are in an anonymous set, you can't expand it, or can you
expand out as far as the assigned set? I'd like for parallel jobs to
be allocated a set of cpus that they can't change, but still be able
to make their own decisions about thread affinity if they desire (for
example OpenMPI has some support for this so processes stay put and in
theory benefit from positive cache effects). If that's feasible in
this model, I'm happy ok it. I think we should keep in mind that these
SGE execution daemons might be sitting inside jails. ;-)
- There's still no way to specify an affinity policy rather than
explicit
affinity, but if our CPU set model is sufficiently general, that might
be
a
vehicle to do that. I.e., cpuset_setpolicy() rather than setting a
mask.
Yes, I think this is orthogonal and can be addressed seperately. I'm not
sure how many userland programs are smart enough or even capable of
making
determinations about their cache behavior however. We should open
another
discussion once this one is done.
- In the interests of boring API changes, recent APIs tend to prefix the
method on the object name. Have you thought about cpuset_create(),
cpuset_foo(), etc? That reduces the chances of interfering with
application
namespaces. I think, anyway. :-).
Yes, I prefer that as well, as I mentioned syscalls tended to favor
brevity. I'm fine with changing that trend.
I need to ponder the proposal a little more, ideally over a hot beverage
this morning, and will follow up if I have further thoughts. Thanks for
working on this, BTW -- affinity is well-overdue for FreeBSD.
A little more to ponder now! Your feedback is much appreciated.
I believe the present hierarchical model satisfies the jail requirements
of
restricting cpus in the jail while still allowing the jail to create
sets.
The unanswered questions are:
1) What to do about sets that strand threads, options described above.
2) Are people ok with the transient nature of sets?
3) Does anyone want to help with man pages, administrative tools, etc?
I
have a prototype tool called 'cpuset' that fully exercises the api but is
probably ugly. Will post details soon.
I could help with some of this as it furthers a funded project at work.
I will provide patches soon. It would be great to have a developer with a
users perspective to look at some of the details and especially the
administration side of things. I think someone else has offered to help
with man pages but I need to double check.
Cool. If you can get some basics out by late Sunday afternoon (CST) I
should be able to look at it and think about it on the plane Monday.
-- Brooks
Attachment:
pgp5CSZBcgA3X.pgp
Description: PGP signature
- Follow-Ups:
- Re: getaffinity/setaffinity and cpu sets.
- From: Jeff Roberson
- Re: getaffinity/setaffinity and cpu sets.
- References:
- Re: Linux compatible setaffinity.
- From: Robert Watson
- Re: Linux compatible setaffinity.
- From: Jeff Roberson
- Re: Linux compatible setaffinity.
- From: Robert Watson
- Re: Linux compatible setaffinity.
- From: David Xu
- Re: Linux compatible setaffinity.
- From: Jeff Roberson
- getaffinity/setaffinity and cpu sets.
- From: Jeff Roberson
- Re: getaffinity/setaffinity and cpu sets.
- From: Robert Watson
- Re: getaffinity/setaffinity and cpu sets.
- From: Jeff Roberson
- Re: getaffinity/setaffinity and cpu sets.
- From: Brooks Davis
- Re: getaffinity/setaffinity and cpu sets.
- From: Jeff Roberson
- Re: Linux compatible setaffinity.
- Prev by Date: Re: getaffinity/setaffinity and cpu sets.
- Next by Date: Re: getaffinity/setaffinity and cpu sets.
- Previous by thread: Re: getaffinity/setaffinity and cpu sets.
- Next by thread: Re: getaffinity/setaffinity and cpu sets.
- Index(es):
Relevant Pages
|
|