Re: getaffinity/setaffinity and cpu sets.
- From: Brooks Davis <brooks@xxxxxxxxxxx>
- Date: Sat, 23 Feb 2008 15:35:07 -0600
On Sat, Feb 23, 2008 at 11:21:33AM -1000, Jeff Roberson wrote:
On Sat, 23 Feb 2008, Brooks Davis wrote:
On Fri, Feb 22, 2008 at 01:52:54PM -1000, Jeff Roberson wrote:
On Fri, 22 Feb 2008, Brooks Davis wrote:
On Fri, Feb 22, 2008 at 12:34:13PM -1000, Jeff Roberson wrote:
On Thu, 21 Feb 2008, Robert Watson wrote:
On Wed, 20 Feb 2008, Jeff Roberson wrote:
- It would be nice to be able to use CPU sets in jail as well,
suggesting
a
hierarchal model with some sort of tagging so you know what CPU sets
were
created in a jail such that you know whether they can be changed in a
jail.
While I recognize this makes things a lot more tricky, I think we
should
basically be planning more carefully with respect to virtualization
when
we
add new interfaces, since it's a widely used feature, and the current
set
of
"stragglers" unsupported in Jail is growing rather than shrinking.
I have implemented a hierarchical model. Each thread has a pointer to
the
cpuset that it's in. If it makes a local modification via
setaffinity()
it
gets an anonymous cpuset that is a child of the set assigned to the
process. This anonymous set will also be inherited across fork/thread
creation.
In this model presently there are nodes marked as root. To query the
'system' cpus available we walk up from the current node until we find
a
root. These are the 'system' set. A thread may not break out of its
system set. A process may join the root set but it may not modify a
root
that is a parent. Jails would create a new root. A process outside of
the
jail can modify the set of processors in the jail but a process within
the
jail/root may not.
The next level down from the root is the assigned set. The root may be
an
assigned set or this may be a subset of the root. Processes may create
sets which are parented back to their root and may include any
processors
within their root. The mask of the assigned set is returned as
'available'
processors.
This gives a 1 to 3 level hierarchy. The root, an assigned set, and an
anonymous set. Any of these but the root may be omitted. There is no
current way for userland to create subsets of assigned sets to permit
further nesting. I'm not sure I see value in it right now and it gives
the
possibility of unbound tree depth.
Anonymous sets are immutable as they are shared and changes only apply
to
the thread/pid in the WHICH argument and not others which have
inherited
from it. Anonymous sets have no id and may not be specifically
manipulated
via a setid. You must refer to the process/thread. From the
administration point of view they don't exist.
When a set is modified we walk down the children recursively and apply
the
new mask. This is done with a global set lock under which all
modifications and tree operations are performed. The td_cpuset pointer
is
protected under the thread_lock() and may read the set without a lock.
This
gives the possibility for certain kinds of races but I believe they are
all
safe.
Hopefully I explained that well enough for people to follow. I realize
it's a lot of text but it's fairly simple book keeping code. This is
all
implemented and I'm debugging now.
One place I'd like to implement CPU affinity is in the Sun Grid Engine
execution daemon. I think anonymous set would not be sufficent there
because the model allows new tasks to be started on a particular node at
any time during a parallel job. I'd have to do some more digging in the
code to be entierly certain. I think the less limits we place on the
hierarchy, the better off we'll be unless there are compeling complexity
reasons to avoid them.
With the anonymous set you can bind any thread to any cpu that is visible
to it. How would this not work?
I'm still trying to wrap my head around the anonymous sets. Is the idea
that once you are in an anonymous set, you can't expand it, or can you
expand out as far as the assigned set? I'd like for parallel jobs to
be allocated a set of cpus that they can't change, but still be able
to make their own decisions about thread affinity if they desire (for
example OpenMPI has some support for this so processes stay put and in
theory benefit from positive cache effects). If that's feasible in
this model, I'm happy ok it. I think we should keep in mind that these
SGE execution daemons might be sitting inside jails. ;-)
Ah, when I said the anonymous sets were immutable, that only means that
they are copy-on-write. Because you can't know who shares a copy via fork
or thread creation you must make a new set each time you write.
I made the anonymous sets so that the parent would have a list of all
derivative children sets so that modifications to the parent would be
reflected in the child. This also means that the scheduler only has to
look at one bitmap to determine the available cpus for a thread.
I think the anonymous sets seem like a good idea. On solution to my
problem might be to make changing your current set to be something that
is not a subset of your parent (or maybe your current set?) is privileged.
-- Brooks
Attachment:
pgpTcn2F7CWR8.pgp
Description: PGP signature
- Follow-Ups:
- cpuset and affinity implementation
- From: Jeff Roberson
- cpuset and affinity implementation
- References:
- Re: Linux compatible setaffinity.
- From: Robert Watson
- Re: Linux compatible setaffinity.
- From: David Xu
- Re: Linux compatible setaffinity.
- From: Jeff Roberson
- getaffinity/setaffinity and cpu sets.
- From: Jeff Roberson
- Re: getaffinity/setaffinity and cpu sets.
- From: Robert Watson
- Re: getaffinity/setaffinity and cpu sets.
- From: Jeff Roberson
- Re: getaffinity/setaffinity and cpu sets.
- From: Brooks Davis
- Re: getaffinity/setaffinity and cpu sets.
- From: Jeff Roberson
- Re: getaffinity/setaffinity and cpu sets.
- From: Brooks Davis
- Re: getaffinity/setaffinity and cpu sets.
- From: Jeff Roberson
- Re: Linux compatible setaffinity.
- Prev by Date: Re: getaffinity/setaffinity and cpu sets.
- Next by Date: cpuset and affinity implementation
- Previous by thread: Re: getaffinity/setaffinity and cpu sets.
- Next by thread: cpuset and affinity implementation
- Index(es):
Relevant Pages
|