Re: getaffinity/setaffinity and cpu sets.
- From: Jeff Roberson <jroberson@xxxxxxxxxxxxxx>
- Date: Fri, 22 Feb 2008 13:52:54 -1000 (HST)
On Fri, 22 Feb 2008, Brooks Davis wrote:
On Fri, Feb 22, 2008 at 12:34:13PM -1000, Jeff Roberson wrote:
On Thu, 21 Feb 2008, Robert Watson wrote:
On Wed, 20 Feb 2008, Jeff Roberson wrote:
I also have a 'cpuset' command which can run a new program with a given
cpu set, view and modify sets of arbitrary pids. This is all working and
I can supply patches if anyone is interested. I have to implement 4BSD
support before I can commit.
I have a proposal for solaris style processor sets which I think is
simple and sufficient for most cases. It involves the following new
syscalls:
int cpuset(void); int setcpuset(pid_t pid, int setid); int
getcpuset(pid_t pid);
The notion would be that you can create a new numbered cpuset with
cpuset(). You can modify or inspect its affinity with get/setaffinity
above and the CPU_WHICH_SET argument. The cpuset exists as long as there
are members of the set. Sort of like a process group or session. The
{get,set}cpuset calls can inspect or modify the state.
This set would not be modifiable by user processes or by processes in a
jail. It would create the restriction that differs between 'avail' and
'sys' above. Processors would be able to directly bind to any processor
within the set. Changing the set would apply to all processes in the set.
The cpuset would be per-process while the mask is per-thread. Sets
involvement is inherited on fork().
In solaris sets can be named and have a more complete management api.
I'm not really interested in implementing all of that but I believe what
I have outlined here would be subset of this and no code/syscalls would
be wasted.
Comments? Objections? I'm fairly pleased with this arrangement now.
Just to put a few notes from our conversation on IRC in e-mail:
- I think I'd prefer int cpuset(cpuset_t *set), int getcpuset(pid_t,
cpuset_t
*) so that we don't mix up ID's and return values. More recent
interfaces
tend to do this, I believe, and it means that the prototype, even if not
the
ABI, remains the same if the set identifier changes in the future.
Ok, this is a good suggestion and I did this. This is actually my
preferred method as well but most syscalls don't follow this pattern and I
was trying to make it look syscallish.
- You don't mention what happens if a process's cpu set changes to
preclude a
CPU the process has a thread with affinity for. Online, you suggested
SIGKILL, and I thought maybe a new SIGCPUGONE with a default SIGKILL
action
might be a friendlier model. We should see what Solaris and others do
here
though. I like the idea that the affinity is a guarantee in userspace
because it means that you can rely on it; I'm OK with the idea that your
thread always runs on the CPUs you have affinity for unless in the
SIGCPUGONE handler :-).
I could also reject changes to the cpuset if they leave a thread with
nothing to run on. It might be confusing for the administrator and hard to
tell them which thread caused the problem. However, it might be nicer than
killing a thread as well.
Another option would be to expel the offending thread from the set that is
in violation and reparent it to the real system root along with a syslog
message or similar. If the administrator addressed the problem with the
set he could then reassign the grouping.
This is what I would most like comments about. Should we have a force
mode? Which of these behaviors sound best to you?
It seems to me that refusing by default and reparenting when forced sound righ
to me. There migth also be some value in adding the ability to signal all
processes/threads bound to a cpu set so you can kill them if that's what you
want to do.
This is where I'm leaning as well. The refuse/force. the cpuset_signal() would have to walk all processes to determine which processes belong to that set however. There are no back pointers between threads and sets. Still, that's not to terrible given that it would be very infrequent.
- It would be nice to be able to use CPU sets in jail as well, suggesting
a
hierarchal model with some sort of tagging so you know what CPU sets were
created in a jail such that you know whether they can be changed in a
jail.
While I recognize this makes things a lot more tricky, I think we should
basically be planning more carefully with respect to virtualization when
we
add new interfaces, since it's a widely used feature, and the current set
of
"stragglers" unsupported in Jail is growing rather than shrinking.
I have implemented a hierarchical model. Each thread has a pointer to the
cpuset that it's in. If it makes a local modification via setaffinity() it
gets an anonymous cpuset that is a child of the set assigned to the
process. This anonymous set will also be inherited across fork/thread
creation.
In this model presently there are nodes marked as root. To query the
'system' cpus available we walk up from the current node until we find a
root. These are the 'system' set. A thread may not break out of its
system set. A process may join the root set but it may not modify a root
that is a parent. Jails would create a new root. A process outside of the
jail can modify the set of processors in the jail but a process within the
jail/root may not.
The next level down from the root is the assigned set. The root may be an
assigned set or this may be a subset of the root. Processes may create
sets which are parented back to their root and may include any processors
within their root. The mask of the assigned set is returned as 'available'
processors.
This gives a 1 to 3 level hierarchy. The root, an assigned set, and an
anonymous set. Any of these but the root may be omitted. There is no
current way for userland to create subsets of assigned sets to permit
further nesting. I'm not sure I see value in it right now and it gives the
possibility of unbound tree depth.
Anonymous sets are immutable as they are shared and changes only apply to
the thread/pid in the WHICH argument and not others which have inherited
from it. Anonymous sets have no id and may not be specifically manipulated
via a setid. You must refer to the process/thread. From the
administration point of view they don't exist.
When a set is modified we walk down the children recursively and apply the
new mask. This is done with a global set lock under which all
modifications and tree operations are performed. The td_cpuset pointer is
protected under the thread_lock() and may read the set without a lock. This
gives the possibility for certain kinds of races but I believe they are all
safe.
Hopefully I explained that well enough for people to follow. I realize
it's a lot of text but it's fairly simple book keeping code. This is all
implemented and I'm debugging now.
One place I'd like to implement CPU affinity is in the Sun Grid Engine
execution daemon. I think anonymous set would not be sufficent there
because the model allows new tasks to be started on a particular node at
any time during a parallel job. I'd have to do some more digging in the
code to be entierly certain. I think the less limits we place on the
hierarchy, the better off we'll be unless there are compeling complexity
reasons to avoid them.
With the anonymous set you can bind any thread to any cpu that is visible to it. How would this not work?
- There's still no way to specify an affinity policy rather than explicit
affinity, but if our CPU set model is sufficiently general, that might be
a
vehicle to do that. I.e., cpuset_setpolicy() rather than setting a mask.
Yes, I think this is orthogonal and can be addressed seperately. I'm not
sure how many userland programs are smart enough or even capable of making
determinations about their cache behavior however. We should open another
discussion once this one is done.
- In the interests of boring API changes, recent APIs tend to prefix the
method on the object name. Have you thought about cpuset_create(),
cpuset_foo(), etc? That reduces the chances of interfering with
application
namespaces. I think, anyway. :-).
Yes, I prefer that as well, as I mentioned syscalls tended to favor
brevity. I'm fine with changing that trend.
I need to ponder the proposal a little more, ideally over a hot beverage
this morning, and will follow up if I have further thoughts. Thanks for
working on this, BTW -- affinity is well-overdue for FreeBSD.
A little more to ponder now! Your feedback is much appreciated.
I believe the present hierarchical model satisfies the jail requirements of
restricting cpus in the jail while still allowing the jail to create sets.
The unanswered questions are:
1) What to do about sets that strand threads, options described above.
2) Are people ok with the transient nature of sets?
3) Does anyone want to help with man pages, administrative tools, etc? I
have a prototype tool called 'cpuset' that fully exercises the api but is
probably ugly. Will post details soon.
I could help with some of this as it furthers a funded project at work.
I will provide patches soon. It would be great to have a developer with a users perspective to look at some of the details and especially the administration side of things. I think someone else has offered to help with man pages but I need to double check.
Thanks,
Jeff
_______________________________________________
-- Brooks
freebsd-arch@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscribe@xxxxxxxxxxx"
- Follow-Ups:
- Re: getaffinity/setaffinity and cpu sets.
- From: Brooks Davis
- Re: getaffinity/setaffinity and cpu sets.
- References:
- Re: Linux compatible setaffinity.
- From: Jeff Roberson
- Re: Linux compatible setaffinity.
- From: Robert Watson
- Re: Linux compatible setaffinity.
- From: Jeff Roberson
- Re: Linux compatible setaffinity.
- From: Robert Watson
- Re: Linux compatible setaffinity.
- From: David Xu
- Re: Linux compatible setaffinity.
- From: Jeff Roberson
- getaffinity/setaffinity and cpu sets.
- From: Jeff Roberson
- Re: getaffinity/setaffinity and cpu sets.
- From: Robert Watson
- Re: getaffinity/setaffinity and cpu sets.
- From: Jeff Roberson
- Re: getaffinity/setaffinity and cpu sets.
- From: Brooks Davis
- Re: Linux compatible setaffinity.
- Prev by Date: Re: getaffinity/setaffinity and cpu sets.
- Next by Date: Re: getaffinity/setaffinity and cpu sets.
- Previous by thread: Re: getaffinity/setaffinity and cpu sets.
- Next by thread: Re: getaffinity/setaffinity and cpu sets.
- Index(es):
Relevant Pages
|
|