final decision about *at syscalls



Dear arch@

Over this summer I was working (among other things) on *at family of syscalls
kindly sponsored by Google (in their Summer of Code). The resulting patch is
almost finished but I need to decide one design question. If you are not interested
in *at/namei feel free to skip this mail.

The *at syscalls are a threads-oriented extension to basic file syscalls (think
of open(), fstat(), etc.) adding the possibility to specify from where the search
for relative path should start.

image that we have /tmp/foo/bar

and CWD is set to "/tmp/", and the process has opened "foo" as dirfd. with ordinary
open() syscall you have to either

chdir("/tmp/foo");open("./bar");

or

open("/tmp/foo/bar");

The first approach is problematic because it changes CWD for all threads in the process,
the second is prone to race-conditions as some of the components of the path can
change in parallel with the "open".

So POSIX introduced a new API, called "Extended API set part 2, ISBN: 1-931624-67-4" (at
least this was the latest when I looked last time), which solves that by introducing "*at"
syscalls that supply an fd of previously opened directory which is used instead of CWD
for searching relative path, ie. the previous example becomes

dirfd = open("/tmp/foo"); openat("foo", dirfd);

I implemented the whole API as native FreeBSD syscalls + in linuxulator emulation layer.
Here's the problem:

There are two approaches to the name translation from "filedescriptor" to the "vnode".

1) we can do it in the kern_fooat() syscall and pass namei() the resulting vnode
2) we can pass namei() the filedescriptor and do the translation there

PROs of #1:

o namei() does not need to know about the curthread, you can use this *at
ability for different purposes, it's cleaner (imho)

PROs of #2

o raceless implementation
o no code duplication

CONs of #1

o some very small code duplication (the translation is done in every
kern_fooat() function)
o there is a race between the name translation and the actual use of the result
of the translation that needs to be handled, the "path_to_file" string is copied
to the kernel space twice hence a race

CONs of #2

o namei is made thread dependant

Please tell me what approach you like more. I personally favour #1 because I don't like namei()
being thread dependant, Kostik Belousov prefers #2.

I'd like to change the current patch to whatever you decide is the best (currently I implement #1)
and finally ship it for commiting.

thank you

Roman Divacky
_______________________________________________
freebsd-arch@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: [PATCH] make uselib configurable (was Re: uselib() & 2.6.X?)
    ... >> The possibility is that there might be unknown applications which use ... > Until there's a list of obsolete syscalls, we can't say for sure, ... > Even if the final patch is unable to benefit many users, ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)
  • Re: final decision about *at syscalls
    ... The *at syscalls are a threads-oriented extension to basic file syscalls (think ... for searching relative path, ... we can do it in the kern_fooatsyscall and pass namei() the resulting vnode ... Considering Robert's paper on security race problems in things like systrace ...
    (freebsd-arch)
  • Re: [parisc-linux] [PATCH] Add key management syscalls to non-i386 archs
    ... Should be ENTRY_COMPif there's compat syscalls. ... particular syscall numbers have already been assigned (blame Linus for ... dropping the PA-RISC patch on the floor instead of including it in 2.6.9). ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)
  • `new syscalls for m68k
    ... I'm updating the syscall table for m68k... ... Below is a patch that adds all syscalls that m68k is currently lacking ... This patch is _not_ to be applied yet! ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)
  • Re: [PATCH 3/3] add the clone64() and unshare64() syscalls
    ... [PATCH 3/3] add the clone64and unshare64() syscalls ... unsigned long newsp); ...
    (Linux-Kernel)