Re: pax misbehavior




Sorry for top-posting, but I am replying to myself and the context is
rather lengthy.

It seems the issue is that our pax has an internal heuristic to apply -s
transformations not only to file names, but to hard- and sym- link
targets also.

On one hand this seems to be beneficial, on the other hand this can lead
to some confusion, because symlink targets can be relative and their
pathnames can match quite unexpected patterns as compared to normal file
pathnames. What makes this behavior is even less obvious to understand
is that if link target is transformed into an an empty string then link
is omitted altogether. This, of course, makes certain sense: there can
not be a link without any target at all. On the other hand, POSIX
explicitly gives one and only one reason to omit a file - when its
_name_ is transformed to empty string. So this looks like a POSIX
violation and unexpected behavior.

I have several proposals on fixing this situation:
1. since link target modifying behavior is something that POSIX is
silent about then it seems to be an extension and it would be nice to
provide extended options to turn on/off (and maybe control some aspects
of) this behavior. AIX pax, for instance, doesn't do that. Solaris and
Linux seem to have the same behavior.
2. I think that regardless if #1 is implemented pax man page should
describe this behavior and even warn about it.
3. symlink target modification heuristic may be updated to exclude the
most trivial and probably widespread case of symlinks into the same
directory, i.e. its target doesn't contain any '/'.
4. symlink target modification heuristic may be updated to leave link
target alone if its substitution results in empty string (rather than
throwing the symlink out as it is done now).

There is, of course, a workaround for my particular case which is to
never use kill-all substitution -s '#.*##', but instead to explicitly
list all archive hierarchies roots like -s '#^root1/.*##' -s
'#^root2/.*##' ...
But even then there might be some unpleasant and hard-to-debug surprises
with other patterns being misapplied where no one expected them to be
applied.

on 20/09/2007 19:09 Andriy Gapon said the following:
Preparation first:
$ mkdir xxxxx
$ cd xxxxx/
$ touch yyyyy
$ ln -s yyyyy yyyyy.0
$ ln -s yyyyy.0 yyyyy.0.0
$ cd ..

Demonstration of expected behavior:
$ pax -w -f xxxxx.tar -s "#xxxxx#zzzzz#" xxxxx
$ pax -vf xxxxx.tar
drwxr-xr-x 2 ... 0 20 Sep 18:51 zzzzz
-rw-r--r-- 1 ... 0 20 Sep 18:51 zzzzz/yyyyy
lrwxr-xr-x 1 ... 0 20 Sep 18:51 zzzzz/yyyyy.0 => yyyyy
lrwxr-xr-x 1 ... 0 20 Sep 18:51 zzzzz/yyyyy.0.0 => yyyyy.0
pax: ustar vol 1, 4 files, 10240 bytes read, 0 bytes written.

Demonstration of misbehavior:
$ pax -w -f xxxxx.tar -s "#xxxxx#zzzzz#" -s "#.*##" xxxxx
$ pax -vf xxxxx.tar
drwxr-xr-x 2 ... 0 20 Sep 18:51 zzzzz
-rw-r--r-- 1 ... 0 20 Sep 18:51 zzzzz/yyyyy
pax: ustar vol 1, 2 files, 10240 bytes read, 0 bytes written.


The only thing added in the second test is -s "#.*##" option _after_ the
first -s option. Mysteriously it caused all symlinks to not be included
into an archive. But this should not happen if the behavior in the first
test is correct and pax follows POSIX specification: if an entry is
handled by the first -s (which it was in the first test), then further
-s options should not be applied to it. Our man page also says it:

Multiple -s expressions can be specified. The
expressions are applied in the order they are specified on the com-
mand line, terminating with the first successful substitution.

Of course, this synthetic test is a simplification of something done for
a real task with a real purpose. -s "#.*##" is meant to exclude from an
archive all "other" files and the side-effect of excluding symlinks as
well is very unfortunate.

Should I file a PR ?



--
Andriy Gapon
_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • RE: Curious fileutils/coreutils behaviour.
    ... > it to operate on the target. ... Changing ownership of a symlink is rarely ... (change link rather than target) ... obvious ones - a common security error. ...
    (Bugtraq)
  • Re: slpath v0.001
    ... You have to string them all back together for the test. ... in positional parameters, there may be an efficient way to string them ... because I have to include the target of the symlink in the archive ...
    (comp.unix.shell)
  • Re: [RFC] fix sysfs symlinks
    ... >> it always sees the old target if target is renamed and obviously does not ... The page symlink operations as used by current sysfs ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)
  • Re: slpath v0.001
    ... Why care whether there is a symbolic link ... > after sorting the list and removing duplicates, ... because I have to include the target of the symlink in the archive ...
    (comp.unix.shell)