Re: trying to recursively get the files' owners and permissions as well as an md5sum of the data



2008-06-17, 00:21(-04), Albretch Mueller:
Also, I have read somewhere that coding like this:
~
sh-3.1# md5sum `find . -type f -print0 | xargs -0`
~
is better than doing it like:
~
sh-3.1# find . -type f -print0 | xargs -0 md5sum
~
I actually read what this guy said. (S)He didn't say "faster" or "less
memory taxing", which are both measurable, but "better" because md5sum is
loaded into memory only once
~
I don't really know how the OS handles this, so I am asking
[...]

That's nonsense.

find ... -print0 | xargs -0 cmd

Tells find to output each filename followed by the NUL
character. The NUL character is the one character that cannot
occur in a file path on Unix. xargs -0 tells xargs to split it's
input on the NUL character and that pass each element resulting
of the splitting to the command. So that cmd gets one argument
per file found by find which is fine. The only improvement one
might suggest is to also use the -r (also GNU specific) option
to xargs so that it doesn't run cmd if its input is empty (if
find didn't find any file).

sh-3.1# md5sum `find . -type f -print0 | xargs -0`

couldn't be more wrong.

Here, as the cmd is not provided, xargs calls the "echo" command
instead. So the files found by find will be passed as arguments
to echo. echo is a command that outputs its arguments separated
by the space character. It also performs some transformations on
those arguments, for instance it transforms the "\n" string into
a newline character.

Then that output of echo (there can be several instances of echo
called) is gathered by the shell (because of `...`) and stored
in memory. When xargs has finished, then the *shell* will split
all that output. The splitting in `...` is done by default on
spaces, tabs and newline characters. Then, for every word
resulting from that splitting, the shell performs globbing, that
is for every word that contains wildcard characters such as *,
?, [...], the shell will try to expand that to the matching
files relative to the current directory.

And then, it will pass that big list as arguments to the md5sum
command (and contrary to xargs, it will not work around the
limitation on the number of arguments).

As an example, if you do:

touch 'some
file with *a* newline character in it, \n and plenty of spaces'

find . -type f -print0 will output:

some<NL>file with *a* newline character in it, \n and plenty of spaces<NUL>

xargs -0

reading that will split it in one argument to echo:
some<NL>file with *a* newline character in it, \n and plenty of spaces

echo will output:

some<NL>file with *a* newline character in it, <NL> and plenty of spaces<NL>

`...` will split that into those elements:
1 some
2 file
3 withs
4 *a*
5 newline
6 character
7 in
8 it,
9 and
10 plenty
11 of
12 spaces

The 4th one contains wildcards, so is subject to globbing. *a*
means any file name that contains "a". And the file happens to
match, so the list becomes:

1 some
2 file
3 withs
4 some<NL>file with *a* newline character in it, \n and plenty of spaces
5 newline
6 character
7 in
8 it,
9 and
10 plenty
11 of
12 spaces

And those will be passed as arguments to md5sum.

--
Stéphane
.



Relevant Pages

  • Re: "Dollhouse" = Duh-house
    ... He ripped through the Dollhouse, sparing Echo ... emergence of Echo as a character. ... original personality doesn't make it so. ...
    (rec.arts.tv)
  • Re: "Dollhouse" = Duh-house
    ... He ripped through the Dollhouse, sparing Echo ... emergence of Echo as a character. ... original personality doesn't make it so. ...
    (rec.arts.tv)
  • Re: find | grep | xargs over filenames with spaces
    ... xargs -r0n1 analyze_mp3 ... xargs -r0n1 echo A ... or your grep doesn't understand it. ... A Binary file matches ...
    (comp.unix.shell)
  • Re: looking for packages versions of running daemons
    ... this error or better run -v option on daemons not installed by APT. ... echo -n ", $latest available" ... xargs: echo: terminated by signal 13 ... To UNSUBSCRIBE, email to debian-user-REQUEST@xxxxxxxxxxxxxxxx ...
    (Debian-User)
  • Re: "Dollhouse" = Duh-house
    ... He ripped through the Dollhouse, sparing Echo ... emergence of Echo as a character. ... original personality doesn't make it so. ...
    (rec.arts.tv)