Re: Command piping question



In article <1139436456.277381.176770@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>, BD <bobby_dread@xxxxxxxxxxx> wrote:

Basic ksh question here... there's a principle in command redirection
which would make my life easier if I understood.

Say I want to delete all files in a directory that have properties that
I can see through an 'ls -al'.

For example, If I wanted to show all files with a timestamp of 12:##, I
could go

ls -al |grep 12:

Can I pipe the output somehow to delete files based on the same
criteria, as in

ls -ls |grep 12: >rm

I know that's not correct, but can I accomplish this somehow with
redirection or piping?

Yes, you can. One way -- but not the best way:

$ ls -al | grep '12:' | awk '{print $NF}' | xargs rm

The piping is to basically whittle down to a list of files, then extract
the filenames of matching files, then tell rm to nuke the whole pile of
matching files in one swoop.

A safer method would be:

$ ls -al | awk '$8 ~ /^12:/ {print $NF}' | xargs rm

What is the output of the ls -al? Lines like:

-rw-r--r-- 1 root system 90 Jan 15 16:22 userfs.list.08

In awk, the default field separator (FS) is a space.

So you can see that there are 9 fields here.

The eighth field has the time.

The awk program is between the two ' ' quote marks.

$8 ~ /^12:/

means:

"If the 8th field matches something that starts with 12:"

and...

{print $NF}

means:

"...then print out the final field (the filename)"

$NF will always be the final field. Doesn't matter how many fields a
line may have... $NF is *ALWAYS* the last field on the line.

Last field here has the filename.

The ls -l | awk method is better than ls -l | grep because the awk
method only checks the time field ONLY... grep checks against the entire
line so it's not as safe.

What if you had a filename like 12:05.txt (a legal filename) created at 9am?

grep would include it.
awk would not.

So awk is safer and more bulletproof in this situation.

Why use xargs? Because rm itself doesn't take filenames to nuke from
the standard input (stdin) -- in other words, can't use the piped output.

Example:

$ echo foo.txt bar.txt | rm

will not work because of this reason.

But:

$ echo foo.txt bar.txt | xargs rm

will work -- xargs collect the output from the pipe then runs "rm
foo.txt bar.txt".

So xargs collects the arguments then runs rm with the filenames.

xargs is also good because it can break down really long lists of
arguments (filenames in this case) into chunks that will not exceed the
maximum length of arguments for a single command.

What if you matched 2,000 filenames? But what if you only could fit
about 500 filenames on a single rm command before it fails. What would
happen if you told rm to nuke all 2,000 files -- it'd fail or it'd
ignore the last 1,500 files.

xargs avoid that problem by figuring out it can stuff about X number of
arguments at one time... let's say, 500 arguments for rm. If it has
2,100 arguments... then it calls rm five times (500, 500, 500, 500, 100).

It runs rm fewer times instead of running rm once for every single file
like 'find' might do.

The 'find' approach would run rm 2,100 times for 2,100 files... xargs
might run rm 5 times for 2,100 files. Guess which is much faster to run?

It's not a big deal with a small directory, but if you use xargs, it'll
automatically be faster the day someone uses your code on a huge
directory without having to change a single thing.

So if you plan ahead and use ls|awk|xargs rm, your code will be more
likely to run correctly in odd situations, and will work fine on small,
medium sized, or huge directories.

This is also a good way to illustrate piping concepts, too.

-Dan
.



Relevant Pages

  • Re: mp3gain
    ... option for xargs and the -print0 option for find, ... the evils of inappropriate characters in filenames and the ... from its standard in and build up command lines. ... would blow out the command line buffer xargs runs the command ...
    (comp.os.linux.setup)
  • Re: grep in a directory with exclude question
    ... The find command finds all files under the current directory and prints ... Because of the -print0, the filenames are seperated ... The xargs program, reads these filenames and executes its argument (in ... Hundreds of volunteers worldwide volunteering ...
    (comp.unix.shell)
  • Re: find -exec surprisingly slow
    ... deleting files from a directory which has 400K filenames in it, ... mv usermail usermail.bigspam ... The thing to use is the '-J' option of xargs. ... a single `mv' command as possible. ...
    (freebsd-questions)
  • Re: [opensuse] Script problem
    ... Will set all directories to chmod 755 ... will set all regular files to perm 644 ... Your problem is most likely the spaces in the filenames; ... See the man pages for xargs and find for more explanations. ...
    (SuSE)
  • Re: argument list too long
    ... Apart from the inefficiency (it will call mv for each and every ... xargs will fit as many filenames on the command line as ... that script will fail if any filenames contain ... of, just like xargs. ...
    (comp.os.linux.misc)