Re: sort + head = weirdness?

From: Barry Margolin (barry.margolin_at_level3.com)
Date: 10/02/03


Date: Wed, 01 Oct 2003 23:03:14 GMT

In article <pan.2003.10.01.22.14.55.821407@livebridge.com>,
Mark <unixadmin@livebridge.com> wrote:
>This is really weird. I'm catting a text file, piping it through
>'sort'and then through 'head' and I'm getting really strange results.
>
>This only affects files > 4096 bytes.
>It's only a problem when piping through 'head'; 'tail' works fine.
>It's only a problem when piping through a file containing 'sort'; it
>works fine with 'sort' itself.
>
>(The test file is just a text file containing the numbers 1-9 repeated.)
>('format' is a file containing the single word 'sort'. Eventually this
>will contain other formatting commands.)
>
>So, here's what happens:
>
>With a file 4096 bytes or less it works fine.
># ls -al testfile -rw-rw-r--
> 1 root root 4096 Oct 1 14:46 testfile
>
># cat testfile | format | head -1
>12345
>
>If I add one more character to the file, it fails.
># ls -al testfile
>-rw-rw-r-- 1 root root 4097 Oct 1 14:38 testfile
>
># cat testfile | format | head -1
>123456
>/usr/local/bin/drs/format: line 1: 8688 Broken pipe sort

The "head" command exits as soon as it has printed the specified number of
lines, so the reading end of the pipe is closed. When sort tries to write
some more, it gets this error.

The reason it starts at 4096 bytes is because this is stdio's default
buffer size. For a small file the entire thing is written in a single
buffer, so there's no attempt to write anything after "head" has exited.
But for a larger file it gets this error when it tries to flush the second
buffer.

>If, with the same 4097 byte file I replace 'head' with 'tail', it works
>fine.
># cat testfile | format | tail -1
>123456789

Tail has to read all of its input before it can print anything, because
that's the only way to know when it has gotten to the last line. So it
can't exit until after sort has finished all its writing.

>If I replace 'format' with 'sort' it works fine.
># cat testfile | sort |head -1
>123456
>
>Any idea what's going on here? Is there some buffer somewhere that can
>only take 4k of data?

Interactive shells typically suppress the error message for "broken pipe"
because this is such a common situation. But in your case the error is
happening in the shell being used to run the "format" script, which is
non-interactive.

Check the documentation of the shell (you didn't say what shell you're
using for the script) to see if there's any way to suppress this.

-- 
Barry Margolin, barry.margolin@level3.com
Level(3), Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.


Relevant Pages

  • Re: the sort function in lisp (destructive)
    ... As for SORT Robert Maas posted a small hack that would preserve the head ... cons after calling the implementation's SORT in SORT-KEEPING-HEAD-CELL ... then splice the user's head cell back in after sorting. ...
    (comp.lang.lisp)
  • See My New Portable Crappy Rehearsal Kit
    ... With a good head, it'll probably sound good. ... I mounted a cymbal arm mount on the floor tom shell, ...
    (rec.music.makers.percussion)
  • Re: See My New Portable Crappy Rehearsal Kit
    ... With a good head, it'll probably sound good. ... I mounted a cymbal arm mount on the floor tom shell, ...
    (rec.music.makers.percussion)
  • [PATCH 2/3] ring-buffer: make lockless
    ... and read from the head. ... buffer is now just a reference to where to look. ... When the writer wraps the buffer and the tail meets the head, ... the reader on another CPU will not take the ...
    (Linux-Kernel)
  • [PATCH v2 2/3] ring-buffer: make lockless
    ... and read from the head. ... buffer is now just a reference to where to look. ... When the writer wraps the buffer and the tail meets the head, ... Let reader reset entries value of header page. ...
    (Linux-Kernel)