Re: Q for Unix guru's

From: Alan Connor (xxxxxx_at_xxxx.xxx)
Date: 07/28/03

  • Next message: umit: "connecting to a remote unix machine"
    Date: Mon, 28 Jul 2003 06:32:08 GMT
    
    

    On Sun, 27 Jul 2003 09:45:18 +0100, Walter Briscoe <wbriscoe@ponle.demon.co.uk> wrote:
    >
    >
    > In message <kNCUa.848$Bg.427@newsread4.news.pas.earthlink.net> of Sat,
    > 26 Jul 2003 21:55:28 in comp.unix.questions, Alan Connor
    ><xxxxxx@xxxx.xxx> writes
    >>On Sat, 26 Jul 2003 19:42:10 +0000 (UTC), Charles Demas
    >><demas@TheWorld.com> wrote:
    >>>
    >>>
    >>> In article <594f6d6f.0307261059.17f0ceb9@posting.google.com>,
    >>> Spider-X <yedilw@yahoo.com> wrote:
    >>>>what combination of grep sed cat and wc would I have to use to count
    >>>>the number of lines in a file that exceed 50 characters?
    >>>
    >>> grep and wc
    >>>
    >>>>If this is
    >>>>easy enough, then how do I print all the lines that exceed this limit
    >>>>to its own file?
    >>>
    >>> awk 'length > 50' infile > outfile
    >>>
    >>> This sounds so much like homework!
    >>>
    >>>
    >>> Chuck Demas
    >>>
    >>
    >>Doesn't it....
    >>
    >>
    >>sed -n '/................................................../p' infile > outfile
    >>
    >>Looks silly, but works great. Less resources and faster than awk too :-)
    >
    > I would be more inclined to try:
    > sed '/^.\{51\}/!d' < infile > outfile
    >
    > 1) -n /RE/p is more succinctly written /RE/!d
    > 2) An anchored RE is usually faster than an unanchored RE
    > Probably makes no difference here.
    > It might be interesting to compare /.\{51\}$/!d
    > 3) .... is less friendly than .\{count\}.
    > I have never compared resource usage for this.
    > 4) < infile rather than infile as I am inclined to do things in the OS
    > rather than in a program. Error handling is also likely to be more
    > consistent.
    >
    > Perhaps the OP could compare resource usage of several solutions.
    > Standard statistical techniques for difference of means is needed.
    > e.g. given 10 results with mean of 4.9 and standard deviation of 0.5 and
    > 15 results with mean of 4.7 and mean of 0.8, what is the probability
    > that this happened by chance? The time shell builtin is likely to give
    > data. Also, how do the solutions compare with different sorts of data:
    > Almost 0% long lines; almost 100% long lines; points in between.
    > Big files; small files.
    >
    > File caching is going to make comparison difficult.
    > I would be inclined to output to /dev/null and ignore first result in a
    > batch. On AIX, I found the inability to force file access from disk
    > meant that sensible comparison was practically impossible.
    > Comparison was of objects in cache; the real world used objects on disk.
    >
    > I wait with baited (sic) breath to see if and how the OP responds. ;-)

    Well, I don't know about the OP, who seems to have vanished, but *I* certainly
    enjoyed and learned something from your little tutorial. In fact, I saved it to
    my sed directory....

    Alan

    -- 
           For Linux/Bash users: Eliminate spam from your life
           with the Mailbox-Sentry-Program. See the thread
           MSP on comp.mail.misc. 
         
    

  • Next message: umit: "connecting to a remote unix machine"