Re: Q for Unix guru's
From: Alan Connor (xxxxxx_at_xxxx.xxx)
Date: 07/28/03
- Previous message: Edgar Allen: "Re: Tape writing."
- In reply to: Walter Briscoe: "Re: Q for Unix guru's"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Mon, 28 Jul 2003 06:32:08 GMT
On Sun, 27 Jul 2003 09:45:18 +0100, Walter Briscoe <wbriscoe@ponle.demon.co.uk> wrote:
>
>
> In message <kNCUa.848$Bg.427@newsread4.news.pas.earthlink.net> of Sat,
> 26 Jul 2003 21:55:28 in comp.unix.questions, Alan Connor
><xxxxxx@xxxx.xxx> writes
>>On Sat, 26 Jul 2003 19:42:10 +0000 (UTC), Charles Demas
>><demas@TheWorld.com> wrote:
>>>
>>>
>>> In article <594f6d6f.0307261059.17f0ceb9@posting.google.com>,
>>> Spider-X <yedilw@yahoo.com> wrote:
>>>>what combination of grep sed cat and wc would I have to use to count
>>>>the number of lines in a file that exceed 50 characters?
>>>
>>> grep and wc
>>>
>>>>If this is
>>>>easy enough, then how do I print all the lines that exceed this limit
>>>>to its own file?
>>>
>>> awk 'length > 50' infile > outfile
>>>
>>> This sounds so much like homework!
>>>
>>>
>>> Chuck Demas
>>>
>>
>>Doesn't it....
>>
>>
>>sed -n '/................................................../p' infile > outfile
>>
>>Looks silly, but works great. Less resources and faster than awk too :-)
>
> I would be more inclined to try:
> sed '/^.\{51\}/!d' < infile > outfile
>
> 1) -n /RE/p is more succinctly written /RE/!d
> 2) An anchored RE is usually faster than an unanchored RE
> Probably makes no difference here.
> It might be interesting to compare /.\{51\}$/!d
> 3) .... is less friendly than .\{count\}.
> I have never compared resource usage for this.
> 4) < infile rather than infile as I am inclined to do things in the OS
> rather than in a program. Error handling is also likely to be more
> consistent.
>
> Perhaps the OP could compare resource usage of several solutions.
> Standard statistical techniques for difference of means is needed.
> e.g. given 10 results with mean of 4.9 and standard deviation of 0.5 and
> 15 results with mean of 4.7 and mean of 0.8, what is the probability
> that this happened by chance? The time shell builtin is likely to give
> data. Also, how do the solutions compare with different sorts of data:
> Almost 0% long lines; almost 100% long lines; points in between.
> Big files; small files.
>
> File caching is going to make comparison difficult.
> I would be inclined to output to /dev/null and ignore first result in a
> batch. On AIX, I found the inability to force file access from disk
> meant that sensible comparison was practically impossible.
> Comparison was of objects in cache; the real world used objects on disk.
>
> I wait with baited (sic) breath to see if and how the OP responds. ;-)
Well, I don't know about the OP, who seems to have vanished, but *I* certainly
enjoyed and learned something from your little tutorial. In fact, I saved it to
my sed directory....
Alan
--
For Linux/Bash users: Eliminate spam from your life
with the Mailbox-Sentry-Program. See the thread
MSP on comp.mail.misc.
- Previous message: Edgar Allen: "Re: Tape writing."
- In reply to: Walter Briscoe: "Re: Q for Unix guru's"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]