Re: Newbie

From: William Park (opengeometry@yahoo.ca)
Date: 04/24/03


From: William Park <opengeometry@yahoo.ca>
Date: 24 Apr 2003 13:46:06 GMT

Stephane CHAZELAS <stephane_chazelas@yahoo.fr> wrote:
> William Park wrote:
> [...]
>> - read a line
>> - break the line into one character per line in a temporary file
>> - grep, sort, uniq -c
>
> So, for each line (there may be billions).
>
> if line is length n.
>
> - Do n+1 read system calls (eventually more and behave badly if
> line ends with \)
> - fork, exec fold(1) (so do many more read/mmap... system calls,
> so the point above is negligible). Fortunately after the first
> line, fold, sort, grep, uniq code will be in memory, so it
> will only waste CPU time and memory bus bandwidth.
> - create a temporary file (not necessary and source of dozens of
> problems, not to speak of inefficiency).
> - sort the input while it's not necessary (if lines are long, it
> will use memory and eventually create temporary files), all
> you need is 26 integers to store the result.
> - 3 forks, 3 execs, two pipes for grep|sort|uniq.
> - And what will you use to reformat and display the result? awk?
> - How will the temporay file be created? How will it be removed?
> What if CTRL-C is hit, will it leave the file?
>
> Were you really serious?

Yes, Stephane. The solution will be shorter, faster, and cheaper than
the time you devoted for the above analysis.

-- 
William Park, Open Geometry Consulting, <opengeometry@yahoo.ca>
Linux solution for data management and processing. 


Relevant Pages

  • Re: "Never change a running system"
    ... > Probably due to some internal memory conflict... ... If the temporary files were corrupted in the ... > even the stuff which was in My Documents had been lost. ... > preferably not in My Documents, but on the hard disk which (can be accessed ...
    (sci.lang)
  • Re: opening multiple files in fortran
    ... If you have enough memory, are only doing this once, and don't expect ... the first 40 output files, keeping all 40 open, then close all 40. ... reading the input files for the next 40. ... Close all temporary files. ...
    (comp.lang.fortran)
  • Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23]
    ... GNU sort uses a merge sort with temporary files on disk. ... so it's not quite the same as anonymous memory. ... on 128MB actually didn't show updatedb being really that big a problem. ...
    (Linux-Kernel)
  • Re: fedora newbie
    ... > partition (to quote a mac term, Virtual Ram or the linux version of the Windows ... The 4G/4G split is to do with how an individual process views memory. ...
    (Fedora)
  • monitoring anaysis services with system counters
    ... Is any one using analysis services's system counters? ... I am trying to asses whether i have a memory bottle neck while aggregations ... I want to see if temporary files are used while aggregations are calculated ...
    (microsoft.public.sqlserver.olap)