Re: Fork or threads

From: Nick Landsberg (hukolau_at_NOSPAM.att.net)
Date: 03/09/04


Date: Tue, 09 Mar 2004 04:32:56 GMT


Joey Abrams wrote:
> Hello,
>
> The applicatin will have to read through multiple files, perhaps several
> hundred. When this task is done I expect it to exit. ie: This is all it will
> do, when it's done reading in all the files and processing them, the
> application is done.
>
> Does 1 single thread make sense? my thinking was that I'd need multiple
> threads(or process) to quickly read through all of these files in order to
> speed it up.

If you are reading files off the same physical disk, then
the speed (or lack thereof) of the disk, dominates the
time you spend processing the files. Reading them
"in parallel" by using either multiple processes or
multiple threads can speed up this process because the
device driver will order the reads to be most efficient,
i.e. minimizing the seek time between blocks. There is
proabaly a point of diminishing returns, tho, when
you are saturing the disk I/O channel anyway. This will
vary from disk drive to disk drive.

Note the above comment assumes that you are not too
terribly compute intensive in processing the files, that
is, they are not coeffients to some fancy algorithm which
is going to take seconds to perform on each line in the file.
If that is the case, then don't worry about the disk.

If that is not the case, then, for a single disk, the latency is
approximated by the formula

R = Ro * ( 1/ (1-U) )

Where Ro is the latency at "no load"
and U is the utilization of the disk.
(Single-server single-queue, i.e. MM/1, with
totally random I/O's)

A typical disk nowadays has about 6-8 ms. latency
for random I/O's. Thus being able to do
about 120-160 I/O's at 100% utilization.
That would equate to about 60-80% disk utilization
and latencies of about 14-28 milliseconds per
disk read. (Unless the files were contiguous,
in which case the device driver should do a nice
job of sucking in a whole lot of data at once.)

Now... all of the above is theory.

In my experience, the total running time of all
jobs like this, either executed in series or in parallel,
will be about the same. (Anecdotal evidence, not
a controlled experiment.)

I have personally found it much more satisfying
to watch stuff being processed in sequence because
it gives a sense of "progress". Running several
dozens of jobs in parallel creates a situation where
none of them seem to finish until, at the end,
all of them finish almost simultaneously.

You get "antsy" thinking what might be going wrong.

Just my opinion.

>
>
> "Joey Abrams" <slcjoey@hotmail.com> wrote in message
> news:Y463c.5221$n37.382069@read2.cgocable.net...
>
>>Hello,
>>
>>I have an application where I need to read through a bunch of files, I
>
> have
>
>>to actually read in the whole file, multiple files, this could be anywhere
>>from 1 file to several hundred files.
>>
>>Should I fork off an X amount of processes to do this? or would threads be
>>better to use?
>>
>>Just looking for suggestions,
>>
>>thank
>>
>>Joe Abrams
>>
>>
>>
>
>
>

-- 
Ñ
"It is impossible to make anything foolproof because fools are so 
ingenious" - A. Bloch


Relevant Pages

  • Bug+fix: PDC20271 RAID detection fails
    ... My array was not detected by my kernel. ... the PDC RAID superblock, that is located at the start ... of the last track on the disk. ... is a multiple of track size and if not, ...
    (comp.os.linux.hardware)
  • Re: Ghost file
    ... You cannot delete a file or a folder on an NTFS file system volume ... How to locate and correct disk space problems on NTFS volumes in Windows XP ... This question was posted individually to multiple groups. ... click the icon next to Newsgroups. ...
    (microsoft.public.windowsxp.basics)
  • Re: Bad response time closing a file
    ... Could this information fit with the close delay? ... You might try using setmode 152 on those disk opens. ... We have seen delays or multiple seconds on closes but never minutes. ... param1.= 1 for nonaudited disk files, causes the cache not to be ...
    (comp.sys.tandem)
  • Re: [gfortran43] opening file twice...
    ... The odds of the standard talking specifically about disk files seem... ... "availability of multiple file connections is system dependent by type ... multiple connections in the language standard. ...
    (comp.lang.fortran)
  • Re: How to verify/fix High Disk Read Latencies in Exch2003 ?
    ... one more observation is that MOM is returning the same High Latency ... > to collect include physical disk - Avg. ... Outlook calls a function that wraps the RPC to the server. ... > N = number of spindles in the RAID set. ...
    (microsoft.public.exchange.admin)