Re: Fork or threads
From: Nick Landsberg (hukolau_at_NOSPAM.att.net)
Date: 03/09/04
- Next message: David Schwartz: "Re: Fork or threads"
- Previous message: Joey Abrams: "Re: Fork or threads"
- In reply to: Joey Abrams: "Re: Fork or threads"
- Next in thread: David Schwartz: "Re: Fork or threads"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Tue, 09 Mar 2004 04:32:56 GMT
Joey Abrams wrote:
> Hello,
>
> The applicatin will have to read through multiple files, perhaps several
> hundred. When this task is done I expect it to exit. ie: This is all it will
> do, when it's done reading in all the files and processing them, the
> application is done.
>
> Does 1 single thread make sense? my thinking was that I'd need multiple
> threads(or process) to quickly read through all of these files in order to
> speed it up.
If you are reading files off the same physical disk, then
the speed (or lack thereof) of the disk, dominates the
time you spend processing the files. Reading them
"in parallel" by using either multiple processes or
multiple threads can speed up this process because the
device driver will order the reads to be most efficient,
i.e. minimizing the seek time between blocks. There is
proabaly a point of diminishing returns, tho, when
you are saturing the disk I/O channel anyway. This will
vary from disk drive to disk drive.
Note the above comment assumes that you are not too
terribly compute intensive in processing the files, that
is, they are not coeffients to some fancy algorithm which
is going to take seconds to perform on each line in the file.
If that is the case, then don't worry about the disk.
If that is not the case, then, for a single disk, the latency is
approximated by the formula
R = Ro * ( 1/ (1-U) )
Where Ro is the latency at "no load"
and U is the utilization of the disk.
(Single-server single-queue, i.e. MM/1, with
totally random I/O's)
A typical disk nowadays has about 6-8 ms. latency
for random I/O's. Thus being able to do
about 120-160 I/O's at 100% utilization.
That would equate to about 60-80% disk utilization
and latencies of about 14-28 milliseconds per
disk read. (Unless the files were contiguous,
in which case the device driver should do a nice
job of sucking in a whole lot of data at once.)
Now... all of the above is theory.
In my experience, the total running time of all
jobs like this, either executed in series or in parallel,
will be about the same. (Anecdotal evidence, not
a controlled experiment.)
I have personally found it much more satisfying
to watch stuff being processed in sequence because
it gives a sense of "progress". Running several
dozens of jobs in parallel creates a situation where
none of them seem to finish until, at the end,
all of them finish almost simultaneously.
You get "antsy" thinking what might be going wrong.
Just my opinion.
>
>
> "Joey Abrams" <slcjoey@hotmail.com> wrote in message
> news:Y463c.5221$n37.382069@read2.cgocable.net...
>
>>Hello,
>>
>>I have an application where I need to read through a bunch of files, I
>
> have
>
>>to actually read in the whole file, multiple files, this could be anywhere
>>from 1 file to several hundred files.
>>
>>Should I fork off an X amount of processes to do this? or would threads be
>>better to use?
>>
>>Just looking for suggestions,
>>
>>thank
>>
>>Joe Abrams
>>
>>
>>
>
>
>
-- Ñ "It is impossible to make anything foolproof because fools are so ingenious" - A. Bloch
- Next message: David Schwartz: "Re: Fork or threads"
- Previous message: Joey Abrams: "Re: Fork or threads"
- In reply to: Joey Abrams: "Re: Fork or threads"
- Next in thread: David Schwartz: "Re: Fork or threads"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|