Re: Efficiency Question: Large Arrays vs. Indexed Files on Alphas
From: Keith A. Lewis (lewis_at_mazda.mitre.org)
Date: 11/21/03
- Previous message: Larry Kilgallen: "Re: password generation"
- In reply to: Cheryl Hoefelmeyer: "Efficiency Question: Large Arrays vs. Indexed Files on Alphas"
- Next in thread: Randy Park: "Re: Efficiency Question: Large Arrays vs. Indexed Files on Alphas"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Fri, 21 Nov 2003 18:31:09 +0000 (UTC)
hoefelmeyer@hotmail.com (Cheryl Hoefelmeyer) writes in article <72a72e76.0311201438.3bdafc33@posting.google.com> dated 20 Nov 2003 14:38:14 -0800:
>We have a GS80 and an ES40, not clustered, each running OpenVMS 7.3. I
>am writing a program in Basic that will run on each box, and I have a
>question regarding which of the following would be most efficient, in
>general.
>
>The program will operate on three very large files, and at one point I
>can either
>(1) decide to read some fields of some records into an array and
>search for information sequentially through the array, or
>(2) read the information into another file and access it each time via
>single read.
>The number of elements or records in this circumstance is not expected
>to exceed 10,000.
>
>Elsewhere in the program, I can either
>(1) maintain records that will eventually be written to an output file
>in an array because they may need to be updated some number of times
>before a final record is written, or
>(2) write the first record to the output file and just update each
>time it is necessary.
>The first option would call for reading sequentially through a very
>large array to find the proper record to update each time a new record
>is added. The second calls for 1-3 file operations per each record
>added. The number of records maintained here is on the order of
>1,000,000.
>
>So, for each of these, which is the best option in general? Years ago,
>I/O on DEC boxes was such a bottleneck that the array option would
>generally be the faster route, but I’m no longer sure that is
>such a big issue now.
In order to give a definitive answer we'd need more info such as how many
records are modified during each run, how many aren't, how big they are,
etc. I won't worry about that, I'll just tell you about some of my related
experiences.
I wrote one program which sorts and formats records. My first stab at it
saved the entire file contents in dynamic memory and the SOR$ utility
routines to sort them, then formatted the records for the output file.
This was fast, but it wouldn't work with input files which were larger than
the user's paging file quota.
So I added an option which read in the records and wrote them to an
intermediate indexed file based on the sort key, then read them out of that
file in order and wrote the output file. This approach was an order of
magnitude slower! It did work, but it took days when it should have taken
hours.
In an interactive program where only one record at a time is modified, I
used a keyed file quite successfully. Record locking allowed multiple users
to modify the same file at the same time.
If you want super-efficient access of a fixed-size file, look at local
sections (or global sections if you need concurrent access for multiple
processes). Sections are pieces of virtual memory which are mapped to a
file of your choosing instead of a pagefile. The I/O is done as-needed, the
raw data from the file is all there. The only drawback is you can't (AFAIK)
extend the file.
--Keith Lewis klewis {at} mitre.org
The above may not (yet) represent the opinions of my employer.
- Previous message: Larry Kilgallen: "Re: password generation"
- In reply to: Cheryl Hoefelmeyer: "Efficiency Question: Large Arrays vs. Indexed Files on Alphas"
- Next in thread: Randy Park: "Re: Efficiency Question: Large Arrays vs. Indexed Files on Alphas"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|