Re: Efficiency Question: Large Arrays vs. Indexed Files on Alphas

From: Keith A. Lewis (lewis_at_mazda.mitre.org)
Date: 11/21/03

  • Next message: John N.: "Re: SYS$LIBRARY:TRACE_V74.EXE on VMS 7.3-1 causing error"
    Date: Fri, 21 Nov 2003 18:31:09 +0000 (UTC)
    
    

    hoefelmeyer@hotmail.com (Cheryl Hoefelmeyer) writes in article <72a72e76.0311201438.3bdafc33@posting.google.com> dated 20 Nov 2003 14:38:14 -0800:
    >We have a GS80 and an ES40, not clustered, each running OpenVMS 7.3. I
    >am writing a program in Basic that will run on each box, and I have a
    >question regarding which of the following would be most efficient, in
    >general.
    >
    >The program will operate on three very large files, and at one point I
    >can either
    >(1) decide to read some fields of some records into an array and
    >search for information sequentially through the array, or
    >(2) read the information into another file and access it each time via
    >single read.
    >The number of elements or records in this circumstance is not expected
    >to exceed 10,000.
    >
    >Elsewhere in the program, I can either
    >(1) maintain records that will eventually be written to an output file
    >in an array because they may need to be updated some number of times
    >before a final record is written, or
    >(2) write the first record to the output file and just update each
    >time it is necessary.
    >The first option would call for reading sequentially through a very
    >large array to find the proper record to update each time a new record
    >is added. The second calls for 1-3 file operations per each record
    >added. The number of records maintained here is on the order of
    >1,000,000.
    >
    >So, for each of these, which is the best option in general? Years ago,
    >I/O on DEC boxes was such a bottleneck that the array option would
    >generally be the faster route, but I&#8217;m no longer sure that is
    >such a big issue now.

    In order to give a definitive answer we'd need more info such as how many
    records are modified during each run, how many aren't, how big they are,
    etc. I won't worry about that, I'll just tell you about some of my related
    experiences.

    I wrote one program which sorts and formats records. My first stab at it
    saved the entire file contents in dynamic memory and the SOR$ utility
    routines to sort them, then formatted the records for the output file.
    This was fast, but it wouldn't work with input files which were larger than
    the user's paging file quota.

    So I added an option which read in the records and wrote them to an
    intermediate indexed file based on the sort key, then read them out of that
    file in order and wrote the output file. This approach was an order of
    magnitude slower! It did work, but it took days when it should have taken
    hours.

    In an interactive program where only one record at a time is modified, I
    used a keyed file quite successfully. Record locking allowed multiple users
    to modify the same file at the same time.

    If you want super-efficient access of a fixed-size file, look at local
    sections (or global sections if you need concurrent access for multiple
    processes). Sections are pieces of virtual memory which are mapped to a
    file of your choosing instead of a pagefile. The I/O is done as-needed, the
    raw data from the file is all there. The only drawback is you can't (AFAIK)
    extend the file.

    --Keith Lewis klewis {at} mitre.org
    The above may not (yet) represent the opinions of my employer.


  • Next message: John N.: "Re: SYS$LIBRARY:TRACE_V74.EXE on VMS 7.3-1 causing error"

    Relevant Pages

    • Re: qsort
      ... >>A simple example is an insertion sort. ... Essentially one element moves along the array until it is in its ... > fail but qsort() cannot. ... without having any kind of dynamic memory allocation ...
      (comp.lang.c)
    • Re: "Sorting" assignment
      ... And many others prefer to call partition exchange because "quicksort" ... bin B depending on whether it is greater than, ... If the array is already sorted, this means that you end up ... attempt to sort them. ...
      (comp.programming)
    • Re: A Fast sorting algorithm for almost sorted data
      ... far my compressor has potential but is nowhere near ready. ... It does however make heavy use of sorting. ... which I am currently calling Run sort. ... entire selected run can be added to the sorted output array. ...
      (comp.compression)
    • Re: Save & Sort
      ... You can copy your array to a scratch ... "Heap" sort. ... Dim lst As Long ... Dim tmp As String ...
      (microsoft.public.excel.programming)
    • Re: fast stable sort
      ... if you have an existing array, you can simply arrange an array of ... pointers to the items, and sort that. ... hence require swapping pages of virtual memory, ... each merge run instead of accessing them in sequential RAM ...
      (comp.programming)