Binary data in multiple adjacent files
- From: JF Mezei <jfmezei.spamnot@xxxxxxxxxxxx>
- Date: Thu, 26 Oct 2006 02:49:30 -0400
I have a program that was setup to provide elevation data for all of
australia new zealand from 2 large datasets that covered that territory.
The application kpet the 2 files opened and did relative reads into
those binary files to obtain the required data (and cached those record
so that if the next request was for a nearby cell, it would save an IO).
Now however, updated data comes in much smaller files, each covering
only 1° lat by 1° long. Each file has 1201 * 1201 values, each value is
2 bytes. (basically elevation in metres for a 92 square metre area).
Each file is roughly 2.8 megs.
(This is degraded information due to the US government feeling it could
be used by terrorists, but it is still better than what was available
before. The SRTM data is precise to about 1 square metre, and some
sections have been released to 30 square metre accuracy, but most is at
92 metre).
Switching to this would mean that my application would need to open
hundreds of files to cover australia for instance, so this introduces
scalability issues. And since I need to review the code, I figured I
should perhaps consider other methods.
For my current needs, I need to cover from 45 to 47° north, and from
-73° to -76°, which gives me th following files:
N46W076 N46W075 N46W074
N45W076 N45W075 N46W074
N46W074 covers 46°N 74°W at the lower left corner to 47°N 73°W at the
upper right corner.
Looking at $CRMPSC, it appears I need to do a RMS $OPEN on each file and
provide the channel associated with each file.
So this would eat into the FILLM quota.
Now, if i just do the $CRMPSC to map a whole file to some virtual
address in my process space, would this consume 2,8 meg of working set
right away ? Or does VMS just allocate virual memory that is marked
invalid, and only when I try to access a few bytes at a location would
VMS load a single page from the actual file that contain those 2 bytes I
asked for ?
The nature of my application means that as I follow a route, I read data
progressing in one direction, and the odds of having to go back are low
(but not nil). So, blocks containing elevation data read early on are
unlikely to be needed again once I have moved on to points outside those blocks.
Does VMS provide a means where I can specify that I want at max 2 pages
from a global section to mapped to my working set at ay point in time ?
(this way, when I request a shortword located in a different block, the
system would automatically unmap the oldest block mapped to that file
and use that memory to map the new block to the file).
The other thing I am thinking about is having say 5 files opened at any
point in time, and whevere I access a file, I update some counter based
on the progress in my processing. So when I need to open a new file, I
would then close the file which as the lowest value in the counter (file
that been iddle for the longest).
So, for every point , I would need to create a file name, and check in
the list of currently opened file if that file has already been opened.
But this would allow me to scale to any size.
Also, if I run this on 8.3 (Alpha): (written in C)
Say I do an fseek to the 824th byte in the file and then read 2 bytes.
Then, I do an fseek to the 1020th byte and also read 2 bytes.
Is the underlying IO system smart enough to know that both are accessing
data from the same physical block and use the cache system ? Or is the C
file IO so screwed up that it would bypass this facility ?
.
- Follow-Ups:
- Re: Binary data in multiple adjacent files
- From: prep
- Re: Binary data in multiple adjacent files
- From: briggs
- Re: Binary data in multiple adjacent files
- From: Richard B. Gilbert
- Re: Binary data in multiple adjacent files
- From: Joseph Huber
- Re: Binary data in multiple adjacent files
- Prev by Date: Re: Terminal Server Manager and Decnet Phase IV
- Next by Date: Re: LAN failover
- Previous by thread: Terminal Server Manager and Decnet Phase IV
- Next by thread: Re: Binary data in multiple adjacent files
- Index(es):