Re: VAX VMS 7.3, ana/disk running out of virtual memory ?



JF Mezei wrote:
Richard Brodie wrote:

were on the same volume, or even all on the primary volume. Do the
corrupt blocks appear at the same LBN numbers (on the other volume)
as the container file?



So far, of all the files I have found to be corrupt, they have all been
on RVN 1. Not a conclusive thing mind you.



I feel your pain. Dave's Xmas present must have arrived early :-)


What I would like to do is to build a map of the disk with filenames
sorted by location on the disk. (at least for first header). The idea
being to try to find a pattern between files I know are currupt, which
would give me some hint on which other files to test (aka: all files
between 2 corrupt files in a list).


Still more pain, but DFU will tell you what file is at a particular
block number. So if you can find the lbn's belonging to the LD container
file (dump/header), you could check the same lbn's on the other 3 drives
to see if they are corrupted.

The implication, though is that LD is figuring out the logical block
number of its container file (and assuming it is contiguous?) and doing
logical or physical I/O to those blocks, possibly to the wrong volume
of the volume set. I would think LDDRIVER is just opening the container
file and doing virtual I/O to it (and thus wouldn't care if the container
is contiguous or split across a volume set or whatever), but maybe it's
hard to do that from a driver, and it takes the much easier course of
doing logical i/o directly to the disk using qios. If it lost track of
which physical extent each chunk of the file lived on, or worse yet,
assumed they were all on the same disk, or on the primary disk of the
volume set, and they weren't, exactly the symptoms you describe would
arise. Maybe LD is checking for a contiguous file, but the check
erroneously succeeds on a volume set? OUCH! (For example, 1st extent
lives on volume 1. Second extent lives on volume 2. LD checks the
retrieval pointers and sees there is only one, and assumes the file is
contiguous, doesn't check for extension headers, and doesn't notice that
the allocation in the retrieval pointer doesn't match the total file
allocation.) (This is pure speculation! Maybe no one has ever used
LD on a non-contiguous file on a volume set before?!?)

------------------------

I've been running VAX V7.3 on several VAXes (hobbyist and work) for
nearly 5 years, and have had no problems like this. Added an Itanium
V8.3 system to the cluster in August, and recently upgraded a couple of
Alphas from V8.2 to V8.3 in the same cluster. Nothing broke (except
monitor cluster.) But maybe there's a cache or locking problem in a
cluster that's addressed in one of the VAX or Alpha patches? There
are only about four Alpha V8.3 patches so far, and about a dozen
VAX V7.3 patches. If one of them fixes the problem, I would most
expect it to be the VAX VAXDRIV patch, or VAXF11X or VAXSYS patches.


OK, just ran a bit of a test on the web site directories. (I backed it
up to a spare disk and try to make it functional again).

Basically a loop of f$search, then check if the file_attribute RFM =
VAR, if so, then TYPE/OUTPUT=NLA0: and check the $STATUS.

451 files, 133 were variable, 3 were bad files.

But testing binary files (image,s .zips etc) will require the use a
browser once I have switched the web server to point to that spare disk.

But this is just so I can get the main web site back up. There is a hell
of a lot of testing to be done on a hell of a lot of files on that disk.

I've had to stop all queues and mail processing. Once I move the user
directories to another disk and clean the active ones up, then I can
restart mail processing. And maybe then get some sleep.

One thing that might help sort out the lost files is to ana/disk/repair,
then set file/remove everything from [syslost] except the directories.
Rename the directories to a temp directory, and repeat the ana/disk/repair.

Any files listed in the directories will no longer be lost, and the
second ana/disk/repair won't touch them. The residue, truly lost files,
will still end up in [syslost], but there maybe many fewer of them to
sort out. If you know which directories are top level directories, just
save those in the first step, rename them to [000000] (or wherever they
belong, if they are one or more levels down from a known location) and
keep repeating the ana/disk/repair until there are no more lost directories.
This, if done right, will rescue not just the files but the directory
structure.

--
John Santos
Evans Griffiths & Hart, Inc.
781-861-0670 ext 539
.



Relevant Pages

  • Re: 4 CONSECUTIVE CORRUPT DISK DISASTERS WITH WIN2K
    ... Disk and drive controller. ... The application install is responsible for placing the shortcuts. ... unreadable / corrupt. ... Restoration of data files from backup hard drives during re-installation ...
    (microsoft.public.win2000.general)
  • Re: defrag/ error check/ Safe mode problems
    ... You need to find out whether you have a failing hard disk? ... For chkdsk to ... but \FOUND.002 was the "corrupt" file. ... > retried and it went to Safe Mode, but I couldn't get sysclean to run. ...
    (microsoft.public.windowsxp.perform_maintain)
  • Re: SBS 2003 - Hal.dll Getting Currupt
    ... Consistenly getting that error would mean, in my mind, a strong possibility of disk corruption - especially with a raid set. ... If you have hardware raid, you should be able to enter the raid diagnostics at boot time, and run some tests on the controller, disks, and raid set from there. ... "Windows could not start because the following file is missing or corrupt: ...
    (microsoft.public.windows.server.sbs)
  • Re: SBS 2003 - Hal.dll Getting Currupt
    ... When "hal.dll missing or corrupt" message appears, it can be because of damaged or missing hal.dll file. ... Go to Boot Menu in BIOS and verify that your hard disk which contains Windows installation is the topmost in boot sequence if you have more than one hard disk. ... "Les Connor" wrote in message ...
    (microsoft.public.windows.server.sbs)
  • Re: Recommend a good free anti-virus utility
    ... I had a corrupt disk that would not make a hardware backup so I ran ... But it was still corrupt. ... making a clone disk with it using Acronis True Disk 8 - apparently the ...
    (microsoft.public.security.virus)