Re: Disk corruption and Pathworks

From: Tom Simpson (thomas.simpson1_at_fubar.comcast.net)
Date: 10/09/04


Date: Fri, 08 Oct 2004 23:17:26 GMT


 Paul,

Here is some additional info:
OSIJX2> admin show ver

Advanced Server V7.3A for OpenVMS

OSIJX2> pwver
Information on Advanced Server for OpenVMS images installed on this system:

Image Name Image Version Link date Linker ID
------------------------------ ---------------- ----------------- ---------
----
PWRK$MASTER V7.3-120A 9-OCT-2003 06:59 A11-20
PWRK$NBDAEMON V7.3-120A 9-OCT-2003 07:20 A11-20
PWRK$KNBDAEMON V7.3-120A 9-OCT-2003 07:21 A11-20
PWRK$STREAMSOS_V7 V7.3-120A 9-OCT-2003 07:13 A11-20
NETBIOS V7.3-120A 9-OCT-2003 07:17 A11-20
NETBIOSSHR V7.3-120A 9-OCT-2003 07:18 A11-20
PWRK$LICENSE_SERVER V7.3-120A 9-OCT-2003 07:33 A11-20
PWRK$ADMIN_LIC V7.3-120A 9-OCT-2003 13:02 A11-20
PWRK$LICENSE_LIBSHR V7.3-120A 9-OCT-2003 07:32 A11-20
PWRK$LICENSE_MGMTSHR V7.3-120A 9-OCT-2003 07:32 A11-20
PWRK$LICENSE_REGISTRAR V7.3-120A 9-OCT-2003 07:33 A11-20
PWRK$LMAPISHR V7.3-120A 9-OCT-2003 10:53 A11-20
PWRK$LMRPCXNPSHR V7.3-120A 9-OCT-2003 10:53 A11-20
PWRK$LMSRV V7.3-120A 9-OCT-2003 12:48 A11-20
PWRK$LMMCP V7.3-120A 9-OCT-2003 11:13 A11-20
PWRK$LMBROWSER V7.3-120A 9-OCT-2003 11:01 A11-20
PWRK$MONITOR V7.3-120A 9-OCT-2003 07:28 A11-20
PWRK$MANAGER V7.3-120A 9-OCT-2003 12:59 A11-20
PWRK$ADMIN_CFG V7.3-120A 9-OCT-2003 13:00 A11-20
PWRK$MGTLIBSHR V7.3-120A 9-OCT-2003 12:55 A11-20
PWRK$WINLIBSHR V7.3-120A 9-OCT-2003 07:49 A11-20

To recover, I created a new directory, copied all the files that were good
to the new directory, deleted the old Pathworks share and created a new
share using the same name, but pointing to the new directory.

During our maintenance window on Saturday, I shut down all processes that
had open files on the corrupted disk and used Analyze/disk/repair. This
cleaned up most of the problems. After the repair was done, Analyze still
called out some file problems. Deleting the remaining files in the problem
directory completed the cleanup and Analyze ran clean.

ADMIN/ANALYZE OUTPUT:

OSIJX2> admin/analy/since=30-sep

  :::::::::: PATHWORKS Error Log Report ::::::::::
           DATE: 8-OCT-2004 07:33:14.59

   ================= EVENT #2210 ==================

Event Time: 30-SEP-2004 12:11:06.96 Node: OSIJX2
Process Id: 20A21278
Event: File system error
Event Source: ODS2 File System Library
Event Class: Error

      Status: %RMS-E-FNF, file not found
      Function: ODS2_RMS_open
      Operation: Open File
      Username: PWRK$DEFAULT
      UIC: [360,001]
      Path: $1$DGA357:\PWRKS\PCREPORT2\C1BKCL28.DUN

   ================= EVENT #2211 ==================

Event Time: 30-SEP-2004 12:11:07.06 Node: OSIJX2
Process Id: 20A21278
Event: File system error
Event Source: ODS2 File System Library
Event Class: Error

      Status: %RMS-E-FNF, file not found
      Function: ODS2_RMS_open
      Operation: Open File
      Username: PWRK$DEFAULT
      UIC: [360,001]
      Path: $1$DGA357:\PWRKS\PCREPORT2\C1BKCL28.DUN

   ================= EVENT #2212 ==================

Event Time: 30-SEP-2004 12:11:07.16 Node: OSIJX2
Process Id: 20A21278
Event: File system error
Event Source: ODS2 File System Library
Event Class: Error

      Status: %RMS-E-FNF, file not found
      Function: ODS2_RMS_open
      Operation: Open File
      Username: PWRK$DEFAULT
      UIC: [360,001]
      Path: $1$DGA357:\PWRKS\PCREPORT2\C1BKCL28.DUN

   ================= EVENT #2213 ==================

Event Time: 30-SEP-2004 12:11:15.12 Node: OSIJX2
Process Id: 20A21278
Event: File system error
Event Source: ODS2 File System Library
Event Class: Error

      Status: %RMS-E-FNF, file not found
      Function: ODS2_RMS_open
      Operation: Open File
      Username: PWRK$DEFAULT
      UIC: [360,001]
      Path: $1$DGA357:\PWRKS\PCREPORT2\CAP1CL72.DUN

   ================= EVENT #2214 ==================

Event Time: 30-SEP-2004 12:11:15.18 Node: OSIJX2
Process Id: 20A21278
Event: File system error
Event Source: ODS2 File System Library
Event Class: Error

      Status: %RMS-E-FNF, file not found
      Function: ODS2_RMS_open
      Operation: Open File
      Username: PWRK$DEFAULT
      UIC: [360,001]
      Path: $1$DGA357:\PWRKS\PCREPORT2\CAP1CL72.DUN

   ================= EVENT #2215 ==================

Event Time: 30-SEP-2004 12:11:15.24 Node: OSIJX2
Process Id: 20A21278
Event: File system error
Event Source: ODS2 File System Library
Event Class: Error

      Status: %RMS-E-FNF, file not found
      Function: ODS2_RMS_open
      Operation: Open File
      Username: PWRK$DEFAULT
      UIC: [360,001]
      Path: $1$DGA357:\PWRKS\PCREPORT2\CAP1CL72.DUN

ADMIN SHOW EVENTS:

OSIJX2> admin show event/full/since=30-sep

Events in System Event Log on server "OSIJX2":

T Date Time Source Category Event User Computer
- -------- ----------- --------- --------------- ------ ---------- ---------
----
I 10/02/04 06:44:39 PM Eventlog None 6005 N/A OSIJX2
The Event log service was started.

I 10/02/04 06:44:00 PM Eventlog None 6005 N/A OSIJX1
The Event log service was started.

W 10/02/04 06:27:38 PM BROWSER None 8021 N/A OSIJX2
The browser was unable to retrieve a list of servers from the browser master
\\O
SIJX1 on the network netbios/streams/nbes. The data is the error code.
Data:
    0000: 35 00 00 00 00 00 00 00 5.......

W 10/02/04 06:27:27 PM BROWSER None 8021 N/A OSIJX2
The browser was unable to retrieve a list of servers from the browser master
\\O
SIJX1 on the network netbios/streams/knbs. The data is the error code.
Data:
    0000: 35 00 00 00 00 00 00 00 5.......

  Total of 4 events

This is a DCL code fragment of a procedure that first encountered the
problem. I though
it was a DCL bug at first until I saw the corruption.

$! TEST PROGRAM
$!
$ say := write sys$output
$ on errors then goto errors
$ search_file = "PCREPORT:C*.%%"
$ file_cnt = 0
$!
$FILE_LOOP1:
$! loop through all the CITI files, generate an "array" of file names...
$ cur_file = f$search(search_file,1)
$ msg_text1 = "FILE RENAME ERROR ** Current File: ''cur_file'"
$ If cur_file .eqs. "" Then GoTo loop1_exit
$ say " Found CITI File: ''cur_file'"
$!
$! Build an array of file names that need to be renamed
$ search_file'file_cnt' = cur_file
$ file_cnt = file_cnt + 1
$!
$ GoTo file_loop1
$!
$LOOP1_EXIT:
$!
$ loop_cnt = file_cnt
$ say ""
$ say ""
$ Inquire/nopunct yn "Do you wish to process these ''loop_cnt' files now? "
$ say ""
$ exit
$!
$ERRORS:
$ say ""
$ say msg_text1
$ exit

---------------------------------------
>From Node OSIJX2
---------------------------------------

OSIJX2> @lbridgetest

Do you wish to process these 0 files now?

OSIJX2> dir PCREPORT:C*.%%

Directory DISK$DATA8:[PWRKS.PCREPORT2]

CA427065.MT;1 2/18 29-SEP-2004 12:55:23.63
CB426886.MT;1 1/18 29-SEP-2004 12:55:23.71

Total of 2 files, 3/36 blocks.

OSIJX2> set verify
OSIJX2> @lbridgetest
$ say := write sys$output
$ on errors then goto errors
$ search_file = "PCREPORT:C*.%%"
$ file_cnt = 0
$!
$FILE_LOOP1:
$! loop through all the CITI files, generate an "array" of file
names...
$ cur_file = f$search(search_file,1)
$ msg_text1 = "FILE RENAME ERROR ** Current File: "
$ If cur_file .eqs. "" Then GoTo loop1_exit
$LOOP1_EXIT:
$!
$ loop_cnt = file_cnt
$ file_cnt = 0
$ say ""

$ say ""

$ Inquire/nopunct yn "Do you wish to process these 0 files now? "
Do you wish to process these 0 files now?

$ say ""

$ exit

----------------------------------------------------
>From node OSIJX2
----------------------------------------------------

OSIJX1> @lbridgetest
    Found CITI File: DISK$DATA8:[PWRKS.PCREPORT2]CA427065.MT;1
    Found CITI File: DISK$DATA8:[PWRKS.PCREPORT2]CB426886.MT;1

Do you wish to process these 2 files now? n

OSIJX1> dir PCREPORT:C*.%%

Directory DISK$DATA8:[PWRKS.PCREPORT2]

CA427065.MT;1 2/18 29-SEP-2004 12:55:23.63
CB426886.MT;1 1/18 29-SEP-2004 12:55:23.71

Total of 2 files, 3/36 blocks.

Here is the ANALYZE output for the disk in question:

Analyze/Disk_Structure for _$1$DGA357: started on 1-OCT-2004 07:38:07.17
%ANALDISK-I-OPENQUOTA, error opening QUOTA.SYS
-SYSTEM-W-NOSUCHFILE, no such file
%ANALDISK-W-ALLOCCLR, blocks incorrectly marked allocated
 LBN 34345836 to 34345907, RVN 1
%ANALDISK-W-BAD_NAMEORDER, filename ordering incorrect in VBN 4
 of directory PWRKS.PCREPORT (8165,1,1)
 Filenames are CAP1_ABC_04006212004R_06212004_ALLPAY_E.XPTN__2ERPT
           and CAP1CL34.DUN
%ANALDISK-W-BAD_NAMEORDER, filename ordering incorrect in VBN 5
 of directory PWRKS.PCREPORT (8165,1,1)
 Filenames are CB053812.MT
           and CAP1_BZ1017062204214800.DUN
%ANALDISK-W-BAD_NAMEORDER, filename ordering incorrect in VBN 10
 of directory PWRKS.PCREPORT2 (44055,122,1)
 Filenames are CAP1_DP092304ABC.TXT-PROC
           and CAP1_BZ5010092804214129.DUN
%ANALDISK-W-BADDIRENT, invalid file identification in directory entry
 [PWRKS.PCREPORT2]CAP1_BZ5010092804214129.DUN;1
-ANALDISK-I-BAD_DIRHEADER, no valid file header for directory
%ANALDISK-W-BADDIRENT, invalid file identification in directory entry
 [PWRKS.PCREPORT2]CAP1_BZ5017092804214129.DUN;1
-ANALDISK-I-BAD_DIRHEADER, no valid file header for directory
%ANALDISK-W-BADDIRENT, invalid file identification in directory entry
 [PWRKS.PCREPORT2]CAP1_BZ6040092804214129.DUN;1
-ANALDISK-I-BAD_DIRHEADER, no valid file header for directory
%ANALDISK-W-BADDIRENT, invalid file identification in directory entry
 [PWRKS.PCREPORT2]CAP1_BZN300092804214129.DUN;1
-ANALDISK-I-BAD_DIRHEADER, no valid file header for directory
%ANALDISK-W-BADDIRENT, invalid file identification in directory entry
 [PWRKS.PCREPORT2]CCSIUPLS0929.SND;1
-ANALDISK-I-BAD_DIRFIDSEQ, invalid file sequence number in directory file ID
%ANALDISK-W-BAD_NAMEORDER, filename ordering incorrect in VBN 15
 of directory PWRKS.PCREPORT2 (44055,122,1)
 Filenames are CITIBANK_0712_54_MAINT.TXT
           and CITIBANK_0712_51_MAINT.TXT
%ANALDISK-W-BAD_NAMEORDER, filename ordering incorrect in VBN 19
 of directory PWRKS.PCREPORT2 (44055,122,1)
 Filenames are TEL_DP092304ABC.TXT-PROC
           and TEL_DP081004ABC.TXT-PROC
%ANALDISK-W-LOSTHEADER, file (55870,130,0) CAP1_PLA092404.DUN_Z1015;1
 not found in a directory
%ANALDISK-W-LOSTHEADER, file (62265,171,0) CAP1_PLA092404.DUN_Z1010;1
 not found in a directory
%ANALDISK-W-LOSTHEADER, file (62286,218,0) CAP1_PLA092404.DUN_Z1011;1
 not found in a directory
%ANALDISK-W-LOSTHEADER, file (62305,24,0) CAP1_PLA092404.DUN_Z1013;1
 not found in a directory
%ANALDISK-W-LOSTHEADER, file (62323,198,0) CAP1_PLA092404.DUN_Z0301;1
 not found in a directory
%ANALDISK-W-LOSTHEADER, file (63496,119,0) GATEWAY999999.TXT;1
 not found in a directory
%ANALDISK-W-LOSTHEADER, file (64238,47,0) CITINB48R.TXT;1
 not found in a directory
%ANALDISK-W-LOSTHEADER, file (64270,8,0) QRT.TXT;2
 not found in a directory
%ANALDISK-W-LOSTHEADER, file (64408,55,0) PVEOSO80.SND;1
 not found in a directory
%ANALDISK-W-LOSTHEADER, file (64479,84,0) PVEOSO79.SND;1
 not found in a directory
%ANALDISK-W-LOSTHEADER, file (76516,7,0) ICARDS.TXT;1
 not found in a directory
%ANALDISK-W-LOSTHEADER, file (76577,51,0) IPLOANS.TXT;1
 not found in a directory
%ANALDISK-W-FREESPADRIFT, free block count of 47230398 is incorrect (RVN 1);
 the correct value is 47230254

Thanks!!
Tom

"PEN" <paul.nuneznosp@mhp.com> wrote in message
news:ck3fv6$rqm$1@hplms2.hpl.hp.com...
> Hi Tom,
>
> I can't recall any reports in the support center on this. But what are
> some of the VMS errors you receive when this occurs?
>
> Do you believe the .DIR file has become corrupted or that the file headers
> or both?
>
> If you can do $ DIR <FILE>, does $ DIR/FULL <FILE> then result in an error
> such as "no such file"?
>
> How do you recover from this problem (i.e, $ anal/disk/repair)?
>
> Are there any related Advanced Server errors seen from:
>
> $ admin/anal/since=<date-problem-occurred>
> $ admin sh event/full/since=<date-problem-occurred>
> $ type pwrk$lmlogs:pwrk$lmsrv_<nodename>.log;n ! Where version 'n' is the
> version of the log open at the time the corruption occurred.
>
> You can determine whether you're running Advanced Server v7.3 eco2 or
v7.3A
> eco2 by doing:
>
> $ @sys$startup:pwrk$define_commands
> $ pwver
>
> The image ident for v7.3A ECO2 will be V7.3-120A
> The image ident for v7.3 ECO2 will be V7.3-120
>
> If you're not running the 'A' flavor, that would be the place to start
(you
> can get ECO3, the current version, from ftp.itrc.hp.com).
>
> HTH,
>
> Paul
> "Tom Simpson" <thomas.simpson1@fubar.comcast.net> wrote in message
> news:4p19d.344875$Fg5.335468@attbi_s53...
> > I've had a second occurrence of disk/directory corruption in 3 months.
> > The
> > affected directory was the same logical disk directory both times. This
> > directory is a Pathworks share and it's prime use is to send and receive
> > files from client systems, both Windows and Unix. We are running a
2-node
> > homogeneous cluster configuration on ES40's using OpenVMS Alpha 7.3-1
and
> > Advanced Server (the version that came with VMS 7.3-1) w/ECO 2.
> >
> > No other directories have issues and I see no problem with the disk
drive
> > (it's on an EMC SAN).
> >
> > When the problem occurs, we see problems such as files that can be seen
> > from
> > one node but not the other. Or you can see the file with a directory
> > command, but can't do anything with it, such as copy or edit. In one
> > instance, f$search could find the files from one node, but not the
other.
> > In the f$search problem, the files could be seen with a directory
command
> > from both nodes.
> >
> > The share directory is on a disk that is mounted cluster-wide and
process
> > privileges are not the issue. This problem has caused major issues
> > because
> > we depend heavily on automated procedures to manage client file
transfers
> > and files are being missed or "lost".
> >
> > Is anyone else seeing problems similar to this?
> >
> > Regards,
> > Tom
> >
> >
>
>