Re: Errors during shadow set merge



On Feb 19, 7:59 am, tadamsmar <tadams...@xxxxxxxxx> wrote:
On Feb 18, 11:17 pm, Michael Austin <maus...@xxxxxxxxxxxxxxxxxx>
wrote:





tadamsmarwrote:
On Feb 18, 5:00 pm, "Richard B. Gilbert" <rgilber...@xxxxxxxxxxx>
wrote:
tadamsmarwrote:
I noticed I was getting errors when adding a member to a shadow
set.
I have been getting errors during shadow set merges since I bought
this refurb DS10.
Got 109 error today when I remerged after doing an image.  16 errors
on DKA0 and 93 on DKA100.
What do you think is causing this?
Are these soft errors?
Here is the log for one:
**** V3.4  ********************* ENTRY 1667
********************************
Logging OS                        1. OpenVMS
System Architecture               2. Alpha
OS version                           V7.3-2
Event sequence number         11474.
Timestamp of occurrence              18-FEB-2008 09:52:48
Time since reboot                    77 Day(s) 1:23:46
Host name                            EESD
System Model                         AlphaServer DS10 617 MHz
Entry Type                        1. Device Error
---- Device Profile ----
Unit                                 $1$DKA0
Product Name                         ATLAS10K2-TY184L
Vendor                               QUANTUM
-- Driver Supplied Info -
Device Firmware Revision             DA40
VMSSCSIError Type               5. Extended Sense Data from Device
SCSIID                         x00
SCSILUN                        x00
SCSISUBLUN                     x00
Port Status               x00000001  NORMAL  -  normal successful
completion
SCSICommand Opcode             x28  Read (10 byte command)
Command Data
                                x00
                                x02
                                x06
                                x44
                                x8A
                                x00
                                x00
                                x01
                                x00
SCSIStatus                     x02  Check Condition
Remaining Byte Length            18.
--- Device Sense Data ---
Error Code                      xF0  Current Error
                                     Information Bytes are Valid
Segment #                       x00
Information Byte 3              x02
            Byte 2              x06
            Byte 1              x44
            Byte 0              x8A  LBA:  x0206448A
Sense Key                       x03  Medium Error
Additional Sense Length         x0A
CMD Specific Info Byte 3        x21
                  Byte 2        x23
                  Byte 1        x3E
                  Byte 0        xD4
ASC & ASCQ                    x1100  ASC  =   x0011
                                     ASCQ =   x0000
                                     Unrecovered Read Error
FRU Code                        x00
Sense Key Specific Byte 0       x80  Valid Sense Key Data
                   Byte 1       x00
                   Byte 2       xA0
----- Software Info -----
UCB$x_ERTCNT                     16. Retries Remaining
UCB$x_ERTMAX                     16. Retries Allowable
IRP$Q_IOSB                x0000000000000000
UCB$x_STS                 x08021810  Online
                                     Software Valid
                                     Unload At Dismount
                                     Volume is Valid on the local
node
                                     Unit supports the Extended
Function bit
IRP$L_PID                 x82640450  Requestor "PID"
IRP$x_BOFF                     4416. Byte Page Offset
IRP$x_BCNT                      512. Transfer Size In Byte(s)
UCB$x_ERRCNT                     32. Errors This Unit
UCB$L_OPCNT                22716780. QIO's This Unit
ORB$L_OWNER               x00010004  Owners UIC
UCB$L_DEVCHAR1            x1C4D4008  Directory Structured
                                     File Oriented
                                     Sharable
                                     Available
                                     Mounted
                                     Error Logging
                                     Capable of Input
                                     Capable of Output
                                     Random Access
Is that system under service contract?  If so, ask to have the drive
replaced!

I hope you have a recent backup that's readable.   If you don't, try to
make one!  Right now!!!!

It could be just a single bad block.  It could also be all the warning
you are going to get that the disk is failing!  Once you hear that "loud
scraping sound" it's all over!!

If you don't have a service contract, order a replacement disk and get a
rush on the delivery!

Meanwhile, keep an eye on the disk.  If you get more error messages with
different LBAs it means the situation is deteriorating and you may have
an emergency within a few minutes or hours.- Hide quoted text -

- Show quoted text -

Are these hard or soft errors?

These are generally HARD errors - do what he said and order a disk ASAP.- Hide quoted text -

- Show quoted text -

I am skeptical that its the disks (In my original message, I indicated
that I get errors for both disks)

I have had this problem for a while.  I have run:

ANAL/MEDIA/EXER

on the disks and found no errors.

These error bursts only happen when I do a shadow set merge.

I suspect something about the SCSI, or connections, that is stressed
by a merge.- Hide quoted text -



Information Byte 3 x02
Byte 2 x06
Byte 1 x44
Byte 0 x8A LBA: x0206448A
Sense Key x03 Medium Error

This is indicative of the problem - a media failure - and not one of
electronics. It might be interesting to determine where that logical
block address lives. I suspect that you'll find that it belongs to
some file that is rarely or never accessed - until such time as a
shadow copy/merge occurs - and then when it is touched the media error
is noted. You might find that the integrity of said file is
compromised (I presume the block is located in a file and not in free
space) and requires restoration (note that this event was recorded as
an unrecovered read).
.



Relevant Pages

  • Re: Errors during shadow set merge
    ... I have been getting errors during shadow set merges since I bought ... getting 16 errors on DKA0 and 83 errors on ... I will swap out one of the disks and give it a try. ... Then I merged the shadowset on the problem machine. ...
    (comp.os.vms)
  • Re: Errors during shadow set merge
    ... I have been getting errors during shadow set merges since I bought ... getting 16 errors on DKA0 and 83 errors on ... I will swap out one of the disks and give it a try. ... Then I merged the shadowset on the problem machine. ...
    (comp.os.vms)
  • miscellaneous puzzles
    ... When I have a shadow set (both members physically connected to node A ... all disks in the cluster MSCP-served) ... When I then DISMOUNT it on ... that the files reside in SYS$SYSROOT:isn't much help, ...
    (comp.os.vms)
  • shadow sets, cluster, merge, MVTIMEOUT, dismount
    ... I have only SCSI disks. ... In the case of system disks, both members have ... has files open on a shadow set and this shadow set disappears, ... necessary to dismount it on B and C in order to avoid a merge? ...
    (comp.os.vms)
  • Re: HBVS, shutdown procedures, dismounting disks, SHADOW_MBR_TMO
    ... We will have some disks that are ... > served and available to the rest of the cluster during the reboot. ... > rebooting system and another shadow set member on another node. ... If they are on the node going down---does SHUTDOWN dismount the physical ...
    (comp.os.vms)