Re: Errors during shadow set merge



tadamsmar wrote:
On Feb 21, 7:54 am, tadamsmar <tadams...@xxxxxxxxx> wrote:

On Feb 20, 10:19 pm, Michael Austin <maus...@xxxxxxxxxxxxxxxxxx>
wrote:






tadamsmar wrote:

On Feb 18, 11:17 pm, Michael Austin <maus...@xxxxxxxxxxxxxxxxxx>
wrote:

tadamsmarwrote:

On Feb 18, 5:00 pm, "Richard B. Gilbert" <rgilber...@xxxxxxxxxxx>
wrote:

tadamsmarwrote:

I noticed I was getting errors when adding a member to a shadow
set.
I have been getting errors during shadow set merges since I bought
this refurb DS10.
Got 109 error today when I remerged after doing an image. 16 errors
on DKA0 and 93 on DKA100.
What do you think is causing this?
Are these soft errors?
Here is the log for one:
**** V3.4 ********************* ENTRY 1667
********************************
Logging OS 1. OpenVMS
System Architecture 2. Alpha
OS version V7.3-2
Event sequence number 11474.
Timestamp of occurrence 18-FEB-2008 09:52:48
Time since reboot 77 Day(s) 1:23:46
Host name EESD
System Model AlphaServer DS10 617 MHz
Entry Type 1. Device Error
---- Device Profile ----
Unit $1$DKA0
Product Name ATLAS10K2-TY184L
Vendor QUANTUM
-- Driver Supplied Info -
Device Firmware Revision DA40
VMSSCSIError Type 5. Extended Sense Data from Device
SCSIID x00
SCSILUN x00
SCSISUBLUN x00
Port Status x00000001 NORMAL - normal successful
completion
SCSICommand Opcode x28 Read (10 byte command)
Command Data
x00
x02
x06
x44
x8A
x00
x00
x01
x00
SCSIStatus x02 Check Condition
Remaining Byte Length 18.
--- Device Sense Data ---
Error Code xF0 Current Error
Information Bytes are Valid
Segment # x00
Information Byte 3 x02
Byte 2 x06
Byte 1 x44
Byte 0 x8A LBA: x0206448A
Sense Key x03 Medium Error
Additional Sense Length x0A
CMD Specific Info Byte 3 x21
Byte 2 x23
Byte 1 x3E
Byte 0 xD4
ASC & ASCQ x1100 ASC = x0011
ASCQ = x0000
Unrecovered Read Error
FRU Code x00
Sense Key Specific Byte 0 x80 Valid Sense Key Data
Byte 1 x00
Byte 2 xA0
----- Software Info -----
UCB$x_ERTCNT 16. Retries Remaining
UCB$x_ERTMAX 16. Retries Allowable
IRP$Q_IOSB x0000000000000000
UCB$x_STS x08021810 Online
Software Valid
Unload At Dismount
Volume is Valid on the local
node
Unit supports the Extended
Function bit
IRP$L_PID x82640450 Requestor "PID"
IRP$x_BOFF 4416. Byte Page Offset
IRP$x_BCNT 512. Transfer Size In Byte(s)
UCB$x_ERRCNT 32. Errors This Unit
UCB$L_OPCNT 22716780. QIO's This Unit
ORB$L_OWNER x00010004 Owners UIC
UCB$L_DEVCHAR1 x1C4D4008 Directory Structured
File Oriented
Sharable
Available
Mounted
Error Logging
Capable of Input
Capable of Output
Random Access

Is that system under service contract? If so, ask to have the drive
replaced!
I hope you have a recent backup that's readable. If you don't, try to
make one! Right now!!!!
It could be just a single bad block. It could also be all the warning
you are going to get that the disk is failing! Once you hear that "loud
scraping sound" it's all over!!
If you don't have a service contract, order a replacement disk and get a
rush on the delivery!
Meanwhile, keep an eye on the disk. If you get more error messages with
different LBAs it means the situation is deteriorating and you may have
an emergency within a few minutes or hours.- Hide quoted text -
- Show quoted text -

Are these hard or soft errors?

These are generally HARD errors - do what he said and order a disk ASAP.- Hide quoted text -

- Show quoted text -

I am skeptical that its the disks (In my original message, I indicated
that I get errors for both disks)

I have had this problem for a while. I have run:

ANAL/MEDIA/EXER

on the disks and found no errors.

These error bursts only happen when I do a shadow set merge.

I suspect something about the SCSI, or connections, that is stressed
by a merge.

I still suspect the media - and I can back it up with 24 years of
reading error logs... can you?- Hide quoted text -

- Show quoted text -

No.

Here is a log of my recent findings

Merged the shadow set, getting 16 errors on DKA0 and 83 errors on
DKA100.

Did a ANALYZE/MEDIA/EXER=FULL of DKA100 and found 1 bad block. Got a
good many errors logged during the ANALYZE.

Merged the shadow set, getting 16 errors on DKA0 and 5 errors on
DKA100.

Did a ANALYZE/MEDIA/EXER=FULL of DKA0 and found 0 bad blocks. Got a
good 0 errors logged during the ANALYZE.

Merged the shadow set, getting 4 errors on DKA0 and 19 errors on
DKA100.

I will swap out one of the disks and give it a try. Put in a disk
that is logging no errors at its current location.- Hide quoted text -

- Show quoted text -


I swapped the dka100 disks between two DS10s (same model disks).

Then I merged the shadowset on the problem machine. I got 4 errors
on dka100 and 16 on dka0. All indicating unrecoverable.

On the other machine a got about 34 errors on dka100 (most indicating
unrecoverable) during the shadowset merge. But I realized that I had
found 1 bad block on it when it it was on the problem machine using
ANALYZE. So, I ran ANAL/MEDIA/EXER=FULL and found 13 bad blocks.

I suspect there must be more than bad disks on the problem machine,
since it got 4 unrecovable errrors (at 2 LBAs) on a disk that had none
recently during shadowset merges on the other machine.

BTW, when you do a DIAGNOSE/TRANS/SUMMARY all these errors
are listed as SCSI errors, but when you look at the sense data in
detailed report, most are identified as medium errors.

I guess will ask the vendor for a couple of disks under the warranty,
but I have no confidence that it will solve the problem. Maybe I need
the machine replaced.

It is possible that you have a problem with a cable, or a host bus adapter either as a contributing factor or (less likely) as the whole problem.

.



Relevant Pages

  • Re: Errors during shadow set merge
    ... I have been getting errors during shadow set merges since I bought ... getting 16 errors on DKA0 and 83 errors on ... I will swap out one of the disks and give it a try. ... Then I merged the shadowset on the problem machine. ...
    (comp.os.vms)
  • Re: Swapping new disks into a shadow set
    ...  Reboot on dka0: ...  Merge the shadow set ... Or fail to mount either ... I don't think any disks are added to the shadow set at startup. ...
    (comp.os.vms)
  • Re: Swapping new disks into a shadow set
    ... Reboot on dka0: ... Merge the shadow set ... Swap out dka0: ... RZ series disks are "hot swapable". ...
    (comp.os.vms)
  • miscellaneous puzzles
    ... When I have a shadow set (both members physically connected to node A ... all disks in the cluster MSCP-served) ... When I then DISMOUNT it on ... that the files reside in SYS$SYSROOT:isn't much help, ...
    (comp.os.vms)
  • shadow sets, cluster, merge, MVTIMEOUT, dismount
    ... I have only SCSI disks. ... In the case of system disks, both members have ... has files open on a shadow set and this shadow set disappears, ... necessary to dismount it on B and C in order to avoid a merge? ...
    (comp.os.vms)