Re: Errors during shadow set merge
- From: John Santos <john@xxxxxxx>
- Date: Fri, 22 Feb 2008 01:27:03 GMT
Richard B. Gilbert wrote:
tadamsmar wrote:
On Feb 21, 7:54 am, tadamsmar <tadams...@xxxxxxxxx> wrote:
On Feb 20, 10:19 pm, Michael Austin <maus...@xxxxxxxxxxxxxxxxxx>
wrote:
tadamsmar wrote:
On Feb 18, 11:17 pm, Michael Austin <maus...@xxxxxxxxxxxxxxxxxx>
wrote:
tadamsmarwrote:
On Feb 18, 5:00 pm, "Richard B. Gilbert" <rgilber...@xxxxxxxxxxx>
wrote:
tadamsmarwrote:
I noticed I was getting errors when adding a member to a shadow
set.
I have been getting errors during shadow set merges since I bought
this refurb DS10.
Got 109 error today when I remerged after doing an image. 16 errors
on DKA0 and 93 on DKA100.
What do you think is causing this?
Are these soft errors?
Here is the log for one:
**** V3.4 ********************* ENTRY 1667
********************************
Logging OS 1. OpenVMS
System Architecture 2. Alpha
OS version V7.3-2
Event sequence number 11474.
Timestamp of occurrence 18-FEB-2008 09:52:48
Time since reboot 77 Day(s) 1:23:46
Host name EESD
System Model AlphaServer DS10 617 MHz
Entry Type 1. Device Error
---- Device Profile ----
Unit $1$DKA0
Product Name ATLAS10K2-TY184L
Vendor QUANTUM
-- Driver Supplied Info -
Device Firmware Revision DA40
VMSSCSIError Type 5. Extended Sense Data from Device
SCSIID x00
SCSILUN x00
SCSISUBLUN x00
Port Status x00000001 NORMAL - normal successful
completion
SCSICommand Opcode x28 Read (10 byte command)
Command Data
x00
x02
x06
x44
x8A
x00
x00
x01
x00
SCSIStatus x02 Check Condition
Remaining Byte Length 18.
--- Device Sense Data ---
Error Code xF0 Current Error
Information Bytes are Valid
Segment # x00
Information Byte 3 x02
Byte 2 x06
Byte 1 x44
Byte 0 x8A LBA: x0206448A
Sense Key x03 Medium Error
Additional Sense Length x0A
CMD Specific Info Byte 3 x21
Byte 2 x23
Byte 1 x3E
Byte 0 xD4
ASC & ASCQ x1100 ASC = x0011
ASCQ = x0000
Unrecovered Read Error
FRU Code x00
Sense Key Specific Byte 0 x80 Valid Sense Key Data
Byte 1 x00
Byte 2 xA0
----- Software Info -----
UCB$x_ERTCNT 16. Retries Remaining
UCB$x_ERTMAX 16. Retries Allowable
IRP$Q_IOSB x0000000000000000
UCB$x_STS x08021810 Online
Software Valid
Unload At Dismount
Volume is Valid on the local
node
Unit supports the Extended
Function bit
IRP$L_PID x82640450 Requestor "PID"
IRP$x_BOFF 4416. Byte Page Offset
IRP$x_BCNT 512. Transfer Size In Byte(s)
UCB$x_ERRCNT 32. Errors This Unit
UCB$L_OPCNT 22716780. QIO's This Unit
ORB$L_OWNER x00010004 Owners UIC
UCB$L_DEVCHAR1 x1C4D4008 Directory Structured
File Oriented
Sharable
Available
Mounted
Error Logging
Capable of Input
Capable of Output
Random Access
Is that system under service contract? If so, ask to have the drive
replaced!
I hope you have a recent backup that's readable. If you don't, try to
make one! Right now!!!!
It could be just a single bad block. It could also be all the warning
you are going to get that the disk is failing! Once you hear that "loud
scraping sound" it's all over!!
If you don't have a service contract, order a replacement disk and get a
rush on the delivery!
Meanwhile, keep an eye on the disk. If you get more error messages with
different LBAs it means the situation is deteriorating and you may have
an emergency within a few minutes or hours.- Hide quoted text -
- Show quoted text -
Are these hard or soft errors?
These are generally HARD errors - do what he said and order a disk ASAP.- Hide quoted text -
- Show quoted text -
I am skeptical that its the disks (In my original message, I indicated
that I get errors for both disks)
I have had this problem for a while. I have run:
ANAL/MEDIA/EXER
on the disks and found no errors.
These error bursts only happen when I do a shadow set merge.
I suspect something about the SCSI, or connections, that is stressed
by a merge.
I still suspect the media - and I can back it up with 24 years of
reading error logs... can you?- Hide quoted text -
- Show quoted text -
No.
Here is a log of my recent findings
Merged the shadow set, getting 16 errors on DKA0 and 83 errors on
DKA100.
Did a ANALYZE/MEDIA/EXER=FULL of DKA100 and found 1 bad block. Got a
good many errors logged during the ANALYZE.
Merged the shadow set, getting 16 errors on DKA0 and 5 errors on
DKA100.
Did a ANALYZE/MEDIA/EXER=FULL of DKA0 and found 0 bad blocks. Got a
good 0 errors logged during the ANALYZE.
Merged the shadow set, getting 4 errors on DKA0 and 19 errors on
DKA100.
I will swap out one of the disks and give it a try. Put in a disk
that is logging no errors at its current location.- Hide quoted text -
- Show quoted text -
I swapped the dka100 disks between two DS10s (same model disks).
Then I merged the shadowset on the problem machine. I got 4 errors
on dka100 and 16 on dka0. All indicating unrecoverable.
On the other machine a got about 34 errors on dka100 (most indicating
unrecoverable) during the shadowset merge. But I realized that I had
found 1 bad block on it when it it was on the problem machine using
ANALYZE. So, I ran ANAL/MEDIA/EXER=FULL and found 13 bad blocks.
I suspect there must be more than bad disks on the problem machine,
since it got 4 unrecovable errrors (at 2 LBAs) on a disk that had none
recently during shadowset merges on the other machine.
BTW, when you do a DIAGNOSE/TRANS/SUMMARY all these errors
are listed as SCSI errors, but when you look at the sense data in
detailed report, most are identified as medium errors.
I guess will ask the vendor for a couple of disks under the warranty,
but I have no confidence that it will solve the problem. Maybe I need
the machine replaced.
It is possible that you have a problem with a cable, or a host bus adapter either as a contributing factor or (less likely) as the whole problem.
Or it could be a problem with SCSI bus termination, either no terminator
or double termination or an extra terminator in the middle or a bad
terminator. We had an old BA350(?) shelf (fast/narrow, gray disks)
that we daisychained an external 8mm tape drive to, worked fine for a
year or two, and then someone inserted a tape upside down or backwards
and busted it. DEC (or maybe Compaq) replaced the drive about 5 times,
they kept failing. Would work fine in a short test but full volume
backups would usually fail. After they replaced the drive about 4 or
5 times, someone said the magic word "termination", and so I took the
covers off the BA350 and discovered whoever had added the external
tape drive had neglected to remove the internal terminator. After
removing it, the latest "DOA" tape drive worked fine. For some reason
the original drive had no problem with the double termination, but
all the replacements did.
BTW, I thought analyze/media/exer did nothing with disks newer than
about SDI (RAxx) vintage. Have you tried ANA/DISK/READ_CHECK, or
(if V7.3-2 or later) ANA/DISK/SHADOW? If the shadow copies are
finding and replacing a bunch of existing bad blocks, once the
shadow copy completes, the bad blocks should be safely sequestered,
and /read_check or /shadow should come up clean.
--
John Santos
Evans Griffiths & Hart, Inc.
781-861-0670 ext 539
.
- Follow-Ups:
- Re: Errors during shadow set merge
- From: tadamsmar
- Re: Errors during shadow set merge
- From: tadamsmar
- Re: Errors during shadow set merge
- From: tadamsmar
- Re: Errors during shadow set merge
- References:
- Errors during shadow set merge
- From: tadamsmar
- Re: Errors during shadow set merge
- From: Richard B. Gilbert
- Re: Errors during shadow set merge
- From: tadamsmar
- Re: Errors during shadow set merge
- From: Michael Austin
- Re: Errors during shadow set merge
- From: tadamsmar
- Re: Errors during shadow set merge
- From: Michael Austin
- Re: Errors during shadow set merge
- From: tadamsmar
- Re: Errors during shadow set merge
- From: tadamsmar
- Re: Errors during shadow set merge
- From: Richard B. Gilbert
- Errors during shadow set merge
- Prev by Date: Re: how to process/produce un*x 'diff' formats ?
- Next by Date: Re: Walkin Interview @ Bootstrap Tech on 23rd Feb 2008
- Previous by thread: Re: Errors during shadow set merge
- Next by thread: Re: Errors during shadow set merge
- Index(es):
Relevant Pages
|