scsi failure: bus or disk?
From: Paul Douglas (paul_at_ultrabook.douglasfamily.au.com)
Date: 12/27/03
- Next message: CJT: "Re: scsi failure: bus or disk?"
- Previous message: Debian User: "Re: Information please"
- Next in thread: CJT: "Re: scsi failure: bus or disk?"
- Reply: CJT: "Re: scsi failure: bus or disk?"
- Reply: Paul Douglas: "Re: scsi failure: bus or disk?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Sat, 27 Dec 2003 07:28:46 GMT
I woke up on Xmas morning to find my Ultra 30 had crashed. I tried to
reboot but it got a little way into loading the O/S and hung. Another try
produced the following messages very early in the boot process:
WARNING:/pci@1f,4000/scsi@3 (glm0):
SCSI bus DATA IN phase parity error
WARNING:/pci@1f,4000/scsi@3 (glm0):
Target 0 reducing sync. transfer rate
and the boot halted soon after. After a couple more tries it did boot.
I'm wandering if the error is in the scsi system or just the disk. The
actual crash seems to have happened when the system tried to mount another
disk in order to perform a backup. I'm still getting these problems with
that disk removed (and they first occurred, according to the log, at a time
when the 2nd disk wasn't involved). I have tried the 2nd disk with another
machine with no problem.
I ran test-all at ok prompt and got no errors. However, whatever the error
is it seems to be intermittent, since the machine did boot once.
I'd be grateful if someone could tell me which component is at fault (or at
least most likely to be at fault). I just don't know from looking at the
log entries. These follow for info.
Many thanks,
Paul
/var/adm/messages:
Dec 24 07:53:16 avon glm: [ID 655122 kern.warning] WARNING:
ID[SUNWpd.check_intcode.6006] Dec 24 07:53:16 avon scsi: [ID 107833
kern.warning] WARNING: /pci@1f,4000/scsi@3 (glm0): Dec 24 07:53:16 avon
Resetting scsi bus, data overrun: got too much data from target from (0,0)
Dec 24 07:53:16 avon genunix: [ID 408822 kern.info] NOTICE: glm0: fault
detected in device; service still available Dec 24 07:53:16 avon genunix:
[ID 611667 kern.info] NOTICE: glm0: Resetting scsi bus, data overrun: got
too much data from target from (0,0) Dec 24 07:53:16 avon scsi: [ID 107833
kern.warning] WARNING: /pci@1f,4000/scsi@3 (glm0): Dec 24 07:53:16 avon
Target 0 reducing sync. transfer rate Dec 24 07:53:16 avon glm: [ID 923092
kern.warning] WARNING: ID[SUNWpd.glm.sync_wide_backoff.6014] Dec 24
07:53:16 avon scsi: [ID 107833 kern.warning] WARNING: /pci@1f,4000/scsi@3
(glm0): Dec 24 07:53:16 avon got SCSI bus reset Dec 24 07:53:16 avon
genunix: [ID 408822 kern.info] NOTICE: glm0: fault detected in device;
service still available Dec 24 07:53:16 avon genunix: [ID 611667 kern.info]
NOTICE: glm0: got SCSI bus reset Dec 24 07:53:16 avon scsi: [ID 107833
kern.warning] WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0): Dec 24 07:53:16
avon SCSI transport failed: reason 'reset': retrying command
-----
at this point, everything still apparently running ok and I was unaware of
any problem
-----
Dec 25 01:10:00 avon unix: [ID 836849 kern.notice] Dec 25 01:10:00 avon
^Mpanic[cpu0]/thread=300024eda80: Dec 25 01:10:00 avon unix: [ID 340138
kern.notice] BAD TRAP: type=31 rp=2a100664a10 addr=30 mmu_fsr=0 occurred in
module "sd" due to a NULL pointer dereference Dec 25 01:10:00 avon unix:
[ID 100000 kern.notice] Dec 25 01:10:00 avon unix: [ID 839527 kern.notice]
mount: Dec 25 01:10:00 avon unix: [ID 520581 kern.notice] trap type = 0x31
Dec 25 01:10:00 avon unix: [ID 381800 kern.notice] addr=0x30 Dec 25
01:10:00 avon unix: [ID 101969 kern.notice] pid=1926, pc=0x11ad438,
sp=0x2a1006642b1, tstate=0x4480001600, context=0xaff Dec 25 01:10:00 avon
unix: [ID 743441 kern.notice] g1-g7: 1487c00, 0, 10000, 30003039508,
30002bdc508, 16, 300024eda80 Dec 25 01:10:00 avon unix: [ID 100000
kern.notice] Dec 25 01:10:00 avon genunix: [ID 723222 kern.notice]
000002a100664740 unix:die+80 (31, 2a100664a10, 30, 0, 30001400b20,
30001400b38) Dec 25 01:10:00 avon genunix: [ID 179002 kern.notice] %l0-3:
0000000000000000 0000000001413460 000002a100664a10 000002a100664908 Dec 25
01:10:00 avon %l4-7: 0000000000000031 00000300001bcd18 00000300001bcd40
0000030007dddf98
-----
the mount is the attempt to mount the backup disk
after this, there's another 60 lines much like the one above, then the
system goes down
finally, here's one of the boot attempt mesages:
-----
Dec 26 11:30:52 avon scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3
(glm0): Dec 26 11:30:52 avon Rev. 3 Symbios 53c875 found. Dec 26 11:30:52
avon pcipsy: [ID 370704 kern.info] PCI-device: scsi@3, glm0 Dec 26 11:30:52
avon genunix: [ID 936769 kern.info] glm0 is /pci@1f,4000/scsi@3 Dec 26
11:30:52 avon scsi: [ID 107833 kern.warning] WARNING: /pci@1f,4000/scsi@3
(glm0): Dec 26 11:30:52 avon SCSI bus DATA IN phase parity error Dec 26
11:30:52 avon glm: [ID 663555 kern.warning] WARNING:
ID[SUNWpd.glm.parity_check.6010] Dec 26 11:30:52 avon scsi: [ID 107833
kern.warning] WARNING: /pci@1f,4000/scsi@3 (glm0): Dec 26 11:30:52 avon
Target 0 reducing sync. transfer rate Dec 26 11:30:52 avon glm: [ID 923092
kern.warning] WARNING: ID[SUNWpd.glm.sync_wide_backoff.6014] Dec 26
11:30:52 avon scsi: [ID 193665 kern.info] sd0 at glm0: target 0 lun 0 Dec
26 11:30:52 avon genunix: [ID 936769 kern.info] sd0 is
/pci@1f,4000/scsi@3/sd@0,0
- Next message: CJT: "Re: scsi failure: bus or disk?"
- Previous message: Debian User: "Re: Information please"
- Next in thread: CJT: "Re: scsi failure: bus or disk?"
- Reply: CJT: "Re: scsi failure: bus or disk?"
- Reply: Paul Douglas: "Re: scsi failure: bus or disk?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|