Re: scsi failure: bus or disk?
From: CJT (abujlehc_at_prodigy.net)
Date: 12/27/03
- Previous message: Paul Douglas: "scsi failure: bus or disk?"
- In reply to: Paul Douglas: "scsi failure: bus or disk?"
- Next in thread: Paul Douglas: "Re: scsi failure: bus or disk?"
- Reply: Paul Douglas: "Re: scsi failure: bus or disk?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Sat, 27 Dec 2003 07:33:46 GMT
Paul Douglas wrote:
> I woke up on Xmas morning to find my Ultra 30 had crashed. I tried to
> reboot but it got a little way into loading the O/S and hung. Another try
> produced the following messages very early in the boot process:
>
> WARNING:/pci@1f,4000/scsi@3 (glm0):
> SCSI bus DATA IN phase parity error
> WARNING:/pci@1f,4000/scsi@3 (glm0):
> Target 0 reducing sync. transfer rate
>
> and the boot halted soon after. After a couple more tries it did boot.
>
> I'm wandering if the error is in the scsi system or just the disk. The
> actual crash seems to have happened when the system tried to mount another
> disk in order to perform a backup. I'm still getting these problems with
> that disk removed (and they first occurred, according to the log, at a time
> when the 2nd disk wasn't involved). I have tried the 2nd disk with another
> machine with no problem.
>
> I ran test-all at ok prompt and got no errors. However, whatever the error
> is it seems to be intermittent, since the machine did boot once.
>
> I'd be grateful if someone could tell me which component is at fault (or at
> least most likely to be at fault). I just don't know from looking at the
> log entries. These follow for info.
>
> Many thanks,
>
> Paul
>
>
> /var/adm/messages:
>
> Dec 24 07:53:16 avon glm: [ID 655122 kern.warning] WARNING:
> ID[SUNWpd.check_intcode.6006] Dec 24 07:53:16 avon scsi: [ID 107833
> kern.warning] WARNING: /pci@1f,4000/scsi@3 (glm0): Dec 24 07:53:16 avon
> Resetting scsi bus, data overrun: got too much data from target from (0,0)
> Dec 24 07:53:16 avon genunix: [ID 408822 kern.info] NOTICE: glm0: fault
> detected in device; service still available Dec 24 07:53:16 avon genunix:
> [ID 611667 kern.info] NOTICE: glm0: Resetting scsi bus, data overrun: got
> too much data from target from (0,0) Dec 24 07:53:16 avon scsi: [ID 107833
> kern.warning] WARNING: /pci@1f,4000/scsi@3 (glm0): Dec 24 07:53:16 avon
> Target 0 reducing sync. transfer rate Dec 24 07:53:16 avon glm: [ID 923092
> kern.warning] WARNING: ID[SUNWpd.glm.sync_wide_backoff.6014] Dec 24
> 07:53:16 avon scsi: [ID 107833 kern.warning] WARNING: /pci@1f,4000/scsi@3
> (glm0): Dec 24 07:53:16 avon got SCSI bus reset Dec 24 07:53:16 avon
> genunix: [ID 408822 kern.info] NOTICE: glm0: fault detected in device;
> service still available Dec 24 07:53:16 avon genunix: [ID 611667 kern.info]
> NOTICE: glm0: got SCSI bus reset Dec 24 07:53:16 avon scsi: [ID 107833
> kern.warning] WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0): Dec 24 07:53:16
> avon SCSI transport failed: reason 'reset': retrying command
>
> -----
> at this point, everything still apparently running ok and I was unaware of
> any problem
> -----
>
> Dec 25 01:10:00 avon unix: [ID 836849 kern.notice] Dec 25 01:10:00 avon
> ^Mpanic[cpu0]/thread=300024eda80: Dec 25 01:10:00 avon unix: [ID 340138
> kern.notice] BAD TRAP: type=31 rp=2a100664a10 addr=30 mmu_fsr=0 occurred in
> module "sd" due to a NULL pointer dereference Dec 25 01:10:00 avon unix:
> [ID 100000 kern.notice] Dec 25 01:10:00 avon unix: [ID 839527 kern.notice]
> mount: Dec 25 01:10:00 avon unix: [ID 520581 kern.notice] trap type = 0x31
> Dec 25 01:10:00 avon unix: [ID 381800 kern.notice] addr=0x30 Dec 25
> 01:10:00 avon unix: [ID 101969 kern.notice] pid=1926, pc=0x11ad438,
> sp=0x2a1006642b1, tstate=0x4480001600, context=0xaff Dec 25 01:10:00 avon
> unix: [ID 743441 kern.notice] g1-g7: 1487c00, 0, 10000, 30003039508,
> 30002bdc508, 16, 300024eda80 Dec 25 01:10:00 avon unix: [ID 100000
> kern.notice] Dec 25 01:10:00 avon genunix: [ID 723222 kern.notice]
> 000002a100664740 unix:die+80 (31, 2a100664a10, 30, 0, 30001400b20,
> 30001400b38) Dec 25 01:10:00 avon genunix: [ID 179002 kern.notice] %l0-3:
> 0000000000000000 0000000001413460 000002a100664a10 000002a100664908 Dec 25
> 01:10:00 avon %l4-7: 0000000000000031 00000300001bcd18 00000300001bcd40
> 0000030007dddf98
>
> -----
> the mount is the attempt to mount the backup disk
>
> after this, there's another 60 lines much like the one above, then the
> system goes down
>
> finally, here's one of the boot attempt mesages:
> -----
>
> Dec 26 11:30:52 avon scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3
> (glm0): Dec 26 11:30:52 avon Rev. 3 Symbios 53c875 found. Dec 26 11:30:52
> avon pcipsy: [ID 370704 kern.info] PCI-device: scsi@3, glm0 Dec 26 11:30:52
> avon genunix: [ID 936769 kern.info] glm0 is /pci@1f,4000/scsi@3 Dec 26
> 11:30:52 avon scsi: [ID 107833 kern.warning] WARNING: /pci@1f,4000/scsi@3
> (glm0): Dec 26 11:30:52 avon SCSI bus DATA IN phase parity error Dec 26
> 11:30:52 avon glm: [ID 663555 kern.warning] WARNING:
> ID[SUNWpd.glm.parity_check.6010] Dec 26 11:30:52 avon scsi: [ID 107833
> kern.warning] WARNING: /pci@1f,4000/scsi@3 (glm0): Dec 26 11:30:52 avon
> Target 0 reducing sync. transfer rate Dec 26 11:30:52 avon glm: [ID 923092
> kern.warning] WARNING: ID[SUNWpd.glm.sync_wide_backoff.6014] Dec 26
> 11:30:52 avon scsi: [ID 193665 kern.info] sd0 at glm0: target 0 lun 0 Dec
> 26 11:30:52 avon genunix: [ID 936769 kern.info] sd0 is
> /pci@1f,4000/scsi@3/sd@0,0
>
>
The first thing I would do is check that the SCSI bus is properly
terminated and all the connections are tight.
-- After being targeted with gigabytes of trash by the "SWEN" worm, I have concluded we must conceal our e-mail address. Our true address is the mirror image of what you see before the "@" symbol. It's a shame such steps are necessary. ...Charlie
- Previous message: Paul Douglas: "scsi failure: bus or disk?"
- In reply to: Paul Douglas: "scsi failure: bus or disk?"
- Next in thread: Paul Douglas: "Re: scsi failure: bus or disk?"
- Reply: Paul Douglas: "Re: scsi failure: bus or disk?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|