Re: scsi failure: bus or disk?

From: CJT (abujlehc_at_prodigy.net)
Date: 12/27/03

  • Next message: Mika Kongas: "Re: Information please"
    Date: Sat, 27 Dec 2003 07:33:46 GMT
    
    

    Paul Douglas wrote:

    > I woke up on Xmas morning to find my Ultra 30 had crashed. I tried to
    > reboot but it got a little way into loading the O/S and hung. Another try
    > produced the following messages very early in the boot process:
    >
    > WARNING:/pci@1f,4000/scsi@3 (glm0):
    > SCSI bus DATA IN phase parity error
    > WARNING:/pci@1f,4000/scsi@3 (glm0):
    > Target 0 reducing sync. transfer rate
    >
    > and the boot halted soon after. After a couple more tries it did boot.
    >
    > I'm wandering if the error is in the scsi system or just the disk. The
    > actual crash seems to have happened when the system tried to mount another
    > disk in order to perform a backup. I'm still getting these problems with
    > that disk removed (and they first occurred, according to the log, at a time
    > when the 2nd disk wasn't involved). I have tried the 2nd disk with another
    > machine with no problem.
    >
    > I ran test-all at ok prompt and got no errors. However, whatever the error
    > is it seems to be intermittent, since the machine did boot once.
    >
    > I'd be grateful if someone could tell me which component is at fault (or at
    > least most likely to be at fault). I just don't know from looking at the
    > log entries. These follow for info.
    >
    > Many thanks,
    >
    > Paul
    >
    >
    > /var/adm/messages:
    >
    > Dec 24 07:53:16 avon glm: [ID 655122 kern.warning] WARNING:
    > ID[SUNWpd.check_intcode.6006] Dec 24 07:53:16 avon scsi: [ID 107833
    > kern.warning] WARNING: /pci@1f,4000/scsi@3 (glm0): Dec 24 07:53:16 avon
    > Resetting scsi bus, data overrun: got too much data from target from (0,0)
    > Dec 24 07:53:16 avon genunix: [ID 408822 kern.info] NOTICE: glm0: fault
    > detected in device; service still available Dec 24 07:53:16 avon genunix:
    > [ID 611667 kern.info] NOTICE: glm0: Resetting scsi bus, data overrun: got
    > too much data from target from (0,0) Dec 24 07:53:16 avon scsi: [ID 107833
    > kern.warning] WARNING: /pci@1f,4000/scsi@3 (glm0): Dec 24 07:53:16 avon
    > Target 0 reducing sync. transfer rate Dec 24 07:53:16 avon glm: [ID 923092
    > kern.warning] WARNING: ID[SUNWpd.glm.sync_wide_backoff.6014] Dec 24
    > 07:53:16 avon scsi: [ID 107833 kern.warning] WARNING: /pci@1f,4000/scsi@3
    > (glm0): Dec 24 07:53:16 avon got SCSI bus reset Dec 24 07:53:16 avon
    > genunix: [ID 408822 kern.info] NOTICE: glm0: fault detected in device;
    > service still available Dec 24 07:53:16 avon genunix: [ID 611667 kern.info]
    > NOTICE: glm0: got SCSI bus reset Dec 24 07:53:16 avon scsi: [ID 107833
    > kern.warning] WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0): Dec 24 07:53:16
    > avon SCSI transport failed: reason 'reset': retrying command
    >
    > -----
    > at this point, everything still apparently running ok and I was unaware of
    > any problem
    > -----
    >
    > Dec 25 01:10:00 avon unix: [ID 836849 kern.notice] Dec 25 01:10:00 avon
    > ^Mpanic[cpu0]/thread=300024eda80: Dec 25 01:10:00 avon unix: [ID 340138
    > kern.notice] BAD TRAP: type=31 rp=2a100664a10 addr=30 mmu_fsr=0 occurred in
    > module "sd" due to a NULL pointer dereference Dec 25 01:10:00 avon unix:
    > [ID 100000 kern.notice] Dec 25 01:10:00 avon unix: [ID 839527 kern.notice]
    > mount: Dec 25 01:10:00 avon unix: [ID 520581 kern.notice] trap type = 0x31
    > Dec 25 01:10:00 avon unix: [ID 381800 kern.notice] addr=0x30 Dec 25
    > 01:10:00 avon unix: [ID 101969 kern.notice] pid=1926, pc=0x11ad438,
    > sp=0x2a1006642b1, tstate=0x4480001600, context=0xaff Dec 25 01:10:00 avon
    > unix: [ID 743441 kern.notice] g1-g7: 1487c00, 0, 10000, 30003039508,
    > 30002bdc508, 16, 300024eda80 Dec 25 01:10:00 avon unix: [ID 100000
    > kern.notice] Dec 25 01:10:00 avon genunix: [ID 723222 kern.notice]
    > 000002a100664740 unix:die+80 (31, 2a100664a10, 30, 0, 30001400b20,
    > 30001400b38) Dec 25 01:10:00 avon genunix: [ID 179002 kern.notice] %l0-3:
    > 0000000000000000 0000000001413460 000002a100664a10 000002a100664908 Dec 25
    > 01:10:00 avon %l4-7: 0000000000000031 00000300001bcd18 00000300001bcd40
    > 0000030007dddf98
    >
    > -----
    > the mount is the attempt to mount the backup disk
    >
    > after this, there's another 60 lines much like the one above, then the
    > system goes down
    >
    > finally, here's one of the boot attempt mesages:
    > -----
    >
    > Dec 26 11:30:52 avon scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3
    > (glm0): Dec 26 11:30:52 avon Rev. 3 Symbios 53c875 found. Dec 26 11:30:52
    > avon pcipsy: [ID 370704 kern.info] PCI-device: scsi@3, glm0 Dec 26 11:30:52
    > avon genunix: [ID 936769 kern.info] glm0 is /pci@1f,4000/scsi@3 Dec 26
    > 11:30:52 avon scsi: [ID 107833 kern.warning] WARNING: /pci@1f,4000/scsi@3
    > (glm0): Dec 26 11:30:52 avon SCSI bus DATA IN phase parity error Dec 26
    > 11:30:52 avon glm: [ID 663555 kern.warning] WARNING:
    > ID[SUNWpd.glm.parity_check.6010] Dec 26 11:30:52 avon scsi: [ID 107833
    > kern.warning] WARNING: /pci@1f,4000/scsi@3 (glm0): Dec 26 11:30:52 avon
    > Target 0 reducing sync. transfer rate Dec 26 11:30:52 avon glm: [ID 923092
    > kern.warning] WARNING: ID[SUNWpd.glm.sync_wide_backoff.6014] Dec 26
    > 11:30:52 avon scsi: [ID 193665 kern.info] sd0 at glm0: target 0 lun 0 Dec
    > 26 11:30:52 avon genunix: [ID 936769 kern.info] sd0 is
    > /pci@1f,4000/scsi@3/sd@0,0
    >
    >
    The first thing I would do is check that the SCSI bus is properly
    terminated and all the connections are tight.

    -- 
    After being targeted with gigabytes of trash by the "SWEN" worm, I have
    concluded we must conceal our e-mail address.  Our true address is the
    mirror image of what you see before the "@" symbol.  It's a shame such
    steps are necessary.          ...Charlie
    

  • Next message: Mika Kongas: "Re: Information please"

    Relevant Pages

    • scsi failure: bus or disk?
      ... and the boot halted soon after. ... I'm wandering if the error is in the scsi system or just the disk. ... Resetting scsi bus, data overrun: got too much data from target from ...
      (comp.unix.solaris)
    • scsi failure: bus or disk?
      ... and the boot halted soon after. ... I'm wandering if the error is in the scsi system or just the disk. ... Resetting scsi bus, data overrun: got too much data from target from ...
      (comp.sys.sun.hardware)
    • Re: AIX 5.2 Upgrade / Not booting
      ... Jason, as Joseph is pointing out, as far as I know the boot discovery issue ... still exists for SAN boot devices.....not for walking a SCSI bus. ... on avg 2-3 seconds PER disk to discover boot volumes over fibre. ... I would prefer NOT to upgrade the shark at this time and I am not really ...
      (AIX-L)
    • Re: Windows Xp crash
      ... Message came up "Disk Boot failure.: ... indicated that all drives had no disk inserted. ... I tried about a week ago to install Windows 7 on the second hard ...
      (microsoft.public.windowsxp.hardware)
    • Re: Installation: doesnt see setup files on hard drive?
      ... > disk the BIOS is set to boot from must have an appropriate Master Boot ... I was worried about those points you made above so I removed all drives except the one I want to instll ...
      (microsoft.public.win2000.general)