Bad SDLT 320?

From: Chris Cameron (Chris.Cameron_at_NetThruPut.com)
Date: 08/26/04

  • Next message: Rachid BOUKHARI: "SUMMARY: Colour problem on PGX64/Blade100"
    To: sunmanagers@sunmanagers.org
    Date: Thu, 26 Aug 2004 10:48:53 -0600
    
    

    Before I go to all the trouble of trying to get Sun to replace my tape
    drive, I wanted to tap some of the experience that's on this list to
    see if what I'm seeing would point to a bad tape drive.

    Have a V240 hooked up to a Sun SDLT 320 drive. Up until this week it was
    backing up ~60 Gigs worth of data using AMANDA (been doing so for 7
    months). That 60 gigs is spread across 3 servers, 1 of which 1 is local
    and the rest are remote.

    When AMANDA does backups, it'll consistently fail on a remote 25 gig
    partition (and only this partition). The errors that AMANDA gives
    (which from my understanding are just passed along dump errors) are:

      devl2 /dev/md/dsk/d8 lev 0 FAILED [out of tape]
      devl2 /dev/md/dsk/d8 lev 0 FAILED ["data write: Broken pipe"]
      devl2 /dev/md/dsk/d8 lev 0 FAILED [dump to tape failed]

    After this the tape drive is unresponsive to any commands, and the
    following errors show up in /var/adm/messages:

    Aug 26 09:10:54 prod2 scsi: [ID 107833 kern.warning]
    WARNING: /pci@1c,600000/scsi@2,1/st@5,0 (st12):
    Aug 26 09:10:54 prod2 SCSI transport failed: reason 'incomplete':
    retrying command
    Aug 26 09:10:56 prod2 scsi: [ID 107833 kern.warning]
    WARNING: /pci@1c,600000/scsi@2,1/st@5,0 (st12):
    Aug 26 09:10:56 prod2 SCSI transport failed: reason 'incomplete':
    retrying command
    Aug 26 09:10:57 prod2 scsi: [ID 107833 kern.warning]
    WARNING: /pci@1c,600000/scsi@2,1/st@5,0 (st12):
    Aug 26 09:10:57 prod2 SCSI transport failed: reason 'incomplete':
    giving up
    Aug 26 09:26:39 prod2 scsi: [ID 365881
    kern.info] /pci@1c,600000/scsi@2,1 (glm1):
    Aug 26 09:26:39 prod2 Cmd (0x1b37578) dump for Target 5 Lun 0:
    Aug 26 09:26:39 prod2 scsi: [ID 365881
    kern.info] /pci@1c,600000/scsi@2,1 (glm1):
    Aug 26 09:26:39 prod2 cdb=[ 0xa 0x0 0x0 0x80 0x0 0x0 ]
    Aug 26 09:26:39 prod2 scsi: [ID 365881
    kern.info] /pci@1c,600000/scsi@2,1 (glm1):
    Aug 26 09:26:39 prod2 pkt_flags=0x0 pkt_statistics=0x61 pkt_state=0x7
    Aug 26 09:26:39 prod2 scsi: [ID 365881
    kern.info] /pci@1c,600000/scsi@2,1 (glm1):
    Aug 26 09:26:39 prod2 pkt_scbp=0x0 cmd_flags=0x18e1
    Aug 26 09:26:39 prod2 scsi: [ID 107833 kern.warning]
    WARNING: /pci@1c,600000/scsi@2,1 (glm1):
    Aug 26 09:26:39 prod2 Disconnected command timeout for Target 5.0
    Aug 26 09:26:39 prod2 genunix: [ID 408822 kern.info] NOTICE: glm1: fault
    detected in device; service still available
    Aug 26 09:26:39 prod2 genunix: [ID 611667 kern.info] NOTICE: glm1:
    Disconnected command timeout for Target 5.0
    Aug 26 09:26:39 prod2 glm: [ID 160360 kern.warning] WARNING:
    ID[SUNWpd.glm.cmd_timeout.6016]
    Aug 26 09:26:39 prod2 scsi: [ID 107833 kern.warning]
    WARNING: /pci@1c,600000/scsi@2,1/st@5,0 (st12):
    Aug 26 09:26:39 prod2 SCSI transport failed: reason 'timeout': giving
    up

    Cycling the tape drive will have it respond to mt again.

    If I try to do a dump manually from the local machine, I'll consistently
    get:

    </> # ufsdump -0f /dev/rmt/1n /dev/md/dsk/d6
      DUMP: Writing 32 Kilobyte records
      DUMP: Date of this level 0 dump: Thu Aug 26 09:05:39 2004
      DUMP: Date of last level 0 dump: the epoch
      DUMP: Dumping /dev/md/rdsk/d6 (prod2:/prod) to /dev/rmt/1n.
      DUMP: Mapping (Pass I) [regular files]
      DUMP: Mapping (Pass II) [directories]
      DUMP: Estimated 35790376 blocks (17475.77MB).
      DUMP: Dumping (Pass III) [directories]
      DUMP: Dumping (Pass IV) [regular files]
      DUMP: Write error 106032 feet into tape 1
      DUMP: NEEDS ATTENTION: Do you want to restart?: ("yes" or "no")

    This happens on a number of tapes, so I doubt it's a tape error.

    Is this a clear case of a bad tape drive? Tonight I'll try on a second
    V240 with a different SCSI cable just to be sure. I have run a cleaning
    tape through it for good measure.

    Thanks,
    Chris
    _______________________________________________
    sunmanagers mailing list
    sunmanagers@sunmanagers.org
    http://www.sunmanagers.org/mailman/listinfo/sunmanagers


  • Next message: Rachid BOUKHARI: "SUMMARY: Colour problem on PGX64/Blade100"

    Relevant Pages

    • Multi-tape backup with "dump"
      ... dump wedges when I put in a new tape. ... as much cpu as they can, ... <ACPI PCI bus> on pcib0 ...
      (freebsd-questions)
    • ufsrestore not /dev/fd friendly
      ... I've recently begun keeping dump tapes' contents online. ... Verify tape and initialize maps ... Unfortunately, on Solaris, ufsrestore seems to incorrectly interpret ... resync restore, skipped 1 blocks ...
      (comp.unix.solaris)
    • Bad SDLT 320 - updates?
      ... WARNING: /pci@1f,4000/scsi@2/st@4,0: ... DUMP: Date of last level 0 dump: the epoch ... Write error 31340 feet into tape 1 ... DUMP: NEEDS ATTENTION: Do you want to restart?: ...
      (SunManagers)
    • Re: Dumping SMF directly to TAPE
      ... Actually our SMF dump job dumps to a daily tape which in the next step is mod on to a weekly tape using IEBGENER. ... This proc used the old IBM "SMFDUMP" program to dump all non-empty SMF datasets to DASD GDS. ...
      (bit.listserv.ibm-main)
    • ufsdump/ufsrestore bizarreness, Solaris 8
      ... The tape is on a remote linux system. ... Where the symbols are filled in appropriately for each file system. ... the first dump on the tape was easily restored using the expected sort of inverse operation: ... That is, spool tape -> local disk, send each big chunk from disk over the network to ufsrestore until done. ...
      (comp.sys.sun.admin)