SUMMARY

From: mahmod kokaje (mkokaje2000_at_yahoo.com)
Date: 10/18/04

  • Next message: Kevin Binda: "Adding disk to 800"
    Date: Mon, 18 Oct 2004 05:28:32 -0700 (PDT)
    To: tru64-unix-managers@ornl.gov
    
    

    Thanks to dr thomas for fast reply,

    --- "Dr Thomas.Blinn@HP.com" <tpb@doctor.zk3.dec.com>
    wrote:

    > "drd" is the distributed raw device manager that's
    > part of the
    > TruCluster software. To be absolutely certain what
    > it was doing
    > when it issued that message, I'd have to go read
    > code, but since
    > each device has a device number (presumably 251 in
    > this message)
    > and error 5 is a generic "I/O error" message, I'd
    > guess that you
    > had a raw disk open that was managed by another
    > cluster member,
    > the program that had it open tried to close it, and
    > the remote
    > system got an I/O error during the close processing
    > (which would,
    > I suspect, involve writing out any in-memory data
    > buffers and so
    > on), and since the cluster member that logged the
    > message can't
    > do anything else to resolve this, it just logged the
    > message and
    > kept going.
    >
    > But that's still just a guess, if you need a
    > definitive answer
    > and you have a support contract, open a support
    > call. If you do
    > not have support, you can start looking in your
    > binary error
    > logs with a tool like "dia" or "ca" to see if you
    > can spot the
    > I/O failure on some other member. But the "error 5"
    > is almost
    > certainly a disk I/O problem, unless it was doing
    > remote tape.
    > In fact, if you do use shared tape, that could be
    > the issue, it
    > is much more common to get tape I/O errors than disk
    > I/O errors.
    > And you could use hwmgr on the node where the
    > problem showed up
    > to check what it thinks device 251 is in the
    > hardware database.
    >
    > > Hi Admin,
    > > I receive the following message on /var/adm,
    > > drd_close_driver: Failed close on device 251 error
    > 5,
    > > considered closed.
    > > what this message means,
    > > please advise.
    >
    > Tom
    >
    > Dr. Thomas P. Blinn + Tru64 UNIX Software +
    > Hewlett-Packard Company
    > Internet: tpb@zk3.dec.com, thomas.blinn@compaq.com,
    > thomas.blinn@hp.com
    > 110 Spit Brook Road, MS ZKO3-2/W17 Nashua, New
    > Hampshire 03062-2698
    > Alpha Tru64 UNIX kernel support
    > Phone: (603) 884-0646
    > ACM Member: tpblinn@acm.org PC@Home:
    > tom@felines.mv.net
    >
    > Worry kills more people than work because more
    > people worry than work.
    >
    > Keep your stick on the ice. -- Steve
    > Smith ("Red Green")
    >
    > My favorite palindrome is: Satan, oscillate my
    > metallic sonatas.
    > -- Phil Agre,
    > pagre@alpha.oac.ucla.edu
    >
    > Yesterday it worked / Today it is not working /
    > UNIX is like that
    > -- apologies to Margaret Segall
    >
    > Opinions expressed herein are my own, and do not
    > necessarily represent
    > those of my employer or anyone else, living or
    > dead, real or imagined.
    >
    >

                    
    _______________________________
    Do you Yahoo!?
    Declare Yourself - Register online to vote today!
    http://vote.yahoo.com


  • Next message: Kevin Binda: "Adding disk to 800"

    Relevant Pages

    • Re: OS Recovery
      ... >> Certainly what you proposed is also a valid disk mirroring strategy. ... >> majority of the servers I support have two internal SCSI boot disks that ... restore from backup tapes or software ... produce a bootable tape. ...
      (comp.unix.solaris)
    • 2.6.0-test4 parallel seek & read problem
      ... I have run into some strange behavior with kernel 2.6.0-test4. ... reads/second issued to the disk. ... Linux Plug and Play Support v0.97 Adam Belay ... ide: Assuming 33MHz system bus speed for PIO modes; ...
      (Linux-Kernel)
    • 2.6.0-test2 (2.6.0-test2-mm1) unknown block device when mounting SCSI disk.
      ... Adaptec AHA-2940U2W controller with Plextor SCSI cdrom and cdrw and IBM ... 36LZX 36gig disk. ... # ACPI Support ... # Device Drivers ...
      (Linux-Kernel)
    • Re: "ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1
      ... very obvious when it's just one disk reporting DMA errors. ... SMART support is: Available - device has SMART capability. ... Auto Offline Data Collection: Enabled. ... Self-test supported. ...
      (freebsd-stable)
    • Re: "ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1
      ... very obvious when it's just one disk reporting DMA errors. ... SMART support is: Available - device has SMART capability. ... Auto Offline Data Collection: Enabled. ... Self-test supported. ...
      (freebsd-stable)