Re: OpenVMS Management Station - cluster storage ????

From: Keith Parris (keithparris_NOSPAM_at_yahoo.com)
Date: 07/10/04

  • Next message: Keith Parris: "Call to stop blaming the users -- make software more secure"
    Date: 9 Jul 2004 15:30:05 -0700
    
    

    dave.baxter@bannerhealth.com (Dave Baxter) wrote in message news:<a3c44af1.0407071017.5d2612b4@posting.google.com>...
    > As far as I am aware, I have my SAN/CLUSTER configured for high
    > redundancy, and therefore I am trying to figure out why my system had
    > such a bad time when all that happened was that a controller failed.

    Based on your description, it certainly sounds like you have a
    fully-redundant configuration (dual HBAs, dual fabrics, dual
    controller-pairs, shadowing across controller pairs). So things should
    have worked. In such a case, it usually involves some in-depth
    analysis with the support folks to figure out what went wrong.

    They'll look at things like patch status on VMS, SAN configuration
    (and error counters on switches), firmware levels on all the hardware,
    determine the settings for various timeout values like the SYSGEN
    parameter setting SHADOW_MBR_TMO and member timeouts that can be set
    with DCL commands, error logs, console logs, and so forth, and get you
    an answer if it's humanly possible to do so.

    > When a disk controller fails, it is supposed to failover all of the
    > drives it has responsibility for to its partner.

    Yes, it is supposed to. I've run into a few cases over the years when
    it hasn't, but that's the very reason why you shadow between different
    controller pairs -- that should have covered that unlikely, but
    possible, event.

    > The message I received from the Management station was
    > "PROD11: Shadow set DSA12: has no member device on NODE01"
    >
    > This implies to me that the Shadow set has no members !!? Is this
    > what it is really saying ???

    Things must be very bad if NODE01 can't see ANY of the shadowset
    members. Any chance your dual SAN fabrics got accidentally connected
    into one fabric?

    It's clear that it is going to take much deeper analysis than the
    output from the Management Station to get to the bottom of this.
    Console output from both the VMS systems and the HSG controllers (I
    hope it was saved) and error logs both on VMS and the controllers are
    going to be crucial pieces of the puzzle.


  • Next message: Keith Parris: "Call to stop blaming the users -- make software more secure"