RE: Clustering: switches reliability/redundancy



"Main, Kerry" <Kerry.Main@xxxxxx> writes in article <FD827B33AB0D9C4E92EACEEFEE2BA2FB7735F5@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx> dated Wed, 14 Dec 2005 17:35:34 -0500:
>
>> From: JF Mezei [mailto:jfmezei.spamnot@xxxxxxxxxxxx]=20

>> Are there considerations if a cluster nodes are all connected via the
>> same switch/hub ? It the later fails, the whole cluster hangs and
>> becomes inaccessible.=20

Yes. It should un-hang when power is restored. Of course, if the same hub
connects you to your network of clients, the cluster would still be
inaccessible regardless of whether it hung or not.

>> Are hubs/switches considered "fault tolerant" ? If not, what possible
>> steps would a good site planner take to ensure a cluster isn't
>> jeoperdized by some $50 switch/hub ?
>>=20
>> Are hubs considered more "fault tolerant" than switches ?
>>=20
>> Is it just a simple case of reserving spare ports on a backup=20
>> switch so
>> that cluster ethernet connectiosn can be moved one by one before the
>> main switch/hub is powered off for maintenance etc ?
>>=20

>Simple solution is to establish VLAN between 2 trunked switch/routers
>and use separate NIC connections to each from each server. This causes
>the switch to appear as a logical unit or virtual cluster interconnect
>box. Entire switch/rtr fails and OpenVMS cluster keeps running i.e. not
>even any application failover issues.=20
>
>With the right IP failover config's in place, you would not even lose a
>telnet connection.=20
>
>OpenVMS will load balance SCS across all configured and available
>connections. By configured I mean not disabled with the SCACP utility.

What we do is run 2 independent switches. Can't v-lan them together because
the MAC addresses are the same across all cards due to DECNET phase IV.
With DECNET performance no longer important, Kerry's solution is better, I'd
change to that if I had the time.

The configuration I describe is tolerant of any single fault, but you have
to stay on top of them because a double fault can hang the cluster.

Example: Nodes A-C each have 2 NICs and are connected to switches 1 and 2.
If NIC A-1 fails, everything still works. But if NIC B-2 fails before A-1
is fixed, nodes A and B can no longer communicate. C can still see both of
them, so it wants to keep the cluster together, and each of {A,B} wants to
kick the other out. The result is a cluster-wide hang.

--Keith Lewis klewis {at} mitre.org
The above may not (yet) represent the opinions of my employer.
.



Relevant Pages

  • Re: Clustering: switches reliability/redundancy
    ... There used to be some totally passive hubs around, ... > so that cluster ethernet connectiosn can be moved one by one before ... > the main switch/hub is powered off for maintenance etc? ... into two independant LAN segments with any switches/hubs running ...
    (comp.os.vms)
  • Re: Clustering: switches reliability/redundancy
    ... same switch/hub? ... connection is a point of failure to be avoided. ... Use multiple network cards and a direct, RX/TX swapped cable between the systems, and the cluster will stay up regardless of what the network nazi's get up to, Sure they'll complain because if its a network cable it has go be in their domain of interest, so they'll argue the point right up until, and beyond some failure or misconfigure of their equipment, so it takes a brave person to take them on. ...
    (comp.os.vms)
  • Re: When a node fail I lost shared folders
    ... we have two Dell Poweredge 2650 with 2x2.2GHz Xeon connected with a POWER ... the share has been configured from Cluster admin. ... Event Source: Application Hang ... > long more or less...but efectively, when one node fail-over, connections ...
    (microsoft.public.windows.server.clustering)
  • Re: recognizing newly created device on HSG80
    ... The IO FIND_WWID command probes all Fibre Channel ports, ... OpenVMS Cluster Configurations. ... now what do I need to do to get the cluster nodes ... On the HSG80 controller, type SHOW CONNECTIONS ...
    (comp.os.vms)
  • Re: setting up a LAT service via a terminal server
    ... > node in my hobbyist cluster. ... If you don't have DECnet running, then you need LANCP to configure the ... $ lancp list dev ewb0 /mop ... > connections, but just one address for OUTgoing connections as well, the ...
    (comp.os.vms)