RE: Clustering: switches reliability/redundancy
- From: klewis@xxxxxxxxxxxxxxx (Keith A. Lewis)
- Date: Thu, 15 Dec 2005 23:01:25 +0000 (UTC)
"Main, Kerry" <Kerry.Main@xxxxxx> writes in article <FD827B33AB0D9C4E92EACEEFEE2BA2FB7735F5@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx> dated Wed, 14 Dec 2005 17:35:34 -0500:
>
>> From: JF Mezei [mailto:jfmezei.spamnot@xxxxxxxxxxxx]=20
>> Are there considerations if a cluster nodes are all connected via the
>> same switch/hub ? It the later fails, the whole cluster hangs and
>> becomes inaccessible.=20
Yes. It should un-hang when power is restored. Of course, if the same hub
connects you to your network of clients, the cluster would still be
inaccessible regardless of whether it hung or not.
>> Are hubs/switches considered "fault tolerant" ? If not, what possible
>> steps would a good site planner take to ensure a cluster isn't
>> jeoperdized by some $50 switch/hub ?
>>=20
>> Are hubs considered more "fault tolerant" than switches ?
>>=20
>> Is it just a simple case of reserving spare ports on a backup=20
>> switch so
>> that cluster ethernet connectiosn can be moved one by one before the
>> main switch/hub is powered off for maintenance etc ?
>>=20
>Simple solution is to establish VLAN between 2 trunked switch/routers
>and use separate NIC connections to each from each server. This causes
>the switch to appear as a logical unit or virtual cluster interconnect
>box. Entire switch/rtr fails and OpenVMS cluster keeps running i.e. not
>even any application failover issues.=20
>
>With the right IP failover config's in place, you would not even lose a
>telnet connection.=20
>
>OpenVMS will load balance SCS across all configured and available
>connections. By configured I mean not disabled with the SCACP utility.
What we do is run 2 independent switches. Can't v-lan them together because
the MAC addresses are the same across all cards due to DECNET phase IV.
With DECNET performance no longer important, Kerry's solution is better, I'd
change to that if I had the time.
The configuration I describe is tolerant of any single fault, but you have
to stay on top of them because a double fault can hang the cluster.
Example: Nodes A-C each have 2 NICs and are connected to switches 1 and 2.
If NIC A-1 fails, everything still works. But if NIC B-2 fails before A-1
is fixed, nodes A and B can no longer communicate. C can still see both of
them, so it wants to keep the cluster together, and each of {A,B} wants to
kick the other out. The result is a cluster-wide hang.
--Keith Lewis klewis {at} mitre.org
The above may not (yet) represent the opinions of my employer.
.
- References:
- RE: Clustering: switches reliability/redundancy
- From: Main, Kerry
- RE: Clustering: switches reliability/redundancy
- Prev by Date: Updated VMS information
- Next by Date: Re: DESTA Memory hog
- Previous by thread: RE: Clustering: switches reliability/redundancy
- Next by thread: Re: Clustering: switches reliability/redundancy
- Index(es):
Relevant Pages
|