RE: WVNETcluster uptime reaches 10 years...




> -----Original Message-----
> From: bill@xxxxxxxxxxxxxxxxxxxx
> [mailto:bill@xxxxxxxxxxxxxxxxxxxx] On Behalf Of Bill Gunshannon
> Sent: January 8, 2006 5:46 PM
> To: Info-VAX@xxxxxxxxxxxx
> Subject: Re: WVNETcluster uptime reaches 10 years...
>
> In article
> <FD827B33AB0D9C4E92EACEEFEE2BA2FB773AC3@xxxxxxxxxxxxxxxxxxxxxx
> orp.net>,
> "Main, Kerry" <Kerry.Main@xxxxxx> writes:
> >
> >> -----Original Message-----
> >> From: bill@xxxxxxxxxxxxxxxxxxxx=20
> >> [mailto:bill@xxxxxxxxxxxxxxxxxxxx] On Behalf Of Bill Gunshannon
> >> Sent: January 7, 2006 6:39 PM
> >> To: Info-VAX@xxxxxxxxxxxx
> >> Subject: Re: WVNETcluster uptime reaches 10 years...
> >>=20
> >> In article <eWHVHo9INQti@xxxxxxxxxxxxxxxxxxxxxxxx>,
> >> Kilgallen@xxxxxxxxxxx (Larry Kilgallen) writes:
> >> > In article <43BF5649.51D600A6@xxxxxxxxxxxx>, JF Mezei=20
> >> <jfmezei.spamnot@xxxxxxxxxxxx> writes:
> >> >> Kenneth Farmer wrote:
> >> >>> I'm sure it's correct. I'm wondering if other OS=20
> >> fanatics could argue it
> >> >>> can be rigged.
> >> >>=20
> >> >> This rigging isn't really the question.=20
> >> >>=20
> >> >> Comparing apples to oranges is. Is it fair to compare the=20
> >> uptime of a
> >> >> VMS cluster against that of individual machines ?
> >> >=20
> >> > The purpose of computers is not to keep a particular green=20
> >> light on the
> >> > machine lit, but rather to provide some service to humans. =20
> >> If VMS is
> >> > able to provide that service on a continuous basis that is=20
> >> what counts.
> >> >=20
> >> > The involvement of multiple CPUs, threads of execution,=20
> >> power supplies
> >> > etc. is just so much geeky technical trivia not germane
> to the issue
> >> > of whether the service was provided to humans.
> >> >=20
> >> >> If you have 2 Solaris machines that provide HTTP/WEB=20
> >> servers and using a
> >> >> router to distribute traffic and stop sending traffic to a=20
> >> node that is
> >> >> down, the uptimes project won't report a "cluster" up time but
> >> >> individual nodes uptime even though functionally, those=20
> >> solaris boxes
> >> >> would offer about the same uptime than a VMS cluster.
> >> >=20
> >> > That service provided would not be the same if the web
> site involved
> >> > updating. For read-only applications I have an even
> more reliable
> >> > technology pre-dating VMS called a "book".
> >>=20
> >> And why could the above mentioned Solaris system not
> involve updating?
> >> I have multiple servers with shared file systems so that
> any update on
> >> any system is universal. I can (and do) do rolling
> updates so that
> >> system availability is continuous. There are only two
> things missing.
> >> A "cluster uptime" value and thinking it matteried enough to care.
> >>=20
> >
> > And could you let us know what happens to the incoming
> writes when the
> > system hosting the writes for other systems via the network
> file sharing
> > you are talking about has to be rebooted or just plain halts or is
> > powered off?
>
> As long as we're not talking Linux they just wait til it
> comes back up.

Obviously, if this is the case, then HA is not a requirement. If you are
running a student registration system on Linux in Sept, then a bad cpu
board (pick any HW failure) which needs to come from some place that may
take a day or two will likely be an issue.

If HA is not a requirement, then clusters is clearly not a requirement.


> What happens to the incoming writes on my VAX Cluster when
> the HSJ serving
> all my disks dies? There are failure modes for everything.

Assuming typical cluster config's with volume shadowing across disk
controllers - failure is transparent to application.

> I have never
> said VMS wasn't good at this, I just don't see the purpose of
> obsessing
> ove this uptime thing. I remember when we were being fed the
> line there
> was a "system" (a Vax no less) that had been up for 15 years.
> Not being
> possible, it eventually turned out to be a Cluster, which is
> a different
> animal entirely.

If you don't need HA, then the uptime thing is obviously not an issue
for you.

>
> >
> > Or perhaps you could expand on how each system can do
> direct IO's to the
> > storage sub-system without the writes taking the long treck over the
> > network? Most folks think a DLM is required to do direct
> IO's from each
> > system.
> >
> > Or perhaps you could expand on how you would shut one
> entire site down
> > without telling the end users in a multi-site config and not impact
> > application availability?
> >
> > Here is a pointer to a whitepaper that can refresh readers on the
> > benefits of clustering and different UNIX implementations
> as compared to
> > OpenVMS:
> >
> http://www.tru64unix.compaq.com/unix/illuminata_dt_unix_resear
> ch_note.pd
> > f
> >
> > Thanks,
>
> You just keep on obsessing. Eventually you may figure out
> that in the real
> world most places just aren't that concerned.
>

In a distributed world - perhaps. The impact of the distributed world is
much less than a centralized environment. Course, you pay for the
additional servers, management, monitoring etc.

The reality in almost all large companies today is that server and data
center (DC) consolidation is one of the hottest topics on most C level
types radars. They need to cut IT costs big time while at the same time
improve service levels and roll-out new stuff as well.

Also, to be compliant with SOX and other regulatory compliance
requirements, many companies must ensure their IT environment meets
specific pre-determined levels.

In a centralized world, HA (including server and DC's) is critical as
the costs of failure in terms of regulatory and legal compliance as well
as applications not being available is extremely high.

So, your comments about most people not being worried about HA is
certainly not one shared by most med-large companies today.

> Oh, by the way, how many VMS systems in New Orleans have
> Uptimes of more
> than a year?

Not sure of any companies that had disaster tolerant clusters in New
Orleans, but servers in clusters are supported up to 800km apart, so
that is all part of the HA planning. The New Orleans disaster is a good
reason why long distance clusters (greater than 100km) are being looked
at by many large Customers today.

Every system in the world is subject to be taken down by
> factors out of the control of the ones running the system.
> Keep telling
> people about the advantages of VMS (God know's your employers
> won't!) but
> stop thinking that the results on some web site that tracks
> "Uptime" is
> going to make people dance in the streets.

I was talking about the technology - not about any news that would get
people dancing in the street.

Kerry Main
Senior Consultant
HP Services Canada
Voice: 613-592-4660
Fax: 613-591-4477
kerryDOTmainAThpDOTcom
(remove the DOT's and AT)

OpenVMS - the secure, multi-site OS that just works.
.



Relevant Pages

  • Re: Windows 2003 clustering for file serving.
    ... They don't have 100% uptime if there is a failure of one ... There is a transition period. ... the servers, as well as updates. ...
    (microsoft.public.windows.server.clustering)
  • RE: Upptime report tools?
    ... You can execute from your workstation across your network to servers. ... Total Uptime: 57d 4h:19m:50s ... Om looking fo a tool to monitor my servers uptime. ...
    (Security-Basics)
  • Re: Advice on KB282227 & Dismantling Cluster
    ... Political decisions to move away from clusters, ... servers showed more uptime than the clusters. ... We have 2 applications that rely on the Physical computer names and IP ...
    (microsoft.public.windows.server.clustering)
  • Re: fault tolerant web servers on freebsd
    ... Then you can start talking about High Availability vs. level of Fault ... My goal is to build a system, that can maintain that uptime. ... storage servers with on-line sync mechanism running ... Few people have told me about a setup with linux, drbd and heartbeat which offers them some level of HA. ...
    (freebsd-stable)
  • Re: fault tolerant web servers on freebsd
    ... Let's say i need to run a few php/sql based web sites and I would like to ... maintain uptime of about 99,99% per month. ... storage servers with on-line sync mechanism running ... mysql servers with on-line database replication ...
    (freebsd-stable)