Re: Netware is no VMS

From: Paul Sture (p_sture_at_elias.decus.ch)
Date: 08/09/03


Date: 9 Aug 03 07:51:34 +0200

In article <bgu58o$icb$1@news.mdx.ac.uk>, david20@alpha1.mdx.ac.uk writes:
> In article <3F327E6D.5080402@MMaz.com>, "Barry Treahy, Jr." <Treahy@MMaz.com> writes:
>>jlsue wrote:
>>
>>>On Wed, 06 Aug 2003 10:33:10 -0700, "Barry Treahy, Jr." <Treahy@MMaz.com>
>>>wrote:
>>>
>>>
>>>>A very interesting Linux project, which is underway, is a clustering
>>>>project that creates a unified process space between cluster nodes so
>>>>that true load balancing and failover can occur when a member node
>>>>either slows or dies... There is no question that VMSclusters has lead
>>>>the way in clustering, but it was never completed to what would be a
>>>>logical conclusion; Process failover/recovery and true load-balancing...
>>>>
>>>>
>>>>
>>>
>>>This is really cool, until you realize that the process that just crashed
>>>system1 got failed over to system2 (then system3, system4...)
>>>
>>Bah, non-sense! You must be talking about Windows, because this isn't
>>an issue with VMS nor most flavors of Unix...
>>
>>Think about it, when is the last time you had a rogue VMS application
>>take down a VMS system? With the exception of a faulty driver that
>>couldn't handle unexcepted device failures, I have not had a software
>>based VMS failure for over a decade and that was Kernel code! As for
>>user-mode code crashing on VMS, I can't think of much except perhaps EDT
>>on VMS 1.0 but that was all RSX based stuff... In the case of
>>Supervisor or Executive code, if the code faults, it kills the process,
>>not the system...
>>
>>Barry
>>
>
> So it was in Kernel mode when it crashed. Are you saying that only
> processes which NEVER go into Kernel mode can be migrated.
>
> Process runs application in user mode. Application does something which means
> it executes some code in Kernel mode and then crashes the system.
> Process is migrated to second system at its last checkpoint.
> Process continues. Application executes its code in Kernel mode again - crashes
> second system.
>

Well said David. We also have to consider the case of a DoS here. A few years
ago we had some job which DECScheduler would restart immediately if the job
failed. Oh, we hit version 32767 on the logfile, and until I raised my own
priority above 16, it was hard to get a look in on the problem.

> Just take a look at the ECOs for DEC TCPIP services. There are usually a few
> fixes for inappropriate usage of NFS, FTP or other parts of the TCPIP suite
> which in certain rare circumstances can crash a system.
>
> eg
>
> ECO B 30-APR-2002 Alpha and VAX
>
> Problem:
>
> INVEXCEPTN crash in TCPIP$INTERNET_SERVICES due to inadvertent
> freeing of a BG device for an active TN device kernel client.
>
> Deliverables:
>
> TCPIP$INTERNET_SERVICES.EXE V5.3-18B
>
> Reference:
>
> PTR 70-5-1928 / CFS.88690 / Req Id: STLQC0001
> TCPIP_BUGS Note 2572
> PTR 70-5-1947 / CFS.89075 / Req Id: HPAQC22TZ
>

Exactly. Our tour of Wednesday's crash dump took me down the road of
reading exactly the same text. Then we hit the DSN database and found
tons of examples of similar, but not quite the same, crash situations.

A maze of twisty passages, all alike, but somehow subtly different.



Relevant Pages

  • Re: Netware is no VMS
    ... >an issue with VMS nor most flavors of Unix... ... So it was in Kernel mode when it crashed. ... it executes some code in Kernel mode and then crashes the system. ...
    (comp.os.vms)
  • Re: How to clear device errors w/o reboot?
    ... CLEAR_ERRORS zeroes out the error count for all devices on a VMS Alpha ... linked lists, zeroing the error fields at it goes. ... Zeroing the error fields can only be done from kernel mode and should be ... There is always a risk of a system crash when executing in kernel mode, ...
    (comp.os.vms)
  • SUMMARY: anyone using WEBES 4.3?
    ... that some of Joe's correspondence is VMS centric. ... and wsea are EV6-centric, ... like you are using Webes on Tru64Unix. ... Typically the files that are created after a crash are ...
    (Tru64-UNIX-Managers)
  • Re: Interpreting image exit status values
    ... > process to crash, but VMS goes on. ... > in the system error log. ... Yes, there were bugchecks logged. ...
    (comp.os.vms)
  • Re: [OT] Solaris crashes itself when /tmp is full
    ... You don't have a problem on VMS because an unprivileged user ... With the default settings the system will not crash if the system disk is full. ... There are enough users and supporters of VMS who are able to provide balance ... Last time I looked you seemed only to ever post to this newsgroup. ...
    (comp.os.vms)