Re: Interpreting or reading DCPS dumps ?
- From: Jan-Erik Soderholm <jan-erik.soderholm@xxxxxxxxx>
- Date: Wed, 02 Jun 2010 23:21:51 +0200
On 2010-06-02 20:06, Paul Anderson wrote:
Jan-Erik Soderholm<jan-erik.soderholm@xxxxxxxxx> wrote:
DCPS$MAX_STREAMS is not defined at all so each queue has its own
symbiont, right ?
What version of DCPS are you running?
$ sh log dcps$version
"DCPS$VERSION" = "V2.6 ECO 3" (LNM$SYSTEM_TABLE)
I have a DCPS V2.7 kit that isn't installed. Maybe time...
$ tcpip sh ver
HP TCP/IP Services for OpenVMS Alpha Version V5.5 - ECO 1
on a COMPAQ AlphaServer DS20E 666 MHz running OpenVMS V8.2
I know the name of the offending (looping) process ("SYMBIONT_145"),
but I do not know which queue it was serving. I have searched but I
didn't find any way to see what queue was using that symbiont at that
time. The process was simply STOP'ed and the PID ws not saved.
Next time, save the PID and do a SHOW LOGICAL DCPS$*PID to find the
name of the queue associated with that process before using the STOP
Yes, I would probably had done that. But I was in my car
trying to help an not that technical guy over the phone
struggling with the VMS systems with a couple of 100's
workers waiting for the system to get back up again and
the factory more or less standing still.
He was reaching out for the main power on/off switch of
the DS20 just when we found the looping process... :-)
But I do see your point. :-)
Last time this happend (a DCPS symbiont looping and using up all CPU)
was something like a year ago, and I do not know a way to force it to
happen. But *when* it happens, everything practicaly halts on the
system. A whole factory more or less halting...
Since you can't reproduce it, you sure don't want to have all symbionts
create trace files, unless you also want to buy a really big disk.
The "solution" I'm looking at right now, is to run all symbionts at a
lower priority to at least not have the system halted...
I have no opinion about that solution as I never tried that.
It's just so that normal logins still works. It took us a couple
of minutes just to log in and raise the prio to be able to
run MONITOR (that was when we found the looping process).
I just checked OPERATOR.LOG and most (or maybe all)
DCPS-F-CONTERMINATED and DCPS-F-BADPARAM messages comes from a couple
of Xerox Phaser 3250 printers. Aprox 400 CONTER... and 100 BAD...
messages in 2 months.
I would check the network timeout value on those printers and make sure
they're in the range of 4-5 minutes.
We also have some trouble where the queues hangs in "Starting" and a
STOP/START of the queues gets it going again. I have a batch job
checking that so it is right now no major problem.
Is your network or queue manager very busy? It sounds like there might
be some timeouts causing some of this bad behavior.
No, not particular busy. The system isn't very loaded over all.
- Prev by Date: Re: File fragments in the Age Of SAN?
- Next by Date: Re: Webradio Del Mar's Last Try
- Previous by thread: Re: Interpreting or reading DCPS dumps ?
- Next by thread: Re: Interpreting or reading DCPS dumps ?