Re: Response issues on GS1280, VMS 7.3-2
From: Lee (lytmah_at_telusplanet.net)
Date: 07/15/05
- Next message: David D Miller: "SET HOST 0"
- Previous message: norm.raphael_at_metso.com: "Re: Migration checklist (no, not away from VMS!)"
- In reply to: Keith Parris: "Re: Response issues on GS1280, VMS 7.3-2"
- Next in thread: Hein: "Re: Response issues on GS1280, VMS 7.3-2"
- Reply: Hein: "Re: Response issues on GS1280, VMS 7.3-2"
- Reply: prep_at_prep.synonet.com: "Re: Response issues on GS1280, VMS 7.3-2"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Fri, 15 Jul 2005 17:53:34 GMT
Do you have LAVC$FAILURE_ANALYSIS in place? This would help you
determine if transient network problems as a contributing factor.
We've ruled out network as the cause of our problem.
Do you see a lot of disk mount verifications?
No disk mount verifications of any kind in the last 5 days.
Is this just any arbitrary DCL command, or something dealing with
specific file(s) or disk(s)? If it's any arbitrary DCL command, then
maybe a CPU shortage is involved, including things like saturation of
the primary CPU in interrupt state ($MONITOR MODES/ALL could help check
for that).
Users run approx. 1,000 commands in the same format when
they log into the cluster:
...
$ DEF/NOLOG/JOB FILE1 FILE1.DAT
$ DEF/NOLOG/JOB FILE2 FILE2.DAT
$ DEF/NOLOG/JOB FILE3 FILE3.DAT
$ DEF/NOLOG/JOB FILE4 FILE4.DAT
…
HP suspects logical name translation to be our problem.
Specifically one of our system logicals called XXXXXX.
Here's trace results of a few seconds from SDA.
Logical Name Trace Information from node L:
Count Logical Name
2150 XXXXXX
294 SYS$SYSROOT
166 SYS$SHARE
128 SYS$COMMON
…
Logical Name Trace Information from node M:
Count Logical Name
3561 XXXXXX
158 SYS$SYSROOT
113 SYS$SHARE
69 SYS$COMMON
During the interactive degradation, CPU usage is very low,
the disk queue length is less than 1,…
I've traced LNM and the high RECLAIM count is not specific to
any one program or any process.
This logical is used by all the FIO routines in our in-house
application programs.
90% of the applications running on the four fairly homogeneous
cluster nodes are in-house.
The strange thing is, most of the programs have not changed
since the migration to GS1280 in May/2005.
Same application programs and the FIO routines were compiled
from a few years ago.
Programs running on the ES45's, no problem.
After the first node was migrated to a GS1280 hard partition,
users experience degraded response on it.
When we had four ES45's, we could roll out one node for
SW/HW maintenance with no response problem.
Now, when we take one GS1280 node out, response is embarrassing.
I've specified System Health Check and and T4 to run daily.
Keith Parris wrote:
> Lee wrote:
>
>> Five-node Gigabit Ethernet VMScluster across three sites.
>
...
- Next message: David D Miller: "SET HOST 0"
- Previous message: norm.raphael_at_metso.com: "Re: Migration checklist (no, not away from VMS!)"
- In reply to: Keith Parris: "Re: Response issues on GS1280, VMS 7.3-2"
- Next in thread: Hein: "Re: Response issues on GS1280, VMS 7.3-2"
- Reply: Hein: "Re: Response issues on GS1280, VMS 7.3-2"
- Reply: prep_at_prep.synonet.com: "Re: Response issues on GS1280, VMS 7.3-2"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|