DS10L hanging problem, tracking it down



My DS10L is misbehaving.

DS10L 6/466, 1GB memory (compatible, not Compaq), KZPBA SCSI
controller, Quantum Viking II 4.5GB UWSE drive, attached BA364 tower
with two UWSE disks and one CDROM.

OpenVMS V8.2, patched up to 6/2006.

The system ran completely reliably for months. About 3 weeks ago I
found it not responding, though I could still get into the RMC from the
console. Rebooted, ran about a week then hung again. Each run
interval got shorter until now it will sometimes stay up less than 30
minutes after a long powerdown; a simple powercycle will usually not
get all the way through a boot before hanging.

A temp monitor run when VMS is up shows system temp at 26C after long
powerdown, and peaking at 40C during mid day when the room is the
warmest. The system has plenty of open ventilation space, and has a
small desk fan blowing at its front panel (unchanged since
installation).

There are no error messages, no crash, no log entries. It just stops.
RMC is still accessible, but performing a reset has only worked once to
reboot; usually a power cycle is required to even get to the SRM.
Front panel HALT, console BREAK and console CTRL-P do nothing when its
in this state (CTRL-P definitely halts the system when its operating
normally).

I had one test where I was in SRM, not having booted yet, and it
stopped responding, so I'm pretty sure its system hardware, not a
problem with the SCSI controller or drives. Further testing needed
there since it was a one-shot.

Work done so far:
- ran without external drives (still hung)
- pulled and reseated SCSI card and memory (both risers and DIMMs) with
static protection
- removed heat sink to verify (after seeing pics of overheated units);
the CPU and heat sink looked fine but the grafoil pad was wrinkled and
twisted around the studs a bit. I pulled it, cleaned the surfaces and
used a high quality thermal grease instead.
- verified all fans are operating, and the little flexible air dams are
properly located.

The system is still hanging after a short period of operations. Still
no errors or log messages.

Next test will be to pull the SCSI controller and just play with it at
SRM level to see what happens. The POST does pass, a memory test seems
to pass. No memory errors were logged at VMS level. I'll run as many
tests as possible at the SRM level as soon as I can.

This is a hobby system, so no support contract. Thanks for any advice.

Rich

.



Relevant Pages

  • Re: sloooooow computer
    ... 128 kilobyte primary memory cache ... Drives Memory Modules c,d ... Primary IDE Channel [Controller] ... SiS 7001 PCI to USB Open Host Controller C-Media AC97 Audio Device ...
    (microsoft.public.windowsxp.general)
  • Re: DS10L hanging problem, tracking it down
    ... problem with the SCSI controller or drives. ... SRM level to see what happens. ... The POST does pass, a memory test seems ...
    (comp.os.vms)
  • Re: sloooooow computer
    ... 64 kilobyte secondary memory cache Board: ... Network Drives None detected Users Printers local user accounts last logon ... Primary IDE Channel [Controller] ... Would installing more memory help?? ...
    (microsoft.public.windowsxp.general)
  • [SLE] firewire 9.1 puzzle
    ... I had an external hard drive using the Lava Firewire to IDE controller. ... Since I couldn't add memory, I decided to upgrade the memory board. ... I replaced all the old drives and everything was working fine. ...
    (SuSE)
  • Re: Looking for SAS/SATA RAID Controller That Supports JBOD
    ... Right, which is why one speaks of a JBOD cabinet, not a JBOD drive. ... All refer to such a controller as a *RAID* controller supporting JBOD ar- ... -or- an external controller spanning drives in a cabinet. ...
    (comp.periphs.scsi)

Loading