Re: disk errors in 5.4
From: Jason Bourne (j_bourne_treadstone_at_hotmail.com)
Date: 07/01/05
- Next message: Friedrich Volkmann: "Re: USB printer"
- Previous message: Rudolf Polzer: "Re: How harmful is pkg_add -f"
- In reply to: Skeleton Man: "Re: disk errors in 5.4"
- Next in thread: Magnus: "Re: disk errors in 5.4"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Fri, 01 Jul 2005 02:18:37 -0400
Skeleton Man wrote:
[snip]
>
> Checked the power supply under load with a dmm.. all the rails are within
> spec
> (brand new 350W psu).. and it's not a power hungy setup (Celeron 700Mhz,
> 1 hard disk, 1 cd-rom)..
Just a little side note here: This probably doesn't apply so it's just an
FYI kind of thing. But checking DC voltages with a meter does not tell the
complete story; even though with a new power supply I highly doubt anything
is wrong. But when you examine the DC voltages with an oscilloscope if the
ripple just happened to be excessive it would cause all manner of problems
which would seem to have no particular source (sans oscilloscope). The
memory of this situation for me was a clients use of a model of HP Vectra
which had a power supply only rated for 135 watts in a machine used as an
archive repository. The archive was of images of cheques as they were
processed and were written to a WORM drive. This drive pulled so much
current every time it did anything the DC on the 'scope was anything but.
All the "digiboys" had gone over this box repeatedly and could not figure
anything out. They even replaced the box with completely new boxen of the
same model and it made no difference. Analog electronics knowledge may be
receding into the background in the computer era, but it still produced a
result in this case: the HP Vectra model just simply could not be certified
for use in this application.
> I checked the cabling by swapping out ide connectors with severel
> different cables, some brand new from the packet, no difference..
> I tried disconnecting cd-rom, no difference. I tried both the original and
> replacement hard drives on both primary and secondary channels, no
> difference..
>
> I tried swapping the ram with known good memory.. again no difference...
Sounds quite frustrating! The next thing I would try hardware-wise myself
would be to substitute a newer drive of a completely different make and
model. If this produces a change rule out using the Seagate. You also may
try expirementing with disabling DMA in /boot/loader.conf. While running in
PIO mode will definitely not be the way to go if the problem disappears it
would point to a DMA/ATA driver issue. Here again, if such were the case,
such a condition may be related to the Seagate/controller/driver
combination itself; still a case for a different drive.
> I am almost certain the issue is software, because a clean install of 5.4
> boots like a charm, but after adding the packages, everything
> segfaults/core dumps (I got core dumps from acpi.ko, ssh, inetd, grep, and
> a list of others)
>
Some diagnostic split needs to be made between hardware or software
troubles. It seems you have just about gone over the hardware fairly
thoroughly. My experience with this is hardware problems may be very flaky
and hard to pin down but software problems can usually be reproduced at
will by doing the same sequence of things that produce the problem. But
keep in mind that if the hard drive subsystem is causing file corruption
then subsequent uses of a file (such as executing ssh for example) will
likely segfault and dump core. Failing media on the drive might allow you
to write out an area, then a bad spot corrupts some sector(s) within. This
should be readily detectable using the drive mfrs diag software between
"incidents".
Software wise after a fresh install and reboot I would load just one app
at a time and run it for a while. For attempting to "tickle" an
intermittent hardware glitch memtest or buildworld comes to mind. If you
can do this for a while add on the next app and repeat. Also, one thing to
consider would be to not use packages but, rather, cvsup your ports tree
and compile the port(s) yourself. If all seemed well then there might be
something wrong with the package(s)? The goal here would be study the
"sequence of steps" ala divide and conquer. If it is truly a software
problem you need to isolate it to the specific package/port.
I know -> This is no answer to your problem and I do *know* the
frustration you are feeling (been there, done that). Don't know much about
the problem but my gut instinct tells me to take a hard look at the "old
Seagate".
-Jason
- Next message: Friedrich Volkmann: "Re: USB printer"
- Previous message: Rudolf Polzer: "Re: How harmful is pkg_add -f"
- In reply to: Skeleton Man: "Re: disk errors in 5.4"
- Next in thread: Magnus: "Re: disk errors in 5.4"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|