Re: ad0 READ_DMA TIMEOUT errors on install of 7.0-RELEASE



Jeremy Chadwick wrote:
And after the reboot, the READ_DMA timeouts were back.

You're not the only one seeing this behaviour. There are too many posts
in the past reporting similar. Here's the breakdown:

* Some have switched to alternate operating systems (usually Linux) for
a short while and seen no sign of DMA timeouts.

Booting the 6.3-RELEASE CD seems to make the problem go away... possibly 7.0 stresses the HD more?

However: in your case, your disk does look to have problems based on the
SMART output you provided. It does not matter how new/old the disk is,
by the way. I'll point out the problematic stats. You need to replace
the disk ASAP.

Yeah, that's pretty much what I figured, the timing (ie: the moment I boot 7.0-RELEASE) is the only bit that seems fishy. This HD has been powered on pretty much continuously for around three years. Given that it's a Maxtor, I'm honestly a bit surprised that it's lasted as well as it has.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 253 253 063 Pre-fail Always - 4

This shows you've had 4 reallocated sectors, meaning your disk does in
fact have bad blocks. In 90% of the cases out there, bad blocks
continue to "grow" over time, due to whatever reason (I remember reading
an article explaining it, but I can't for the life of me find the URL).

This is unusual now? I've always "known" that a small number of bad blocks is normal. Time to readjust my knowledge again?

194 Temperature_Celsius 0x0032 253 253 000 Old_age Always - 48

This is excessive, and may be attributing to problems. A hard disk
running at 48C is not a good sign. This should really be somewhere
between high 20s and mid 30s.

Yeah, this is a known problem with this drive... it's been running hot for years. I always figured it was due to the rotational speed increase in commodity drives.

Error 2 occurred at disk power-on lifetime: 5171 hours (215 days + 11 hours)
When the command that caused the error occurred, the device was in an unknown state.
Error 1 occurred at disk power-on lifetime: 5171 hours (215 days + 11 hours)
When the command that caused the error occurred, the device was in an unknown state.

These are automated SMART log entries confirming the DMA failures. The
fact that SMART saw them means that the disk is also aware of said
issues. These may have been caused by the reallocated sectors. It's
also interesting that the LBAs are different than the ones FreeBSD
reported issues with.

If that power on lifetime is accurate, that was at least a year ago... but I can't find any documentation as to when the power-on lifetime wraps or what it actually indicates. I'm assuming that it is total power on time since the drive was manufactured. If it's total hours as a 16-bit integer, it shouldn't wrap. Is there a way of getting the "current" power-on lifetime value that you're aware of? That power on minutes is interesting, but its current value is lower than the value at the error (but higher than the power uptime of the system):
9 Power_On_Minutes 0x0032 219 219 000 Old_age Always - 1061h+40m

Also interesting is that after getting more errors from FreeBSD, I did not get more errors in smartctl.

My advice to you is: replace the disk ASAP. This problem will only get
worse. Try another hard disk brand too (I don't have anything "against"
Maxtor, but usually its recommended to avoid a brand you have problems
with until the next time you have issues, then switch brands, etc.
etc...). I'm very fond of Western Digital's SE16, RE, and RE2 series
currently. But avoid Fujitsu and Samsung (both have a long track record
of having buggy drive firmwares, forcing vendors to make custom
workarounds for issues); stick with Seagate, Western Digital, or Maxtor.

Yeah, that's my plan... but I wanted to stake out some whining rights in advance so I can do the "But you said it was a bad HD or cable! Now I'm out $x00 and my system still doesn't work! Help me or I switch to DragonFly BSD/Desktop BSD/Linux which is perfect and has no problems!" thing. Then go on Slashdot and post long rambling messages about how FreeBSD is dead and it doesn't matter than the manpages on any given Linux box are useless.

_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: Last typewriter factory in the world shuts down
    ... Hz power, but ground tests unsurprisingly showed that the 20,000 Hz ... I've never heard of a disk drive faster than 7200 RPM. ... today's 15k 3.5" drives say very little about what was possible in ...
    (rec.arts.sf.fandom)
  • Re: Easiest way to clone a 3B1?
    ... systems - bad motherboard on one and small disk on the other. ... I do have the floppy tape drives and floppy drives, ... connector on the power supply. ... Look at the connector -- if you see the orange plastic somewhat ...
    (comp.sys.3b1)
  • Re: Cannot mount 2.5" drive in USB 2.0 case!!
    ... >>>backup the whole disk. ... Hence the usb IDE interface case. ... see if applying power via ... I have no luck at all running laptop drives from unpowered USB ...
    (microsoft.public.windowsxp.hardware)
  • Re: BBC Commissions Promise TV to Build Ultra-High-Capacity DVR
    ... matter of the hard disk vendors adding more platters to an existing chassis. ... Chicken-and-egg, few people need Terabyte Drives right now, so they are not ... everyone buy a solar panel to power the Terra-watts of power these wretched ... The government has found ingenious new ways to piss away energy and natural ...
    (uk.media.tv.misc)
  • can you make a hard disk read-only?
    ... I'm wondering if there's some way of making a hard disk drive read-only ... -install a base system from woody CDs with the switch set to writeable. ... You could even do this semi-automatically, if the hardware was designed so ... this for these older drives just by preventing the DIOW- line ...
    (comp.os.linux.security)