Re: suddenly filesystem becomes read-only?



Troy Piggins wrote:
* Logan Shaw wrote:
Troy Piggins wrote:
* Michael Paoli wrote:
may have options such as errors=remount-ro or may default to such
behavior. Some may have mount options such as errors=panic or may
If this is in fact the reason the filesystem became read-only (which
it seems like it probably is), then you really need to do some
careful investigation, provided this machine is important to you.

Put it this way, my girlfriend thinks I love my linux box more
than her...
I think that's a common (natural?) occurrence between girlfriends and
LINUX ;-) ... but that's probably a topic for some other newsgroup.

At this point, there are only two possible causes:
(1) software problem, 99% chance it's a bug in the filesystem code
(2) hardware problem, which means either defective memory (unlikely)
or dying hard disk.
Yep, I'm thinking (2) also.
Well, I'd also lump power distruptions into (2), or in some cases (1),
depending upon their cause, and they could be completely external
to the system itself that's had the filesystem issue - nevertheless
such could introduce logical data corruption and/or hardware damage
to the disk, or otherwise have negative impacts upon the computer
system.

If I had to bet money, I'd say this was most likely due to a defective
hard disk. So, I would check your disk with whatever tool Linux provides
for checking for bad sectors. You could even just do something like a
"dd if=/dev/hda1 of=/dev/null bs=1024k" just to be sure you can read
all the sectors. Checking that you can read and write them both would
be a better test, though.

I'll do some checks on the weekend. Thanks for the pointers.

Yes, that's one of the first things I'd also do if I suspected there
might be disk hardware problems - read the entire device end-to-end,
and see if that is successful or not. Note also that more modern
and/or intelligent hard drive (e.g. SCSI, most non-ancient IDE/ATA,
etc.) drives are relatively intelligent about automagically "fixing"
minor hard drive problems. Within my experience, on SCSI this is
typically much more graceful than with IDE/ATA, but others may have
had different experiences (thus far I've not dealt with a particular
large number of drives that automagically recovered themselves, so
I'm working from a small number statistics sample set). With SCSI,
there's a "grown defects list" (or whatever its precise name is).
When bad/suspect sectors are found, they're added to this list. If
the drive is still able to read the data (if it's read, before there
is some attempt to write it), it will rewrite it elsewhere, and remap
so the alternate sector is used. If it's simply written to, it
likewise remaps it, and writes and henceforward uses the alternate
sector. Things only go rather to quite poorly with this scheme when
either the operating system still needs to read the data, and the
SCSI drive can't successfully read it, or the "grown defects list"
table overflows, and the SCSI drive can no longer remap bad sectors
(if you or your monitoring software can monitor the "grown defects
list", watching for growth there, particularly if it's growing fast,
or the table is approaching being full, those are strong indicators of
a disk that is quite probable to non-recoverablely fail in the near
future). With IDE/ATA I've seen similar, but less graceful behavior.
With such drives, it seems (I've not verified this at all, ... just
my guestimate on behaviors I've seen) the drives aren't as
"proactive" about remapping. It seems they only get around to
remapping after a sector has gotten to the point where it can't be
successfully read. On the other hand, SCSI seems a bit more
intelligent about this, and is often capable of detecting that
sectors are becoming "difficult" (perhaps close to tolerance limits,
or experiencing some read errors, but succeed with repeated read
retrys) to read, and often successfully remap them (with no visible
sign that any problems occurred, other than the growth of the defects
list, and perhaps a trace of extra latency in reads on some
occasions). Anyway, even with the "smart", but not *as* "smart"
IDE/ATA drives, overwriting the sector of the device that's having
the problem will often cause the problem to automagically
"disappear", as it gets remapped upon the overwrite (e.g. my personal
laptop has given this precise behavior exactly twice thus far in the
over 3 years that I've had this laptop). Note however, that for many
filesystem types, that overwriting the file that contains the bad
sector may not attempt to overwrite the bad sector - e.g. journaling
filesystems will typically write the data elsewhere upon "overwrite"
(so that an incomplete action - such as one disrupted by loss of
power or system lock-up, can be "rolled back" (or forward) to a
consistent filesystem state.

As was mentioned (or at least hinted at) earlier, if you're able to
do repeated overwrites of various patters, that's typically best at
testing/exercising a hard drive (particularly also with lots of
random seeks included - I've had drives that read (and wrote)
perfectly fine end-to end, but failed miserably under random seek
conditions) ... but "most of the time" (at least more often than
not), reading end-to-end (even in purely sequential manner) will
typically pick up problems a drives is having. Also, due to all the
automagic remapping stuff, overwrite tests can quickly and
effectively "hide" a problem (or make it go away, when successfully
remaped), and can make it less clear that there at least *was* a
problem ... hence I generally recommend at least doing full reads,
before trying overwrites (at least if one wants to check/confirm if
the drive is or has been having a problem).

Also, not sure about the latest protocols, standards, and tools, but
as far as I'm aware, the "grown defects list" can be inspected (e.g.
via software and SCSI protocols) on SCSI disks, but I don't think
such capability exists for ATA/IDE drives (but perhaps that's
changed?). Precise answers on that may also vary depending on OS
flavor and available software. You mentioned Ubuntu (which is Debian
based). Debian has tools for getting detailed information from SCSI
devices - including the "grown defects" list, so I'd think it
probable Ubuntu includes or makes available same, or similar tool.
I'm not as sure about ATA/IDE, but perhaps you or someone else will
provide us with more information (and any applicable corrections)
regarding such.

.



Relevant Pages

  • Re: SATA vs SCSI ...
    ... > sequrncial read there is no difference between scsi and sata. ... I'm also under the impression that consumer-grade drives generally write ... If only one sector is being written, ...
    (freebsd-stable)
  • Re: [opensuse] raid question
    ... SATA drives do not reallocate on read only on write. ... > Since the OP has the sector #, he should use dd to read in the sector ... I don't think the SATA error code interpretation by the SCSI ...
    (SuSE)
  • Re: The Windows 2000/XP USB 128 GB problem
    ... The USB drives always get a SCSI LBA-32 sector, ... configuring something, while Linux is not. ...
    (comp.sys.ibm.pc.hardware.storage)
  • Re: Strange PEER error with Danis 506 1.81 generic question
    ... First, a really large number of the systems around me are still SCSI systems which use not only SCSI hard drives, but also SCSI DAT drive tape backup systems, additional SCSI hard drives which can be plugged in, but are not normally there, in mobile drive trays; as well as, in some cases actual SCSI CD-RW units. ... The only use there for the IDE interface is another CD or DVD device for read purposes. ...
    (comp.os.os2.bugs)
  • Asus A7N8X-E w/SATA and SCSI, 4.10 sys install snafu
    ... Old buslogic SCSI controller with 2 drives, a cd, and a tape. ... and nt going off the sata drives. ... I configured the mobo to boot from cd first, ...
    (freebsd-questions)