Re: RFC: Project geom-events



Lev Serebryakov wrote:
Hello, Miroslav.
You wrote 5 октября 2011 г., 1:27:03:

I am still missing one thing - dropped provider is not marked as failed
RAID provider and is accessible for anything like normal disk device. So
in some edge cases, the system can boot from failed RAID component
instead of degraded RAID. This can cause data loss or demage.
What RAID do you mean exactly? geom_stripe? geom_mirrot? geom_raid?
Something else?

I am mostly using geom_mirror.

If GEOM class drops underlying provider due to errors,
it doesn't have chances to update metadata for it.

I understand this, but if there are (stale) metadata on provider, system can read this metadata and should disallow normal operations (for example propagating slices, partitions and labels)

But most of classes, if dropped provider attached again, will
rebuild itself, as they track which components are actual and which
ones are old.

I see many times dropped provider (for example ada1) because of some DMA timeout (bad cables and so on), sometimes provider (disk ada1) detached from ATA channel and reattached after reboot. In both cases, provider has stale metadata and is marked as "broken" by geom_mirror and auto rebuild did not start.

In this case, I see gm0 with all of its slices, partitions and labels and ada1 with the same slices, partitions and labels - this is the problem. Because there are two devices providing same labels and the winner is the first tasted... Even if the system (geom_mirror) knows, that ada1 is "broken disk".

I think that GEOM should be more robust in this case and if metadata is found, do not publish slices, partitions, labels and so on...

Do you want GEOM classes to track droppen components somewhere else
and din't even try to attach them automaticaly when they re-appear?

If some disk is removed, reinserted and synchronisation starts, then everything is OK. But situation where component is marked as "broken" and system and user can operate on it like on normal "good and clean" drive is wrong.

The drive's content should be inacessible until operator do some action (for example gmirror clear on broken disk device).

Miroslav Lachman
_______________________________________________
freebsd-current@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: GEOM_RAID in GENERIC is harmful
    ... Yes, it can be a problem if system can't boot, but now we at least have live mode on installation images, that should allow to do it. ... Disabling them will also make metadata removal without full wipe more difficult because different RAIDs have different on-disk metadata layout, and you should know where exactly to apply dd. ... Right now we will have a situation when most of the users will just upgrade to the new kernel, and will get a non-bootable system or a system with one 100% busy disk. ... Should I test all of the disks against graid labels? ...
    (freebsd-stable)
  • Re: gmirror oddities
    ... >> I've been using gmirror for a while to safeguard my system disks. ... >> some old provider metadata was stored there. ... > Gmirror saves it's metadata on the last sector of its disk space. ...
    (freebsd-stable)
  • LVM cant find devices after adding new disks
    ... the disk the system boots normally. ... with an alternate superblock: ... # This is an example configuration file for the LVM2 system. ... # Configuration of metadata backups and archiving. ...
    (Debian-User)
  • Re: AHCI driver and static device names
    ... Especially when the reason for the technical change isn't clear and the new method isn't at all like the old (ie no disk is guaranteed to get the same id). ... I grant you variable topologies makes things incredibly hairy, but there's no need to take that mess and inject it into how the fixed topology is handled. ... Even this is less than ideal because your variable topologies provide no guarantee of anything being the same, thus your system could boot 1 day and fail the next because someone added a new piece of hardware to the network. ... operation if I forget to label), but I'll give gpt labels a try. ...
    (freebsd-questions)
  • Re: CreateFile() and FILE_FLAG_WRITE_THROUGH
    ... write-through to do this as long as I call FlushFileBuffersto update the metadata each time a new block is added. ... is written with write through, and if the power then fails, the data is available to be read on restart. ... variety of hardware platforms with both SATA and SCSI disk systems, ... Surely the whole point of specifying that flag is that I want the data to be ...
    (microsoft.public.win32.programmer.kernel)