Re: DiskSuite - puzzling issue

From: Juhan Leemet (juhan_at_logicognosis.com)
Date: 07/23/04


Date: Thu, 22 Jul 2004 22:21:45 -0200

On Thu, 22 Jul 2004 12:52:49 +0000, Scott Howard wrote:
> Juhan Leemet <juhan@logicognosis.com> wrote:
>> I believe the reason the system does not load the metadbs is that you do
>> not have more than 1/2 the number of valid metadatabase replicas that SVM
>> prefers. I'll bet your /etc/system does not have the line:
>>
>> set md:mirrored_root_flag=1
>
> Please don't use this flag. Please?

Um, OK... I'll take it "under advisement". I have not read any explanation
for why this should never be used, considering that it is mentioned in
some Sun document(s). I have always been careful to qualify any mention of
this flag by saying "some claim that this is dangerous" (or some such).

I would like to understand how (under what circumstances) SVM/SDS "gets
confused" and trashes the mirror by replicating backwards. I have not read
anywhere any clear description of how/why that would occur, your post
notwithstanding. I would really like to understand this issue.

> To take a step back...
> A _correctly configured_ SDS/SVM, even running on only two disks, will
> _never_ drop to the OK prompt, crash, reboot or anything else when a disk
> fails. If it does, then either your system isn't correctly configured,
> or you've found a bug.

Yes, I agree, and I've said that in past posts. I have corrected some
others who have suggested not mirroring swap (which would cause a crash).

> The only time that the md:mirrored_root_flag comes into play is during
> boot, where it will allow the system to boot with exactly 50% or more of
> metadb replicas available, instead of the normal >50% of replicas
> available.

Yes, I agree. No argument from me.

> There's generally only 3 situations where a machine will be rebooting with
> exactly 50% of it's replicas available :
> * After a disk replacement in a system where disks are not hot-swap, and
> the replicas on the disk being replaced were not removed before the machine
> was shut down. This is a process issue - the admins need to be educated
> to remove the metadb replicas before shutting the machine down. The fix
> in this situation is trivial (metadb -d and reboot) and relatively quick
> (especially given that machines with non-hotswap disks are generally fairly
> quick to post).

OK. Doesn't apply to me (SCA disks), but I'll grant that.

BTW, what happens if you (make a mistake) and just replace the disks,
without deleting the metadb on them? There should not be any valid metadb
on the replacement disk. I would hope that SVM/SDS can detect that?!? The
metadb man page says "Each copy, referred to as a replica, is subject to
strict consistency checking to ensure correctness." Are you saying that
does not work? Do you have documented cases/incidents where it didn't?

> * When rebooting a machine where the admins have not noticed a failed
> disk. If you've got a failed disk and you haven't noticed, then the
> machine failing to reboot so that you notice is probably a good thing!
> Certainly it's far better than waiting for the 2nd disk to fail to find
> out. Again the fix is trivial, and fairly quick.

Unless you're not there, and the machine won't boot, and other machines
then get stuck for whatever reason (NFS files not accessible?).

> * If a disk fails during a reboot (or power cycle, or relocation, etc).
> Again as the machine is already down the overhead of having to fix this
> is minimal, and being notified of this failure fairly early in the boot
> process is not a bad thing in my mind.

Unlikely, but I would want the system to boot, if possible.

> The only situation where this really becomes a problem in terms of outage
> time is if you've got a failed disk and a machine reboots unexpectedly
> (panic, power outage, failfast). There's a really simple solution to this
> one too - monitor your systems! Running metadb/metastat from cron every
> 10 minutes and looking for errors is simple to setup, not to mention the
> dozens of scripts out there to do it for you - not the least the one in
> the SDS manuals!

OK, this is the situation that I would like to avoid. I have my servers
running off in the corner. I do keep an eye on them, but I don't monitor
them every second. I have other things to do. I would like to setup some
monitoring (every 10 minutes? sounds excessive? you don't expect 2 disk
failures within anything close to a 10 minute interval, do you?). I was
thinking more of a few times a day, and sending e-mail and/or SMS message
to my cell phone. I might not be there, leaving the servers to run,
sort of "lights out" but without the fancy/expensive gear. For my use, I
would expect a replacement response "within 1 day" to be acceptable. No?

I'm not recommending that people be foolish with their crucial gear, which
has contractual or severe availability implications. If you have a
big/important system, definitely use 3 way root mirrors. Problem solved.
If you're controlling a nuclear reactor (Sun specifically exclude that
usage in their license text!), go for full clustering on other gear. No
sense in being stupid, jeopardizing your systems, your job and career.

Now, my situation is probably different from yours. I don't have 1000 high
paying customers and a contractual obligation for 99.99% availability
(does anyone actually reach those numbers?!?). Not everyone does, either.
Maybe the qualification is: don't use X when Y, where we specify Y. I
haven't seen anything in your post to discourage me from using that flag.
I haven't seen the specific condition for which this is discouraged, just
your blanket preference/recommendation, which has validity. However, it
does not do anything to explain why (can't help it, I'm an EE).

In other discussions, we have agreed that it is better to have more
metadbs scattered across more (than 2) disks, in which case the question
is moot. One could even argue that for a small machine, with only 2 disks
(so I cannot put more metadbs anywhere) there is little likelihood of any
catastrophic contractual failure either. I'm not going to be supporting
1000 users with any Ultra2 with 2 x 9GB, am I? I was initially concerned
about a small Ultra2 with a 711: wondered if I would ever want to boot the
Ultra2 alone, without the 711, and I didn't want to have problems. Now I'm
thinking that I'll put more replicas on the 2 system disks, and less
replicas in the 711. Then the Ultra2 will boot without the 711, or the
Ultra2 (without a mirror) will boot with the 711. The only thing that
won't boot is the Ultra2 with a failed mirror and without the 711. I will
use quorum and will not be needing the flag, when I reconfigure like that.

If I had a small machine tucked away somewhere doing monitoring or process
control, I would probably use the flag with root mirrors too, unless
someone can show how the system might boot "insane and dangerous!"?

If you're never supposed to use this flag, then why did Sun put it into
Solaris? Why did they document it? Where were the retractions? Reasons?

-- 
Juhan Leemet
Logicognosis, Inc.


Relevant Pages

  • Re: rebuild 240v from scratch
    ... >>> disk bootable, I would boot from it and dd from one disk to the other. ... >> of METADEVICE databases on server b. ... >> find a metadb on server B to resolve it to a physical partition. ...
    (comp.unix.solaris)
  • Re: Cannot BOOT into XP PRO after installing & removing Ubuntu 7.1
    ... a boot CD on somebody's machine that has a floppy disk drive. ... I used the LiveCD version of "Gparted" to set the flag on the Win XP ... After that I was able to boot into Win XP (it went through a "Repair" ... flag" is the "active flag" that a boot partition must have. ...
    (microsoft.public.windowsxp.general)
  • Need Help on Mirroring ......
    ... deleting databases using the metadb command. ... I had a disk crash on me ... other cf files but the system panicked during boot. ...
    (SunManagers)
  • CHKDSK running at boot
    ... > Check Disk - Disk Checking Runs Upon Boot ... > "Rik Bean" Rik.Bean.1mz5po@xxxxxxxxxxxx wrote in message ... > that NTFS has a 'dirty volume' flag or something like that. ...
    (microsoft.public.windowsxp.help_and_support)
  • Re: rebuild 240v from scratch
    ... >>> of METADEVICE databases on server b. ... >>> find a metadb on server B to resolve it to a physical partition. ... > It is likely that you have lost the metadb partitions on you two ... > you two data disk. ...
    (comp.unix.solaris)