Re: Solaris 10 + SVM = What a Piece of Sh#t !!!

From: xstian (cg2131_at_columbia-dot-edu.no-spam.invalid)
Date: 10/11/05


Date: 11 Oct 2005 00:02:32 GMT


> UNIX adminwrote:
Roger P. Johnson wrote:
> Blew a week testing SVM because of Bug ID 57779 -- disable logging
on
> the root file system when mirroring the root file system on a 2
disk
> system. OK fine and dandy. So much for resilance. I make my
changes.
>
> Once I have a fully mirrored system, I now shutdown, remove either
disk,
> and boot off the other mirror and I get another kernel panic and
the
> endless cycling. WTF? Another bug? So now I can't even make a copy
of my
> system and ditto them into my other AC200's as a coookie cutter
> approach? KMA!
>
You don't understand how SVM works, you have not read the
documentation,
and even if you have, you have not comprehended it.

It takes exactly five (5) minutes or less to *correctly* configure
mirroring on Solaris. Solaris SVM is the easiest ever volume manager
to
use. BUT YOU HAVE TO UNDERSTAND WHAT YOU ARE DOING. And you don't.
That's the problem.

Had you read the documentation and EXAMPLES at http://docs.sun.com/
you would have known how to do it properly.

And BTW, the kernel panicked because you failed to maintain a quorum.

There weren't enough meta database replicas left after you took the
disk
out and REBOOTED BEFORE INTENTIONALLY DESTROYING replicas that have
now
failed.

In other words, you didn't know what you were doing. Call us back when

you learn Solaris.[/quote:db61351ec8]

This is a really lame and completely useless response, and if the
truth be known, it is totally unfair. There is a known bug with
Solaris 10 and mirrored root filesystems that prevents it from
booting off a second submirror in the event that the first submirror
dies or becomes inaccessible (my example assumes you have two
submirrors in your mirrored root filesystem). Sun has issued a
work-around but no patch yet for this issue.

I have just spent a full work day trying to get an E420R with 2x18GB
disks, running Solaris 10 with all latest patches, with mirrored root
and swap filesystems, to boot from the second root submirror once the
first has been physically removed from the system. My mirrors were
set up and sync'ed correctly -- I agree with you that there is more
than adequate documentation available online on how to mirror
filesystems, including the root filesystem. SEVM is really great,
and very easy to work with. I agree with you on that -- however, in
our case, we have been bitten by a Solaris 10 bug that may force us
to go back to Solaris 9 until Sun can issue a patch to fix this. As
of this posting (Oct. 10th 2005) no patch exists that addresses this
issue, to my knowledge.

This is a known Bug in Solaris 10:

BugId 6215065

See the following URL for a discussion about this known bug:
http://forum.sun.com/thread.jspa?threadID=24802&tstart=135

Sun has issued a workaround that will let you boot from a submirror in
the event that the primary disk fails, but this involves some
gruntwork (mounting the second disk after booting to a CD or a
network boot, editing vfstab, removing metadbs that refer to the
first, failed or absent submirror, etc. see the URL above) and is not
a clean solution. According to the sunsolve document which the
posting above references, "A final resolution is pending
completion."

I sat all day trying to get this to work -- my mirrors were set up
fine but I cannot boot from the second disk once the first has been
physically removed from the system. Instead, I get the same kernel
panic as described above, with identical messages. This is the same
situation that is described in the sunsolve article mentioned in the
URL I included above.

Here are the errors when I try to boot from the second submirror:

WARNING: md: d2: (Unavailable) needs maintenance
WARNING: Error writing ufs log state
WARNING: ufs log for / changed state to Error
WARNING: Please umount(1M) / and run fsck(1M)

panic[cpu0]/thread=fec1be20: mod_hold_stub: Couldn't load stub module
misc/strplumb Solution SummaryTo
 
This is what is described in the sunsolve article and is clearly a bug
that Sun has yet to resolve. So before you go calling someone stupid,
you should take into account the possibility that there really is a
bug in Solaris 10 that is causing this, despite Sun's stellar record
for producing stable software. I, too, am quite amazed that
something as trivial as this was not tested before Solaris 10 was
pushed out the door.

Our options are to downgrade to Solaris 9 for the time being, or else
wait until a patch is issued that fixes this problem. (You can also
go ahead with 10 now if you are comfortable fixing the submirror by
hand as described in the workaround above -- but this is kludgey at
best -- or else you can hope that your disk lasts until Sun can issue
a patch to fix this...)

I hope this helps -- I may be a jerk, but I'm not stupid. ;)

Christian Gough
EE Systems Analyst
Columbia University
christian.gough AT ee.columbia.edu



Relevant Pages