Re: Best way to simulate a failure



On Jan 13, 12:24 am, Cydrome Leader <prese...@xxxxxxxxxxxxxx> wrote:
Mike <n00s...@xxxxxxxxxxx> wrote:
I need to test a client's customized HA setup.  I want to simulate the
failure of a single box running Solaris.  As I understand it, a
poweroff issues a sync command which sends a FIN to all the network
connections.  A real failure wouldn't be so accommodating.  On the
other hand, I don't want to cause a real failure by pulling the power
cord too many times during a test.

Any suggestions?  Does it matter what specific boxes I use?  There are
several setups, each involving different hardware generations.  A
general solution is preferred.

crashing/corrupting/unplugging the machine is most accurate.

You can pull network cables, but that never screws up the
machine you just removed. Even better, pull drives out. That will generate
some great errors and the machine will still be on the network.

A machine falling off the network doesn't really simulate a half broken
machine that's still limping along, possibly causing more problems. That's
a real test right there.

Sigh. I agree that the best way to test HA is to actually fail the
hardware. However, I am not allowed to really break the existing
boxes. Management wants a different solution and is pushing for
poweroff. However, as I said, poweroff does a sync which is, to my
mind, way too polite. I looking for the closest thing to pulling the
power cord that doesn't put the system at a real risk of not powering
backup.

--Thank you, Mike Jr.

.



Relevant Pages

  • Re: Best way to simulate a failure
    ... A real failure wouldn't be so accommodating. ... You can pull network cables, but that never screws up the ...
    (comp.sys.sun.hardware)
  • Re: Best way to simulate a failure
    ... A real failure wouldn't be so accommodating. ... You can pull network cables, but that never screws up the ... Even better, pull drives out. ...
    (comp.sys.sun.hardware)
  • Re: Accessing network freezes computer
    ... >>>network has the same results. ... At this point I'm thinking hardware failure of some sort. ... >> close to a dozen startup lists. ... >> # Filemon makes a scrolling display of each file as it is accessed, ...
    (microsoft.public.windowsxp.network_web)
  • Re: Exchange Management Console slow
    ... So what are the odds of such a failure ... versus the cost of such protection, minus the extra downtime due to ... heartbeat network which places no impact on the production network ... Snapshots are generally not valid backups unless you're taking a VSS backup ...
    (microsoft.public.exchange.admin)
  • Re: Please stop apps going into state D uninterrupted sleep !!
    ... The apps don't *know* they're using a networked filesystem, ... (using /dev/whatever is just a lower-level abstraction). ... program would have to accept failure (this part being a problem that has ... network ops are already timing out and retrying and at some point they ...
    (Fedora)