Re: Problem witj LPP file / maybe admin style discussion :)

From: Peter Jakobi (lists_at_KEFK.OA.SHUTTLE.DE)
Date: 09/15/05

  • Next message: Hirter Marcel: "TR: Best way to wipe data with AIX ?"
    Date:         Thu, 15 Sep 2005 02:57:51 +0200
    To: aix-l@Princeton.EDU
    
    

    JOSEPH KREMBLAS wrote:

    > [corrected formulation of query by Joseph I believe]
    > BACKGROUND. I corrupted my LPP files by copying over some LPP files on an
    > AIX 5.3 system to an AIX 5.2 system and installing the AIX 5.3 LPP files on
    > the AIX 5.2 system.
    >
    >
    >
    > QUESTION: How do I fix my "screw up"?
    >
    > As for the phrase "without rebuilding,"--I feel you mean to say "rebuilding
    > the ODM that I corrupted, as I don't have a backup."
    >
    > ANSWER: If any of the above is true, I ask: Can you say, "Gosh darn,
    > re-install AIX from scratch?"

    Hmm. My aix is still at 4.3.3 and a bit rusty. But important
    command names and overall structure shouldn't have changed too much.

    Investigating into the handling of the installed rpms and lpps
    sounds like an interesting and useful research to improve one's
    skills :).

    So a quick, "improper" hack might be:

    If the files overwritten are installed files (not being files of
    the odm database itself), it shouldn't mess up the whole system that
    badly. (Deleting the files in question?) after finding the corresponding
    lpps and removing them, afterwards reinstalling them should do the trick.

    [sorry, cannot find any note for this in my files; associating
    files to lpps was either trivial via lspp or odm. Or you need
    lslpp and for the lpps look into the root dir of all your
    installation media containing the corrupted lpps.
    AFAIR, there were some files explicitely listing dependencies
    and possibly also the files contained. Sorry, never had time
    to look properly in building lpps. And given rpm in 5.x, I probably
    never will.]

    [Hmm. I think I saw some mention to check consistency of installed
    lpps and fixes. Have a look at smitty or successor, lslpp, instfix, ...]

    [Oh - also check whether there are patches dealing with the files
    you killed. Backout/reinstall them as necessary.]

    As should do setting up a second near-identical 5.2 and
    copying(+deleting additional ones...) the files from there.
    You already have one, as test system, don't you?

    Of course the sane way would be to move user data to a safe place
    (read: another system, *two* backup sets, ...), and restoring
    the system itself from the most recent backup. Better yet, if the
    userdata on backup is current, just restore the whole backup.
    Given a short term emergency maintenance window!

    If the system is already quite dead, it should still be possibly to
    trick with alternate rootvgs or maintenance mode to modify the
    filesystem directly.

    Maybe even slower than rebuilding the system following proper
    procedure. And in spite of above, you may still need to do that,
    too.

    But the best tip is still by Douglas Adams: don't panic: don't allow
    others to put you in stress and and force you to risk enlarging
    the problem (willingly or by stress).

    Note:

    Senior admin high wizardry AKA quick hacks that CAN BE UNDONE
    quickly (if *any*thing worth saving still exists)
    are IMHO valid attempts to deal with the most likely suspects
    to reduce ->un<-scheduled downtime.

    More stoid admins are known to burst into flames when confronted
    with such an idea. And have good arguments for doing so. (1)

    For quick hacks you should have a VERY GOOD overview of
    YOUR system, ALL of its history and ALL the consequences
    of your quick hack to the system AND HOW TO UNDO it in case of hack
    failure. And as you thought about undoing your quick hack
    (you DID!?), you can still follow the proper and slow procedures
    when the hack fails without having lost too much additional time.

    Never allow people to rely on you being able to do such tricks.
    They're a (rare) bonus, and beyond the call of duty. If they
    keep expecting it, refuse quick hacks for a while - you know
    the system, they don't. When considering quick hacks, you've
    already lost at least one set of 'backups' (the live one, or
    whatever) - moving things around now increases risk a bit, say,
    when a 2nd failure occurs. Which is sufficient argument if you need
    to argue for following slow proper procedures. See (1).

    If you're uncertain about your idea and undoing it, do NOT try it.
    Test it later on a test system in your spare time. You do have
    test systems of similar and suitable hardware, don't you?

    And if you willingly compromise the system state/odb/...
    (or allow it to remain *possibly* compromised),
    you should still rebuild or restore the system to a known
    consistent state. As quickly as possible in the next
    ->scheduled<- maintance window :).

    Oh, and check why things happened in the first place. Quick
    hacks SHOULD NOT be necessary at all. You still need to
    improve structures and procedures to avoid repetitions.
    A slew of all-nighters every week shows immaturity of
    organization. And proves an unwillingness to improve.
    Consider switching jobs if you cannot educate people.

    Note to self: store as personal admin style manifesto
    for future use:).

    Well, sometimes there's still a chance to be
    paid for learning, playing with nice hardware
    and interesting, non-repetitive, non-trivial
    user errors :),
    Peter

    PS: concerning unix sysadmin work, have a look at sage.org,
    sysadmin (samag.com) and aix update.


  • Next message: Hirter Marcel: "TR: Best way to wipe data with AIX ?"