Re: Problem witj LPP file / maybe admin style discussion :)
From: Peter Jakobi (lists_at_KEFK.OA.SHUTTLE.DE)
Date: 09/15/05
- Previous message: Robert Binkley: "Re: Problem witj LPP file"
- In reply to: JOSEPH KREMBLAS: "Re: Problem witj LPP file"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Thu, 15 Sep 2005 02:57:51 +0200 To: aix-l@Princeton.EDU
JOSEPH KREMBLAS wrote:
> [corrected formulation of query by Joseph I believe]
> BACKGROUND. I corrupted my LPP files by copying over some LPP files on an
> AIX 5.3 system to an AIX 5.2 system and installing the AIX 5.3 LPP files on
> the AIX 5.2 system.
>
>
>
> QUESTION: How do I fix my "screw up"?
>
> As for the phrase "without rebuilding,"--I feel you mean to say "rebuilding
> the ODM that I corrupted, as I don't have a backup."
>
> ANSWER: If any of the above is true, I ask: Can you say, "Gosh darn,
> re-install AIX from scratch?"
Hmm. My aix is still at 4.3.3 and a bit rusty. But important
command names and overall structure shouldn't have changed too much.
Investigating into the handling of the installed rpms and lpps
sounds like an interesting and useful research to improve one's
skills :).
So a quick, "improper" hack might be:
If the files overwritten are installed files (not being files of
the odm database itself), it shouldn't mess up the whole system that
badly. (Deleting the files in question?) after finding the corresponding
lpps and removing them, afterwards reinstalling them should do the trick.
[sorry, cannot find any note for this in my files; associating
files to lpps was either trivial via lspp or odm. Or you need
lslpp and for the lpps look into the root dir of all your
installation media containing the corrupted lpps.
AFAIR, there were some files explicitely listing dependencies
and possibly also the files contained. Sorry, never had time
to look properly in building lpps. And given rpm in 5.x, I probably
never will.]
[Hmm. I think I saw some mention to check consistency of installed
lpps and fixes. Have a look at smitty or successor, lslpp, instfix, ...]
[Oh - also check whether there are patches dealing with the files
you killed. Backout/reinstall them as necessary.]
As should do setting up a second near-identical 5.2 and
copying(+deleting additional ones...) the files from there.
You already have one, as test system, don't you?
Of course the sane way would be to move user data to a safe place
(read: another system, *two* backup sets, ...), and restoring
the system itself from the most recent backup. Better yet, if the
userdata on backup is current, just restore the whole backup.
Given a short term emergency maintenance window!
If the system is already quite dead, it should still be possibly to
trick with alternate rootvgs or maintenance mode to modify the
filesystem directly.
Maybe even slower than rebuilding the system following proper
procedure. And in spite of above, you may still need to do that,
too.
But the best tip is still by Douglas Adams: don't panic: don't allow
others to put you in stress and and force you to risk enlarging
the problem (willingly or by stress).
Note:
Senior admin high wizardry AKA quick hacks that CAN BE UNDONE
quickly (if *any*thing worth saving still exists)
are IMHO valid attempts to deal with the most likely suspects
to reduce ->un<-scheduled downtime.
More stoid admins are known to burst into flames when confronted
with such an idea. And have good arguments for doing so. (1)
For quick hacks you should have a VERY GOOD overview of
YOUR system, ALL of its history and ALL the consequences
of your quick hack to the system AND HOW TO UNDO it in case of hack
failure. And as you thought about undoing your quick hack
(you DID!?), you can still follow the proper and slow procedures
when the hack fails without having lost too much additional time.
Never allow people to rely on you being able to do such tricks.
They're a (rare) bonus, and beyond the call of duty. If they
keep expecting it, refuse quick hacks for a while - you know
the system, they don't. When considering quick hacks, you've
already lost at least one set of 'backups' (the live one, or
whatever) - moving things around now increases risk a bit, say,
when a 2nd failure occurs. Which is sufficient argument if you need
to argue for following slow proper procedures. See (1).
If you're uncertain about your idea and undoing it, do NOT try it.
Test it later on a test system in your spare time. You do have
test systems of similar and suitable hardware, don't you?
And if you willingly compromise the system state/odb/...
(or allow it to remain *possibly* compromised),
you should still rebuild or restore the system to a known
consistent state. As quickly as possible in the next
->scheduled<- maintance window :).
Oh, and check why things happened in the first place. Quick
hacks SHOULD NOT be necessary at all. You still need to
improve structures and procedures to avoid repetitions.
A slew of all-nighters every week shows immaturity of
organization. And proves an unwillingness to improve.
Consider switching jobs if you cannot educate people.
Note to self: store as personal admin style manifesto
for future use:).
Well, sometimes there's still a chance to be
paid for learning, playing with nice hardware
and interesting, non-repetitive, non-trivial
user errors :),
Peter
PS: concerning unix sysadmin work, have a look at sage.org,
sysadmin (samag.com) and aix update.
- Previous message: Robert Binkley: "Re: Problem witj LPP file"
- In reply to: JOSEPH KREMBLAS: "Re: Problem witj LPP file"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]