Re: How do you manage 1000+ UNIX systems ?
aryzhov_at_spasu.net
Date: 06/27/05
- Next message: ML Starkey: "Re: limit printer access"
- Previous message: Sak Wathanasin: "Re: Remote management capailities of V20z?"
- In reply to: Rodrick Brown: "How do you manage 1000+ UNIX systems ?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 27 Jun 2005 06:39:38 -0700
IMHO, large environment can not survive without
a formal change management system. I.e. if someone wants
to make a change, she has to request it, get it approved
by all affected parties (like, if you want to grow
the Oracle filesystem, make sure the DB team signs
on the change request, even though you probably could
grow the FS on-the fly, and are 100% sure nothing will
go wrong).
Ticketing system. You never make a change
because you feel like it. You only do when you get a ticket.
Some tickets are generated by monitoring systems, some
by business units, some by your collegues. Ocasionally,
you may open a ticket to yourself.
After all, this shows your boss how busy you are :-)
Some solutions (like Remedy, for instance) combine both
Change Management and Trouble Ticketing.
Knowledge database. As previous posters mentioned,
communication within admin team must be logged, and logs
must be searcheable. The brightes solution I've seen so far
was an unpersonalised mail alias inside sysadmin mail group.
Whenever mail is sent to mail group, a copy is stored
in this archived mailbox. Every group member has read
access to this box and can search by keywords or hostname.
Sudo logging. Good practice is to not only log all
superuser logins, but also trace all commands run by root
interactively. Policy may deny direct root logins
and su root, allowing sudo only, and sudo sessions
can be logged to the central log server. Of course
there are many ways to intentionally break the policies,
but such violations can also be logged in most cases.
"hostinfo" database (can be part of Remedy) must be
carefully maintained. Good things to keep there are
contact person for all apps running on the host, plus
any exotic specifics.
Every trouble ticket or change request must
have a checkbox whether the ticket/change requires
manual update of hostinfo db entry.
Some change requests on live hosts may require
Jumpstart updates (especially when Jumpstart is used
as emergency restore mechanism), thus, people responsible
for host staging must be on signoff list.
regards,
Andrei
- Next message: ML Starkey: "Re: limit printer access"
- Previous message: Sak Wathanasin: "Re: Remote management capailities of V20z?"
- In reply to: Rodrick Brown: "How do you manage 1000+ UNIX systems ?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|