Re: Patch your damn computers already!!!
From: Bill Vermillion (bv_at_wjv.com)
Date: 11/10/05
- Next message: Philip Paeps: "Re: Help with auth smtp"
- Previous message: Chronos: "Re: Help with auth smtp"
- In reply to: Matthew X. Economou: "Re: Patch your damn computers already!!!"
- Next in thread: Philip Paeps: "Re: Patch your damn computers already!!!"
- Reply: Philip Paeps: "Re: Patch your damn computers already!!!"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Thu, 10 Nov 2005 12:45:00 GMT
In article <u3bm5guwv.fsf@irtnog.org>,
Matthew X. Economou <xenophon+usenet@irtnog.org> wrote:
>>>>>> "Bill" == Bill Vermillion <bv@wjv.com> writes:
>
> Bill> Very few have required making a new kernel - and thus having
> Bill> to reboot the system.
>I usually reboot for all core library updates (e.g., libc, libm,
>libssl, etc.), as well as all kernel patches. I suppose for library
>patches, one could merely drop to single user mode briefly, but that
>requires console access; a remote reboot doesn't.
I don't know how you could NOT reboot on a kernel patch :-)_
> Bill> Patch yes. Reboot is not neccessary for all things. Up
> Bill> time is critical for the clients.
>Uptime is a poor measure of availability, never mind integrity or
>confidentiality. I would be much more interested in unscheduled
>outages, as maintenance-related outages are a cost of doing business
>and can usually be scheduled during the least inconvenient times.
With upwards of 300,000 emails a day going through the system it's
good stay up most of the time.
I've had about 6 hours of downtime - MAX - maybe 4 - since we
moved into the current facility in February of 2000 - one of the
first clients in the Level 3 facility locally.
There was about 1/2 hour of intermittent connections - dropped lots
of packets - when L3 had a bad card in a Cisco 12000. We were the
only one who called - at 5AM - but there were only 6 clients on
the problem card.
There was 45 minutes outage because of an administration error by
the managment.
Then there was a Cisco 7200 - that perhaps got hacked. It stopped
about 2AM. I drove to the facility and restarted it and found no
other problems. About 3AM I get another call - it was down again -
and the client had about 1200 domains spread over 2 systems.
I called my partner who headed there. And when he got there I then
went into my backup machine and reconfigure it as a router. My
partner would restart the Cicso when it failed. Then I got the
FreeBSD In place. We have a Foundry Engines switch/router in
place now.
As to downtime on servers - I'll build up a new one, and then
switch almost on the fly - and the new server is up and accessible
almost immediately - with only about 2 minutes lost accessibility
during the changeover.
The only time I was really worried about unknown holes was when we
were running SGI's with IRIX 5.x - before the move to FreeBSD in
1996.
> Bill> It's all in your approach, but I disagree with 'reboot
> Bill> often'. When the machines are a minimum of 1/2 hour drive
> Bill> time at 2AM and much longer during daylight hours you reboot
> Bill> when you know you have to, but not un-necessarily.
>I have the luxury of local test hardware, where I can try out the
>remote patch process without worrying about locking myself out of
>systems that are physically distant. I'm pretty anal about testing
>patches prior to installing them, as (especially in the Windows and
>Solaris worlds) I've had patches cause more problems than they solved.
>I would be much, much more careful if I had to do everything remotely.
I always run the RELEASE versions on the servers, but always have
the latest STABLE in the 4x locally.
And as to 'patches cause more problems than they solved' that is SO
TRUE in the MS world. I stay away from MS servers. And though
I've not done much with Solaris I've gotten them working when the
people who installed it [who really shouldn't have done it in the
first place] were trying a Linux approach and totally munged it up.
And patches last week totally made my local XP machine unworkable.
So it was a weekend of reinstall. Thank goodness for Knoppix [I
like it for emergency use] as it was able to mount the NTFS
partition and I fired up FTP and moved everything I didn't have
backed up to CD/DVD to the FreeBSD machine. Too bad MS doesn't
have a 'single-user' mode concept so you can go in fix/repair
things that are repairable. One directory was not readable, and I
wondered what would happen if MS had an 'fsck' type utility -
whether it would be recoverable. The more I work with MS the more
I dislike it. DOS 2.0 drove me to Unix in 1983 and I've never
looked back [ you never know what's behind you :-) ].
On the Solaris machine just reading the error messages and going
through one step at a time I got it running. I guess I need to get
a new release of Solaris to take a peek at it.
These are the same people who had a tunnel to adminstering their
machines in our racks. We installed the SW/HW on both sides.
Things looked OK - but the far city said things weren't working.
I went in and saw packets come in, go to their machines [two Sun
Pizza box front ends for apache, two Apple G4s for Web Objects,
and a multi-cpu Sun for Oracle] and I'd see the packets leave.
But they said it was broken. So we got a 3-way call with client,
GTA [hw firewall vendor] and client. I then saw where the packets
were stopping and asked the client who has IP xx.xx.xx.xx - because
that's where it's stopping.
They said "I don't know , it's not ours".
So some more sniffing and using RECORD ROUTE I found that it
WAS their IP. It was the serial port coming into their local
Cicso 2500. The admin - who was new at the job and was primarily a
Web Object programmer and Linux admin - had blocked everything in
sight without understanding what he was doing.
You surely learn a lot trouble shooting things for remote users :-)
The worst day was two days of getting a cross-country link up -
setting up IPs remotely that a local user had so that he could just
put his machines up without reprogramming.
Wound up talking to people in 4 NOCs through 4 different transport
providers - and if I had not had serial ports configured with IPs
on the far routers, I would have never found that PacBell had
missed putting a return route in so packets would go Florida, to
California, to Washington DC, and on the way back they'd stop in
Calfornia.
I surely DO NOT miss the days when there were so few transport
providers. Days like that were pure hell.
Bill
-- Bill Vermillion - bv @ wjv . com
- Next message: Philip Paeps: "Re: Help with auth smtp"
- Previous message: Chronos: "Re: Help with auth smtp"
- In reply to: Matthew X. Economou: "Re: Patch your damn computers already!!!"
- Next in thread: Philip Paeps: "Re: Patch your damn computers already!!!"
- Reply: Philip Paeps: "Re: Patch your damn computers already!!!"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|