Re: Patch your damn computers already!!!

From: Bill Vermillion (bv_at_wjv.com)
Date: 11/10/05


Date: Thu, 10 Nov 2005 12:45:00 GMT

In article <u3bm5guwv.fsf@irtnog.org>,
Matthew X. Economou <xenophon+usenet@irtnog.org> wrote:
>>>>>> "Bill" == Bill Vermillion <bv@wjv.com> writes:
>
> Bill> Very few have required making a new kernel - and thus having
> Bill> to reboot the system.

>I usually reboot for all core library updates (e.g., libc, libm,
>libssl, etc.), as well as all kernel patches. I suppose for library
>patches, one could merely drop to single user mode briefly, but that
>requires console access; a remote reboot doesn't.

I don't know how you could NOT reboot on a kernel patch :-)_

> Bill> Patch yes. Reboot is not neccessary for all things. Up
> Bill> time is critical for the clients.

>Uptime is a poor measure of availability, never mind integrity or
>confidentiality. I would be much more interested in unscheduled
>outages, as maintenance-related outages are a cost of doing business
>and can usually be scheduled during the least inconvenient times.

With upwards of 300,000 emails a day going through the system it's
good stay up most of the time.

I've had about 6 hours of downtime - MAX - maybe 4 - since we
moved into the current facility in February of 2000 - one of the
first clients in the Level 3 facility locally.

There was about 1/2 hour of intermittent connections - dropped lots
of packets - when L3 had a bad card in a Cisco 12000. We were the
only one who called - at 5AM - but there were only 6 clients on
the problem card.

There was 45 minutes outage because of an administration error by
the managment.

Then there was a Cisco 7200 - that perhaps got hacked. It stopped
about 2AM. I drove to the facility and restarted it and found no
other problems. About 3AM I get another call - it was down again -
and the client had about 1200 domains spread over 2 systems.

I called my partner who headed there. And when he got there I then
went into my backup machine and reconfigure it as a router. My
partner would restart the Cicso when it failed. Then I got the
FreeBSD In place. We have a Foundry Engines switch/router in
place now.

As to downtime on servers - I'll build up a new one, and then
switch almost on the fly - and the new server is up and accessible
almost immediately - with only about 2 minutes lost accessibility
during the changeover.

The only time I was really worried about unknown holes was when we
were running SGI's with IRIX 5.x - before the move to FreeBSD in
1996.

> Bill> It's all in your approach, but I disagree with 'reboot
> Bill> often'. When the machines are a minimum of 1/2 hour drive
> Bill> time at 2AM and much longer during daylight hours you reboot
> Bill> when you know you have to, but not un-necessarily.

>I have the luxury of local test hardware, where I can try out the
>remote patch process without worrying about locking myself out of
>systems that are physically distant. I'm pretty anal about testing
>patches prior to installing them, as (especially in the Windows and
>Solaris worlds) I've had patches cause more problems than they solved.
>I would be much, much more careful if I had to do everything remotely.

I always run the RELEASE versions on the servers, but always have
the latest STABLE in the 4x locally.

And as to 'patches cause more problems than they solved' that is SO
TRUE in the MS world. I stay away from MS servers. And though
I've not done much with Solaris I've gotten them working when the
people who installed it [who really shouldn't have done it in the
first place] were trying a Linux approach and totally munged it up.

And patches last week totally made my local XP machine unworkable.
So it was a weekend of reinstall. Thank goodness for Knoppix [I
like it for emergency use] as it was able to mount the NTFS
partition and I fired up FTP and moved everything I didn't have
backed up to CD/DVD to the FreeBSD machine. Too bad MS doesn't
have a 'single-user' mode concept so you can go in fix/repair
things that are repairable. One directory was not readable, and I
wondered what would happen if MS had an 'fsck' type utility -
whether it would be recoverable. The more I work with MS the more
I dislike it. DOS 2.0 drove me to Unix in 1983 and I've never
looked back [ you never know what's behind you :-) ].

On the Solaris machine just reading the error messages and going
through one step at a time I got it running. I guess I need to get
a new release of Solaris to take a peek at it.

These are the same people who had a tunnel to adminstering their
machines in our racks. We installed the SW/HW on both sides.
Things looked OK - but the far city said things weren't working.

I went in and saw packets come in, go to their machines [two Sun
Pizza box front ends for apache, two Apple G4s for Web Objects,
and a multi-cpu Sun for Oracle] and I'd see the packets leave.

But they said it was broken. So we got a 3-way call with client,
GTA [hw firewall vendor] and client. I then saw where the packets
were stopping and asked the client who has IP xx.xx.xx.xx - because
that's where it's stopping.

They said "I don't know , it's not ours".

So some more sniffing and using RECORD ROUTE I found that it
WAS their IP. It was the serial port coming into their local
Cicso 2500. The admin - who was new at the job and was primarily a
Web Object programmer and Linux admin - had blocked everything in
sight without understanding what he was doing.

You surely learn a lot trouble shooting things for remote users :-)
The worst day was two days of getting a cross-country link up -
setting up IPs remotely that a local user had so that he could just
put his machines up without reprogramming.

Wound up talking to people in 4 NOCs through 4 different transport
providers - and if I had not had serial ports configured with IPs
on the far routers, I would have never found that PacBell had
missed putting a return route in so packets would go Florida, to
California, to Washington DC, and on the way back they'd stop in
Calfornia.

I surely DO NOT miss the days when there were so few transport
providers. Days like that were pure hell.

Bill

-- 
Bill Vermillion - bv @ wjv . com


Relevant Pages

  • Re: Patch your damn computers already!!!
    ... Bill> to reboot the system. ... I usually reboot for all core library updates (e.g., libc, libm, ... libssl, etc.), as well as all kernel patches. ... Bill> time is critical for the clients. ...
    (comp.unix.bsd.freebsd.misc)
  • RE: MS Software Update Service
    ... and approve the patches before releasing them to your clients. ... Automatic Update client) ... Doesn't patch SQL Server, Exchange, Office, etc. ... distributed through Active Directory software installation, ...
    (Focus-Microsoft)
  • Windows XP home edition
    ... >Can you access safe mode via the BIOS? ... >To prevent resets interupting the downloading of patches ... >Turn off Automatic Reboot, if you haven't already. ... >virus forum.Even if you elect to reformat,please report ...
    (microsoft.public.windowsxp.security_admin)
  • Re: Error 3197 but there are no memo fields
    ... I know it will be a battle, but client should know that nowadays you can't expect to have a functioning app without being current on patches. ... I've got the Error 3197 problem ("The Microsoft Jet database engine stopped the process because you and another user are attempting to change the same data at the same time.") in a client database. ... I took a copy of the backend, did compact/repair, opened each table and searched for corrupt fields, even sorted on each field in each table, which usually turns up a problem if there's corruption, and found nothing. ...
    (comp.databases.ms-access)
  • Re: Office Installation Not Syncing to AIP
    ... Specifically two patches that are applied to it are the ... Here are the cached client patches. ... Move to the method previously discussed where the AIP stays at baseline ... get successfully locked down and clients install random patches then the ...
    (microsoft.public.officeupdate)