Re: Weekly reboots of Solaris servers? Insane or not?

From: Peter Bunclark (psb_at_ast.cam.ac.uk)
Date: 03/01/04


Date: Mon, 01 Mar 2004 15:36:04 +0000


aryzhov wrote:

>wasadmin@optonline.net (WAS Admin) wrote in message news:<2b7019bb.0402260806.215950fd@posting.google.com>...
>
>
>>My company uses a number of Solaris 8 servers. These servers are
>>running Oracle, WebSphere, iPlanet, etc... They are scheduled to
>>reboot every Sunday morning (no, I don't know why. Probably to cover
>>up memory leaks).
>>
>>My question is this: How often does your company reboot their Solaris
>>servers? Or are they only reboot if a problem occurs or for
>>maintenance reasons? If your servers are reboot often, can you
>>explain why?
>>
>>
>>
>
>I used to work on large/critical sites that don't have
>a periodic reboots policy, and on as large and critical ones
>that do.
>
>My impression is that having a policy, in general,
>shows a more serious attitude than not having one.
>I'd even put it like "those who reboot, care more".
>
>Let's face it: people prefere not to reboot because it's
>an extra work, and noone likes it when his pager starts beeping at 5 a.m.
>on Saturday morning because one of the servers failed to boot in time.
>And, of course, managers don't dream to allocate the budget
>for those exciting hours.
>
>I's always easy to find an excuse for something you don't like
>to do isn't it? "Reboots don't matter" is one of such excuses,
>I believe.
>
>To me, keeping the serves up until something happens,
>sounds like a significant risk, and periodic reboots have
>lots of good reasons besides cleaning up the memory leaks.
>Checking for possible boot sequence screwups was already named here;
>another one to consider is NFS or other nasty cross-mounts, or other
>dead-lock-style interdependencies between the servers that sometimes
>not only prevent them from booting, but even prevent from shutting down.
>Organizing persistent backdoors and other hooks requires quite a bit
>bigger effort from the hacker in the "dynamic" networks where the stuff
>reboots every so often, thus security, IMHO, wins this way, too.
>That's only what comes to mind easily.
>Not necessarily the most important.
>
>To summarise, reboots *do* help to discover the problems
>at more convenient time, and to keep machines clean.
>Reboot if you care :-)
>
>Regards,
>Andrei
>
>
There seems to be a certain amount of muddled logic here; considering some
of the points:
1) if you've changed the startup scripts, clearly they need testing and
debugging
     by rebooting.
     if you haven't, then there's no need to do so.

2) If you have OS memory leaks, report it to OS suppport; if you have
application
    memory leaks, fix app. In interim, before fixes, then reboot; but
if, like many
    of us apparently, the OS doesn't leak and neither do the apps, then
don't reboot
    for this reason either.

3) If you find you have ``...interdependencies...'', which apparently
don't show
    up for a long time, one could argue that it is good to keep going
till you hit
    them so then you can identify and debug the problem.

4) Ground truth: those of us that run systems for long periods prove (to
ourselves at
     least) that they function well in this regime; those of you who
reboot frequently
     can only be theorizing.

As it happens, we have some simulation apps that run for several weeks
at a time;
weekly reboots are not an option.

Pete.



Relevant Pages

  • Internet apps suddenly want to be servers
    ... The only way out of this is to reboot. ... but would rather know when client-only apps try to act as servers. ...
    (microsoft.public.windows.vista.networking_sharing)
  • Re: JCIFS18_15_5D
    ... a Cisco wireless network where we searched for the PC names but did ... remember that the issued went away after we rebooted the servers. ... that this is a naming convention used by some OS or service. ... And ofcourse you should reboot the servers after installing ...
    (microsoft.public.windows.server.general)
  • Re: Weird ADO failure/crash behaviour
    ... which managed nearly 18 months uptime in a single stretch - ... that when I was using Windows NT SP6a, ... Thursday invariably it needed a reboot. ... Servers are all HP Netservers and Compaq ML ...
    (microsoft.public.data.ado)
  • Re: Active Synch, OWA, RPC over HTTPS, quit working
    ... No error messages on either the frontend or backend servers. ... When it happened to the other two boxes, only a reboot cleared it up. ... I have 3 front-end servers load-balanced through a Cisco Content Series ... Authentication form, it seems that all other IIS services fail, without ...
    (microsoft.public.exchange.admin)
  • Re: Windows 2003 SP1
    ... Prior to the SP1 upgrade, I had the Dell 1750 at the latest ... >> We have 4 Dell servers in a single domain acting as DCs ... >> Installation completed successfully but on reboot the registry was ... >> with a STOP message and Registry_Error and error code 0x00000051. ...
    (microsoft.public.windows.server.general)