Good case for VMS
From: DeanW (dean.woodward_at_gmail.com)
Date: 02/25/05
- Next message: Rob Young: "Re: Sayonara Tukwilla"
- Previous message: Jean-François Piéronne: "Python Webware application server succesfully run on OpenVMS"
- Next in thread: Dave Froble: "Re: Good case for VMS"
- Reply: Dave Froble: "Re: Good case for VMS"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Fri, 25 Feb 2005 09:39:17 -0800
Anyone else notice that the BOINC version of Seti@Home is down? I'm
not going to claim that VMS would have prevented all the issues
they're having right now- but it might have helped. I leave it as an
exercise to the reader to determine the cost:benefit ratio a 2-node
cluster that would have prevented this whole mess might have.
http://setiweb.ssl.berkeley.edu/tech_news.php
February 24, 2005 - 23:30 UTC
Update on yesterday's outage: We are still dealing with some database
fallout. Most of the classic SETI@home systems are up - enough that we
can serve workunits to users. However, BOINC is dead in the water
until we get at least one database server up and running.
With the master database corrupted beyond repair, we turned all our
attention to the replica. Its disks finished sync'ing last night, and
after some file system checks the machine booted and mysql started
just fine. A battery of tests revealed no corruption.. until we got to
the result table. Of course, that's by far the biggest and most
important table in the database. We are attempting to repair it now.
Assuming we can repair it with little or no data loss, we will then
dump all the data from the replica back onto the master. If we're
lucky, this will be done by tomorrow morning and we can start revving
all the engines back up.
Please note that since it was a slower machine than the master, the
data on the replica database server was about 30 minutes behind real
time. We did try to limp both systems along to sync the replica data
up even further but no dice. So, when we do get back on line it will
be as if there was a half-hour hole in time during which all uploaded
results were lost (and any user profile updates, message board
postings, etc.). We sincerely apologize to all our users for this
loss.
Court brought in a UPS from his personal server collection. So the
master database will be protected while we scramble to purchase
another. The database server was unprotected yesterday because it was
in our lab, not in the data closet where all of our UPS's are. We
were/are just weeks away from a data closet reorganization designed to
make room for the DB server.
February 23, 2005 - 23:30 UTC
A sudden, unexpected power outage due to a blown breaker shut the
whole BOINC project down for several hours (along with all the other
projects in the lab). The cause is still unknown (which is scary), so
there will be a scheduled power outage in the near future to hunt for
electrical problems. We do know this: we just can't seem to catch a
break around here.
We were able to gracefully shut down many servers on battery backup
(UPS) before the batteries drained, but not all of them, including the
new BOINC database server. So the data is scrambled, and mysql refuses
to start. Our last backup to tape is a week old. This week's tape
backup was about 60% finished when the power went out (Murphy's law in
a nutshell).
The good news is we have a replica database which should be up to
date. The bad news is that this had disk errors upon booting up and
its drives are still resync'ing. After that, we'll have to check the
table integrity on the replica - if we're lucky and mysql is able to
start, we can then dump the data from the replica back onto the master
and continue right where we left off.
Earlier this morning the project was off for some routine maintenance
(tweaking the BIOS on the database server to get rid of spurious error
messages and snapshotting for database backups). An hour after we
brought everything back up the power went off.
- Next message: Rob Young: "Re: Sayonara Tukwilla"
- Previous message: Jean-François Piéronne: "Python Webware application server succesfully run on OpenVMS"
- Next in thread: Dave Froble: "Re: Good case for VMS"
- Reply: Dave Froble: "Re: Good case for VMS"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|