Re: Required reading for HP executives
From: JF Mezei (jfmezei.spamnot_at_istop.com)
Date: 12/02/03
- Next message: John Brandon: "Directory output by date rather than name"
- Previous message: Barry Treahy, Jr.: "Re: Old license paks/software"
- In reply to: Keith Parris: "Re: Required reading for HP executives"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Mon, 01 Dec 2003 20:08:35 -0500
Keith Parris wrote:
> This is why the Space Shuttle uses a mix of different
> independently-developed implementations on different hardware, which
> then vote on the result.
Actually not quite. There are 5 computers, one of which is in standby with
different software. All others have same software.
(from what I was told) Much of the voting is done litterally with brute force.
If there are 3 actuators to move one surface, each is controlled by a
different computer. If one computer goes nuts, the other 2 will still control
2 of the 3 actuators and will be able to overpower the rogue one.
In areas where there is only one item (for instance a pump), the item has
harware in it to decide which command has majority when it gets multiple
feeds. I other words, the "voting" is done at the device level, not at the
computer level.
The communications are done on a 1553 bus. This type of bus has a bus
controller, somewhat similar to token passing: it sends a command asking
device X to send data to device Y. It has a list of such commands to send in a
particular order giving each device its time slot to talk to device Y. It
also ensures that after sending a command, the devices respond properly, so
the bus controller can detect failures. It can be reprogrammed with a new list
of who talks to whom at each time slot, allowing device X to talk to Z instead
of Y.
In the event of failure, brute force does the trick initially, until they can
"restring" that bus controllerso that a different comuter sends the commands
to the devices originally controlled by the failed computer. But restringing
isn't something done at the touch of a button as I recall.
The 5th computer is there, listening on all conversations, but not sending
anything out to commmand. As I recall, it is activated manually by the
commander in case of failure of all others. It then takes over all tasks with
no voting involved.
Remember that shuttle dates back from before VMS, and before clusters. The US
side of the space station uses the same 1553 bus philosophy. It has a 3 tier
hiearchical architecture. The lowest level has many MDMs (small computers)
which collect data and send commands to devices. They are not redundant, but
in some cases, a different MDM can be told by the bus controller to take some
of the functions of a failed one. The mid tier is the "server side" management
of station with automated tasks such as controlling station orientation,
temperature etc, as well as the task of collecting information from all the
lower tier MDMs (think of it as the subconscious side of your brain). Each MDM
in the middle tier has one machine capable of taking over.
The top tier is the command tier. It interfaces with laptops (running solaris)
that provide user interface to commands, telecom equipment etc. It is the tier
which has a more global view of the station. (C&C is Tier 1).
Here is some text that better explains it:
One of the C&C MDMs is fully operational, while a second is a “warm”
backup (powered on and processing data but not commanding equipment) and the
third is a
“cold” backup (powered off). There are five pairs of Tier 2 MDMs; each MDM in
the pair is
identical to the other MDM. Typically, one MDM is operational and the second
of the pair is
powered off. However, the redundant GNC MDM is a warm backup.
(GNC = guidance and navigation)
If you recall, a year or two ago, during the mission where they installed the
canadian arm, they had mega computer problems with hard drives failing one
after the others. It was the tier one machines. They were down to one
functional machine (the backup one needed to have its disk reloaded and that
took a long time), and they were affraid enough to tell the crew to reduce
commanding so that disk/io would be reduced. (they have sinced switched to
solid state drives)
The restringing of the machines to redirect bus IO to a different master can
be done remotely by ground control. I do not believe it is an automatic fallback.
Note that a space station can continue to exist for hours even with total
power failure (as one incident with Mir proved). A space shuttle, during the 8
minutes from pad to orbit cannot afford to miss one beat.
On the other hand, there is no FedEx to the station, so if you run out of
spares, getting replacement part can be measured in months. (in the current
context, it will is measured in years for some parts, such as one of the 4
gyro units)
Had VMS been chosen for the station, it would have a great marketing asset for
VMS, showing off its clustering capabilities. HP did get a piece of the action
by supplying some of the servers on the russian segment, which gives them a
big sign prominently displayed at the Moscow Misson Control Centre.
- Next message: John Brandon: "Directory output by date rather than name"
- Previous message: Barry Treahy, Jr.: "Re: Old license paks/software"
- In reply to: Keith Parris: "Re: Required reading for HP executives"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|