Re: 2nd CPU on a Sun Blade 2000 problem

On Sep 30, 10:43 pm, "DoN. Nichols" <dnich...@xxxxxxxxxxx> wrote:
On 2008-09-30, pne.cho...@xxxxxxxxxxx <pne.cho...@xxxxxxxxxxx> wrote:

On Sep 28, 12:47 am, "DoN. Nichols" <dnich...@xxxxxxxxxxx> wrote:
On 2008-09-28, pne.cho...@xxxxxxxxxxx <pne.cho...@xxxxxxxxxxx> wrote:

Hi All,

I have a Sun Blade 2000 with a single 900 Mhz. I added another CPU
module and now my system won't boot. I tried to go back to the
original configuration and my system still will not boot.


Any suggestions?

        Have you removed the CPU in the '1' slot? (most distant from the
memory) and left the original in the '0' slot (closest to the memory)?

I tried two together, no luck, then
I tried all three one at a time and right now the original is back in
the system alone.

        In slot 0?  Slot 1 alone won't work.

Yep, I knew that.


I have the dayglo green torque wrench, which I used to remove and
install each CPU. Heard click at the end to indicate properly seated.

        I specified that because of someone who was selling a SB-2000 on
eBay who said that he could not check whether the CPUs were 900 MHz or
1200 MHz because they needed a special wrench -- and his photos of the
system open showed the wrench plainly visible, so I don't assume that
anyone knows where it is until they say so. :-)

        O.K.  I usually go back after the first click on each side to
get a couple of more clicks, just in case there is a little more motion
available.  You can't over-torque it this way.

The first time I did this the power was off but the power cord was
still connected to the box. I am fearing the worst with that fact.

        Hmm ... I'm pretty sure that I have changed a CPU with power on,
even though I should not.  Everything survived.

        But this *may* be a function of which end of the connector lifted
clear first.  You may need a new system board.

Yes, this what I was afraid of.

                                                 The thing which is
proably most sensitive is the RSC (Remote System Control) card which you
will only find in a Sun Fire 280R (uses the same CPUs and system board,
but is a rack mount server, and the RSC card allows "LOM" (Lights Out
Management) -- that is you can reboot the system remotely with an
ethernet connected to the RSC card.  This gets power full time -- even
when the system is off so it *can* control the system through reboot.
(And the 280R has dual hot-swap power supplies, so you have *two* power
cords to unplug at once.)  The RSC card plugs into the second UPA
framebuffer slot -- and there is no way to use a UPA framebuffer in the
Sun Fire 280R -- just PCI-based framebuffers.  The RPC card has a PCMCIA
modem card installed in a socket and connected to a jack in the back.
And it has a rechargeable battery to keep its settings through a certain
amount of power loss.

        [ ... ]

No, I have the dayglo torque wrench as stated.


        [ ... ]

        Also -- poorly seated memory modules can prevent booting.

Checked that.


        And -- I had a system board (motherboard) which had been damaged
by slightly bent pins on one CPU resulting in a board which could boot

        [ ... ]

I checked that based upon your recommendation and found nothing that
looked damaged or bent.


        [ ... ]

I'm certain that I have 900 Mhz US-III Cu CPUs. Given that I am
fearing the worst, namely that I inadvertently damaged something else
that prevents the system from booting.

More about the boot process. When I push the power button on the SB
2000 the front light comes on, the disk drives comes on and I hear the
monitor click but no memory beep. The system just hums and nothing
ever comes on the monitor screen. I am using a KVM switch scenario
with a PC. Perhaps I should use each I/O device stand-alone, but I
don;t see how that can be a problem. Maybe it is.

Is there anyway I can test the system board or check for a blown fuse,
bad battery, something beyond what I am now doing?

        When trying to troubleshoot a system -- bring it down to the
absolute minimum, and if that works, start adding things one at a time.
This means in this case you want:

        One CPU in slot 0

        Memory DIMMs in every other memory slot starting with the one
closest to the CPUs. (This is bank 0 of RAM.)  Before you plug them in,
spread them all out (on an anti-static surface) and check the barcode
numbers.  These are on a label which starts "501", and goes on for some
number of digits.  The first four after the 501 count.  The rest make up
the serial number of the DIMM (or other part).  I've discovered that
when I accidentally mix say the 512 MB and the 256 MB ones in a single
bank, they are all treated as being 256 MB ones -- though that might be
a function of what size was in the first slot checked, and it might have
complained bitterly if the larger one was seen first.  Some mixes may
keep it from booting.

        Anyway -- one CPU, half load of memory, no keyboard, no
framebuffer, no other PCI cards, no disks (internal or external) other
than the DVD-ROM and floppy if you have them.

        Then plug a crossover cable between ttya on the system giving
problems and a tty port on another system -- or a serial ASCII terminal,
set to 9600 baud 8 data bits, 1 stop bit.  The crossover is pin 2 on
one end to pin 3 on the other and vise versa -- plus a straight through
wire from pin 7 to pin 7.  This should be enough.  

Yes I have a null-modem.

Then use the terminal
(or the other system through tip(1)) to watch the serial port.  It may
take a minute or so to go through the POST, and in the absence of both a
keyboard and a framebuffer (actually, the keyboard alone is enough) it
will switch to ttya as the console, and will start feeding POST data to
it -- even data which you never see otherwise, because the monitor and
keyboard are not yet alive.

Okay, I have an RS-232 terminal (Televideo 910) with a null modem on
the cable.

        Shortly before you would normally get the keyboard and monitor
enabled, you should see the LED in the power button flashing a few
times.  If necessary, you can hit the button during this time one quick
push, wait a full second, then another quick push which should reset the
environment variables to the default values.  Note that they will reset
to whatever they were next boot -- unless you type "set-defaults" to the
"ok" prompt.

This is how I originally installed Solaris 9 on the system.

I assume the following happens if my system board is not dead, or can
it be partially dead and still work with the serial port and RS-232

        Once you get to the "ok" prompt, type "printenv" and get the
list of parameters.  The ones which you really are interested in are:



        setenv diag-level max
        setenv diag-switch? true

and you will now be in for an apparently interminable set of
diagnostics.  Best to view them through tip, so you can scroll the
window back to read it all -- or capture it in a file.  Mostly, it will
be puzzling lists of things being checked -- but all you need to do is
to look for "error" or "fail" in that flow of text.

        Once you have that run, be sure to reset the first two to:


or you will have the interminable test run every time you reboot.  It
takes perhaps a half hour with full memory and both CPUs installed.

        While you are pulling cards out of the system, note the barcode
number on the system board.  It is along the back edge where the PCI
card brackets bolt to the chassis.

        You *may* need to find a replacement system board on eBay..  The
barcode number may be 5016230????, that is what mine is.  Anyway, the
various ones are interchangeable.

        A quick search on those finds (at present):

        150297795889    $90.00

        180293473616    $95.00

both vendors are apparently shipping them removed from the steel frame
which they should come with, so you will have to transfer the one you
have from the other system board.

        The other two listed for the SB-2000 are:


which have older versions of the SCIZO chip -- for whatever difference
that makes.  The 501-6230 is the latest listed.

        Or -- you could go the way I did to get two SB-2000 systems,
with 2 GB of RAM, and (supposedly) a 900 MHz CPU.  As it turned out,
both systems had 1200 MHz CPUs, which have been combined into a single
system.  He's  starting the auction at $99.99.  The auction number is:


The shipping cost will be higher, but you will get lots of spares.

        Oh yes -- if you *do* get the "ok" prompt, you might want to
look at the value of "output-device=".  If it is just "screen", fine.
If it is something like "screen:r1440x900x76" then it is setting the
framebuffer to a specific resolution -- which may or may not be the
right one for your monitor.

Will do.

        Anyway -- if you get the "ok" prompt, once you have passed all
the tests -- and perhaps reset the "output-device" to the default value,
you start adding things one at a time to see if things stop working.
First, I think, would be the memory.  Then the framebuffer.  Power it up
after each of these, so you know whether it stops working.  Do you have
a separate monitor and keyboard which you can use with this system,
avoid the KVM until you have everything else tested, since it is the
most unknown part of the equation.

Yes, I can hook the Sun system up without the PC and KVM switch. I
might try this before I hook up a terminal but sort of what to know
how to troubleshoot the system using a terminal anyway.

        Good Luck,