SUMMARY:Floating point problem



I almost posted a question about this, but one colleague suggested
a test, the other colleague presented a program to do it, and I was
able to use it in the needed way...so here's the summary....

We have two Java-based integration products, eGate and ICAN,
running on Solaris 8 on Fireservers. The application was working on
one of our servers, but on the other server, it gave us a problem
with floating point math. Money gets converted from floating point
to a double precision number and then to a string (we didn't write this
code :-).

This worked on our "production server".
On our "test" server, it worked, then it stopped working following
maintenance, then a restart got it working again.
Six months later, it stopped working, again following maintenance,
and could not be resolved.

The issue was money (2 digit decimal) was shown with a repeating
decimal.....eg, 7.59 was 7.58999999999......not 100% of the time,
but very often.

We spoke with "See Beyond" (actually Sun, now :-), and with Sun,
and ourselves, System Patches, applicatiion patches, revision levels,
everything checked, no result.

Then someone said, maybe it's hardware? No errors in the logs....
Can we bind it to a CPU? Interesting question.
The Java test program would run once and return results,
we had a script to do that forever. We changed the program
to loop internally, so that it retained it's Process ID.....

We then used top to monitor CPU#, and pbind to move that process
to another CPU. One at a time, through the 4 new, fast CPU's, the
bug
was gone, then the first of the old CPU's, bingo, 100% failure.

psradm allowed us to shut off the processor, and
the problem went awy. SUN FE came and replaced the
motherboard and cpu 10 (hot swap! nice!), 0 downtime!

The problem is gone. About 20 minutes to resolve.
Just took a week to come up with the right question,
could this be an undetected hardware issue?
_______________________________________________
sunmanagers mailing list
sunmanagers@xxxxxxxxxxxxxxx
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



Relevant Pages

  • Performance Problems.. Server hardware smoked by $500 box?
    ... webserver with a separate db/file server sitting behind it. ... Now granted the development machine has the most Mhz, ... CPU: Intel Pentium III Origin = ... Ubench CPU: 25713 ...
    (freebsd-performance)
  • Performance Problems.. Server hardware smoked by $500 box?
    ... webserver with a separate db/file server sitting behind it. ... Now granted the development machine has the most Mhz, ... CPU: Intel Pentium III Origin = ... Ubench CPU: 25713 ...
    (freebsd-questions)
  • Re: Dell PowerEdge 2450 & Win2k3 server
    ... The other thing you can do is try to run just one CPU and see if one of the ... Enterprise server sp1. ... I get this error after the windows setup process. ... Tried installing with the PERC and also tried installing using the ...
    (microsoft.public.windows.server.general)
  • Re: Chat server : Threading in select call
    ... on the same CPU. ... the server continues to make forward progress on ... For example, in the typical case, the server handles each request as ... any server where there is tight integration between connections (one ...
    (comp.unix.programmer)
  • Re: NFS client problems in 2.4.18 to 2.4.20
    ... > has stopped working? ... I am experiencing similar problems with 2.4.18 as a client (the NFS ... server is on Solaris). ... When the client freezes I see nfsstat 'client rpc ...
    (Linux-Kernel)