Re: Netscape 7 issues.
From: Dr. David Kirkby (drkirkby_at_ntlworld.com)
Date: 07/12/03
- Next message: Baby Peanut: "Re: Peter Salus tells the truth about Solaris and Sun"
- Previous message: Josh Mckee: "Re: Sun E4500: Odd Console Message"
- In reply to: Eric Behr: "Re: Netscape 7 issues."
- Next in thread: Alan Coopersmith: "Re: Netscape 7 issues."
- Reply: Alan Coopersmith: "Re: Netscape 7 issues."
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Date: Sat, 12 Jul 2003 03:49:46 +0100
Eric Behr wrote:
> Time for a little rant, I guess.
Okay, I'll have one too in a minute, about a similar issue (nothing
specific to Netscape). Sorry it is a little long, but does I think
expose an important point.
> This tells me that noone
> is using Solaris to test Unix things, nobody believes in KISS
> anymore, noone gives a damn about portability and bloat.
There is certainly a culture amoungst Linux users that if it works on
the latest Redhat, or whatever distribution they use, the software is
ready for release to the world. No effort seems to be made to check
portability to any other systems. I've seem people say 'all I care
about is if it runs on Redhat 7.2'
I think this is VERY foolish, for reasons other than the obvious one,
as the following real example shows.
Before releasing the previous version (4.3.2) of 'atlc'
http://atlc.sourceforge.net/
I had tested atlc on 10 computers, including: several Suns running:
Solaris 2.5
Solaris 9 - with both Sun's compiler and gcc.
NetBSD
OpenBSD
Debian Linux
Redhat Linux
in addition to:
A Dec Alpha running Tru64 - using HP's compiler.
An HP box running HP-UX
A PC running Solaris x86
A PC running Redhat Linux.
No problems were found, apart from when compiled to exploit multiple
CPUs on SPARC Linux distributions. I thought this was due to broken
thread support, as I gather thread support was poor in early linux
distributions. The code works fine on current Linux distributions.
Since releasing it, I got the chance to check it on an Itainium based
machine, again with no problems - even with multi-processor support
enabled.
This week I tried to build that code on an IBM RS/6000 running AIX
using IBM's C compiler. IBM's compiler did not like my C++ comments.
Given the code is C, I decided to remove the C++ style comments, so
aiding portability. (Sure, I could have added a compiler option to
accept C++ comments, but why do that? C++ style comments are
convenient, but it seems a shame to use them in an otherwise C
program).
However, much more seriously, once the comments were fixed, the
program crashed on AIX. On inspection I found a dynamically allocated
array, foo[x-1][y], was being called with x=0, so it was trying to
read from element foo[-1][y].
Finding this bug on AIX, while not on the above 11 systems, was
surprising given:
a) The bug had not been found when compiled using Sun's C compiler
with the compiler's 'memory access checking' enabled.
b) I'd built the code with gcc, with the unofficial bounds checking
patches of gcc
http://www.lrde.epita.fr/~akim/compil/doc/bounds-checking.html
installed. Again, one might have expected the patched gcc to find such
a bug.
That, along with many other instances in the past, have shown me that
by testing for portability, one finds bugs on one system that don't
obviously show up on another. The bugs exist, but don't show up. That
is until you run on another system, using a different compiler, have
more or less users on the system, the wind is blowing the wrong way
etc etc.
The other fatal flaw people make, it to compile on other hardware, but
still using gcc. It's clearly better to check portability on a number
of different compilers.
I think if people took care like this, bugs in code would be found
much easier. With places like Sourceforge's compile farm:
http://sourceforge.net/docman/display_doc.php?docid=762&group_id=1
HP's testdrive
http://www.testdrive.compaq.com/
anyone can (if they put the effort in), test code on a range of
platforms for zero cost. Old hardware that is suitable for such
testing is cheap - you don't need the latest fastest box around. Using
automated test procedures (I'll post a script if anyone wants it),
it's possible to test on a number of systems in parallel, without
spending all your life doing it.
I've now fixed the bug, so the latest release of atlc does not make
this fatal memory access. It now compiles and runs on AIX 5.2, but is
giving the wrong answers!! This indicates there is ANOTHER bug, which
is again showing up on AIX and not the other systems. But I can't
currently donate the time to find that bug. Sometime over the weekend,
I might get a chance to fix it.
Later I will check portability to IRIX too, and perhaps install an
older release of AIX on the IBM, just to check it under different
versions.
Perhaps I go a bit OTT in checking for portability issues, but I sure
wish others would do a bit more. Then large programs like the Gimp,
Mozilla, OpenOffice would crash less often. </rant>
-- Dr. David Kirkby, Senior Research Fellow, Department of Medical Physics, University College London, 11-20 Capper St, London, WC1E 6JA. Tel: 020 7679 6408 Fax: 020 7679 6269 Internal telephone: ext 46408 e-mail davek@medphys.ucl.ac.uk
- Next message: Baby Peanut: "Re: Peter Salus tells the truth about Solaris and Sun"
- Previous message: Josh Mckee: "Re: Sun E4500: Odd Console Message"
- In reply to: Eric Behr: "Re: Netscape 7 issues."
- Next in thread: Alan Coopersmith: "Re: Netscape 7 issues."
- Reply: Alan Coopersmith: "Re: Netscape 7 issues."
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Relevant Pages
|