Re: Netscape 7 issues.

From: Dr. David Kirkby (drkirkby_at_ntlworld.com)
Date: 07/12/03


Date: Sat, 12 Jul 2003 03:49:46 +0100

Eric Behr wrote:

> Time for a little rant, I guess.

Okay, I'll have one too in a minute, about a similar issue (nothing
specific to Netscape). Sorry it is a little long, but does I think
expose an important point.

> This tells me that noone
> is using Solaris to test Unix things, nobody believes in KISS
> anymore, noone gives a damn about portability and bloat.

There is certainly a culture amoungst Linux users that if it works on
the latest Redhat, or whatever distribution they use, the software is
ready for release to the world. No effort seems to be made to check
portability to any other systems. I've seem people say 'all I care
about is if it runs on Redhat 7.2'

I think this is VERY foolish, for reasons other than the obvious one,
as the following real example shows.

Before releasing the previous version (4.3.2) of 'atlc'
http://atlc.sourceforge.net/
I had tested atlc on 10 computers, including: several Suns running:

Solaris 2.5
Solaris 9 - with both Sun's compiler and gcc.
NetBSD
OpenBSD
Debian Linux
Redhat Linux

in addition to:

A Dec Alpha running Tru64 - using HP's compiler.
An HP box running HP-UX
A PC running Solaris x86
A PC running Redhat Linux.

No problems were found, apart from when compiled to exploit multiple
CPUs on SPARC Linux distributions. I thought this was due to broken
thread support, as I gather thread support was poor in early linux
distributions. The code works fine on current Linux distributions.

Since releasing it, I got the chance to check it on an Itainium based
machine, again with no problems - even with multi-processor support
enabled.

This week I tried to build that code on an IBM RS/6000 running AIX
using IBM's C compiler. IBM's compiler did not like my C++ comments.
Given the code is C, I decided to remove the C++ style comments, so
aiding portability. (Sure, I could have added a compiler option to
accept C++ comments, but why do that? C++ style comments are
convenient, but it seems a shame to use them in an otherwise C
program).

However, much more seriously, once the comments were fixed, the
program crashed on AIX. On inspection I found a dynamically allocated
array, foo[x-1][y], was being called with x=0, so it was trying to
read from element foo[-1][y].

Finding this bug on AIX, while not on the above 11 systems, was
surprising given:

a) The bug had not been found when compiled using Sun's C compiler
with the compiler's 'memory access checking' enabled.

b) I'd built the code with gcc, with the unofficial bounds checking
patches of gcc
http://www.lrde.epita.fr/~akim/compil/doc/bounds-checking.html
installed. Again, one might have expected the patched gcc to find such
a bug.

That, along with many other instances in the past, have shown me that
by testing for portability, one finds bugs on one system that don't
obviously show up on another. The bugs exist, but don't show up. That
is until you run on another system, using a different compiler, have
more or less users on the system, the wind is blowing the wrong way
etc etc.

The other fatal flaw people make, it to compile on other hardware, but
still using gcc. It's clearly better to check portability on a number
of different compilers.

I think if people took care like this, bugs in code would be found
much easier. With places like Sourceforge's compile farm:
http://sourceforge.net/docman/display_doc.php?docid=762&group_id=1
HP's testdrive
http://www.testdrive.compaq.com/
anyone can (if they put the effort in), test code on a range of
platforms for zero cost. Old hardware that is suitable for such
testing is cheap - you don't need the latest fastest box around. Using
automated test procedures (I'll post a script if anyone wants it),
it's possible to test on a number of systems in parallel, without
spending all your life doing it.

I've now fixed the bug, so the latest release of atlc does not make
this fatal memory access. It now compiles and runs on AIX 5.2, but is
giving the wrong answers!! This indicates there is ANOTHER bug, which
is again showing up on AIX and not the other systems. But I can't
currently donate the time to find that bug. Sometime over the weekend,
I might get a chance to fix it.

Later I will check portability to IRIX too, and perhaps install an
older release of AIX on the IBM, just to check it under different
versions.

Perhaps I go a bit OTT in checking for portability issues, but I sure
wish others would do a bit more. Then large programs like the Gimp,
Mozilla, OpenOffice would crash less often. </rant>

-- 
Dr. David Kirkby,
Senior Research Fellow,
Department of Medical Physics,
University College London,
11-20 Capper St, London, WC1E 6JA.
Tel: 020 7679 6408 Fax: 020 7679 6269
Internal telephone: ext 46408
e-mail davek@medphys.ucl.ac.uk


Relevant Pages

  • Re: An example of Mathematica on a PDA
    ... For example, Sourceforge has a 'compile farm', where any Sourceforge use can get access to a large range of machines (Suns, Linux, *BSD etc). ... So people could check compilation on a PC running Solaris with a Sun compiler. ... If the makefiles add -Wall, therefore ignoring what CFLAGS is set to on the Sun compiler, I would think there is a high probability it would add -Wall on an HP compiler too. ... I suspect if the configure script will start to compile GCL without the bfd library being present, there is a reasonable chance that bug would show up on another platform too. ...
    (sci.math.symbolic)
  • Re: jwe0019i-u The program was terminated abnormally with signal number SIGSEGV.
    ... Try turning on every debug option in the compiler to see if you can get ... The odds of the bug being "related to Linux" are low. ... nonstandard that happened to work in one environment. ...
    (comp.lang.fortran)
  • Re: derived type or structure
    ... > solaris, so i have some portability problems... ... * CVF is an F95 compiler so it can certainly do TYPE ... * For Linux the story depends on what hardware running Linux on. ...
    (comp.lang.fortran)
  • Re: array allocation in a subroutine
    ... >>I submitted this bug to Intel back in August, ... I regard this to be a pretty large hole in their compiler. ... compiler for Linux, I understand that 8.1.025 for Windows is the same as ...
    (comp.lang.fortran)
  • Re: C99 IDE for windows
    ... Regarding portability, I am focused on Linux only. ... If you focused only on Linux you might as well take advantage of gcc ... At the very least, there's Intel's compiler, but there are other ...
    (comp.lang.c)