Re: Anyone object to the following change in libc?

From: Harti Brandt (brandt_at_fokus.fraunhofer.de)
Date: 10/30/03

  • Next message: Andy Hilker: "Re: Postfix locks 5.1-servers?"
    Date: Thu, 30 Oct 2003 12:32:46 +0100 (CET)
    To: Terry Lambert <tlambert2@mindspring.com>
    
    

    On Thu, 30 Oct 2003, Terry Lambert wrote:

    TL>Harti Brandt wrote:
    TL>> TL>Paragraph 6 of:
    TL>> TL>
    TL>> TL> http://www.opengroup.org/onlinepubs/007904975/functions/sscanf.html
    TL>> TL>
    TL>> TL>Implies that the lack of characters in the string following the
    TL>> TL>conversion, due to failure in assignment, should result in an
    TL>> TL>"Input failure". Note also that stdio.h defines EOF as -1.
    TL>>
    TL>> I fail to locate this paragraph. This interpretation would also imply
    TL>> that scanf() always needs to return -1 whenever it cannot match a format
    TL>> specifier.
    TL>
    TL> The fscanf() functions shall execute each directive of the
    TL> format in turn. If a directive fails, as detailed below, the
    TL> function shall return. Failures are described as input
    TL> failures (due to the unavailability of input bytes) or
    TL> matching failures (due to inappropriate input).
    TL>
    TL>It comes down to how you interpret the NUL byte at the end of the
    TL>sscanf() input string. Is it an EOF? Or is it an unavailability of
    TL>input bytes? The answer to the question picks which return value
    TL>is correct.

    Section 7.19.6.7 of N843 states:

    "Reaching the end of the string is equivalent to encountering end-of-file
    for the fscanf function."

    Unfortunately this is missing in POSIX, but obviously implied by their
    reference to ISO.

    The next paragraph states:

    "The sscanf function returns the value of the macro EOF if an input
    failure occurs before any conversion."

    Again: do we have a conversion? We have! Should we return EOF? No.

    TL>
    TL>
    TL>> TL>I think it can be interpreted either way, still.
    TL>>
    TL>> You miss the section about RETURN VALUE: EOF is return on a read error.
    TL>> This is not an input error.
    TL>
    TL>How do I distinguish a "return value is -1 as an error result" from
    TL>"return value is -1 as an EOF result"?

    Well, I suppose that's the intention of having scanf() setting errno
    when it returns -1 in POSIX. Unfortunately POSIX fails to describe
    the error codes. This is possibly fodder for the aardvark.

    TL>
    TL>
    TL>> You should also read the very 1st paragraph. This clearly states, that
    TL>> ISO is the primary source of information and the ISO text is a lot
    TL>> cleaner.
    TL>
    TL>No, that's not what it actually states; here's the paragraph:
    TL>
    TL> The functionality described on this reference page is
    TL> aligned with the ISO C standard. Any conflict between
    TL> the requirements described here and the ISO C standard
    TL> is unintentional. This volume of IEEE Std 1003.1-2001
    TL> defers to the ISO C standard.
    TL>
    TL>It says that any conflicts are unintentional, and their intent was
    TL>to use different language for no good reason, rather than just
    TL>copying it verbatim and removing any doubt. It does *NOT* say
    TL>that no conflicts exist.

    Yes. But I take the last sentence to mean that ISO-C takes over in the
    case a conflict exists.

    TL>
    TL>Also: In this context, which is IEEE 1003.1-2001, Issue 6, "the
    TL>ISO C standard" refers to "c89", which is the version of the C
    TL>standard that was in effect at the time that SVID IV was defined.

    Line 107 of Austin TC-1:

    "The c89 utility (which specified a compiler for the C Language specified
    by the 108 ISO/IEC 9899: 1990 standard) has been replaced by a c99 utility
    (which specifies a compiler for 109 the C Language specified by the
    ISO/IEC 9899: 1999 standard)."

    TL>If you need clarification on this issue, you should download the
    TL>currently available version of the NIST/PCTS, which specifically
    TL>requires you to compile with a c89 compiler, not one more recent.
    TL>The same is true of The Open Group test suites which are available
    TL>on the Internet.
    TL>
    TL>The version of the ISO C standard you are quoting from is *NOT*
    TL>the c89 version.

    Our sscanf() claims conformance to C99. So if we change the behaviour
    we have to remove this claim.

    TL>This makes interpretation ambiguous, since the test you are
    TL>specifically referencing to get the 0 result is text that was
    TL>added to the next version of the standard to clarify it.
    TL>
    TL>
    TL>> I think it makes no sense to classify
    TL>>
    TL>> sscanf("123", "%*d%d", ...
    TL>>
    TL>> as an error, but
    TL>>
    TL>> sscanf("123", "%d%d", ...
    TL>>
    TL>> not, does it? Also at least Solaris 9 return -1 but fails to set
    TL>> errno. Which is simply a bug.
    TL>
    TL>It makes no sense to do conversions without assignment in the
    TL>first place (IMO).

    [... Stuff about sense removed (I was talking about what return
    code makes sense, not whether calling sscanf makes sense) ...]

    TL>In any case, we are practically guaranteed that returning -1, as
    TL>all other UNIX-like OS's currently do, would result in less source
    TL>code breaking.

    No coder in his right mind should have written code that depends
    on this behaviour given the moot formulations in the classical books,
    man pages and pre-C99 standards. Also note, that the reason for
    this change request was that configuration scripts break, not applications.
    If applications break they should be fixed.

    TL>In other words, conformance level has historically been dictated
    TL>by what code is not broken, not what is technically permitted by
    TL>the standards, if you language-lawyer them to death.
    TL>
    TL>To put it in IETF terms: "Be conservative in what you generate,
    TL>and generous in what you accept".

    This does not apply here because you cannot return -1 and 0 at the same
    time. Adhering to a cleanly written standard and breaking a handful of
    badly written autoconf scripts is clearly better than adhering to
    undocumented historical behaviour. What will we do if Solaris 10
    returns 0 in the above case? Change our code back?

    harti

    -- 
    harti brandt,
    http://www.fokus.fraunhofer.de/research/cc/cats/employees/hartmut.brandt/private
    brandt@fokus.fraunhofer.de, harti@freebsd.org
    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
    

  • Next message: Andy Hilker: "Re: Postfix locks 5.1-servers?"

    Relevant Pages

    • Re: Wrong FORMAT statement
      ... width of a 'D'-type exponent. ... run under such a compiler. ... A glitch in conformity to standard is sure possible. ... statement including the 'E4' width specifier as if it would ...
      (comp.lang.fortran)
    • Re: Is C99 the final C? (some suggestions)
      ... > that someone will try compile their stuff on an old compiler. ... > because the ANSI standard obsoleted them, and everyone picked up the ANSI ... fixed by using another language. ... >>are multiplying two expressions of the widest type supported by your ...
      (comp.lang.c)
    • Re: Statement on Schildt submitted to wikipedia today
      ... to working programmers and more with being "right all the time, ... so that compiler developers could be shed ... The major corporate interests were compiler developers, ... processors, committed to standard division semantics, and otherwise ...
      (comp.lang.c.moderated)
    • Re: #define and (brackets)
      ... Minor compiler vendors are free to join if they are so inclined, ... analysis hasn't changed between the two versions of the standard. ... This bug is a minor bug in an obscure ...
      (microsoft.public.vc.language)
    • Re: interesting use of NEXT SENTENCE vs. CONTINUE
      ... Program name in quotes (allowed in '02 Standard) ... > If J can be made an independent item which the compiler can put wherever it ... > has to be associated with a hardware device in SPECIAL-NAMES. ... > that ALTER *always* modifies the address parameter of the hardware branch ...
      (comp.lang.cobol)