Re: Anyone object to the following change in libc?

From: Terry Lambert (tlambert2_at_mindspring.com)
Date: 10/30/03

  • Next message: Terry Lambert: "Re: Sysinstall's fdisk/disklabel should be improved"
    Date: Thu, 30 Oct 2003 02:59:09 -0800
    To: Harti Brandt <brandt@fokus.fraunhofer.de>
    
    

    Harti Brandt wrote:
    > TL>Paragraph 6 of:
    > TL>
    > TL> http://www.opengroup.org/onlinepubs/007904975/functions/sscanf.html
    > TL>
    > TL>Implies that the lack of characters in the string following the
    > TL>conversion, due to failure in assignment, should result in an
    > TL>"Input failure". Note also that stdio.h defines EOF as -1.
    >
    > I fail to locate this paragraph. This interpretation would also imply
    > that scanf() always needs to return -1 whenever it cannot match a format
    > specifier.

            The fscanf() functions shall execute each directive of the
            format in turn. If a directive fails, as detailed below, the
            function shall return. Failures are described as input
            failures (due to the unavailability of input bytes) or
            matching failures (due to inappropriate input).

    It comes down to how you interpret the NUL byte at the end of the
    sscanf() input string. Is it an EOF? Or is it an unavailability of
    input bytes? The answer to the question picks which return value
    is correct.

    > TL>I think it can be interpreted either way, still.
    >
    > You miss the section about RETURN VALUE: EOF is return on a read error.
    > This is not an input error.

    How do I distinguish a "return value is -1 as an error result" from
    "return value is -1 as an EOF result"?

    > You should also read the very 1st paragraph. This clearly states, that
    > ISO is the primary source of information and the ISO text is a lot
    > cleaner.

    No, that's not what it actually states; here's the paragraph:

            The functionality described on this reference page is
            aligned with the ISO C standard. Any conflict between
            the requirements described here and the ISO C standard
            is unintentional. This volume of IEEE Std 1003.1-2001
            defers to the ISO C standard.

    It says that any conflicts are unintentional, and their intent was
    to use different language for no good reason, rather than just
    copying it verbatim and removing any doubt. It does *NOT* say
    that no conflicts exist.

    Also: In this context, which is IEEE 1003.1-2001, Issue 6, "the
    ISO C standard" refers to "c89", which is the version of the C
    standard that was in effect at the time that SVID IV was defined.

    If you need clarification on this issue, you should download the
    currently available version of the NIST/PCTS, which specifically
    requires you to compile with a c89 compiler, not one more recent.
    The same is true of The Open Group test suites which are available
    on the Internet.

    The version of the ISO C standard you are quoting from is *NOT*
    the c89 version.

    This makes interpretation ambiguous, since the test you are
    specifically referencing to get the 0 result is text that was
    added to the next version of the standard to clarify it.

    > I think it makes no sense to classify
    >
    > sscanf("123", "%*d%d", ...
    >
    > as an error, but
    >
    > sscanf("123", "%d%d", ...
    >
    > not, does it? Also at least Solaris 9 return -1 but fails to set
    > errno. Which is simply a bug.

    It makes no sense to do conversions without assignment in the
    first place (IMO).

    Also, it makes no sense to call sscanf() with a string with too few
    arguments, considering that you are providing the arguments to it in
    the first place. You are effectively using sscanf() to validate an
    ambiguous set of data as part of its operation.

    I'm not sure that this is reasonable to do. Specifically, none of
    the referenced standards expects this to happen with sscanf(), since
    they do not define, specifically, how the end of the input string
    should be interpreted: EOF vs. unavailability of input bytes. One
    could argue that an unavailability of matching input bytes results
    only from the separator character(s) between format strings not
    being matched properly. At that point, "%d%d" (or "%*d%d") is a
    non-sensical format specifier entirely, since any characters that
    would be valid for input to the second specifier would also be valid
    for input to the first: and the matching is, by definition, greedy.

    Really, this is a problem which has occurred because you are not
    using fscanf() or scanf() on the input stream, instead of doing
    some conversion into an internal buffer, presumably to avoid a
    buffer overflow and/or bitch about the standards being specified
    inadequately in comp.lang.c, or on current@freebsd.org.

    In other words, overly anal buffer overflow checking, rather than
    specifying the buffer length in the format string.

    In terms of standards conformance, I'd like to see the output of a
    conformance test suite for ISO C (any version) complaining about the
    -1 return. I think IEEE 1003.1-2001 conformance is probably more
    important, if we have to pick one or the other on the basis of what
    sscanf() is going to return in this manufactured problem case.

    I'd also like to point out that the compiler we are using permits
    the standards conformance version to be chosen at compile time, but
    routines like sscanf(), unless they are inlined in header files,
    are not conditionally selectable based on the version at compile
    time.

    Further, it's quite possible that version conformance, even if it
    were specifiable at compile time, is not specifiable at link time,
    so moving the function into an inline would be the only viable
    approach to dealing with this issue in multiple libraries, each of
    which expects a different version, but which must be linked into a
    single program at the end of things in order to get an applicaiton
    using libraries with different expectations.

    So it's pretty stupid for a language standard to specify anything
    other than language syntax (e.g. things like library behaviour).

    In any case, we are practically guaranteed that returning -1, as
    all other UNIX-like OS's currently do, would result in less source
    code breaking.

    Finally, I will point to the current FreeBSD precedents in this
    matter, which is the TCP/IP RFC conformance for 1644 and 1323,
    which were defaulted to "off", after it broke a lot of existing
    code (and Livingston Portmaster terminal servers), and select(2)
    not modifying the contents of the timeval struct to provide an
    accurate value for the remaining timeout prior to the select
    coming true or a signal being received.

    In other words, conformance level has historically been dictated
    by what code is not broken, not what is technically permitted by
    the standards, if you language-lawyer them to death.

    To put it in IETF terms: "Be conservative in what you generate,
    and generous in what you accept".

    -- Terry
    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"


  • Next message: Terry Lambert: "Re: Sysinstall's fdisk/disklabel should be improved"

    Relevant Pages

    • Re: Why is it dangerous?
      ... learn that C had such weak string handling, ... Exploiting getson an auto buffer ... that omitted any standard on what could or could not be input. ... the pedants, on whose code contains more *actual* bugs. ...
      (comp.lang.c)
    • Re: A C++ Whishlist
      ... > people from inclusion in a standard. ... > creating their own string class. ... >>don't want an ever increasing size of exception specification on each ...
      (comp.lang.cpp)
    • Re: How to make Forth interesting?
      ... Standard Forth doesn't give you all the tools to do that. ... thought here is to set up some new wordlists whose hash function is the ... John Passaniti says if you have a language that's ... it might be useful to have more string stuff in Forth. ...
      (comp.lang.forth)
    • Re: string comparison
      ... Null termination is part of the definition of a string. ... of the Standard, or did you mean horribly designed? ... then why trust that the assembler will generate the ... Do you check the machine code to see if the ...
      (comp.lang.c)
    • Re: Anyone object to the following change in libc?
      ... TL>> specifier. ... Is it an EOF? ... TL>ISO C standard" refers to "c89", which is the version of the C ... "The c89 utility (which specified a compiler for the C Language specified ...
      (freebsd-current)