Bulletproof reading of ints (was Re: string concatenation)
From: CBFalconer (cbfalconer_at_yahoo.com)
Date: 03/09/04
- Next message: Eric: "changing a field type in libtiff"
- Previous message: Lorinczy Zsigmond / Domonyik Mariann: "tar: setting the date of tar's output file"
- In reply to: Al Bowers: "Re: string concatenation"
- Next in thread: CBFalconer: "Re: string concatenation"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Tue, 09 Mar 2004 16:58:03 GMT
Al Bowers wrote:
> Martin Stromberg wrote:
> >
> > I always use if( sscanf(argv[i], "%d %c", &my_int, &a_char ) != 1 )
> > { error_handling }.
> >
> > You're welcome to point out holes in that.
>
> Isn't there a flaw with both described routines? What will
> result in integer overflow (underflow)?
>
> Using function strtol, you can catch outside of range by
> using the variable errno.
>
> Example, using the string value of "2147483648" should
> be outside the range of a type int on my implementation. Using
> the sscanf routine, this overflow is not caught. It can be
> caught using strtol and checking the errno value.
>
> The output of the following code is:
>
> value = 2147483648
> Using sscanf routine: my_int = 2147483647
> Error in coversion using a strtol routine
... snip code ...
Here is my "bulletproof" input from file streams routine. I am
not happy with the EOF treatment on systems with non-sticky EOF
because of the action of unget(c) on EOF. They also overdo
casting to UCHAR. flushln should return ch. They should better
return positive int for error, 0 for no error, and EOF for EOF.
Setting errno would also be worthwhile. They should all be
written for longs in the first place, which changes virtually
nothing except some spelling.
Adaptation for string input a la strto* would also be easy.
/* ------------------------------------------------- *
* File txtinput.c *
* ------------------------------------------------- */
#include <limits.h> /* xxxx_MAX, xxxx_MIN */
#include <ctype.h> /* isdigit, isblank, isspace */
#include <stdio.h> /* FILE, getc, ungetc */
#include "stdops.h" /* bool, true, false */
#include "txtinput.h"
#define UCHAR unsigned char
/* These stream input routines are written so that simple
* conditionals can be used:
*
* if (readxint(&myint, stdin)) {
* do_error_recovery; normally_abort_to_somewhere;
* }
* else {
* do_normal_things; usually_much_longer_than_bad_case;
* }
*
* They allow overflow detection, and permit other routines to
* detect the character that terminated a numerical field. No
* string storage is required, thus there is no limitation on
* the length of input fields. For example, a number entered
* with a string of 1000 leading zeroes will not annoy these.
*
* The numerical input routines *NEVER* absorb a terminal '\n'.
* Thus a sequence such as:
*
* err = readxint(&myint, stdin);
* flushln(stdin);
*
* will always consume complete lines.
*
* They are also re-entrant, subject to the limitations of file
* systems. e.g interrupting readxint(v, stdin) operation with
* a call to readxwd(wd, stdin) would not be well defined, if
* the same stdin is being used for both calls. If ungetc is
* interruptible the run-time system is broken.
*/
/*--------------------------------------------------------------
* Skip all blanks on f. At completion getc(f) will return
* a non-blank character, which may be \n or EOF
*
* Skipblks returns the char that getc will next return, or EOF.
*/
int skipblks(FILE *f)
{
int ch;
do {
ch = getc(f);
} while ((' ' == ch) || ('\t' == ch));
/* while (isblank((UCHAR)ch)); */ /* for C99 */
return ungetc(ch, f);
} /* skipblks */
/*--------------------------------------------------------------
* Skip all whitespace on f, including \n, \f, \v, \r. At
* completion getc(f) will return a non-blank character, which
* may be EOF
*
* Skipblks returns the char that getc will next return, or EOF.
*/
int skipwhite(FILE *f)
{
int ch;
do {
ch = getc(f);
} while (isspace((UCHAR)ch));
return ungetc(ch, f);
} /* skipwhite */
/*--------------------------------------------------------------
* Read an unsigned value. Signal error for overflow or no
* valid number found. Returns true for error, false for noerror
*
* Skip all leading whitespace on f. At completion getc(f) will
* return the character terminating the number, which may be \n
* or EOF among others. Barring EOF it will NOT be a digit. The
* combination of error and the following getc returning \n
* indicates that no numerical value was found on the line.
*
* If the user wants to skip all leading white space including
* \n, \f, \v, \r, he should first call "skipwhite(f);"
*
* Peculiarity: This specifically forbids a leading '+' or '-'.
* Peculiarity: This forbids overflow, unlike C unsigned usage.
* on overflow, UINT_MAX is returned.
*/
bool readxwd(unsigned int *wd, FILE *f)
{
unsigned int value, digit;
bool status;
int ch;
#define UWARNLVL (UINT_MAX / 10U)
#define UWARNDIG (UINT_MAX - UWARNLVL * 10U)
value = 0; /* default */
status = true; /* default error */
do {
ch = getc(f);
} while ((' ' == ch) || ('\t' == ch)); /* skipblanks */
/* while (isblank((UCHAR)ch)); */ /* for C99 */
if (!(EOF == ch)) {
if (isdigit((UCHAR)ch)) /* digit, no error */
status = false;
while (isdigit((UCHAR)ch)) {
digit = (unsigned) (ch - '0');
if ((value < UWARNLVL) ||
((UWARNLVL == value) && (UWARNDIG >= digit)))
value = 10 * value + digit;
else { /* overflow */
status = true;
value = UINT_MAX;
}
ch = getc(f);
} /* while (ch is a digit) */
}
*wd = value;
ungetc(ch, f);
return status;
} /* readxwd */
/*--------------------------------------------------------------
* Read a signed value. Signal error for overflow or no valid
* number found. Returns true for error, false for noerror. On
* overflow either INT_MAX or INT_MIN is returned in *val.
*
* Skip all leading whitespace on f. At completion getc(f) will
* return the character terminating the number, which may be \n
* or EOF among others. Barring EOF it will NOT be a digit. The
* combination of error and the following getc returning \n
* indicates that no numerical value was found on the line.
*
* If the user wants to skip all leading white space including
* \n, \f, \v, \r, he should first call "skipwhite(f);"
*
* Peculiarity: an isolated leading '+' or '-' NOT immediately
* followed by a digit will return error and a value of 0, when
* the next getc will return that following non-digit. This is
* caused by the single level ungetc available.
*/
bool readxint(int *val, FILE *f)
{
unsigned int value;
bool status, negative;
int ch;
*val = value = 0; /* default */
status = true; /* default error */
negative = false;
do {
ch = getc(f);
} while ((' ' == ch) || ('\t' == ch)); /* skipwhite */
/* while (isblank((UCHAR)ch)); */ /* for C99 */
if (!(EOF == ch)) {
if (('+' == ch) || ('-' == ch)) {
negative = ('-' == ch);
ch = getc(f); /* absorb any sign */
}
if (isdigit((UCHAR)ch)) { /* digit, no error */
ungetc(ch, f);
status = readxwd(&value, f);
ch = getc(f); /* This terminated readxwd */
}
if (negative && (value < UINT_MAX) &&
((value - 1) <= -(1 + INT_MIN))) *val = -value;
else if (value <= INT_MAX) *val = value;
else { /* overflow */
status = true;
if (value)
if (negative) *val = INT_MIN;
else *val = INT_MAX;
}
}
ungetc(ch, f);
return status;
} /* readxint */
/*--------------------------------------------------------------
* Flush input through an end-of-line marker inclusive.
*/
void flushln(FILE *f)
{
int ch;
do {
ch = getc(f);
} while (('\n' != ch) && (EOF != ch));
} /* flushln */
/* End of txtinput.c */
If you are wondering about stdops.h, here it is:
/* Standard defines of operators, usable on C90 up */
#ifndef stdops_h
#define stdops_h
#if defined(__STDC__) && (__STDC_VERSION__ >= 199901L)
/* The following from C99 - must define for C90 */
#include <stdbool.h> /* define bool, true, false */
#include <iso646.h> /* define not, and, or */
#else
#define false 0
#define true 1
typedef int bool;
#define not !
#define and &&
#define or ||
#define xor ^
#endif
#endif
-- Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net) Available for consulting/temporary embedded and systems. <http://cbfalconer.home.att.net> USE worldnet address!
- Next message: Eric: "changing a field type in libtiff"
- Previous message: Lorinczy Zsigmond / Domonyik Mariann: "tar: setting the date of tar's output file"
- In reply to: Al Bowers: "Re: string concatenation"
- Next in thread: CBFalconer: "Re: string concatenation"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|