Formatted Input

The int fscanf(FILE *stream, char const *fmt, ...) function is similar to a reversed printf, but has slightly different semantics. Input is read from stream and matched to the directives listed in the fmt format string. Each directive is either,

  • A sequence of white-space characters; or,

  • An ordinary character (not ‘%’ and not white-space); or,

  • A conversion specification

A whitespace directive consumes any white space in the stream, reading up to the first non-whitespace character which is not consumed. This is sometimes referred to as “whitespace munching”.

An ordinary character directive consumes exactly that character in the input stream.

A conversion specifier directive consumes an input item, which is the longest valid matching sequence of input that can be converted to the requested type. The input item is converted into said type, and stored in the object pointed at by the subsequent argument.

For the following example, the input “(3, 5)” will result in a value of 3 stored in x, and a value of 5 stored in y,

int x, y;
fscanf(stdin, "(%d, %d)", &x, &y);

The semantics of the conversion characters are slightly different from those used with printf; some commonly used conversions are listed below,

‘d’, ‘i’

The ‘d’ specifier matches a base 10 formatted signed integer; the ‘i’ specifier matches a formatted signed integer of a deduced base, either octal, decimal, or hexadecimal. Both require an argument of type int*.

Leading whitespace is discarded.

In the below example, the input “123 0x7f” will cause the value 123 to be stored in x, and the value 127 to be stored in y,

fscanf(stdin, "%d %i", &x, &y);
‘o’, ‘u’, ‘x’

The ‘o’ specifier matches an optionally signed octal integer (base 8); the ‘u’ specifier matches an optionally signed integer (base 10); and the ‘x’ specifier matches an optionally signed hexadecimal integer (base 16), and is case-insensitive. The argument must by a type unsigned int *. If the converted value is negative (preceded by a ‘-‘), then it will be stored using modular unsigned arithmetic, as is the normal behavior when assigning a negative value to an unsigned type.

Leading whitespace is discarded.

‘c’

Matches exactly 1 character–or a sequence of characters equal to the field width, if specified. The argument must be a char * that points to an array large enough to hold however many characters are requested. A terminating null byte is not added to the end.

Leading whitespace is matched by this directive and not discarded.

‘%’

Matches a single ‘%’ character; no conversion or assignment occurs.

The return value is the number of successful input items assigned, which can be fewer than requested (even 0) in the event of an early matching failure. If an input failure occurs before any conversion, the return value is EOF.

Additional Input Functions

As with fprintf, additional versions of fscanf are available, each of which reads its input from a different location,

  • int scanf(char const *fmt, ...) is equivalent to fscanf with stdin as its stream argument.

  • int sscanf(char const * s, char const *fmt, ...) reads its input from the string pointed at by s, rather than from a stream.

Additionally, a v* version is provided of each function,

  • vfscanf(FILE *stream, char const *fmt, va_list ap);

  • vscanf(char const *fmt, va_list ap);

  • vsscanf(char const *s, char const *fmt, va_list ap);

Caveats

Although it is often presented in introductory programming courses, the scanf family of functions are actually used very little in practice, because they offer very little control over conversions, and do not allow intervention to remedy failed conversions. Additionally, several of the conversion directives ("%s" and "%[...]") can produce buffer overflows. Instead, programmers generally implement custom scanners and parsers using other library functions such as strtoul, which convert strings to integers and other types. More complex parsing often involves the use of special-purpose parser-generators and other lexical analysis tools like POSIX’s lex and yacc.