Character i/o
Character i/o is used for reading and writing individual bytes, one at a time, and can be used with raw text or raw byte data.
Input
Individual bytes may be read from an arbitrary stream by the int fgetc(FILE *stream)
function, which returns an unsigned char
byte, upcast to int
. This cast is performed so that the special EOF
(end-of-file) indicator can be returned as an error indicator without being confused for a valid byte value; EOF
is a macro that expands to a negative value (typically -1).
A common programming bug is to store the result of fgetc
into a character type, which is a narrowing cast. Consider the following code,
T c = fgetc(stdin);
if (c < 0) {
/* Error */
}
Suppose that,
T
is typeunsigned char
: Iffgetc
returnsEOF
, the negative value will be converted to a valid, positive byte value (typically, 255). Theif
statement will never detect errors or end of input.T
is typesigned char
: Iffgetc
returns a value greater thanSCHAR_MAX
, which is half of the possible byte values, then the behavior is implementation-defined; it may raise a signal, or produce an unexpected result.T
is typechar
: Either of the above situations will occur, depending on whetherchar
is signed or unsigned.
Therefore, it is very important to always use an int
to store the return value of fgetc
. If the return value is not EOF
, then it is safe to convert it to an unsigned char
if desired,
int ret = fgetc(stdin);
if (ret < 0) {
/* Error */
} else {
unsigned char c = ret; /* Ok */
}
Tip
Notice that the comparison used is c < 0
, rather than c == EOF
. On most architectures, it is more efficient to compare a value to zero than it is to compare it to any other value, since the latter requires encoding that comparison value into an instruction. On AMD64, for example, comparisons against zero always compile to two-byte test
instructions, while comparisons against other values compile to three-byte cmp
instructions with an immediate operand. Skilled C programmers recognize and take advantage of these opportunities for free optimizations.
The int getchar(void);
function is equivalent to calling fgetc
with stdin
as its argument, but is generally 30-50% faster due to being optimized specifically for stdin
.
The int getc(FILE *stream);
function is equivalent to fgetc
, except that it, if it is implemented as a macro, it may evaluate stream more than once, so its stream argument should never be an expression with side effects. Practically, these functions are the same in modern implementations, leaving this as a historical artefact.
Peeking
The int ungetc(int c, FILE *stream);
function may be used to push back a character to the specified stream so that it may be subsequently read; the character is pushed back to an internal buffer and does not affect actual external storage. It is guaranteed that one character may be pushed back–pushing back additional characters may be supported, but the call may also fail and return EOF
. Repositioning or writing to the stream discards any pushed back characters that haven’t been re-read yet. This function is typically used for look-ahead parsers; example,
int peekc(FILE *stream)
{
int c = fgetc(stream);
ungetc(c, stream);
return c;
}
Output
The int fputc(int c, FILE *stream);
, int putchar(int c);
, and int putc(int c, FILE *stream)
functions are all the output variant of the fgetc
, getchar
, and getc
input functions, respectively. Each writes c, cast as an unsigned char
to stream; putchar
is equivalent to fputc
with stdout
as its stream argument, and putc
may be a macro that evaluates stream multiple times. The return value is the character written, or EOF
on error, just as before.