File Access and Operations

As mentioned previously, files are referred to by their filenames–arbitrary strings–with no concept of multi-component paths or any other file system structure. Opening a file allocates and populates a stream object, consisting of additional state information and stream buffers wrapped around the underling raw file access, and produces a stream handle–a pointer to the opaque FILE data type; i.e. FILE*. Stream handles are passed to different standard I/O methods to perform operations on that stream.

Removing and Renaming Files

#include <stdio.h>

int remove(char const *filename);
int rename(char const *oldname, char const *newname);

A file may be removed with the remove function, and renamed with the rename function.

For example, a portable mv utility might be implemented as so,

#include <stdlib.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
  if (argc < 2) {
    fprintf(stderr, "%s: Missing file operand\n", argv[0]);
    exit(1);
  } else if (argc < 3) {
    fprintf(stderr, "%s: Missing destination operand\n", argv[0]);
    exit(1);
  } else if (argc > 3) {
    fprintf(stderr, "%s: Too many arguments\n", argv[0]);
    exit(1);
  }
  int result = rename(argv[1], argv[2]);
  if (result != 0) {
    fprintf(stderr, "%s: Operation failed\n", argv[0]);
    exit(1);
  }
  return 0;
}

And a portable rm (remove) utility might be implemented as so,

#include <stdlib.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
  if (argc < 2) {
    fprintf(stderr, "%s: Missing file operand\n", argv[0]);
    exit(1);
  } else if (argc > 2) {
    fprintf(stderr, "%s: Too many arguments\n", argv[0]);
    exit(1);
  }
  int result = remove(argv[1]);
  if (result != 0) {
    fprintf(stderr, "%s: Operation failed\n", argv[0]);
    exit(1);
  }
  return 0;
}

Temporary Files

#include <stdio.h>

FILE *tmpfile(void);
char *tmpnam(char *s);

The tmpfile function creates a new file, opens it with mode "w+", and returns a handle to that open stream; the file is automatically deleted when the last reference to it is closed.

The tmpnam function, on the other hand, generates a unique, valid file name that is not the same as any existing file. This provides more control over which access mode a temporary file is opened with, and prevents automatic deletion on program exit when all open streams are closed.

Warning

The tmpnam function is subject to a race condition: Between the time that a filename is generated and the time that the program opens that file, another process could potentially create and modify file with the same name, leading to conflicts. Safer alternatives are often provided as system-specific functions, such as the POSIX mkstemp function, which atomically creates and opens a temporary file with a unique name.

Opening and Closing Files

#include <stdio.h>

File *fopen(char const *filename, char const *mode);

The fopen function is used to open a file with a particular access mode, and returns a stream handle for the open file. The access mode is a string, with the following significance,

Access Mode

Mode string

Explanation

Read

"r"

Open a file for reading

Write

"w"

Truncate (or create) a file for writing

Append

"a"

Open (or create) a file for appending

Extended read

"r+"

Open a file for read/write

Extended write

"w+"

Truncate (or create) a file for read/write

Extended append

"a+"

Open (or create) a file for read/appending

When a file is opened in append mode, the position is always set to the end of the file before a write operation. The meaning of reading on a file opened with "a+" is not well-specified, and varies between implementations.

An additional ‘b’ may be added to any of the above mode strings, such as "rb" or "rb+". This sets the stream to binary mode; otherwise the stream is in text mode by default. A binary mode stream treats its contents as a sequence of bytes, while a text mode stream treats its contents as a series of lines, and may discard non-text bytes or any whitespace at the end of a line. Binary mode is usually preferred since it does not alter underlying data. Under POSIX, the ‘b’ character has no effect–all files behave as binary files, but the ‘b’ is important to ensure program portability to non-POSIX systems.

Standard Streams

#include <stdio.h>

extern FILE *stdin;  /* Standard Input */
extern FILE *stdout; /* Standard Output */
extern FILE *stderr; /* Standard Error */

The standard streams are already opened and ready to be accessed when a program is invoked, and need not be closed when a program exits. These are accessed via the macros stdin, stdout, and stderr which expand to expressions of type FILE* just as any other stream. Importantly, these are not actual objects–they are macros–and do not necessarily expand to lvalues that can be directly modified. Attempting to reassign one of the standard streams–e.g. stdin = fopen("input.txt", "r");–will result in undefined behavior.

Reopening Streams

#include <stdio.h>

FILE *freopen(char const *pathname, char const *mode, FILE *stream);

As mentioned previously, the standard streams cannot be simply reassigned–but it is often desirable to do so. The freopen function is provided specifically to handle this situation; it operates the same as fopen, except that it repurposes the internal FILE object of an existing stream handle rather than create a new one. For example, stdin could be changed from whatever the program inherited when invoked to instead to read from a particular file, freopen("input_file", "r", stdin);.

Closing a file

#include <stdio.h>

int fclose(FILE *stream);

The fclose function closes the provided stream argument and frees any of its resources. This also has the effect of flushing any pending data to be written–because data may be held in an internal stream buffer, a program that exits without closing its open streams might produce incomplete output. Therefore it is important to close all opened streams before a program terminates in order to avoid resource leaks and data loss.

As part of the normal program exit process, the standard library flushes and closes open streams; this behavior should not, however, be relied on–all explicitly opened streams should be explicitly closed before program termination.

Stream Status Indicators

#include <stdio.h>

int feof(FILE *stream);
int ferror(FILE *stream);
void clearerr(FILE *stream);

Each stream has two boolean status indicators: end-of-file (eof), and error (err). When attempting to read past the end of a file, the end-of-file indicator is set; when an error occurs, the error indicator is set. Each indicator can be queried with the feof and ferror functions, respectively. The clearerr function may be used to clear both status indicators.

These functions never fail and do not modify errno, so it is safe to check the error indicator on a stream before checking errno for any additional implementation-defined error information.

Buffering

As mentioned previously, standard i/o streams use an i/o buffer to move data between program memory and the external environment in large, efficient chunks.

Buffering Modes

#include <stdio.h>

int setvbuf(FILE *stream, char *buf, int mode, size_t size);
int setbuf(FILE *stream, char *buf);

There are three buffering modes that a stream can have–fully buffered, line buffered, and unbuffered. In fully buffered mode, a stream has a fixed-size output buffer which is automatically flushed to external storage whenever it is filled; this is also referred to as block buffering, because data is written in large blocks. Line buffering is the same as full buffering, except that output is also automatically flushed whenever a newline ('n') character is written to the stream. In unbuffered mode, data is written to external storage as soon as it is available.

When a file is opened, as with fopen, it is buffered by default. If it refers to an interactive device, such as a terminal, it is line buffered; otherwise, it is fully buffered. By default, the standard streams stdin and stdout are line buffered, and stderr is unbuffered.

After a file is opened, but before any other operations have been performed on it, its buffering mode may be modified with the setvbuf function.

The mode argument must be either, _IOFBF, _IOLBF, or _IONBF, each of which causes i/o to be fully buffered, line buffered, or unbuffered, respectively. If _IOFBF or _IOLBF is specified, the buf argument should point to a buffer of at least size bytes, which will be used as the buffer. It may also be a null pointer, in which case the library will allocate a suitable buffer on the next i/o operation. The contents of the buffer while it is in use are indeterminate.

Warning

Recall that one of the effects of closing a stream is that it flushes pending output. If setvbuf is used to assign a buffer to a stream, that buffer must have a lifetime that extends at least until the stream is closed; otherwise the behavior is undefined.

An additional setbuf convenience function is exactly equivalent to,

int setbuf(FILE *stream, char *buf)
{
   setvbuf(stream, buf, buf ? _IOFBF : _IONBF, BUFSIZ);
}

In other words, if buf is null, the stream is set to unbuffered mode. Otherwise, buf must be a pointer to an array at least BUFSIZ bytes, and the stream is set to fully buffered mode.

Alternating I/O

One important thing to note is that the same buffer is used for both input and output. Stream input methods will cause any buffered output to be flushed, and then the buffer will be filled with input data from the external storage. Similarly, stream output methods will discard any buffered input before placing the output data in the buffer. There is no difference between line and full buffering when the buffer is used for input–it is always completely filled with input data whenever it is depleted.

Because of input prefetching, the real file position can move ahead of the perceived file position. This only becomes a problem if an output method is used after an input method, and data is discarded from the buffer–those discarded bytes represent the difference between the real and perceived file positions.

If the underlying file supports seeking, then the library will update the real file position to the perceived position when switching the buffer from input to output. However, not all file types support seeking, such as a terminal device file, and will result in errors due to the failed seek operations. For this reason, it is generally unsafe to perform an output operation after an input operation on a buffered stream.

Flushing a Buffer

#include <stdio.h>

int fflush(FILE *stream);

The fflush function explicitly flushes (writes) any pending output in a stream’s buffer.

Repositioning

#include <stdio.h>

long ftell(FILE *stream);
int fseek(FILE *stream, long offset, int whence);
void rewind(FILE *stream);

Streams support a very basic positioning mechanism, which, as mentioned above, may not be available if the underlying storage does not support seeking.

The ftell function returns the value of the file position indicator, and the fseek function sets the file position indicator relative to whence, which may be SEEK_SET–the beginning of the file; SEEK_CUR–the current position; or SEEK_END–the end of the file.

Text mode streams can only be repositioned with an offset of 0 or with an offset equal to the return value of a previous ftell, with whence set to SEEK_SET.

The rewind function is equivalent to:

void rewind(FILE *stream) {
  fseek(stream, 0L, SEEK_SET);
  clearerr(stream);
}

Wide Streams

#include <stdio.h>

int fgetpos(FILE *stream, fpos_t *pos);
int fsetpos(FILE *stream, fpos_t *pos);

As an extension to the standard i/o library, streams can support multi-byte, or wide, character encodings such as unicode. The methods for this are exposed through the additional whchar.h header, and most are direct analogues of the byte-oriented methods that will be covered in this module; for example, getwchar is the wide-oriented analogue of getchar byte-oriented method. Wide streams have an associated parse state, so simply using the above repositioning methods could cause the stream to be repositioned to a location where the current parse state is invalid; it is only safe to rewind a wide stream using these methods.

A pair of functions fgetpos and fsetpos are provided as a means to save and then restore a file position and its parse state. These are the only safe mechanism for seeking to a location that is not the beginning of a wide stream. They can also be used on narrow streams, where they are essentially equivalent to ftell and fseek.

Wide streams are an advanced topic that will not be discussed further in this course.