Understanding File Descriptors

When a process opens a file, and certain other types of “file-like” objects such as sockets, the kernel allocates a structure called a file descriptor, which represents the open file. A file descriptor includes information about the location of the file, the current file offset, the ownership of the file, and additional flags associated with the open file such as access modes, synchronization policy, etc. The file descriptor structure also contains a nested file operations structure, containing function pointers corresponding to all of the fundamental file operations, as we mentioned when discussing drivers previously. Each system call that accesses a file simply looks up the corresponding function pointer in its file descriptor and calls it, allowing for different types of files with different access semantics while still using the same basic interface.

Processes don’t have direct access to file descriptors; instead, the kernel assigns a unique integer to each file descriptor, which is returned to the process as a file descriptor number. This can be thought of as the index into an array of file descriptors maintained by the kernel. System calls that operate on files accept file descriptor numbers as arguments. Each process has its own set of file descriptors and corresponding file descriptor numbers. Whenever a new file descriptor is created, the kernel assigns the lowest available file descriptor number to it.

A process can duplicate an existing file descriptor number with the dup() system call. This simply assigns an additional (lowest available) file descriptor number to an existing file descriptor. In addition to dup(), the dup2() system call gives processes more control, in that it allows the caller to specify the new file descriptor number rather than taking the lowest available. Either file descriptor number can then be used to access the same file descriptor, and therefore shares file offset, status flags, and other properties with the other. For example, seeking on one file descriptor number will also change the offset as seen from the other descriptor number. In addition, either file descriptor number can be closed without affecting the other, so that moving a file descriptor number involves duplicating then closing the original.

int
movefd(int oldfd, int newfd)
{
  if (oldfd == newfd) return newfd;
  int r = dup2(oldfd, newfd);
  if (r == newfd) close(oldfd);
  return r;
}

A child process inherits any open file descriptors from its parent when it is created via fork(). As with duplicated file descriptor numbers, the child and parent file descriptors are shared, and changes to the same file descriptor in the parent or child will be reflected in the other. If the child process then calls exec(), the process its open file descriptors are retained while executing a new program. At this stage, the retained file descriptors are fully copied rather than being shared between parent and child, so that changing a file descriptor in one process does not affect the other, and vice versa. A process can also prevent a file descriptor from being retained across an exec() by setting the file descriptor’s close-on-exec flag, which is typically used for security purposes to avoid “losing control” over an open file by allowing another program to gain access to it. This allows processes to set up file descriptors on behalf of their children, and it is how the shell implements redirection and pipes.

Let’s look at a simple example of setting up a pipe between two processes. Here, the parent process creates a new pipe with the pipe() system call. It fills the pipe_fds[2] array with two new file descriptors, which are the read and write sides of the pipe, in that order[1]. In the child process, the write side is closed, and the read side file descriptor is moved to standard in; tr will use this as its source of input. In the parent, the read side is closed, and the write side is wrapped in a stream, to simplify writing the message to it.

#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>

char const msg[] =
    "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod "
    "tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim "
    "veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea "
    "commodo consequat. Duis aute irure dolor in reprehenderit in voluptate "
    "velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint "
    "occaecat cupidatat non proident, sunt in culpa qui officia deserunt "
    "mollit anim id est laborum.";

int
movefd(int oldfd, int newfd)
{
  return (oldfd == newfd) || (dup2(oldfd, newfd) == newfd && close(oldfd) == 0)
             ? newfd
             : -1;
}

int
main()
{
  int pipe_fds[2];
  if (pipe(pipe_fds) < 0) err(1, "creating pipe");

  switch (fork()) {
    case 0:
      close(pipe_fds[STDOUT_FILENO]); // Close write side
      movefd(pipe_fds[STDIN_FILENO],
             STDIN_FILENO); // Redirect stdin to the pipe
      execlp("tr", "tr", "[:lower:]", "[:upper:]", 0);
      err(1, "exec");
    case -1:
      err(1, "fork");
    default:
      close(pipe_fds[STDIN_FILENO]); // Close read side
      FILE *fp = fdopen(pipe_fds[STDOUT_FILENO], "w");
      if (!fp) err(1, "fdopen");
      fwrite(msg, 1, sizeof msg, fp);
      fclose(fp);
      wait(0);
  }
}

Notice a few important details–the unused ends of the pipe are closed in each process, and the process with the write side of the pipe open (the parent in this case) closes that end when it is done sending data; this also implicitly flushes the stream buffer that we created when wrapping that end of the pipe in a stream. If the write side of the pipe is not closed after finished sending data, the process with the read side of it will hang while waiting for additional data or end-of-file. Only once the write side is closed can end-of-file condition (read() returns 0) be observed on the read side. We will see later how a similar idea is implemented with respect to shutting down sockets.