Multiprogramming

Multiprogramming is the ability of an operating system to hold more than one process running in memory at the same time. When we open a text editor to write a program, and at the same time we open a Web Browser to review a website, while listening to music from our favorite streaming service, all done in the same computer, we are using multiprogramming. Each one of these activities may generate one or more processes that are trying to run on the computer. From the point of view of the user, all these processes may appear to be running at the same time, but all of them require access to a CPU to perform their instructions. In the simplest case with a single CPU in the computer, all processes will take turns accessing it. The operating system is the one deciding which process accesses the CPU at any given time and for how long. Therefore, the decisions that it makes will affect the performance of each one of the applications running in the system. This is one of the main responsibilities for the operating system, as a CPU scheduler.

Forking

In Linux, multiprogramming is achieved by a technique called forking. This requires that a process “clones” itself using the fork system call with no parameters. When a process makes this call, the kernel creates a copy of the current process that will run independently and concurrently with the original process and all other processes already in the system. The newly created process (called the child process) receives copies of all segments from the original process (called the parent process), as well as copies of the file descriptors. After the call to fork, both processes may begin execution of the next sentence in the program. If no allowances are made, both processes may execute the same sentences when they take control of the CPU. However, the call to fork returns different values for parent and child processes. The parent process receives the process ID from the children just created, however the children process receives a value of zero. This fact can be used to request each process to perform different instructions. simple_fork.c shows a process that forks. The value this system call returns is used in a switch statement to assign different instructions to parent and child processes. Notice that if the call to fork returns a -1 if the call fails. The fork system call is declared in the unistd.h header.

simple_fork.c

#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>

int
main(int argc, char *argv[])
{
  pid_t pid;
  pid_t fork_result; // status of the fork

  switch (fork_result = fork()) {
    case -1: /* error */
      err(1, "fork");

    /* The child proc sees the return value from fork() as 0 */
    case 0:
      /* Child Zone */
      pid = getpid();
      printf("I am the child process.\n"
             "\tThe fork gave me a %d, but my PID is %d\n",
             fork_result,
             pid);

      /* Typically, the child has its own exit path distinct from the parent,
       * though it could also be allowed to continue executing past the switch
       * block */
      exit(0);

    /* The parent proc sees the return value from fork() as the child's pid */
    default:
      /* Parent zone */
      pid = getpid();
      printf("I am the parent process.\n"
             "\tThe fork gave me my child PID (%d), but my own PID is %d\n",
             fork_result,
             pid);
      break;
  }
  /* Parent waits for child to exit */
  wait(0);
  /* Parent exit */
}

When the program runs, it produces a result like:

$ ./simple_fork
I am the parent process.
	The fork gave me my child PID (97160), but my own PID is 97159
I am the child process.
	The fork gave me a 0, but my PID is 97160

Since the parent and child run in parallel after forking, it is also possible to observe,

$ ./simple_fork
I am the parent process.
	The fork gave me my child PID (97160), but my own PID is 97159
I am the child process.
	The fork gave me a 0, but my PID is 97160

As previously indicated, copies of the file descriptors from a parent process are given to the child process. Because they are copies, if any of the processes modifies the attributes of the files they reference, like offset placement or access mode, all these changes will be visible and relevant to both processes. If the programmer wants to keep different attributes, they should create new file descriptors after fork.

Even though a child process should receive copies of the page table after fork, this is actually done only when the children require a change in any of them. This strategy is called “copy-on-write” (CoW) and it avoids doing copies of resources that may not be needed. Before that, both parent process and children process just keep references to the same segments. If any of the processes requires to update the stack or heap, or maybe load a different set of instructions for the text, this is when the kernel will generate the copy with the appropriate change. This is particularly useful in cases where a child is created to run a different program than the one in the parent process, as we will see later, since it avoids expensive copying of a process image that is about to be replaced.

A typical application of forking is to have a main program that generates tasks, each one to be handled by a different child process. The parent process waits for these processes to complete. The following listing shows the basic structure of such an application in a program called fork_cycle.c.

fork_cycle.c

#include <err.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>

int
main(int argc, char *argv[])
{
  pid_t pid;
  int fork_limit;       /* Number of requested forks */
  pid_t child_exit_pid; /* PID of returning child */
  int child_exit_count; /* Number of children finishing its tasks */
  int time = 0;         /* Sleeping time for a child process */
  int i;                /* Counter */

  if (argc != 2 || ((fork_limit = atoi(argv[1])) < 1)) {
    err(1, "Usage: %s FORK_LIMIT", argv[0]);
    exit(1);
  }

  /* Create fork_count child processes */
  for (i = 0; i < fork_limit; i++) {
    switch (fork()) {
      case -1:
        err(1, "fork");

      case 0: // Child prints, sleeps and exits
        pid = getpid();
        fprintf(stderr,
                "Process %d created. Waiting %d seconds.\n",
                pid,
                fork_limit - i);
        sleep(fork_limit - i);
        exit(0);
    }
  }

  child_exit_count = 0;
  for (;;) {
    child_exit_pid = wait(NULL);

    if (child_exit_pid < 1) {
      if (errno == ECHILD) {
        /* No more children */
        errno = 0;
        break;
      }
      err(1, "wait");
    }
    child_exit_count++;
    printf("Child #%d (PID %d) exited\n",
           child_exit_count,
           child_exit_pid);
  }
  printf("All children have exited!\n");
  exit(0);
}

$ ./fork_cycle 7
Process 98716 created. Waiting 6 seconds.
Process 98715 created. Waiting 7 seconds.
Process 98717 created. Waiting 5 seconds.
Process 98718 created. Waiting 4 seconds.
Process 98720 created. Waiting 2 seconds.
Process 98719 created. Waiting 3 seconds.
Process 98721 created. Waiting 1 seconds.
Child #1 (PID 98721) exited
Child #2 (PID 98720) exited
Child #3 (PID 98719) exited
Child #4 (PID 98718) exited
Child #5 (PID 98717) exited
Child #6 (PID 98716) exited
Child #7 (PID 98715) exited
All children have exited!

This program expects an integer parameter to indicate the number of children to be created. A loop creates these children. Each one could be performing a different task, but in this program, they just pause for a different number of seconds with the sleep system call and then exit. The parent process waits for them with the wait system call in a loop; wait returns the process id of the exiting child process. If no more children remain, it fails and sets errno = ECHILD. The parent process increments the count of returning children after any new one is received. The loop ends when all children have returned.

The exec() system call

Sometimes it is necessary for a child process to forgo the sets of instructions it inherited from its parent process and open a complete new program.This can be done with the exec system call. Once this call is made, the kernel will reload the virtual memory address space of the child process with a new program and update the registers to start the process. However, many other elements inherited from the parent process will remain in place. The process will keep its PID, as well as its parent ID, process group IDs, and session ID. It also keeps all the environment variables inherited from the parent process and the file descriptors with all its settings.The process will execute the new code and when it ends, it will exit and never come back to the parent process. There will be no notification of completion or return to the parent process.

In reality, the exec system call does not exist. This is just the generic name for a set of system calls, all beginning with the exec prefix. The most basic system call is execve. All other variants are convenient ways to prepare the call to execve with different parameters. Basically, all these system calls require three main parameters: 1.The location and name of the program to be executed, 2. The set of all the strings containing the arguments for the program to be executed, just like the argv parameter that would be created for that program, and 3.The environment variables that are passed to the program to be executed.The variants of this system call prepare these parameters in different ways. It is easier to remember all of them if we relate their names with the parameters they request as described below.

The first decision a programmer has to make on selecting an exec system call is to decide if the arguments for the program to be executed are going to be supplied together arranged inside an array (just like argv[]) or if they are going to be supplied separately in an explicit list. In the first case we must select a system call starting with the execv prefix, for a vector (another name for an array). In the second case we must select a system call starting with the execl prefix, for a list. These system calls have the following prototypes:

execv(const *pathname, char *const argv[])
int execl(const char *pathname, const char *arg, ...
               /*, (char *) NULL */);

Both require a pointer to the pathname and filename of the program to be executed as the first parameter. The pathname can be absolute or relative. The arguments are passed in either as an array of char *, or as a variadic list of char *. The first argument should be the name of the program–usually the same as pathname. Next follow the remaining arguments, and finally a null pointer to indicate the end of the array or argument list. Notice that there is no “number of arguments” argument–these functions have no way to know when they’ve reached the end of the list if you forget a null pointer! (this is a very common mistake).

Note

In some cases, the first argument is not the same as the file path. For example, some monolithic programs present themselves as two separate programs. An example is the program vim which will change its behavior when executed with its first argument as vi. This is also used in other cases, such as to have a single utility implement both zip and unzip. Taken to an extreme, the BusyBox project packages an entire set of POSIX utilities into a single monolithic program for use on resource-constrained embedded systems. All utilities in /bin/ are just links to this one program, which basically has a massive lookup table based on the value of argv[0].

Aside from execl and execv, there are two other sets of functions in the exec family; the first are execlp and execvp. These two are the most commonly used in practice; they differ from those above in that they perform command lookup by searching the directories listed in the PATH environment variable–just like the shell does–if the pathname argument doesn’t contain any slashes (/). Otherwise the behavior is identical.

The second set of functions are execle and execve. These functions allow the caller to pass in a custom set of environment variables, rather than have them inherited by the child process. They both have the same function signature as the previous functions, with an additional final argument which is a pointer to a null-terminated list of null-terminated environment strings. These are typically used for security purposes, to empty the environment of a child process before it is executed, so that things such as the user’s name and other configuration settings aren’t leaked to untrusted processes,

int execle(const char *pathname, const char *arg, ... /*, (char *) NULL, char *const envp[] */);
int execve(const char *file, char *const argv[], char *const envp[]);

Note

execle takes a variadic list of arguments, but it requires an array of environment strings, like execve

Finally, there is one more pair of functions, execvpe and execlpe which combine the properties of the e functions with the properties of the p functions, accepting an additional environment variable argument, and performing command lookup if the pathaname argument doesn’t contain any slashes.

Note

execvpe is a nonstandard GNU-extension.

Consider the following program, xargs.c, which is a basic implementation of the xargs utility. It repeatedly executes the command that is passed to it, with any prefix arguments, followed by additional arguments taken as white-space separated tokens on standard input, up to the argument limit ARG_MAX,

xargs.c

#include <assert.h>
#include <ctype.h>
#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <wait.h>

/* Maximum arguments to pass to each instance of a child process */
#ifndef ARG_MAX
#define ARG_MAX 1023
#endif

/* Command default when argc < 2 */
#ifndef DFL_CMD
#define DFL_CMD "echo"
#endif

char *args[ARG_MAX + 1] = {DFL_CMD, 0};
int args_start = 1;

/** Read one whitespace-delimited token from strem */
char *read_token(FILE *stream);

int
main(int argc, char *argv[])
{
  if (argc > 1) {
    memcpy(args, &argv[1], sizeof *argv * (argc - 1));
    args_start = argc - 1;
  }

  for (int eof = 0; !eof;) {
    for (int i = args_start; i < ARG_MAX; ++i) {
      /* Clean up if there is something stored here
       * from a previous iteration */
      free(args[i]);

      char *tok = read_token(stdin);
      if (!tok) {
        if (feof(stdin)) {
          /* On end of input, null-terminate the args array
           * and set the eof flag */
          clearerr(stdin);
          args[i] = 0;
          eof = 1;
          break;
        }
        err(1, "read token");
      }

      /* Store a copy of the next token */
      args[i] = strdup(tok);
    }
    switch (fork()) {
      case -1:
        /* Error */
        err(1, "fork");
      case 0:
        /* Child process */
        execvp(args[0], args);
        err(1, "exec"); /* catch exec() failure */
    }

    /* Wait for the child process to finish */
    int status = 0;
    wait(&status);
    if (WIFEXITED(status) && WEXITSTATUS(status) == 255) {
      errx(124, "%s: exited with status 255; aborting", args[0]);
    }
  }
}

char *
read_token(FILE *stream)
{
  static char *line = 0, *sp = 0;
  static size_t line_n = 0;

  for (;;) {
    /* Skip whitespace */
    if (sp) {
      for (; isspace(*sp); sp++);
      if (*sp) break; /* Found token */
    }
    /* Reached EOL; read a new line */
    ssize_t line_len = getline(&line, &line_n, stream);
    if (line_len < 0) return 0;
    sp = line;
  }
  assert(sp); /* Sanity check */

  /* Store location of the token */
  char *retval = sp;

  /* Scan to end of token */
  for (; !isspace(*sp); ++sp);
  /* Terminate token */
  if (*sp) *sp++ = '\0';
  return retval;
}

Compiled with a small ARG_MAX value of 3 (program name and two additional arguments), we can see the effect below, when the alphabet is piped into it,

$ CPPFLAGS=-DARG_MAX=3 make xargs
cc  -DARG_MAX=3   xargs.c   -o xargs
$echo {a..z} | ./xargs
a b
c d
e f
g h
i j
k l
m n
o p
q r
s t
u v
w x
y z

In this output, we can surmise that echo is being executed thirteen times, printing two letters of the alphabet each time. The xargs utility is used for performing batch operations. For example, the find utility writes a list of files matching a criteria to standard out. Piping this into xargs allows one to easily perform operations with those files. Suppose we wish to delete object files after compiling a program, the following operation, find . -name '*.o' | xargs rm --, invokes rm with many file names, repeatedly until exhausting the list. In contrast, removing the files one-by-one would be much less efficient due to the overhead of spawning a new process for each file.

Of orphans, zombies and daemons

Parent processes usually wait for their children to end execution, and when they do, the parent process requests their removal from the table of running processes managed by the kernel. Consequently, all resources allocated to the children processes are released, to be used by other processes, and their entry in the table of running processes is removed.

Children processes whose parents terminate before themselves are considered orphans. Because no process may be left without a parent, the init process “grandfathers” all orphaned children. The init process becomes the parent ID of these processes. The orphaned processes are then “waited” by the init process, and when they finish, init removes them from the table of running processes managed by the kernel and deallocates its resources.This is the reason why the init process is also known as the reaper process.

If a child process terminates before the parent process issues the wait system call, the kernel will still release its resources, but a record of the child process will remain in the table of running processes. This allows the parent a chance to issue the wait system call at a later stage and receive a reply from the terminated process. Because in practice the child process no longer exists, but there is still a record of it in the table of running processes, this process becomes what is known as a zombie process. Zombie processes cannot be killed by normal means, and the only way to eliminate them is to issue a call to wait. If zombie processes are not removed from the table of running processes they may remain there indefinitely. Zombie processes in which the parent terminates without issuing the wait system call become orphan processes, and therefore are waited by the init process that removes them from the table of running processes.

Other important processes in the operation of Linux systems are daemons. Daemons are special background processes, unconnected to a terminal, that provide special services for the system. Because these services may be needed at any point of the system operation, they are usually running from the system start-up through shut down. Examples of daemons are the processes that control networked printers, the ones that allow secure remote login to the system, and the ones that handle requests to web page servers.

Process Resource Limits

Resources in a computer are limited. For example, there is just a certain amount of memory in RAM, a certain storage capacity in files, and a certain amount of time to execute programs. All these limits can be set, so that all users can fairly share these resources. To do so, the following two system calls from sys/resource.h can be used:

int getrlimit(int resource, struct rlimit *rlp);
int setrlimit(int resource, const struct rlimit *rlp);

The getrlimit system call retrieves the limits of the resource operand into a rlimit structure, while the setrlimit system call sets the limits for the resource operand from the rlimit structure.

The various system resources are codified with a number that is also identified by a name. Some of the most useful resource names are shown below:

Resource Name	Description
RLIMIT_CPU	CPU time in seconds
RLIMIT_AS	Process virtual memory size in bytes
RLIMIT_STACK	Size of the stack segment in bytes
RLIMIT_DATA	Process data segment in bytes
RLIMIT_NOFILE	Maximum number of file descriptors plus 1
RLIMIT_FSIZE	File size in bytes
RLIMIT_NPROC	Number of processes open for the real user ID

The struct rlimit type has the following members,

struct rlimit {
         rlim_t  rlim_cur;  /* Soft limit */
         rlim_t  rlim_max;  /* Hard limit (ceiling for rlim_cur) */
      };

We use this structure, because every user may have a hard limit for a resource over which it cannot go. That is the rlim_max. However, a user may be using less than that limit, therefore the current limit is specified by rlim_cur. Privileged users, like the superuser or root, may modify these limits as they see fit, as long as the current limit is smaller than or equal to the hard limit. Other users may lower the current limit, or raise it up to the hard limit, but cannot go over it.

The following figure shows the program show_limits.c that displays the soft and hard limits of some resources. This program uses a print_rlimit function taken from “The Linux Programming Interface” by Michael Kerrisk (recommended textbook for this course). This function uses the getrlimit system call to obtain the limits , but before printing them, it checks if they are too large, and if so it prints “infinite” as the value. Similarly, it also checks if the specific Linux implementation has a limit that is beyond the number allowed to be stored in the rlim_t data type. If that is the case, the rlim_cur may hold a predefined RLIM_SAVED_CUR value, and equivalently, the rlim_max may hold a predefined RLIM_SAVED_MAX value. The detection of these values will make the function to print “unrepresentable” instead. The bottom cell shows the program running for a privileged user.

show_limits.c

#include <stdio.h>
#include <sys/resource.h>

// Function print_r_limit taken from Listing 36-2 Page 758 from
// The Linux Programming Interface by Michael Kerrisk
int print_r_limit(char const *, int);

int
main(int argc, char *argv[])
{
  print_r_limit("CPU Time limits (seconds):", RLIMIT_CPU);
  print_r_limit("Process Virtual Memory limits (bytes):", RLIMIT_AS);
  print_r_limit("Stack Segment limits (bytes):", RLIMIT_STACK);
  print_r_limit("Process Data Segment limits (bytes):", RLIMIT_DATA);
  print_r_limit("Number of File Descriptors (+1):", RLIMIT_NOFILE);
  print_r_limit("File Size (bytes):", RLIMIT_FSIZE);
  print_r_limit("Number of Processes allowed to open:", RLIMIT_NPROC);
}

/* Print 'msg' followed by limits for 'resource' */
int
print_r_limit(char const *msg, int resource)
{
  struct rlimit rlim;

  if (getrlimit(resource, &rlim) == -1) return -1;

  printf("%s soft=", msg);
  if (rlim.rlim_cur == RLIM_INFINITY) printf("infinite");
#ifdef RLIM_SAVED_CUR /* Not defined on some implementations */
  else if (rlim.rlim_cur == RLIM_SAVED_CUR) printf("unrepresentable");
#endif
  else printf("%lld", (long long)rlim.rlim_cur);

  printf("; hard=");
  if (rlim.rlim_max == RLIM_INFINITY) printf("infinite\n");
#ifdef RLIM_SAVED_MAX /* Not defined on some implementations */
  else if (rlim.rlim_max == RLIM_SAVED_MAX) printf("unrepresentable");
#endif
  else printf("%lld\n", (long long)rlim.rlim_max);

  return 0;
}

$ ./show_limits
CPU Time limits (seconds): soft=infinite; hard=infinite
Process Virtual Memory limits (bytes): soft=infinite; hard=infinite
Stack Segment limits (bytes): soft=8388608; hard=infinite
Process Data Segment limits (bytes): soft=infinite; hard=infinite
Number of File Descriptors (+1): soft=1024; hard=524288
File Size (bytes): soft=infinite; hard=infinite
Number of Processes allowed to open: soft=62720; hard=62720