Multiprogramming
Multiprogramming is the ability of an operating system to hold more than one process running in memory at the same time. When we open a text editor to write a program, and at the same time we open a Web Browser to review a website, while listening to music from our favorite streaming service, all done in the same computer, we are using multiprogramming. Each one of these activities may generate one or more processes that are trying to run on the computer. From the point of view of the user, all these processes may appear to be running at the same time, but all of them require access to a CPU to perform their instructions. In the simplest case with a single CPU in the computer, all processes will take turns accessing it. The operating system is the one deciding which process accesses the CPU at any given time and for how long. Therefore, the decisions that it makes will affect the performance of each one of the applications running in the system. This is one of the main responsibilities for the operating system, as a CPU scheduler.
Forking
In Linux, multiprogramming is achieved by a technique called
forking. This requires that a process “clones” itself using the
fork
system call with no parameters. When a process makes this
call, the kernel creates a copy of the current process that will run
independently and concurrently with the original process and all other
processes already in the system. The newly created process (called the
child process) receives copies of all segments from the original process
(called the parent process), as well as copies of the file descriptors.
After the call to fork, both processes may begin execution of the next
sentence in the program. If no allowances are made, both processes may
execute the same sentences when they take control of the CPU. However,
the call to fork returns different values for parent and child
processes. The parent process receives the process ID from the children
just created, however the children process receives a value of zero.
This fact can be used to request each process to perform different
instructions. simple_fork.c shows a process that forks. The value this system call returns is
used in a switch statement to assign different instructions to parent
and child processes. Notice that if the call to fork returns a -1 if the
call fails. The fork system call is declared in the unistd.h
header.
#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>
int
main(int argc, char *argv[])
{
pid_t pid;
pid_t fork_result; // status of the fork
switch (fork_result = fork()) {
case -1: /* error */
err(1, "fork");
/* The child proc sees the return value from fork() as 0 */
case 0:
/* Child Zone */
pid = getpid();
printf("I am the child process.\n"
"\tThe fork gave me a %d, but my PID is %d\n",
fork_result,
pid);
/* Typically, the child has its own exit path distinct from the parent,
* though it could also be allowed to continue executing past the switch
* block */
exit(0);
/* The parent proc sees the return value from fork() as the child's pid */
default:
/* Parent zone */
pid = getpid();
printf("I am the parent process.\n"
"\tThe fork gave me my child PID (%d), but my own PID is %d\n",
fork_result,
pid);
break;
}
/* Parent waits for child to exit */
wait(0);
/* Parent exit */
}
When the program runs, it produces a result like:
$ ./simple_fork
I am the parent process.
The fork gave me my child PID (97160), but my own PID is 97159
I am the child process.
The fork gave me a 0, but my PID is 97160
Since the parent and child run in parallel after forking, it is also possible to observe,
$ ./simple_fork
I am the parent process.
The fork gave me my child PID (97160), but my own PID is 97159
I am the child process.
The fork gave me a 0, but my PID is 97160
As previously indicated, copies of the file descriptors from a parent process are given to the child process. Because they are copies, if any of the processes modifies the attributes of the files they reference, like offset placement or access mode, all these changes will be visible and relevant to both processes. If the programmer wants to keep different attributes, they should create new file descriptors after fork.
Even though a child process should receive copies of the page table after fork, this is actually done only when the children require a change in any of them. This strategy is called “copy-on-write” (CoW) and it avoids doing copies of resources that may not be needed. Before that, both parent process and children process just keep references to the same segments. If any of the processes requires to update the stack or heap, or maybe load a different set of instructions for the text, this is when the kernel will generate the copy with the appropriate change. This is particularly useful in cases where a child is created to run a different program than the one in the parent process, as we will see later, since it avoids expensive copying of a process image that is about to be replaced.
A typical application of forking is to have a main program that generates tasks, each one to be handled by a different child process. The parent process waits for these processes to complete. The following listing shows the basic structure of such an application in a program called fork_cycle.c.
#include <err.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>
int
main(int argc, char *argv[])
{
pid_t pid;
int fork_limit; /* Number of requested forks */
pid_t child_exit_pid; /* PID of returning child */
int child_exit_count; /* Number of children finishing its tasks */
int time = 0; /* Sleeping time for a child process */
int i; /* Counter */
if (argc != 2 || ((fork_limit = atoi(argv[1])) < 1)) {
err(1, "Usage: %s FORK_LIMIT", argv[0]);
exit(1);
}
/* Create fork_count child processes */
for (i = 0; i < fork_limit; i++) {
switch (fork()) {
case -1:
err(1, "fork");
case 0: // Child prints, sleeps and exits
pid = getpid();
fprintf(stderr,
"Process %d created. Waiting %d seconds.\n",
pid,
fork_limit - i);
sleep(fork_limit - i);
exit(0);
}
}
child_exit_count = 0;
for (;;) {
child_exit_pid = wait(NULL);
if (child_exit_pid < 1) {
if (errno == ECHILD) {
/* No more children */
errno = 0;
break;
}
err(1, "wait");
}
child_exit_count++;
printf("Child #%d (PID %d) exited\n",
child_exit_count,
child_exit_pid);
}
printf("All children have exited!\n");
exit(0);
}
$ ./fork_cycle 7
Process 98716 created. Waiting 6 seconds.
Process 98715 created. Waiting 7 seconds.
Process 98717 created. Waiting 5 seconds.
Process 98718 created. Waiting 4 seconds.
Process 98720 created. Waiting 2 seconds.
Process 98719 created. Waiting 3 seconds.
Process 98721 created. Waiting 1 seconds.
Child #1 (PID 98721) exited
Child #2 (PID 98720) exited
Child #3 (PID 98719) exited
Child #4 (PID 98718) exited
Child #5 (PID 98717) exited
Child #6 (PID 98716) exited
Child #7 (PID 98715) exited
All children have exited!
This program expects an integer parameter to indicate the number of
children to be created. A loop creates these children. Each one could be
performing a different task, but in this program, they just pause for a
different number of seconds with the sleep
system call and then exit.
The parent process waits for them with the wait
system call in a loop; wait
returns the process id of the exiting child process. If no more children remain, it fails and sets errno = ECHILD
. The parent process increments the count of returning
children after any new one is received. The loop ends when all children
have returned.
The exec() system call
Sometimes it is necessary for a child process to forgo the sets of
instructions it inherited from its parent process and open a complete
new program.This can be done with the exec
system call. Once this
call is made, the kernel will reload the virtual memory address space of
the child process with a new program and update the registers to start
the process. However, many other elements inherited from the parent
process will remain in place. The process will keep its PID, as well as
its parent ID, process group IDs, and session ID. It also keeps all the
environment variables inherited from the parent process and the file
descriptors with all its settings.The process will execute the new code
and when it ends, it will exit and never come back to the parent
process. There will be no notification of completion or return to the
parent process.
In reality, the exec
system call does not exist. This is just the
generic name for a set of system calls, all beginning with the exec
prefix. The most basic system call is execve
. All other variants
are convenient ways to prepare the call to execve
with different
parameters. Basically, all these system calls require three main
parameters: 1.The location and name of the program to be executed, 2.
The set of all the strings containing the arguments for the program to
be executed, just like the argv
parameter that would be created for
that program, and 3.The environment variables that are passed to the
program to be executed.The variants of this system call prepare these
parameters in different ways. It is easier to remember all of them if we
relate their names with the parameters they request as described below.
The first decision a programmer has to make on selecting an exec
system call is to decide if the arguments for the program to be executed
are going to be supplied together arranged inside an array (just like
argv[]
) or if they are going to be supplied separately in an
explicit list. In the first case we must select a system call starting
with the execv
prefix, for a vector (another name for an array). In
the second case we must select a system call starting with the execl
prefix, for a list. These system calls have the following prototypes:
execv(const *pathname, char *const argv[])
int execl(const char *pathname, const char *arg, ...
/*, (char *) NULL */);
Both require a pointer to the pathname and filename of the program to be executed as the first parameter. The pathname can be absolute or relative. The arguments are passed in either as an array of char *
, or as a variadic list of char *
. The first argument should be the name of the program–usually the same as pathname
. Next follow the remaining arguments, and finally a null pointer to indicate the end of the array or argument list. Notice that there is no “number of arguments” argument–these functions have no way to know when they’ve reached the end of the list if you forget a null pointer! (this is a very common mistake).
Note
In some cases, the first argument is not the same as the file path. For example, some monolithic programs present themselves as two separate programs. An example is the program vim which will change its behavior when executed with its first argument as vi. This is also used in other cases, such as to have a single utility implement both zip and unzip. Taken to an extreme, the BusyBox project packages an entire set of POSIX utilities into a single monolithic program for use on resource-constrained embedded systems. All utilities in /bin/
are just links to this one program, which basically has a massive lookup table based on the value of argv[0]
.
Aside from execl
and execv
, there are two other sets of functions in the exec
family; the first are execlp
and execvp
. These two are the most commonly used in practice; they differ from those above in that they perform command lookup by searching the directories listed in the PATH
environment variable–just like the shell does–if the pathname
argument doesn’t contain any slashes (/
). Otherwise the behavior is identical.
The second set of functions are execle
and execve
. These functions allow the caller to pass in a custom set of environment variables, rather than have them inherited by the child process. They both have the same function signature as the previous functions, with an additional final argument which is a pointer to a null-terminated list of null-terminated environment strings. These are typically used for security purposes, to empty the environment of a child process before it is executed, so that things such as the user’s name and other configuration settings aren’t leaked to untrusted processes,
int execle(const char *pathname, const char *arg, ... /*, (char *) NULL, char *const envp[] */);
int execve(const char *file, char *const argv[], char *const envp[]);
Note
execle
takes a variadic list of arguments, but it requires an array of environment strings, like execve
Finally, there is one more pair of functions, execvpe
and execlpe
which combine the properties of the e
functions with the properties of the p
functions, accepting an additional environment variable argument, and performing command lookup if the pathaname
argument doesn’t contain any slashes.
Note
execvpe
is a nonstandard GNU-extension.
Consider the following program, xargs.c, which is a basic implementation of the xargs utility. It repeatedly executes the command that is passed to it, with any prefix arguments, followed by additional arguments taken as white-space separated tokens on standard input, up to the argument limit ARG_MAX
,
#include <assert.h>
#include <ctype.h>
#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <wait.h>
/* Maximum arguments to pass to each instance of a child process */
#ifndef ARG_MAX
#define ARG_MAX 1023
#endif
/* Command default when argc < 2 */
#ifndef DFL_CMD
#define DFL_CMD "echo"
#endif
char *args[ARG_MAX + 1] = {DFL_CMD, 0};
int args_start = 1;
/** Read one whitespace-delimited token from strem */
char *read_token(FILE *stream);
int
main(int argc, char *argv[])
{
if (argc > 1) {
memcpy(args, &argv[1], sizeof *argv * (argc - 1));
args_start = argc - 1;
}
for (int eof = 0; !eof;) {
for (int i = args_start; i < ARG_MAX; ++i) {
/* Clean up if there is something stored here
* from a previous iteration */
free(args[i]);
char *tok = read_token(stdin);
if (!tok) {
if (feof(stdin)) {
/* On end of input, null-terminate the args array
* and set the eof flag */
clearerr(stdin);
args[i] = 0;
eof = 1;
break;
}
err(1, "read token");
}
/* Store a copy of the next token */
args[i] = strdup(tok);
}
switch (fork()) {
case -1:
/* Error */
err(1, "fork");
case 0:
/* Child process */
execvp(args[0], args);
err(1, "exec"); /* catch exec() failure */
}
/* Wait for the child process to finish */
int status = 0;
wait(&status);
if (WIFEXITED(status) && WEXITSTATUS(status) == 255) {
errx(124, "%s: exited with status 255; aborting", args[0]);
}
}
}
char *
read_token(FILE *stream)
{
static char *line = 0, *sp = 0;
static size_t line_n = 0;
for (;;) {
/* Skip whitespace */
if (sp) {
for (; isspace(*sp); sp++);
if (*sp) break; /* Found token */
}
/* Reached EOL; read a new line */
ssize_t line_len = getline(&line, &line_n, stream);
if (line_len < 0) return 0;
sp = line;
}
assert(sp); /* Sanity check */
/* Store location of the token */
char *retval = sp;
/* Scan to end of token */
for (; !isspace(*sp); ++sp);
/* Terminate token */
if (*sp) *sp++ = '\0';
return retval;
}
Compiled with a small ARG_MAX
value of 3 (program name and two additional arguments), we can see the effect below, when the alphabet is piped into it,
$ CPPFLAGS=-DARG_MAX=3 make xargs
cc -DARG_MAX=3 xargs.c -o xargs
$echo {a..z} | ./xargs
a b
c d
e f
g h
i j
k l
m n
o p
q r
s t
u v
w x
y z
In this output, we can surmise that echo is being executed thirteen times, printing two letters of the alphabet each time. The xargs utility is used for performing batch operations. For example, the find utility writes a list of files matching a criteria to standard out. Piping this into xargs
allows one to easily perform operations with those files. Suppose we wish to delete object files after compiling a program, the following operation, find . -name '*.o' | xargs rm --
, invokes rm with many file names, repeatedly until exhausting the list. In contrast, removing the files one-by-one would be much less efficient due to the overhead of spawning a new process for each file.
Of orphans, zombies and daemons
Parent processes usually wait for their children to end execution, and when they do, the parent process requests their removal from the table of running processes managed by the kernel. Consequently, all resources allocated to the children processes are released, to be used by other processes, and their entry in the table of running processes is removed.
Children processes whose parents terminate before themselves are considered orphans. Because no process may be left without a parent, the init process “grandfathers” all orphaned children. The init process becomes the parent ID of these processes. The orphaned processes are then “waited” by the init process, and when they finish, init removes them from the table of running processes managed by the kernel and deallocates its resources.This is the reason why the init process is also known as the reaper process.
If a child process terminates before the parent process issues the
wait
system call, the kernel will still release its resources, but
a record of the child process will remain in the table of running
processes. This allows the parent a chance to issue the wait
system call at a later stage and receive a reply from the terminated
process. Because in practice the child process no longer exists, but
there is still a record of it in the table of running processes, this
process becomes what is known as a zombie process. Zombie processes
cannot be killed by normal means, and the only way to eliminate them is
to issue a call to wait
. If zombie processes are not removed from
the table of running processes they may remain there indefinitely.
Zombie processes in which the parent terminates without issuing the
wait
system call become orphan processes, and therefore are waited
by the init process that removes them from the table of running
processes.
Other important processes in the operation of Linux systems are daemons. Daemons are special background processes, unconnected to a terminal, that provide special services for the system. Because these services may be needed at any point of the system operation, they are usually running from the system start-up through shut down. Examples of daemons are the processes that control networked printers, the ones that allow secure remote login to the system, and the ones that handle requests to web page servers.
Process Resource Limits
Resources in a computer are limited. For example, there is just a
certain amount of memory in RAM, a certain storage capacity in files,
and a certain amount of time to execute programs. All these limits can
be set, so that all users can fairly share these resources. To do so,
the following two system calls from sys/resource.h
can
be used:
int getrlimit(int resource, struct rlimit *rlp);
int setrlimit(int resource, const struct rlimit *rlp);
The getrlimit
system call retrieves the limits of the resource
operand into a rlimit
structure, while the setrlimit
system
call sets the limits for the resource
operand from the rlimit
structure.
The various system resources are codified with a number that is also identified by a name. Some of the most useful resource names are shown below:
Resource Name |
Description |
---|---|
RLIMIT_CPU |
CPU time in seconds |
RLIMIT_AS |
Process virtual memory size in bytes |
RLIMIT_STACK |
Size of the stack segment in bytes |
RLIMIT_DATA |
Process data segment in bytes |
RLIMIT_NOFILE |
Maximum number of file descriptors plus 1 |
RLIMIT_FSIZE |
File size in bytes |
RLIMIT_NPROC |
Number of processes open for the real user ID |
The struct rlimit
type has the following members,
struct rlimit {
rlim_t rlim_cur; /* Soft limit */
rlim_t rlim_max; /* Hard limit (ceiling for rlim_cur) */
};
We use this structure, because every user may have a hard limit for a
resource over which it cannot go. That is the rlim_max
. However, a
user may be using less than that limit, therefore the current limit is
specified by rlim_cur
. Privileged users, like the superuser or root,
may modify these limits as they see fit, as long as the current limit is
smaller than or equal to the hard limit. Other users may lower the
current limit, or raise it up to the hard limit, but cannot go over it.
The following figure shows the program show_limits.c that displays
the soft and hard limits of some resources. This program uses a
print_rlimit
function taken from “The Linux Programming Interface” by
Michael Kerrisk (recommended textbook for this course). This function
uses the getrlimit
system call to obtain the limits , but before
printing them, it checks if they are too large, and if so it prints
“infinite” as the value. Similarly, it also checks if the specific Linux
implementation has a limit that is beyond the number allowed to be
stored in the rlim_t
data type. If that is the case, the rlim_cur
may
hold a predefined RLIM_SAVED_CUR
value, and equivalently, the
rlim_max
may hold a predefined RLIM_SAVED_MAX
value. The detection
of these values will make the function to print “unrepresentable”
instead. The bottom cell shows the program running for a privileged
user.
#include <stdio.h>
#include <sys/resource.h>
// Function print_r_limit taken from Listing 36-2 Page 758 from
// The Linux Programming Interface by Michael Kerrisk
int print_r_limit(char const *, int);
int
main(int argc, char *argv[])
{
print_r_limit("CPU Time limits (seconds):", RLIMIT_CPU);
print_r_limit("Process Virtual Memory limits (bytes):", RLIMIT_AS);
print_r_limit("Stack Segment limits (bytes):", RLIMIT_STACK);
print_r_limit("Process Data Segment limits (bytes):", RLIMIT_DATA);
print_r_limit("Number of File Descriptors (+1):", RLIMIT_NOFILE);
print_r_limit("File Size (bytes):", RLIMIT_FSIZE);
print_r_limit("Number of Processes allowed to open:", RLIMIT_NPROC);
}
/* Print 'msg' followed by limits for 'resource' */
int
print_r_limit(char const *msg, int resource)
{
struct rlimit rlim;
if (getrlimit(resource, &rlim) == -1) return -1;
printf("%s soft=", msg);
if (rlim.rlim_cur == RLIM_INFINITY) printf("infinite");
#ifdef RLIM_SAVED_CUR /* Not defined on some implementations */
else if (rlim.rlim_cur == RLIM_SAVED_CUR) printf("unrepresentable");
#endif
else printf("%lld", (long long)rlim.rlim_cur);
printf("; hard=");
if (rlim.rlim_max == RLIM_INFINITY) printf("infinite\n");
#ifdef RLIM_SAVED_MAX /* Not defined on some implementations */
else if (rlim.rlim_max == RLIM_SAVED_MAX) printf("unrepresentable");
#endif
else printf("%lld\n", (long long)rlim.rlim_max);
return 0;
}
$ ./show_limits
CPU Time limits (seconds): soft=infinite; hard=infinite
Process Virtual Memory limits (bytes): soft=infinite; hard=infinite
Stack Segment limits (bytes): soft=8388608; hard=infinite
Process Data Segment limits (bytes): soft=infinite; hard=infinite
Number of File Descriptors (+1): soft=1024; hard=524288
File Size (bytes): soft=infinite; hard=infinite
Number of Processes allowed to open: soft=62720; hard=62720