Process Hierarchy

Processes are organized in a hierarchical structure, with each process having a unique process ID (PID) and zero or more child processes. This hierarchy forms a tree-like structure, with the specially designated init process at the root of the tree.

Init Process

The init process is the first process that is started by the operating system when it boots up–the name of the init program is hardcoded or passed to the kernel as a parameter by the bootloader. It has a PID of 1 and it is designed to run for the entire time that the system is running. It has special privileges, such as being able to catch the normally uncatchable SIGKILL and SIGSTOP signals. Any program can be designated to be run as the init process, and some simple operating systems simply spawn a basic shell as their init process. More complex systems usually have a dedicated init process that manages system services and performs various startup tasks.

After system startup, the init process usually goes to sleep and waits for other processes to communicate with it to request to start or stop some service, manage some system configuration, and so on.

Process Creation

The init process is started directly by the kernel at startup. Additional programs are executed through a two-step process called forking and executing.

Forking

A process can request to duplicate itself using the fork() system call; the kernel makes a new process that is a child of the calling process. The child is a near-copy of its parent, and continues executing from the same point in the program that the fork() call originated from, but the parent and child are able to tell themselves apart based on different return values from the fork() function, and can therefore behave differently after the fork() call.

Executing

A process can request to execute a new program using the exec() system call; the kernel replaces the process image with the requested program’s image, and resets the CPU context of the process to point to the entry point of the newly loaded program. Most other aspects of the process, called its environment, such as open files and signal dispositions, are preserved across this call.

Between calling fork() and exec(), the child process’s program performs any necessary setup tasks by modifying its own environment, while it is separate from the original parent, but still has control over execution. For example, it might change its open files before executing the new program, in order to change where i/o will be directed by the new program. In contrast, earlier systems that implemented fork-and-exec as a single step required each program to set up its own environment. This made programs more complex, and made it more tedious to pass data around since intermediate named files had to be used for everything. This design paradigm was one of the many important contributions that UNIX popularized in the world of operating systems.

Process Termination

When a process terminates, the kernel retains information about the exit conditions of the process, such as its exit status and whether it was killed by a signal. The parent process has a special relationship to its children in that it can query this exit information through the wait() system call, at which point the kernel releases the remaining resources associated with the child. If the parent process does not wait on its exited child process, the child remains in a “zombie” state, occupying those system resources after terminating.

If the parent process exits before waiting on its child, the child process becomes orphaned. The system recognizes orphaned processes and assigns them to the nearest ancestor process that is designated as a reaper process. The reaper process periodically waits on all of its “adopted” children in order to ensure that their resources are properly released, thus preventing the accumulation of zombie processes and the wasting of valuable system resources.

One important behavior of the init process is that it is always designated as a reaper process, ensuring that no process can ever not have a reaper process as an ancestor. It periodically wakes up and waits on its adopted children to prevent zombies from sticking around for too long. In fact, typically, init is the only active reaper process in a system. Many system services are intentionally orphaned through a special process called a double-fork, where an intermediate process is created, which forks again and then exits, leaving the grandchild process–which runs the service–orphaned, so that it will be adopted by init. This helps isolate system services from the processes that start them.

../../../../_images/reaper.webp

Reaper processes, such as init, clean up after their orphaned descendants when they become zombies. System services are typically intentionally orphaned to isolate them from the processes that spawn them.