System Call Interface

The kernel is not, itself, a process, and there is no “kernel” process in a process listing. The kernel has full control over all system hardware, and sees beyond all abstractions that regular processes encounter. As mentioned earlier, when the operating system is started by a bootloader, the kernel actually does run at first as a regular program in a highly privileged mode of operation, but this is long before the abstractions of processes exist. Once system initialization is complete, the kernel sets up an init process, creates a page table for it, turns on address translation, drops privileges into user mode, and jumps to the entry point of the init process.

From this point on, the kernel remains loaded into memory in the background, but it can’t really be thought of as running program anymore. Instead, the kernel has a specially defined entry point called the system call interface, and this is mapped into each process’s virtual memory address space (with no read, write, or execute permissions) at a specific address. This address is stored in a special register on the CPU that requires elevated privileges to modify. The special AMD64 assembly instruction, syscall elevates the process’s privileges back to kernel mode and calls the function at the stored address. Other methods may achieve this behavior on different architectures; for example Linux uses the software interrupt, int 0x80, or sysenter instructions on IA32 architectures.

From here, the kernel performs various privileged operations on behalf of the requesting process, such as writing or reading files, interacting with other processes, allocating memory, and so on. When the kernel is done, it restores the processor state, drops privileges, and jumps back to the requesting process. As you can see, this is a lot of steps, so system calls are not cheap operations, but they are necessary in order to ensure security and allow processes to coexist on a system without trampling over each other.

One other thing to note is, processes cannot stay in user mode indefinitely, even if they don’t perform any system calls. The kernel schedules processes to run in short time slices and sets an alarm that automatically triggers a jump to the kernel’s scheduler code after a certain amount of time. A process can’t disable this alarm, so it can’t prevent the kernel from regaining control once in a while. The kernel can then jump back to the process, swap it out for another process that is waiting to run, or handle some asynchronous tasks like signal delivery and network data transfers. The kernel is also entered whenever a page fault occurs, and under various other situations that are outside the control of a process.