File System Structure
In Unix, the file system is organized into three layers–physical, virtual, and logical. Each layer builds upon the previous, with increasing degree of abstraction. The physical layer is responsible for implementing each actual file system–that is, translating the actual data that makes up a file system into a standardized file system interface. The virtual layer is responsible for combining multiple physical file systems into one coherent structure. Finally, the logical layer is responsible for providing the user-and-application-facing interface to the file system.
Physical File System
The physical file system consists of all the lower-level aspects of the file system ranging from actual physical hardware and device drivers, up to the file system driver in the kernel which interprets the raw file system data and produces a uniform file system interface consisting of abstract file system objects and operations. These abstract objects mirror traditional Unix file system structure on a physical data level–driver code for local file system implementations like the ext family of file systems often simply transfers this data directly to and from physical storage. In contrast, other file systems such as FAT or NFS, as well as virtualized file systems–which are dynamically generated in memory–may require significant on-the-fly translation into or generation of the appropriate structures that are understood by the upper layers of the file system stack.
Superblock
The superblock, also known as the file system metadata, is an object that describes the basic features and structure of a particular file system. The superblock contains important information such as:
The file system type, volume name, and mount location
The locations and sizes of blocks and inodes
The maximum filename length and file size
The location of the root inode
Additionally, the superblock exposes several file system operations, mostly centered around manipulating inodes and synchronizing the file system with storage.
Inode
An inode (index node) abstracts a file on disk. It contains meta-data about a particular file, such as its size, access rights, ownership, timestamps, and a list of block pointers that point to the actual data the file contains. Each inode has a unique numeric id, called the inode number or file serial number. Inodes 0-10 are reserved for special file system purposes; in particular, inode 2 usually corresponds to the root directory of a file system.
Note
An inode does not contain a filename. File names are an aspect of dentry objects, described below.
Dentry
In Unix, a directory is just a special type of file that contains a list of named references to other files. These references are abstractly represented by dentry (directory entry) objects, each of which associates a file name string with a superblock and inode number. Additionally, dentry objects contain lists of references to both parent and child dentry objects, allowing navigation across the file system as a connected tree of dentry objects.
File
A file object represents an open file, and exposes an abstract interface for various file operations such as seeking, reading, writing, opening and closing. Additional operations are also provided for special files like directories and symbolic links. The file structure contains information such as the cursor position, the access mode of the file, and a reference to the underlying inode.
Virtual File System
The virtual file system (VFS) acts as the intermediary between the physical file system and the logical file system that user-space applications interact with. The main purpose of the VFS is to merge multiple physical file systems into one coherent file system through a mechanism called mounting.
For example, a typical system installation might be spread across each of a boot, system, and user data partition. Additionally, the kernel generates several pseudo file systems to represent things like virtual and physical hardware devices, running processes, and runtime data. These distinct file systems are each mounted to particular paths, examples of some of which are shown in the following table:
File System |
Mount Point |
---|---|
system |
|
boot |
|
devices |
|
user data |
|
running processes |
|
runtime data |
|
VFS Cache
It would be impracticable to load all of the superblock, inode, and dentry objects for all mounted filesystems into memory all at once. Likewise, it would not be performant to make requests to the physical file system for every file operation. An important task of the VFS is in managing a cache of objects that were recently retrieved from the physical file system. As different locations in the VFS are accessed, the VFS seamlessly fills in missing entries in the cache by querying the appropriate physical file system.
In order to mount a file system, the VFS generates a dentry object that points to the root directory of that file system, and adds it to the VFS cache. The root file system is mounted by creating a dentry for /
that points to that file system’s root inode. Subsequent file systems are then mounted by creating dentry objects for their respective locations within the virtual file system. These explicitly generated dentry objects act to link together each file system into a unified tree structure, rooted at /
.
Logical File System
The logical file system is the familiar user-facing view of the file system. In the logical file system, a file is known by its path, which is a string consisting of path components delimited by the path separator /
, such as /bin/env/bash
, which consists of the components /
, bin/
, and bash
. Files also have properties, called file stats, such as permissions, timestamps, and size. When a file is opened within the logical file system, a structure called a file descriptor is allocated, and the open file is assigned a file descriptor number, which is used as a handle to perform further operations on that file.
Each of these components has a direct analogue within the VFS:
Each component of a path corresponds to a dentry object.
A file’s stats correspond to fields of its inode object.
An open file descriptor is a direct analogue of a file object.
The VFS translates logical file system operations into the appropriate operations on these underlying structures, and then returns the results in formats understood by the logical file system.
Logical File Types
In the logical file system, there are six standard types of files, each of which provides a unique interface.
Regular Files
A regular file is a persistent data storage unit that remains unchanged unless modified. Regular files support operations such as reading, writing, and seeking. This type of file is commonly used to store various types of data, such as text, images, or program code.
Directories
A directory contains a list of directory entries (dirents) which reference other files. The logical layer exposes a special interface for iterating over this list and does not provide direct access to the data contained within a directory.
FIFOs
FIFOs, also called named pipes, represent transient streams of data and are used for inter-process communication. Data written to a FIFO is made available on a first-in, first-out basis when reading. As data is read from a FIFO, it is discarded. As a result, FIFOs do not support seeking, only reading and writing.
Unlike other types of files, the data contained in a FIFO is stored in-memory by the kernel; when data is written to a FIFO, the kernel allocates a fixed-size buffer to hold the data, and services later read requests from this buffer.
Symbolic Links
A symbolic link is a type of file that contains a pathname and act as logical references to other files. Symbolic links transparently forward normal file operation to the pointed-at file. Special additional operations are provided for accessing the pathname stored in the symbolic link, itself.
Special Files
Special files, also called devices, are similar to regular files, except that they override normal file operations with custom driver code. These are used to represent hardware devices and pseudo devices using the standard file i/o interfaces. There are two types of special files: block and character.
Block devices are typically used to represent raw i/o on hardware storage, such as a hard disk drive, and they behave a lot like regular files–data is stable, and seeking is supported.
Character devices, on the other hand, represent a duplexed data stream, similar to a FIFO. These might be tied to i/o ports on a device, or represent dynamically generated data from pseudo devices like /dev/random
, /dev/zero
, /dev/null
, and so on.