Expansion

Before a command is executed, each of the command words undergo a process called expansion, where recognized special sequences of text within the word are replaced with an evaluated value. During this process, each word is split into fields, and all quoting characters are removed; these fields ultimately become the command name and arguments.

Expansion is fairly intuitive in simple cases; for example, the following expansion substitutes the value of the shell variable HOME into a command before the command is executed:

$ echo My home directory is: $HOME
My home directory is: /home/bennyb

The above example demonstrates parameter expansion; there are four other types of expansions which can occur, namely tilde expansion, command substitution, arithmetic expansion, and finally pathname expansion. Expansion is carried out in four distinct phases:

  1. Tilde expansion, parameter expansion, command substitution, and arithmetic expansion are performed in a left-to-right order on each word.

  2. The result of the preceding step undergoes field splitting where each expansion occurred to produce zero or more fields.

  3. Pathname expansion is performed on each resulting field, potentially generating additional fields.

  4. Quote removal is performed on each field.

If any fields result at the end of this process, the first will be interpreted as a command name, and the remaining fields as its arguments.

Tilde Expansion

Whenever an unquoted tilde appears at the beginning of a word, the tilde and any characters following it up to the next slash (/), if there is one, form a tilde prefix. For example, ~bennyb/documents/final_exam.pdf contains the tilde-prefix ~bennyb. The tilde prefix is replaced with the path to the named user’s home directory, which is queried from the system’s user database file /etc/passwd:

$ echo ~bennyb/documents/final_exam.pdf
/home/bennyb/documents/final_exam.pdf

If a name is omitted (e.g. ~/documents/final_exam.pdf), then the tilde prefix is replaced with the value of the HOME environment variable, which conventionally contains the pathname of the user’s own home directory:

$ echo ~/documents/final_exam.pdf
/home/bennyb/documents/final_exam.pdf

Note

Implicit tilde-expansion is subtly different from explicitly naming oneself, in that the value of the HOME variable can be set arbitrarily, while the explicit version of tilde expansion always queries /etc/passwd. This difference is sometimes exploited to implement switching between multiple configuration environments by storing each in a separate directory and modifying the value of HOME to point to the appropriate one for a particular environment.

If a tilde appears anywhere else in a word, or is quoted by any of the mechanisms described above, it retains its literal value:

$ echo Hello~World!
Hello~World!

The implicit form of tilde expansion is used frequently to refer to one’s own files, while the explicit form is generally used by administrators to quickly access particular users’ home directories.

Parameter Expansion

Parameter expansion takes the form ${expression}; in the simplest case, expression is the name of a shell parameter, and the sequence is replaced with the value of that parameter:

$ echo My home directory is: \"${HOME}\"
My home directory is: "/home/bennyb"

All parameters have values that are represented as strings, which may be zero length; parameters may also be unset, which often has the same effect as, but is subtly different from, having a zero-length value. There are three classes of shell parameters, namely the positional parameters, shell special parameters, and shell variables.

Positional and Special Parameters

The positional parameters are numbered starting from 1, and represent each of the parameters that the shell, itself, was invoked with; in general, no positional parameters are set in interactive shell sessions, so these variables are largely reserved for shell scripting, and their uses will be discussed in the next module.

There are eight shell special parameters, each of which is used to query information about the shell, and these are represented by single-character parameter names: @, *, #, ?, -, $, !, and 0. As with the positional parameters, these are rarely used outside of shell scripts, so we will discuss these in the next module as well.

The curly braces are optional for positional parameters 1 through 9, and for all of the special parameters, but required for positional parameters with more than one digit.

Shell Variables

Finally, shell variables are the set of user-defined, named[1] parameters. The curly braces are optional for variable names, in which case the shell matches the longest valid name following the $. We will see in a moment how Variable Assignment and parameter expansion facilitate storing and manipulating the results of expressions in order to be used later on.

Modified Parameter Expansion

As mentioned at the beginning of this subsection, the simplest case of parameter expansion is when expression is the name of a parameter; there are several expansion modifiers that can be added to a parameter expansion, which influence its behavior. These are used more often in shell scripting, but sometimes come in handy when working interactively; in all cases, word undergoes recursive expansion before being used:

parameter Set and Not Null

parameter Set But Null

parameter Unset

${parameter:-word}

substitute parameter

substitute word

substitute word

${parameter-word}

substitute parameter

substitute null

substitute word

${parameter:=word}

substitute parameter

assign word

assign word

${parameter=word}

substitute parameter

substitute null

assign word

${parameter:?word}

substitute parameter

error, exit

error, exit

${parameter?word}

substitute parameter

substitute null

error, exit

${parameter:+word}

substitute word

substitute null

substitute null

${parameter+word}

substitute word

substitute word

substitute null

${#parameter}

substitute string length of the value of parameter

substitute 0

${parameter%word}

substitute parameter with shortest suffix matching word removed.

${parameter%%word}

substitute parameter with longest suffix matching word removed.

${parameter#word}

substitute parameter with shortest prefix matching word removed.

${parameter##word}

substitute parameter with longest prefix matching word removed.

Command Substitution

Command substitution takes two forms, $(command), and `command`, with the major difference that the latter (backtick) version cannot nest. The contents of command are executed in a subshell[2] environment, and the resulting output to stdout is substituted, with any trailing newline characters removed.

Example:

$ echo My name is: \"$(whoami)\"
My name is: "bennyb"

Arithmetic Expansion

Arithmetic expansion takes the form $((expression)), and evaluates to a string representation of the result of expression. Arithmetic expansion supports a subset of the C language’s operators, mostly covering integer and bitwise arithmetic. The tokens within expression are recursively expanded prior to evaluation.

If a shell variable contains a value that forms a valid integer constant, then it may be used in the expression by its ordinary name:

$ echo $x
123
$ echo $((x + 5))
128

The values of shell variables may be modified by the use of assignment operators:

$ echo $x
123
$ echo $((x = x * 2))
246
$ echo $x
246

Field Splitting

Any characters that resulted from expansion and which are not enclosed in double quotes can be recognized as field delimiters or separators. The special shell variable IFS (Internal Field Separators) contains a list of characters which are recognized as such. If unset, IFS defaults to <space><tab><newline>.

The IFS whitespace characters, <space>, <tab>, and <newline>, if present in IFS are treated as field separators: any sequence of these characters at the beginning or end of the word are removed, and any sequence found within a word separate two fields.

Any non-IFS whitespace characters present in IFS are treated as field delimiters: any individual occurrence of one of these characters, along with any adjacent IFS whitespace characters, delimits a field.

For example, the special shell variable PATH contains a colon-delimited list of paths to directories containing system utilities; typically something like /bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin. We can see how changing the contents of IFS can be used to split the results of the expansion of the PATH variable:

$ echo $PATH
/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
$ IFS=:
$ echo $PATH
/bin /sbin /usr/bin/ usr/sbin /usr/local/bin /usr/local/sbin

We can closely inspect the effects of field splitting with the printf utility. This utility takes a format string as its first argument and then consumes subsequent arguments to according to the format specifiers present in the format string. It repeatedly processes the format string until its arguments are exhausted, as shown in the below example:

$ printf '[%s]\n' a b c d
[a]
[b]
[c]
[d]
$ printf '%s %s\n' a b c d
a b
c d

With the example with PATH above, we can use the %s\n format specifier to print each field as a separate line instead of separated by spaces as with the echo utility:

$ printf '%s\n' $PATH
/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
$ IFS=:
$ printf '%s\n' $PATH
/bin
/sbin
/usr/bin
/usr/sbin
/usr/local/bin
/usr/local/sbin

In most cases, field splitting is undesirable and can lead to confusing or unexpected results, such as when substituting in variables that contain whitespace. However, once mastered, it can be very useful for certain tasks. Generally, portions of words that contain expansions should be enclosed in double quotes to suppress field splitting, or the IFS variable should be set to a null string.

Pathname Expansion

Pathname expansion, also known as globbing or wildcard expansion, is a final expansion step where the shell treats each field that resulted from field splitting as a pattern. The presence of an unquoted *, ?, or [ in a field cause the shell to attempt to match it against file paths. An asterisk, *, matches any sequence of 0-or-more characters; a question mark, ?, matches exactly one character, and a bracket expression enclosed in [ and ] matches one of the enclosed characters. Additionally, a bracket expression may be negated in the form [!...].

Matches occur between slashes, /, and none of the above pattern elements can match a slash; if a slash appears after a [, but before the terminating ], then the [ is treated as an ordinary character. If any matching file paths are found, the field is replaced with one field for each matching file. Otherwise, it is left unchanged.

For example, * matches any file in the current directory, while */* matches any file in any subdirectory, and so on. Something like ??? matches any file with a three-character name, while ???* matches any file with at least three characters in its name. Something like file[[:digit:]].txt matches file1.txt, file2.txt, and so on, but not file10.txt, because [[:digit:]] can only match exactly one character.

One caveat is that files beginning with a dot, ‘.’, character are only matched by patterns that also begin with a ‘.’ character. This is because UNIX’s convention is that filenames beginning with a dot are hidden files, so they are ignored unless explicitly matched.

Quote Removal

Finally, any quote characters (backslash, single-, and double-quote) are removed if they are not, themselves, quoted.