Pipelines

Recall the core Unix philosophy, “do one thing, and do it well” (DOTADIW), which states that each utility should perform one task. This allows programmers to build more complex programs by combining many small utilities together using shell scripts, which we will discuss in the next module. However, even in an interactive session, a user may often want to have one utility process the output of another, without having to store intermediate results to a file using redirections. The shell offers a useful way to do this called a pipeline.

The pipeline operator, |, between two commands causes the stdout of the command on the left to be “piped in” as the stdin of the command on the right. Several commands can be chained together in this fashion to form long pipelines. Here is a real-world example:

$ cut -f 2 gradebook | sed 's/$/,/' | sort | sed 's/,$//' | uniq -c
  96 A
  28 A-
  18 B+
  22 B
   8 B-
  11 C+
   8 C
  10 C-
   7 D+
   6 D
   5 D-
  10 F

Here we are taking the second column of a file called “gradebook”, which contains letter grades, using the cut utility. The output of cut is fed into the sed utility to add a comma (‘,’) to the end of each line. The output of sort is then fed to sed again, this time to remove the added comma. Finally, the output of sed is fed into the uniq utility which, with the -c option, produces a count of each word it sees.

Why use sed to add a comma to each line? If I don’t, sort places plain letter grades before, rather than between, the plus/minus variants: “A”, “A-, “B”, “B+”, “B-”, “C”, “C+”, “C-”, etc. Conveniently, the ‘,’ character is character code 44 in ASCII, which is right between ‘+’ (43) and ‘-’ (45). Adding a ‘,’ to the end of each line causes the plain letter grades to sort between the + and - grades. The result is: “A,”, “A-,”, “B+,”, “B,”, “B-,”, etc. Then we use sed once more to remove the added ‘,’ from each line. Nifty, eh?

Pipeline Subshells

One very important thing to understand is that each command in a (multiple command) pipeline runs in a separate subshell environment, so variable assignments and redirections occurring in any of the pipelined commands do not affect the current shell environment or the environment of other commands. This is a very common source of confusion even among proficient users. This happens most often with the built-in read command, which reads a line of input and assigns it to a variable; if the read builtin command is used in a pipeline, it does consume input and store it in a variable, but it does so in a subshell environment and has no effect on any other commands in the pipeline or on the parent shell process:

$ read input
hello!
$ echo Input was: $input
Input was: hello!
$ echo 'test' | read input
$ echo Input was: $input
Input was: hello!

In the above example, the line echo 'test' | read input does assign “test” to the shell variable named input, but only in the subshell environment of the read command itself. We can see that the value of input remains “hello!” from the perspective of the parent shell. We will discuss ways to work around this when we cover shell scripting.