Right back in the very first instalment we described the Unix philosophy as being Lego-like, that is, having lots of simply commands that do one thing well, and then assembling them together to do something really powerful. So far, we’ve only been working with a single command at a time, but that changes with this instalment. We’ll be introducing the concept of streams, which can be used to connect commands and files together.

Listen Along: Taming the Terminal Podcast Episode 15

Streams

Before we can get into the nitty-gritty of chaining commands together, we need to introduce a new concept, that of a stream of data. Quite simply, a stream is a sequential flow of data – everything that goes in one end comes out the other, and it always comes out in the same order it went in.

In a Unix/Linux environment there are three standard streams:

  • STDOUT (Standard Out): when working on the command line this stream is usually connected to the terminal – anything written to that stream is printed in the terminal window. Within applications this stream is usually connected to a log file.
  • STDERR (Standard Error): this is another output stream, but one reserved for error messages. When working on the command line this stream is usually also connected to the terminal in the same way that STDOUT is. Within Applications this stream is usually connected to a log file, though often a different log file to STDOUT.
  • STDIN (Standard In): this stream is used for input rather than output. When working at the command line it is usually connected to the keyboard. Within Applications this stream could be attached to anything really. E.g. within a web server it is connected to the HTTP request data sent to the server by the browser.

Many Unix/Linux commands can take their input from STDIN, and just about every command will write it’s output to STDOUT and/or STDERR. This allows commands to be strung together by the simple act of redirecting these streams – if you redirect the output of one command to the input of another, you have chained them together!

Remember that every process has it’s own environment, and therefore, it’s own version of the three standard streams. The redirection operators alter the copies of these three variables within individual processes, so, from the command’s point of view, it always reads from STDIN, and always writes to STDOUT and/or STDERR, but where those streams flow to is determined by their environment.

Stream Redirection

BASH provides a number of stream redirection operators:

  • | (the ‘pipe’ operator) – this operator connects STDOUT in the environment of the command to it’s left to STDIN in the environment of the command to it’s right.
  • > and >> – these operators connect STDOUT in the environment of the command to their left a file at the path specified to their right
  • < – this operator connects the contents of the file at the path specified to it’s right to STDIN in the environment of the command to it’s left

The | operator is probably the most used of the three, as it allows straight-forward command chaining. It’s also the simplest. For that reason we’re going to focus solely on the | operator in this instalment, leaving the file-related operators until the next instalment.

The | Operator in Action

To facilitate an example, lets introduce a simple command for counting words, characters, or lines – wc (word count).

A quick look at the man page for wc shows that it counts lines when used with the -l flag. Something else you might notice is that the command OPTIONALLY takes an argument of one or more file paths as input. This means we can count the number of lines in the standard Unix/Linus hosts file with the command:

But why is the list of file paths optional? What could the command possibly count if you don’t point it at a file? The answer can be found further down in the man page:

The wc utility displays the number of lines, words, and bytes contained in each input file, or standard input (if no file is specified) to the standard output.

In other words, wc will read it’s input from STDIN if no file is specified, and, no matter what the input source, it will write it’s results to STDOUT.

Now that you know about the three standard streams, you’ll start to see them in man pages all over the place. E.g., you’ll find the following in the man page for ls:

By default, ls lists one entry per line to standard output; the exceptions are to terminals or when the -C or -x options are specified.

Let’s combine our knowledge of ls, wc, streams, and stream redirection to build a command to determine how many files or folders there are on our Desktop:

NOTE – the | operator ONLY redirects STDOUT to STDIN, it has no effect on STDERR, so if there is an error generated by the ls command that error message will go to the screen, and not to the wc command. To illustrate this point, lets try to count the number of files in a folder that does not exist:

We see the error message generated by ls on the screen, and it is one line long, but, wc never saw that line because it was printed to STDERR, so instead of saying there is 1 file in this fictitious folder, it tells us, correctly, that there are no files in that folder.

Special Stream Files

You’ll often hear Unix nerds tell you that in Unix, everything is a stream. This is because deep down, Unix (and Linux) treat files as streams. This is especially true of some special files which really are streams of data rather than pointers to data on a hard disk. Special files of this type have a c (for character special file, i.e. a character stream) as the first letter in their permission mask in the output of ls -l. E.g.:

There are many such files in /dev/ on a Unix/Linux machine, but it is a VERY bad idea to write to or read from any you don’t understand. These files are generally connected directly to some piece of hardware in your system, including low-level access to your hard drives, so you could destroy important data very easily (thankfully you need to be root to write to the hard drive connected ‘files’).

There are a few safe special files that I want to mention though:

  • /dev/null: this is effectively a black hole – you use this file to redirect output into oblivion. (more on this in the next instalment)
  • /dev/random or /dev/urandom: both of these streams output random data 8 bytes at a time. The difference between the two is that /dev/random does not care how much entropy the OS has built up (i.e. how good the randomness is), it will output what it has, while /dev/urandom will pause output when the level of entropy gets too low, and only resume when the entropy pool has built up again. In other words, /dev/random promises speed, but not quality, while /dev/urandom promises quality but not speed.
  • /dev/zero: this stream outputs a constant flow of zeros.

As an example, lets use /dev/urandom to generate 10 random characters.

Before we can begin there are two complications that we need to understand. Firstly, these special streams have no beginning or end, so we have to be sure to always read from then in a controlled way – if you ask a command like cat to print out the contents of such a file it will never stop, because cat continues until it reaches the end of file marker, and these special ‘files’ have no end! Also, /dev/urandom does not output text characters, it outputs binary data, and while some combinations of binary data map to characters on our keyboards, most don’t, so we will need to convert this stream of binary data into a stream of text characters.

We can overcome the first of these limitations by using the head command we met in part 11 of this series. Previously we’ve used head to show us the first n lines of a file, but we can use the -c flag to request a specific number of characters rather than lines.

The second problem can be overcome with the base64 command, which converts binary data to text characters using the Base64 encoding algorithm. A quick look at the man page for base64 shows that it can use streams as well as files:

With no options, base64 reads raw data from stdin and writes encoded data as a continuous block to stdout.

putting it all this together we can assemble the following command:

This is nearly perfect, but, you’ll notice that the output always end with ==, this is the Base64 code for ‘end of input’. We can chop that off by piping our output through head one more time to return only the first 10 characters:

This will print only the 10 random characters, and nothing more. Since this command does not print a new-line character, it leaves the text stuck to the front of your prompt which is messy. To get around this you can run echo with no arguments straight after the above command:

Note we are NOT piping the output to echo, the symbol used is ;, which is the command separator, it denotes the end of the previous command and the start of the next one, allowing multiple separate commands to be written on one line. The commands will be executed in order, one after the other.

Finally, because we need to use the same number of characters in both head commands, we could use command line variables to make this command more generic and to make it easier to customise the number of characters:

Conclusions

In this instalment we have introduced the concept of streams, particularly the three standard streams provided by the environment, STDOUT, STDERR, and STDIN. We’ve seen that these streams can be redirected using a set of operators, and that this redirection provides a mechanism for chaining commands together to form more complex and powerful commands. We’ve been introduced to the concept of using files as input and output, but have not looked at that in detail yet. We’ve also not yet looked at merging streams together, or independently redirecting STDOUT and STDERR to separate destinations – this is what’s on the agenda for the next instalment.