Pipes: How They Work and How to Use Them

In this post I'm going to use a very small program that I wrote recently as a little bit of a joke for a friend to explain and demonstrate how the famous Unix pipes work.

Prerequisite knowledge: A little command line fu, a little Ruby knowledge and an understanding of the terms STDIN and STDOUT.

# Pipes

So, first of all: pipes. What are they? What can you do with them? Pipes are, in a word, awesome. They allow you to use the output of one program as the input to another. For example:

$ history | grep ssh

This will list all of your previous ssh commands. The history command shows a list of your command history (this command may not exist on your system depending on what you're working with, have a Google around if your shell complains about the history command not existing). grep is the quintessential searching tool. It will parse through text and return only the lines that contain the given search term. It has many options, many hidden treasures and if you get the time, I strongly recommend learning what you can do with grep. It is very powerful.

What the pipe is doing, basically, is linking the STDOUT from history to the STDIN of grep. grep is reading history's STDOUT as if it were its own STDIN. That's all that pipes do! But through this simple, genius idea there are a phenomenal amount of possibilities. There are some great examples of the possibilities in Gary Bernhardt's fantastic talk: The Unix Chainsaw

# Command line input

Writing Ruby programs that accept command line input is very simple. I'm going to use a program I wrote as a joke called "Gommoize" as an example. All Gommoize does is replace all vowels in a given input with the letter "O". The description of its origin can be found on its GitHub page.

Here's a simple first iteration of the program:

#!/usr/bin/env ruby

input = ARGV.shift
puts input.gsub(/[aeiou]/, 'o').gsub(/[AEIOU]/, 'O')

The first line is what's called a "shebang". It tells the shell what program to use to run the script. This is cool because it lets us run the script just by its name instead of needing to specify what program to run it with. For example, we could just do this:

$ chmod +x script
$ ./script

Instead of having to do this:

$ ruby script.rb

All because of the shebang.

The input = ARGV.shift line simply takes the first command line argument from the list of command line arguments. ARGV is an array of arguments that were supplied to our program and shift is a method of the Array class in Ruby that removes the first element of an array and returns it.

The last line in the script outputs the input but with all vowels switched with the letter "O". I've written a naive implementation of case sensitivity by using two global substitutions (the gsub method). There's probably a more elegant way of doing this but I feel this method will suffice for now.

That's it. We can now use this program like so:

$ ./gommoize "Gemma"

And the output would be "Gommo" (assuming our file is called "gommoize" and has been made executable via chmod). However, we can't pipe to the program at the moment. If we try this:

$ touch somefile.txt
$ echo "Gemma" >> somefile.txt
$ cat somefile.txt | ./gommoize

It will result in an error about a method not being found on nil. Bummer. That means the program tried to get an argument from the command line and didn't find one. Fortunately, this is a really easy problem to solve.

# Modifying our program for piping

The modification we need to apply to allow our program to be piped to is both simple and elegant:

#!/usr/bin/env ruby

input = ARGV.shift || $stdin.read
puts input.gsub(/[aeiou]/, 'o').gsub(/[AEIOU]/, 'O')

Only the line that reads the input needs to change. We're using a little trick that Ruby lets us do with the logical or || operator. If the first expression in the or statement ARGV.shift || $stdin.read evaluates to true (which is anything apart from nil or false in Ruby) then the second part is not evaluated. For example:

true || puts("Hello")

The puts("Hello") will never be printed out in the code above, because the first part of the or statement evaluates to true. So if our program gets no command line arguments passed to it, it will look to STDIN for its input. If we've piped to the program, STDIN will contain the output from the program that has been piped to our program. So now:

$ cat somefile.txt | ./gommoize

Should work :)

# Summing up

Hopefully this post has gone some way to demystifying pipes for you. It's easy to write your command line programs in a way that allows them to take either direct input from the command line as arguments or input from pipes and doing so will make your program more usable and a better member of the Unix ecosystem.

Also, don't be fooled. Reading STDIN for pipe input will work regardless of the programming language that you use, not just Ruby. Give it a try!

If you have any questions, feel free to email me! samwho@lbak.co.uk