Pipes: How They Work and How to Use Them
In this post I'm going to use a very small program that I wrote recently as a little bit of a joke for a friend to explain and demonstrate how the famous Unix pipes work.
Prerequisite knowledge: A little command line fu, a little Ruby knowledge and an understanding of the terms STDIN and STDOUT.
# Pipes
So, first of all: pipes. What are they? What can you do with them? Pipes are, in a word, awesome. They allow you to use the output of one program as the input to another. For example:
$ history | grep ssh
This will list all of your previous ssh commands. The history
command shows a
list of your command history (this command may not exist on your system
depending on what you're working with, have a Google around if your shell
complains about the history
command not existing). grep
is the
quintessential searching tool. It will parse through text and return only the
lines that contain the given search term. It has many options, many hidden
treasures and if you get the time, I strongly recommend learning what you can
do with grep
. It is very powerful.
What the pipe is doing, basically, is linking the STDOUT from history
to the
STDIN of grep
. grep
is reading history
's STDOUT as if it were its own
STDIN. That's all that pipes do! But through this simple, genius idea there are
a phenomenal amount of possibilities. There are some great examples of the
possibilities in Gary Bernhardt's fantastic talk:
The Unix Chainsaw
# Command line input
Writing Ruby programs that accept command line input is very simple. I'm going to use a program I wrote as a joke called "Gommoize" as an example. All Gommoize does is replace all vowels in a given input with the letter "O". The description of its origin can be found on its GitHub page.
Here's a simple first iteration of the program:
#!/usr/bin/env ruby
input = ARGV.shift
puts input.gsub(/[aeiou]/, 'o').gsub(/[AEIOU]/, 'O')
The first line is what's called a "shebang". It tells the shell what program to use to run the script. This is cool because it lets us run the script just by its name instead of needing to specify what program to run it with. For example, we could just do this:
$ chmod +x script
$ ./script
Instead of having to do this:
$ ruby script.rb
All because of the shebang.
The input = ARGV.shift
line simply takes the first command line argument from
the list of command line arguments. ARGV
is an array of arguments that were
supplied to our program and shift
is a method of the Array
class in Ruby
that removes the first element of an array and returns it.
The last line in the script outputs the input but with all vowels switched with
the letter "O". I've written a naive implementation of case sensitivity by using
two global substitutions (the gsub
method). There's probably a more elegant
way of doing this but I feel this method will suffice for now.
That's it. We can now use this program like so:
$ ./gommoize "Gemma"
And the output would be "Gommo" (assuming our file is called "gommoize" and has
been made executable via chmod
). However, we can't pipe to the program at the
moment. If we try this:
$ touch somefile.txt
$ echo "Gemma" >> somefile.txt
$ cat somefile.txt | ./gommoize
It will result in an error about a method not being found on nil. Bummer. That means the program tried to get an argument from the command line and didn't find one. Fortunately, this is a really easy problem to solve.
# Modifying our program for piping
The modification we need to apply to allow our program to be piped to is both simple and elegant:
#!/usr/bin/env ruby
input = ARGV.shift || $stdin.read
puts input.gsub(/[aeiou]/, 'o').gsub(/[AEIOU]/, 'O')
Only the line that reads the input needs to change. We're using a little trick
that Ruby lets us do with the logical or ||
operator. If the first expression
in the or statement ARGV.shift || $stdin.read
evaluates to true (which is
anything apart from nil or false in Ruby) then the second part is not evaluated.
For example:
true || puts("Hello")
The puts("Hello")
will never be printed out in the code above, because the
first part of the or statement evaluates to true. So if our program gets no
command line arguments passed to it, it will look to STDIN for its input. If
we've piped to the program, STDIN will contain the output from the program that
has been piped to our program. So now:
$ cat somefile.txt | ./gommoize
Should work :)
# Summing up
Hopefully this post has gone some way to demystifying pipes for you. It's easy to write your command line programs in a way that allows them to take either direct input from the command line as arguments or input from pipes and doing so will make your program more usable and a better member of the Unix ecosystem.
Also, don't be fooled. Reading STDIN for pipe input will work regardless of the programming language that you use, not just Ruby. Give it a try!
If you have any questions, feel free to email me! samwho@lbak.co.uk