Git-Style Automatic Paging in Ruby

Posted March 18, 2008

I was using Chris Wanstrath’s cheat the other day, seeing if there were any cool git features I was missing out on (did you know you can color the output?). If you haven’t come across it yet, cheat is a nice little Ruby utility that displays “cheat sheets”— user-generated pages of text that serve as miniature reference manuals.

Unfortunately, some of these cheat sheets can get pretty long. The git one is 228 lines. Some of the text went off the top of my terminal. I sighed and typed in cheat git | less, thinking once again how nice it would be if more programs followed git’s example and automatically paged their output.

Although git’s not usually held up as a paragon of usability1, there are a few places where it shines. My favorite is how it’ll run less on its output when the output is too big to fit on my terminal screen. Then I can easily scroll and search through the text.

Since I had a bit of time with nothing urgent to do, I decided to take a crack at making cheat page like git. Two and a half hours later, after digging through git’s source, getting help from the good folks in #git and #ruby-lang on Freenode, and receiving tons of bug fixes from Kevin Ballard, I got it to work.

I was actually surprised at how short the code was. It’s not simple, but it is small. The version below is nicer than the bare minimum, but I think you could make do with about eight lines of code.

def run_pager
  return if PLATFORM =~ /win32/
  return unless STDOUT.tty?

  read, write = IO.pipe

  unless Kernel.fork # Child process
    STDOUT.reopen(write)
    STDERR.reopen(write) if STDERR.tty?
    read.close
    write.close
    return
  end

  # Parent process, become pager
  STDIN.reopen(read)
  read.close
  write.close

  ENV['LESS'] = 'FSRX' # Don't page if the input is short enough

  Kernel.select [STDIN] # Wait until we have input before we start the pager
  pager = ENV['PAGER'] || 'less'
  exec pager rescue exec "/bin/sh", "-c", pager
end

Upon getting this to work, I promptly forked cheat and added it there. The really cool thing about this method is that it can be dropped into any Ruby app. Just call run_pager, and everything you print to standard output will be paginated.

How this works is a bit tricky. It does some dark magic with Unix processes. At a high level, it:

  1. Creates a child process that’s a copy of the current process.
  2. Hooks the child’s standard output to the original process’s standard input.
  3. Replaces the original process with the pager program.

Then the child process continues on, unaware that anything has changed. The only difference between it and the original program is that its output is being sent to the pager, which is now the program the user is directly interacting with. This clever trick is pretty much the same thing Git does.

The Code

Now, let’s see how this works.

return if PLATFORM =~ /win32/
return unless STDOUT.tty?

The stream- and process-munging we do is only really possible on Unix, so we give up if we’re running under Windows. We also don’t want to bother invoking the pager if we aren’t actually talking to a terminal (a.k.a. tty).

read, write = IO.pipe

This sets up an input-output pipe that we can use to send our output to the pager. read and write are the output and input ends, respectively.

unless Kernel.fork

This is where it starts to get really tricky. Kernel.fork splits off the child process. This child process is almost identical to the original parent process. It even begins at the same spot: right after fork returns. The only difference is the return value of fork. In the child process, it’s nil; in the parent process, it’s not.

The upshot of this is that the stuff in the unless statement is only run in the child process. Since we return in there, the stuff afterwards is only run in the parent process.

STDOUT.reopen(write)

This hooks up the child’s output to the input end of our pipe. Because our pipe is also hooked up to the parent process, this means that now anything printed by the child is read in by the parent.

STDERR.reopen(write) if STDERR.tty?

This hooks up the child’s error output to the input end of our pipe as well, as long as the error is actually going to the terminal. If our child process has a problem, we want to tell the user about it.

read.close
write.close

Now we close up the pipe to the parent process. This seems a little strange at first (this is the part that took me the longest to figure out) - don’t we still want to send text through the pipe?

The way to think about this is that we’ve just used the pipe we created to tell the parent and child processes, “Here, talk to each other.” They aren’t using the pipe to do the talking; it just shows them how to talk. So once we’ve hooked up the child’s output to the parent’s input, we can get rid of the pipe that we used to do it.

return

Remember that the child process is identical to the parent. Now that we’re done with run_pager, it’ll just continue on its merry way, as if the method hadn’t done anything. In cheat’s case, this means printing out a cheat sheet. Then whatever it prints will be sent back to the parent.

STDIN.reopen(read)

We’re back in the parent process. First we hook up the output end of the pipe to our process’s input. Remember that the current process will eventually become the pager, so this is how it’ll get input from the child process.

read.close
write.close

Now we close up the pipe in the parent process, as well. Although we’re referring to the same pipe in both processes, when we forked the ends of the pipe— our read and write variables—got copied. We closed the child’s ends above; now we need to close the parent’s.

ENV['LESS'] = 'FSRX'

This is pretty much a magic incantation. It tells less, the most common pager, not to bother paging if the output will fit on the terminal screen.

Kernel.select [STDIN]

This oddly-named function tells our current process, the parent, not to continue doing anything until there’s input ready to be read. This isn’t strictly necessary, but according to the git source code, it works around a bug in less.

pager = ENV['PAGER'] || 'less'

Here we choose what pager program to use. If the user has a preferred pager defined, we’ll use that; otherwise, we’ll use less, the typical pager.

exec pager

We finally replace the parent process with the pager. exec just means “replace the current process with this program.” It makes the parent process get rid of all its state and literally become the new program. Fortunately, our input stream stays hooked up to the child process, so anything it prints will be read in by our new pager and paged.

rescue exec "/bin/sh", "-c", pager

Unfortunately, the previous call to exec sometimes fails for mysterious reasons on OS X. We want to catch that exception and try running the pager through sh, the standard shell. This usually helps.

Update: Added in a bunch of bug fixes found by Kevin Ballard. Thanks, Kevin!

1 Not that it’s unusable; it just lacks polish in some ways and takes longer to get accustomed to than, say, darcs.

cyclotron said March 18, 2008:

I think it’s misleading to say that when you

write.close

you are closing the pipe because you “don’t need it anymore”. What you don’t need is the write end in the parent process, because it is going to do the reading (you could also close the read end in the child process).

Nice article anyway, thanks!

Nathan said March 18, 2008:

Although I don’t fully understand all the pipe munging that’s going on here, I don’t think that’s quite accurate, cyclotron. I think the pipe itself is managed by the OS, and thus is independent of either process. Closing it in the child process actually has the same effect.

Florian said March 18, 2008:

Nathan seems to be right, there’s some deeper process to it. Actually closing both ends in parent and child doesn’t affect the pager at all:

write.close
read.close

It keeps on working..

Don said March 19, 2008:

Nice functionality, and that’s a fancy trick that may come in handy later. Also, I never knew about Kernel#exec and it’s going to solve a little problem for me. Thanks for the write-up!

vic said April 16, 2008:

Nathan, thanks for such a cool snippet. I’ve included it on

http://github.com/vic/buildr/tree/master/doc/scripts/buildr-git.rb

We provide some git-workflow paged tips to buildr developers when they get a buildr fork.

You can try it at http://balloon.hobix.com/buildr-git

PotatoSalad said April 26, 2008:

Thanks for the tip, I wrote about using this with ack and rak over at http://potatosaladx.blogspot.com/2008/04/automatic-paging-with-ack-and-rak-git.html

The results are awesome.

Bryan said July 22, 2009:

Hey, thanks for some cool code. :)

Any ideas how to make this work in Readline? I have a command-line application using Readline and I’d like to use something like this to paginate long output from commands. I tried using it as is but as one would expect it continues to do paging after the output text, negatively affecting the command-line of the application. I tried messing around with the code to redirect STDOUT back to the original STDOUT and kill the extra process, but that didn’t work either.

Any suggestions?!

— Thanks! Bryan

Nathan said July 22, 2009:

The problem is that the pager process replaces the top-level process. Not only that, but once the pager’s the top process, control never passes back to the process that spawned it. The only way you might possibly be able to get this to work is wrapping the pager in another program that re-raises the Ruby process. That would be very complicated, and I’m not really sure that it would work at all. Good luck, though!

Make your comments snazzy with Textile!