...making Linux just a little more fun!

<-- prev | next -->

Speed Compiling with Distcc

By V. L. Simpson

Introduction

When was the last time you compiled a linux kernel?

Yesterday? Last week? Five minutes ago?

On a 486?

I don't remember either.

Remember how long it took?

I remember that. Too long. Too damn long.

Now why would I want to compile the latest kernel on a 486?

Ordinarily, I wouldn't. But with the tragic death of my main computer I was forced to move my computing needs to an old 486 someone had given me. I had been using this one as a NTP time server for my home network. Suffice it to say, what was on the NTP server wasn't the latest and the greatest. The other computer on the network wasn't much of an improvement over the 486. (A foundling laptop with a miniscule hard drive.)

Well I was screwed because I needed my Emacs. So I pulled the drive from the dead computer and hooked it up to the 486.

It worked flawlessly, which is a testament to the Linux kernel and GNU Software quality and efficiency. I didn't really know what to expect regarding response and the general feel of the environment, but in console mode I noticed no real difference. The X window system even worked fine, albeit slow on the start-up. Now, there was no way the GIMP or Mozilla was going to run with any kind of usability, but I could use Emacs and lynx or dillo without too many problems.

But did I really want to sit through something that was going to take a few hours at least? Not really. I guess I could have washed dishes, mowed the lawn or watched TV but, hey, TV sucks. I'd rather watch a kernel compile.

Enter the award-winning distcc, a distributed compiler front end for gcc, written by Martin Poole.

Distcc

Distcc consists of two binary programs: distccd and distcc.

distccd runs as a daemon and handles network traffic. By passing pre-processed source code files across a network to other computers with an installed compiler, you effectively have two or more compilations going at once.

distcc is a front end to gcc and g++. You specify distcc as the compiler in place of gcc and it transparently handles all the magic that is going on. distcc can be used for all compile jobs whether you need the networking capabilities or not, i.e., you can compile one file or thousands, it's up to you.

The easiest way to demonstrate distcc's abilities is to use it to compile itself as an example of distributed compilation.

I'll show how to compile distcc and give my time for the initial compilation, then recompile using distcc in place of gcc.

Minimum Requirements:

Two compatible networked computers designated as a server and a client.

The server:
This machine should have a complete C/C++ development environment installed. You'll also need any other ancillary development packages (readline, ncurses, gtk+, whatever) that your particular bit of software needs for compiling.

distcc itself requires nothing special.

Note: There are a couple of other programs produced by distcc: distccmon-text and distccmon-gnome.

These are monitor programs to show you what's happening during a distcc compile session. The *-gnome version needs GTK at a minimum but if you don't have it installed, don't worry.

The client:
This machine only needs the compilers installed. You do not need libc, ncurses, kernel headers or the infinite array of libraries things seem to need nowadays to compile.

distcc source code available here:
distcc source code.

Building distcc, the first run:

Standard Operating Procedure:

$ tar -jxvf distcc*
(use j flag not z with tar, distcc is bzip2ed).
$ cd distcc*
$ ./configure
$ time make 
(don't forget the time command).

distcc is small and doesn't require much time to build. Here's the time from that aforementioned 486DX:

     Without distcc 
     real    13m45.185s
     user    12m4.320s
     sys     1m7.120s

It took longer to run the configure script than it did to compile.

Install the binaries:

make install
distcc and distccd should be in /usr/local/bin

For the client machine: Transfer a copy of distccd to /usr/local/bin or your binary repository of choice.

Now to use distcc to recompile distcc.

Make sure you are in the distcc source directory

$ make clean

This will clean out all the crud leftover from the first compile. You won't need to run configure again.

We need to spend a couple of minutes setting up for distcc.

1. Run the distccd daemon on both computers.

$ distccd --daemon

It'll bitch about no distcc user. Ignore the warning.
You can check to see that it's actually running via "$ ps -ax | grep distccd" to assuage your concerns.

2. Set the DISTCC_HOSTS environment variable:

You can use IP addresses or if your /etc/hosts file is set-up properly the hostnames of the computers.

I have two computers at the moment:
mothra on 192.168.1.2
ghidra on 192.168.1.3 (This one's a rescued 120MHz laptop. It would be my main computer but it doesn't have the drive space I need.)

Set the variable (sh syntax, adjust for your shell):

$ export DISTCC_HOSTS="mothra ghidra"

or

$ export DISTCC_HOSTS="192.168.1.2 192.168.1.3"

Either way it doesn't matter.
NOTE: Names or addresses are space delimited.

Recompile the code:

$ time make -j4 CC=distcc

Explaining the command line:

time: should be obvious.

make -j4:
the -j flag is make's "multiple command" flag. Read the info manual for more specific information. Trust me, just use -j4 for now.

CC=distcc:
Override configured compiler directive. This way you can do a regular configure with gcc defined in the makefile. distcc is nice about not forcing complicated procedures to use it.

distcc compiled with distcc
     real    6m38.089s
     user    2m42.200s
     sys     0m29.520s

Cut the time in half! You can't complain about that.

The following shows times for some of my favorite programs compiled with and without distcc, utilizing the two node setup describe above.

Remember, I'm compiling with a 486 without distcc.

                     		  
     Dillo Web Browser
Without Distcc With Distcc real 52m14.120s real 22m31.975s user 47m24.820s user 5m12.630s sys 3m29.220s sys 1m23.930s

     The BASH Shell
Without Distcc With Distcc real 75m25.306s real 18m22.613s user 69m2.110s user 3m27.950s sys 5m8.030s sys 0m58.980s

This was the most amazing for me. This is 1/4 of the non-distcc compilation time!

Conclusion:

distcc is flexible. You can use it as a one-shot compiler or set-up your build environment to use it for all compiles.

You can define the available compiler hosts in a $HOME/.distcc/hosts file.

You can force distcc to prefer one machine over another by listing the order in the .distcc/hosts or DISTCC_HOSTS environment variable.

For example, rather than having my poor little 486 desktop grind down to an almost unusable state as gcc takes over the system, I set DISTCC_HOSTS='ghidra' and all the compilation is shipped to the faster laptop.

More documentation is at the distcc web site.

Oh, yeah - that kernel compile. How long did it take? I don't know. I said screw it, I'll just stick with the stock kernel from my Slackware install. Even with distcc it would take forever. Maybe I'll bite the bullet at some point - but I think I'll just save up for that dual processor Athlon system I've been coveting.

 


[BIO] V. L. Simpson, after being unceremoniously (and rather rudely) informed that GNU Emacs is not an operating system, has been re-adjusted to a happy, regular life after many protracted sessions with 'the doctor'.

A webpage is available here.

Copyright © 2004, V. L. Simpson. Released under the Open Publication license unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 107 of Linux Gazette, October 2004

<-- prev | next -->
Tux