@reid

Debugging Travis builds

Important Update December 19, 2013 — Travis infrastructure has changed since this post was written. This post remains unchanged for posterity; however, be aware that this setup no longer represents the current state of Travis.

Yeti uses Travis CI to the max by spawning lots of PhantomJS processes that are tested asynchronously. When tests start failing in Travis, but not anywhere else, debugging can be infuriating.

You can setup a Travis box locally to debug the problem. Trung Lê wrote about debugging with a local Travis VM in August 2012 but that post is outdated and didn’t work. You can expect sweeping changes to the Travis build infrastructure in the near future, so this blog post will also be outdated soon.

For now, we have a bug that needs fixing. Let’s setup a local Travis VM for Node.js debugging.

My assumptions about your computer

  1. Plenty of disk space
  2. OS X (but this should work with other systems, too)

Install Vagrant & VirtualBox

Visit the Vagrant website and the VirtualBox website for installers.

Download the jvm box

The jvm box is where Node.js builds run. Download travis-jvm.box. (3.2 GB)

Prepare the jvm box

Import the VM box:

vagrant box add travis-jvm travis-jvm.box

Verify it’s there:

vagrant box list

Start the jvm box

Create a working directory to hold the Vagrantfile for the travis-jvm box.

mkdir boxes
cd boxes

Create the Vagrantfile and get the box ready to use:

vagrant init travis-jvm

The username for the VM is travis, so add this line to your Vagrantfile’s do-block:

config.ssh.username = "travis"

Start the VM:

vagrant up

Things should work nicely. (If not, read on.)

You can access the VM with this command:

vagrant ssh

You can now follow the steps of your Travis build and debug your problem.

If you have problems running vagrant ssh, you can try using a username and password for logging in instead with this command:

vagrant ssh -p -- -l travis

The password is travis.

More fancy debugging

The VM uses a NAT-only network. You may not be able to access it from your computer. For the problems I debug, I need to be able to access the the Node web server inside the VM from a browser on OS X.

This means you should setup a host-only network interface, in addition to the NAT-only interface.

Add this line to your Vagrantfile’s do-block:

config.vm.network :hostonly, "192.168.89.10"

This will setup a host-only network and give the IP address 192.168.89.10 to the VM. (If you are on a network that uses 192.168.89.xxx already, change the IP subnet to something else, because it cannot conflict with another subnet your host computer is using.)

Reload the VM:

vagrant reload

You should be able to access the VM from the host at 192.168.89.10.

Sometimes running vagrant up with a host-only network fails because of a bug in net-ssh and Vagrant with the message “Waiting for VM to boot. This can take a few minutes.” This bug is quite annoying, but a detailed workaround procedure that clears DHCP leases is available that’ll allow you to start the VM.

Teardown

You can pause the VM with vagrant suspend and bring it back to life with vagrant resume. If you’d like to power off the VM, do so over SSH. If you can’t, vagrant halt will power it off.

If you want to start from a clean slate, you can destroy it with vagrant destroy. Re-creating a fresh VM is as easy as vagrant up.

You can learn more about these commands by reading the Vagrant teardown documentation.

Do not debug on travis-ci.org

When you have a Travis-only problem, try creating a local environment as described above first. Don’t try to use the hosted Travis service to debug complicated failures. I have made this mistake many times:

  1. Noticed my build is failing in Travis.
  2. Make sure my own environments and clean installs do not fail. (The problem is sometimes here!)
  3. Create a branch named fix-travis-foo or something.
  4. Start making commits and pushes to this temporary branch attempting to fix the problem.
  5. Wait for the build to finish. If the build failed, repeat step 4.

First of all, this takes a long time. Travis isn’t terribly slow for CI purposes, but when you’re trying to actively debug something, the last thing you need is having your app tests go from 15 seconds to 5 minutes.

Most of all, my problems are usually with PhantomJS. When I can fully control the VM, I can add remotely debug the headless browser on OS X. I can also capture packets creatively with tcpdump(8). Using those tools, I found some pretty heinous race conditions in the Yeti test suite that would have been nearly impossible to find by trial-and-error.