In this post I show how to create a Docker image for linear programming in R. If you just want to use the image, visit the repository on Docker Hub.

Docker provides a means of wrapping up an application in a complete environment that contains everything it needs to run: code, dependencies, operating system, and files. This guarantees that it will always run the same, regardless of what platform it’s run on. In this post, I’ll give an introduction to building your own Docker images.

As an example, I’ll build a Docker image for solving linear programming optimization problems with R. Linear programming solvers can be challenging to install, often requiring compiling source code and extensive configuration, and the steps required are totally platform dependent. I’m particularly interested in using linear programming to build tools for systematic conservation prioritization, and these installation and configuration challenges would likely be a huge hurdle for many conservation planners. Docker provides a solution to this issue by providing a means of wrapping all the R packages for linear programming into a nice neat, platform-independent package.

Thanks to Bill Mills for his awesome Docker introduction at the UBC Mozilla Study Group, which provided the inspiration for this post.

Docker Basics

Install Docker

Follow the instructions on the Docker website for Linux, Mac, or Windows.

If you’re on Mac or Windows, this will install the Docker Quickstart Terminal, which is a Linux virtual machine that Docker runs in. Open this application or, if you’re on Linux, just open a normal Terminal. Then run the following command to test your installation. Note: on Linux you may need to prefix this and every other command in this tutorial with sudo.

docker run hello-world

If this runs without errors you’ve installed everything correctly and are good to go.

Images and Containers

A Docker image is a file encapsulating an application and all its dependencies in a single complete environment. An image is loaded into a container, which is essentially a lightweight Linux operating system that can run Docker images. These containers can run on any host machine with Docker installed. So, whether you’re on Windows, Mac, or Linux, and regardless of what software and OS versions you have on your machine, you can run an application built as a Docker image. No more dependency issues!

Docker images are built up in layers, which avoids having to start from nothing each time you want to create a new application. Docker Hub is a repository for Docker images and these images can be used directly or to build your own image on top of. It serves a similar purpose for Docker images that GitHub serves for code.

Commands

The following is a list of the most common commands you’re likely to need when working with Docker.

  • docker run [-d] [-p a:b] <image name>: loads a docker image into a new container. The optional -d flag runs the container in the background. And -p a:b maps port a of the container to port b on your host machine. Without this mapping a container may expose a port (e.g. for HTTP access at port 80) into the Docker daemon, but the Docker daemon won’t make that port accessible to the outside world. Hence you must explicitly map Docker daemon ports to host machine ports to make them accessible.
  • docker ps [-a]: list all containers, including those that aren’t currently running if the -a flag is included.
  • docker images: list all images currently on your machine.
  • docker stop <container name|id>: stop a running container.
  • docker start <container name|id>: start a container that already exists but isn’t running.
  • docker kill <container name|id>: kill a running container.
  • docker rm <container name|id>: remove a stopped container.
  • docker rmi <image name|id>: remove an image and delete it from the file system.

Most of these commands require referencing images or containers by their unique name or id, which can be found by listing images or containers with docker images or docker ps, respectively.

Rocker

Our eventual goal is to build an image for solving linear programming problems in R, hence we will need R and RStudio in our image at the very least. Fortunately, the Rocker project provides a variety of R base images, including the hadleyverse image, which contains R, RStudio Server, and the Hadleyverse suite of packages. Download this image from Docker Hub with:

docker pull rocker/hadleyverse

The image file is ~3GB, so it may take some time to download. Running this image will load RStudio Server, which is a version of RStudio accessible via the browser. Before we proceed, we need to determine the IP address for this RStudio Server instance. On Linux this will just be localhost, but on Mac or Windows this will be the IP address of the Linux virtual machine within which Docker runs. To find this IP address run:

docker-machine ip default

In my case, this returns 192.168.99.100. Now, load the Rocker image with:

docker run -dp 8787:8787 rocker/hadleyverse

The run command loads a docker image into a new container. The -d flag keeps the container running in the background after docker run finishes. And, -p 8787:8787 makes port 8787 in the container application (i.e. the port RStudio runs on), accessible from port 8787 on your local machine.

Now, point your browser to 192.168.99.100:8787 (or whichever IP you found above) to access RStudio. Use rstudio for both the username and the password.

Finally, you may need to access the command line on the container, for example to install certain software. To do this run:

docker exec -it <container id> bash

You now have access to a normal Debian bash prompt and can, for example, install software with apt-get.

Building an Image

Now that we have a good base image to work with, it’s time to build upon it. There are essentially two ways to do this. First, you can make changes to a running container directly, for example installing software or adding file, then use docker commit to create a new image based on this container. However, the preferred method, and the one I’ll describe here is to create a Dockerfile.

Dockerfiles

A Dockerfile is a text file with a series of instructions that build up the layers of an image. The beauty of using this method for creating new images is that it lays out exactly how the image was created in a clear and reproducible manner. Start by creating a new directory and a plain text file within that directory with the name Dockerfile. In the first line of the Dockerfile specify that we want to start with the Hadleyverse image with:

FROM rocker/hadleyverse:latest

Subsequent lines should conform to the following format:

# Comment
INSTRUCTION arguments

Where INSTRUCTION corresponds to one of several docker specific instructions that how the image should be modified in the next layer. For a full list of possible instructions visit the Dockerfile documentation. However, some of the more useful commands are:

  • FROM <image>: specify the image upon which to build your new image.
  • MAINTAINER <name>: specify the name of the Dockerfile maintainer.
  • RUN <command>: run a shell command, for example apt-get to install new software.
  • COPY <source> <destination>: copy files from your local filesystem, at location <source>, to the filesystem of the Docker container, at <destination>. Typically the file would be put in the same directory as the Dockerfile.
  • ENV <name> <value>: set environmental variable <name> to <value>. For example, to append to the PATH use ENV PATH $PATH:/path/to/add/.

Build the image

To build a Docker image from a Dockerfile, run the following command:

docker build -t <username>/<imagename> <path>

where <username> is your Docker Hub username, <imagename> is the name you want to give the image, and <path> is the directory containing the Dockerfile. So, for me running this command in the same directory as the Dockerfile, I’d use:

docker build -t mstrimas/optimizr .

You should now see Docker building up all the layers of the image one-by-one. Once this is completed you should have a working Docker image, which you can create containers from.

Push to Docker Hub

Just like pushing code to GitHub, you can push a Docker image to Docker Hub. This is a great way to share an image with others! First you’ll need to get a Docker Hub account, then provide your login credentials with

docker login

Then, to push an image use

docker push <username>/<imagename>

Docker image for linear programming

Goal

As an example, I want to create an image that includes all the open source linear programming solvers, and their R package interfaces, listed in CRAN Task View for Optimization:

Dockerfile

The Dockerfile for the linear programming image I’ve created is:

FROM rocker/hadleyverse:latest
MAINTAINER Matt Strimas-Mackey

# Install linear programming solvers
# GEOS and GDAL GIS libraries are required for some R packages
RUN apt-get update \
  && apt-get install -y apt-utils libgdal-dev libproj-dev libgeos-dev \
  && apt-get install -y coinor-libsymphony-dev coinor-libcgl-dev libglpk-dev
  
# Install R packages for LP solvers
RUN install2.r --error \
  lpSolve \
  lpSolveAPI \
  Rsymphony \
  clpAPI \
  Rglpk \
  glpkAPI \
  && installGithub.r Bioconductor-mirror/lpsymphony
  
# Copy Gurobi files and install Gurobi and R package
# Can't directly install because no direct link to install files
COPY gurobi6.5.1_linux64.tar.gz /
RUN tar -xzf /gurobi6.5.1_linux64.tar.gz -C /opt/ \
  && install2.r --error \
    /opt/gurobi651/linux64/R/gurobi_6.5-1_R_x86_64-unknown-linux-gnu.tar.gz \
  && ln -s /opt/gurobi651/linux64/lib/libgurobi65.so /lib/libgurobi65.so
ENV PATH $PATH:/opt/gurobi651/linux64/bin"

Let’s break this down. First, note that the back slash (\) is used to split commands up over multiple lines. The double ampersand (&&) is used within RUN statements to issue multiple shell commands within the same statement. The first two lines just specify that this image will be based on the Rocker Hadleyverse image and that I’m the maintainer.

The first RUN command uses apt-get to install the libraries for three open source linear programming solvers: Symphony, Cgl, and GLPK. In addition, some GIS libraries are installed, which are required for some of the R packages.

The second RUN command installs all the R packages to interact with the open source solvers. It uses littler (a command line interface to R) to do this.

The next two commands install the commercial optimization software Gurobi. There is no Debian package for this, nor is there a direct link to the install files on the Gurobi website. So I’ve downloaded the install file from Gurobi and placed it in the same directory as the Dockerfile, then used the COPY command to copy the install file to the Docker image. Finally, in the last RUN command, I’ve uncompressed the Gurobi install files to the appropriate directory, then installed the corresponding R package, which comes with Gurobi. I create a symbolic link (ln -s) to allow R to access the Gurobi shared object file libgurobi65.so when loading the gurobi package.

In the final line, I add Gurobi to the PATH environmental variable so the binaries are accessible. Note that to actually use Gurobi, you’ll need a license, consult the instructions in the Docker Hub repository for further details.

Using the image

I’ve pushed the resulting image to Docker Hub so you can create containers based on it using:

docker run -dp 8787:8787 mstrimas/optimizr