In this post I show how to create a Docker image for linear programming in R. If you just want to use the image, visit the repository on Docker Hub.
Docker provides a means of wrapping up an application in a complete environment that contains everything it needs to run: code, dependencies, operating system, and files. This guarantees that it will always run the same, regardless of what platform it’s run on. In this post, I’ll give an introduction to building your own Docker images.
As an example, I’ll build a Docker image for solving linear programming optimization problems with R. Linear programming solvers can be challenging to install, often requiring compiling source code and extensive configuration, and the steps required are totally platform dependent. I’m particularly interested in using linear programming to build tools for systematic conservation prioritization, and these installation and configuration challenges would likely be a huge hurdle for many conservation planners. Docker provides a solution to this issue by providing a means of wrapping all the R packages for linear programming into a nice neat, platform-independent package.
If you’re on Mac or Windows, this will install the Docker Quickstart Terminal, which is a Linux virtual machine that Docker runs in. Open this application or, if you’re on Linux, just open a normal Terminal. Then run the following command to test your installation. Note: on Linux you may need to prefix this and every other command in this tutorial with
docker run hello-world
If this runs without errors you’ve installed everything correctly and are good to go.
Images and Containers
A Docker image is a file encapsulating an application and all its dependencies in a single complete environment. An image is loaded into a container, which is essentially a lightweight Linux operating system that can run Docker images. These containers can run on any host machine with Docker installed. So, whether you’re on Windows, Mac, or Linux, and regardless of what software and OS versions you have on your machine, you can run an application built as a Docker image. No more dependency issues!
Docker images are built up in layers, which avoids having to start from nothing each time you want to create a new application. Docker Hub is a repository for Docker images and these images can be used directly or to build your own image on top of. It serves a similar purpose for Docker images that GitHub serves for code.
The following is a list of the most common commands you’re likely to need when working with Docker.
docker run [-d] [-p a:b] <image name>: loads a docker image into a new container. The optional
-dflag runs the container in the background. And
-p a:bmaps port
aof the container to port
bon your host machine. Without this mapping a container may expose a port (e.g. for HTTP access at port 80) into the Docker daemon, but the Docker daemon won’t make that port accessible to the outside world. Hence you must explicitly map Docker daemon ports to host machine ports to make them accessible.
docker ps [-a]: list all containers, including those that aren’t currently running if the
-aflag is included.
docker images: list all images currently on your machine.
docker stop <container name|id>: stop a running container.
docker start <container name|id>: start a container that already exists but isn’t running.
docker kill <container name|id>: kill a running container.
docker rm <container name|id>: remove a stopped container.
docker rmi <image name|id>: remove an image and delete it from the file system.
Most of these commands require referencing images or containers by their unique name or id, which can be found by listing images or containers with
docker images or
docker ps, respectively.
Our eventual goal is to build an image for solving linear programming problems in R, hence we will need R and RStudio in our image at the very least. Fortunately, the Rocker project provides a variety of R base images, including the hadleyverse image, which contains R, RStudio Server, and the Hadleyverse suite of packages. Download this image from Docker Hub with:
docker pull rocker/hadleyverse
The image file is ~3GB, so it may take some time to download. Running this image will load RStudio Server, which is a version of RStudio accessible via the browser. Before we proceed, we need to determine the IP address for this RStudio Server instance. On Linux this will just be
localhost, but on Mac or Windows this will be the IP address of the Linux virtual machine within which Docker runs. To find this IP address run:
docker-machine ip default
In my case, this returns
192.168.99.100. Now, load the Rocker image with:
docker run -dp 8787:8787 rocker/hadleyverse
run command loads a docker image into a new container. The
-d flag keeps the container running in the background after
docker run finishes. And,
-p 8787:8787 makes port
8787 in the container application (i.e. the port RStudio runs on), accessible from port
8787 on your local machine.
Now, point your browser to
192.168.99.100:8787 (or whichever IP you found above) to access RStudio. Use
rstudio for both the username and the password.
Finally, you may need to access the command line on the container, for example to install certain software. To do this run:
docker exec -it <container id> bash
You now have access to a normal Debian bash prompt and can, for example, install software with
Building an Image
Now that we have a good base image to work with, it’s time to build upon it. There are essentially two ways to do this. First, you can make changes to a running container directly, for example installing software or adding file, then use
docker commit to create a new image based on this container. However, the preferred method, and the one I’ll describe here is to create a
Dockerfile is a text file with a series of instructions that build up the layers of an image. The beauty of using this method for creating new images is that it lays out exactly how the image was created in a clear and reproducible manner. Start by creating a new directory and a plain text file within that directory with the name
Dockerfile. In the first line of the
Dockerfile specify that we want to start with the Hadleyverse image with:
Subsequent lines should conform to the following format:
# Comment INSTRUCTION arguments
INSTRUCTION corresponds to one of several docker specific instructions that how the image should be modified in the next layer. For a full list of possible instructions visit the
Dockerfile documentation. However, some of the more useful commands are:
FROM <image>: specify the image upon which to build your new image.
MAINTAINER <name>: specify the name of the
RUN <command>: run a shell command, for example
apt-getto install new software.
COPY <source> <destination>: copy files from your local filesystem, at location
<source>, to the filesystem of the Docker container, at
<destination>. Typically the file would be put in the same directory as the
ENV <name> <value>: set environmental variable
<value>. For example, to append to the
ENV PATH $PATH:/path/to/add/.
Build the image
To build a Docker image from a
Dockerfile, run the following command:
docker build -t <username>/<imagename> <path>
<username> is your Docker Hub username,
<imagename> is the name you want to give the image, and
<path> is the directory containing the
Dockerfile. So, for me running this command in the same directory as the
Dockerfile, I’d use:
docker build -t mstrimas/optimizr .
You should now see Docker building up all the layers of the image one-by-one. Once this is completed you should have a working Docker image, which you can create containers from.
Push to Docker Hub
Just like pushing code to GitHub, you can push a Docker image to Docker Hub. This is a great way to share an image with others! First you’ll need to get a Docker Hub account, then provide your login credentials with
Then, to push an image use
docker push <username>/<imagename>
Docker image for linear programming
As an example, I want to create an image that includes all the open source linear programming solvers, and their R package interfaces, listed in CRAN Task View for Optimization:
- lp_solve with R packages lpSolve and lpSolveAPI.
- COIN-OR Symphony with R packages Rsymphony, from CRAN, and lpsymphony, from Bioconductor.
- COIN-OR Clp with R package clpAPI.
- The GNU Linear Programming Kit with R packages glpkAPI and Rglpk.
- Gurobi and it’s corresponding R package. This is the only commercial solver in the list and, if you intend on using it, you’ll need to provide a license file. Further details for how to do this are in the Docker Hub repository.
Dockerfile for the linear programming image I’ve created is:
FROM rocker/hadleyverse:latest MAINTAINER Matt Strimas-Mackey # Install linear programming solvers # GEOS and GDAL GIS libraries are required for some R packages RUN apt-get update \ && apt-get install -y apt-utils libgdal-dev libproj-dev libgeos-dev \ && apt-get install -y coinor-libsymphony-dev coinor-libcgl-dev libglpk-dev # Install R packages for LP solvers RUN install2.r --error \ lpSolve \ lpSolveAPI \ Rsymphony \ clpAPI \ Rglpk \ glpkAPI \ && installGithub.r Bioconductor-mirror/lpsymphony # Copy Gurobi files and install Gurobi and R package # Can't directly install because no direct link to install files COPY gurobi6.5.1_linux64.tar.gz / RUN tar -xzf /gurobi6.5.1_linux64.tar.gz -C /opt/ \ && install2.r --error \ /opt/gurobi651/linux64/R/gurobi_6.5-1_R_x86_64-unknown-linux-gnu.tar.gz \ && ln -s /opt/gurobi651/linux64/lib/libgurobi65.so /lib/libgurobi65.so ENV PATH $PATH:/opt/gurobi651/linux64/bin"
Let’s break this down. First, note that the back slash (
\) is used to split commands up over multiple lines. The double ampersand (
&&) is used within
RUN statements to issue multiple shell commands within the same statement. The first two lines just specify that this image will be based on the Rocker Hadleyverse image and that I’m the maintainer.
RUN command uses
apt-get to install the libraries for three open source linear programming solvers: Symphony, Cgl, and GLPK. In addition, some GIS libraries are installed, which are required for some of the R packages.
RUN command installs all the R packages to interact with the open source solvers. It uses littler (a command line interface to R) to do this.
The next two commands install the commercial optimization software Gurobi. There is no Debian package for this, nor is there a direct link to the install files on the Gurobi website. So I’ve downloaded the install file from Gurobi and placed it in the same directory as the
Dockerfile, then used the
COPY command to copy the install file to the Docker image. Finally, in the last
RUN command, I’ve uncompressed the Gurobi install files to the appropriate directory, then installed the corresponding R package, which comes with Gurobi. I create a symbolic link (
ln -s) to allow R to access the Gurobi shared object file
libgurobi65.so when loading the
In the final line, I add Gurobi to the
PATH environmental variable so the binaries are accessible. Note that to actually use Gurobi, you’ll need a license, consult the instructions in the Docker Hub repository for further details.
Using the image
I’ve pushed the resulting image to Docker Hub so you can create containers based on it using:
docker run -dp 8787:8787 mstrimas/optimizr