Working with Docker

In this section, we will learn to create and use our own docker container.

Note

Prerequisites: 1. Have Docker installed on your laptop 2. Create a Docker Hub Account

Building Images From a Dockerfile

  • Dockerfiles are a reproducible and well documented method of developing your own container.
  • They store the whole procedure of how an image is built.

Dockerfile Format

A Dockerfile has two type of fields:

  • Instructions followed by arguments and comments
  • A basic Dockerfile looks like
# Comment
INSTRUCTION arguments

General Steps

  1. Choose a base operating system
  2. Install dependencies and other useful packages
  3. Install scientific application
  4. Set any environment variable that might be useful

Install a tool in a Docker container

Let’s build a container that will run fastqc a popular bioinformatics tool for checking the quality of DNA sequencing reads.

To make it easier, follow along with the sample Dockerfile.

$ cat Dockerfile
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Choose a base operating system
FROM ubuntu:18.04

# Update and install necessary packages
RUN apt-get update && apt-get upgrade -y \
  && apt-get install -y wget unzip default-jdk libfindbin-libs-perl

# Install the application
RUN wget https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.8.zip \
    && unzip fastqc_v0.11.8.zip \
    && rm fastqc_v0.11.8.zip \
    && chmod 755 /FastQC/fastqc

# Use environment variable to add executable to PATH
ENV PATH "/FastQC:$PATH"

From your current working directory, preferably a clean one, copy the contents of this file into a new file called Dockerfile and save it.

Build

$ docker build -t username/fastqc:0.11.8 .
Sending build context to Docker daemon  251.4MB
Step 1/4 : FROM ubuntu:18.04
---> 775349758637
. . .
Successfully built b5d705fbdfa1
Successfully tagged reshg/fastqc:0.11.8

Check images

$ docker images
REPOSITORY                     TAG                 IMAGE ID            CREATED             SIZE
reshg/fastqc                   0.11.8              b5d705fbdfa1        3 hours ago         708MB

Run

We use the docker run command to run containers from an image. We pass a command to run in the container. Similar to running other programs on Unix systems, we can run containers in the foreground (attached) or in the background.

$ docker run username/fastqc:0.11.8 which fastqc
/FastQC/fastqc

Unpacking the ‘docker run’ command

docker run Run something
–rm Remove the container when the process completes
username/fastqc:0.11.8 The name of the container
which fastqc The command to run

Push Image to Docker hub

$ docker push username/fastqc:0.11.8

Alternatively, you could also do this interactively

Note

Preferred way to build a docker image is by using Dockerfile. For the purpose of testing, working inside the container is sometimes helpful.

Open a base Docker Image

$ docker run --rm -it ubuntu /bin/bash

Unpacking the interactive ‘docker run’ command

docker run Run something
–rm -it Remove the container when the process completes and connect your terminal to the container runtime
ubuntu The name of the container
/bin/bash The type of shell to start

Install your tool in the image

root@ded8d40f1a1e:/#
# install dependencies
$ apt-get update && apt-get upgrade -y
$ apt-get install -y wget unzip default-jdk libfindbin-libs-perl

# install FastQC
$ wget https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.8.zip
$ unzip fastqc_v0.11.8.zip
$ rm fastqc_v0.11.8.zip

# make fastqc executable
$ chmod 755 /FastQC/fastqc

# add fastqc to the system path by linking to /bin
$ ln -s /FastQC/fastqc /bin
$ exit

Commit your image

$ docker ps -a
CONTAINER ID        IMAGE                          COMMAND                  CREATED             STATUS                      PORTS                    NAMES
9f0d7afff313        ubuntu                         "/bin/bash"              9 minutes ago       Exited (0) 18 seconds ago                            affectionate_einstein

# Grab the CONTAINER ID of the ubuntu image created just few minutes ago.
$ docker commit CONTAINER ID username/fastqc:0.11.8
sha256:738f35b39c5711f722cc6d9b550215454f2a7ea765c73667355d383a8a9285bf

$ docker images
REPOSITORY                     TAG                 IMAGE ID            CREATED             SIZE
reshg/fastqc                  0.11.8              738f35b39c57        12 seconds ago      718MB

Push your image to Docker Hub

$ docker push username/fastqc:0.11.8
The push refers to repository [docker.io/reshg/fastqc]
6750c6c8d397: Pushing [=========>                                         ]  124.3MB/654.2MB

Running a Container in Daemon mode

We can also run a container in the background. We do so using the -d flag:

$ docker run -d ubuntu sleep infinity
f406f6b0c34d4bba552a7106e951a5d667dcbfddcb429e2d42b0ac7a10a919fc

$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
f406f6b0c34d        ubuntu              "sleep infinity"    6 seconds ago       Up 5 seconds                            romantic_wilson

$ docker ps -a
CONTAINER ID        IMAGE                          COMMAND             CREATED             STATUS                    PORTS               NAMES
f406f6b0c34d        ubuntu                         "sleep infinity"    15 seconds ago      Up 14 seconds                                 romantic_wilson
b7c50065ea75        ubuntu                         "/bin/bash"         21 hours ago        Up 21 hours                                   charming_robinson
a197e85bee14        reshg/fastqc:latest            "/bin/bash"         21 hours ago        Exited (0) 21 hours ago                       reverent_williamson
4eb4cf433d32        reshg/fastqc:latest            "which fastqc"      21 hours ago        Exited (1) 21 hours ago                       stoic_dhawan
1eb1de6ac64c        reshg/fastqc                   "which fastqc"      21 hours ago        Exited (1) 21 hours ago                       upbeat_mcnulty

Note: The docker ps command only shows you running containers - it does not show you containers that have exited. In order to see all containers on the system use docker ps -a.

Summary

A Dockerfile allows you to transparently document all the dependancies and steps needed to describe a software tool. You can then run this tool as a Docker container for full reproducibility.