Introduction to Containers
==========================
Containers are an important common currency for app development, web services,
scientific computing, and more. Containers allow you to package an application
along with all of its dependencies, isolate it from other applications and
services, and deploy it consistently and reproducibly and *platform-agnostically*.
In this introductory module, we will learn about containers and their uses, in
particular the containerization platform **Docker**.
What is a Container?
--------------------
* A container is a standard unit of software that packages up code and all its
dependencies so the application runs quickly and reliably from one computing
environment to another.
* Containers allow a developer to package up an application with all of the
parts it needs, such as libraries and other dependencies, and ship it all out
as one package.
* Multiple containers can run on the same machine and share the OS kernel with
other containers, each running as isolated processes in user space, hence are
*lightweight* and have *low overhead*.
* Containers ensure *portability* and *reproducibility* by isolating the
application from environment.
How is a Container Different from a VM?
---------------------------------------
Virtual machines enable application and resource isolation, run on top of a
hypervisor (high overhead). Multiple VMs can run on the same physical
infrastructure - from a few to dozens depending on resources. VMs take up more
disk space and have long start up times (~minutes).
.. figure:: images/arch_vm.png
:width: 400
:align: center
Applications isolated by VMs.
Containers enable application and resource isolation, run on top of the host
operating system. Many containers can run on the same physical infrastructure -
up to 1,000s depending on resources. Containers take up less disk space than VMs
and have very short start up times (~100s of ms).
.. figure:: images/arch_container.png
:width: 400
:align: center
Applications isolated by containers.
**Benefits of using containers include:**
* Platform independence: Build it once, run it anywhere
* Resource efficiency and density
* Enables reproducible science
* Effective isolation and resource sharing
Container Technologies
----------------------
Docker
~~~~~~
.. figure:: images/docker_logo.jpg
:height: 180
:width: 200
:align: right
:alt: Docker Logo
:figclass: left
Docker is a containerization platform that uses OS-level virtualization to
package software and dependencies in deliverable units called containers. It is
by far the most common containerization platform today, and most other container
platforms are compatible with Docker. (E.g. Apptainer, Singularity, and Shifter
are other containerization platforms you may find in HPC environments).
Apptainer
~~~~~~~~~
Apptainer (a recent offshoot from Singularity) is a container solution designed to
execute applications at bare-metal performance while being secure, portable, and
100% reproducible. Apptainer's permissions model makes it a popular choice for
shared HPC environments where Docker cannot be supported. It has its own syntax
for building containers but also support pulling and running Docker containers.
In general we use **Docker** to develop new containers and run them on our laptops.
We use **Apptainer** as a runtime on our HPC systems.
We can find existing containers that are compatible with both Docker and Apptainer
platforms (among others) at:
1. `Docker Hub `_
2. `NVIDIA GPU Cloud (NGC) `_
3. `Quay.io `_
4. `BioContainers `_
Some Quick Definitions
----------------------
Dockerfile
~~~~~~~~~~
A Dockerfile is a recipe for creating a Docker image. It is a human-readable,
plain text file that contains a sequential set of commands (*a recipe*) for
installing and configuring an application and all of its dependencies. The Docker
command line interface is used to interpret a Dockerfile and "build" an image
based on those instructions. Other container build environments, such as Apptainer,
have different syntax for container recipes, but the function is the same.
Image
~~~~~
An image is a read-only template that contains all the code, dependencies,
libraries, and supporting files that are required to launch a container. Docker
stores images as layers, and any changes made to an image are captured by adding
new layers. The "base image" is the bottom-most layer that does not depend on
any other layer and typically defines the operating system for the container.
Container
~~~~~~~~~
A container is an instance of an image that can execute a software enviornment.
Running a container requires a container runtime environment (e.g. Docker,
Apptainer) and an instruction set architecture (e.g. x86) compatible with the
image from which the container is instantiated.
Image Registry
~~~~~~~~~~~~~~
Docker images can be stored in online image registries, such as `Docker Hub
`_. (It is analogous to the way Git repositories are
stored on GitHub.) Image registries are an excellent way to publish research
software and to discover tools built by others. Image registries support the
notion of tags to identify specific versions of images.
Image Tags
~~~~~~~~~~
Docker supports image tags, similar to tags in a git repository. Tags identify
a specific version of an image. The full name of an image on Docker Hub is
comprised of components separated by slashes. The components include an
"owner" (which could be an individual or organization), the "name",
and the "tag". For example, an image with the full name
.. code-block:: text
tacc/gateways19:0.1
would reference the "gateways19" image owned by the "tacc" organization with a
tag of "0.1".
Summing Up
----------
If you are developing an app or web service, you will almost certainly want to
work with containers. First you must either *build* an image from a
Dockerfile, or *pull* an image from a public registry. Then, you can *run*
(or deploy) an instance of your image as a container.
.. figure:: images/docker_workflow.png
:width: 600
:align: center
Simple Docker workflow.