Search Docs by Keyword

Table of Contents

Containers

Introduction

Containers have become the industry standard method for managing complex software environments, especially ones with bespoke configuration options. In brief, a container is a self-contained environment and software stack that runs on the host operating system (OS). They can allow users to use a variety of base operating systems (e.g., Ubuntu) and their software packages aside from the host OS the cluster nodes run (Rocky Linux). One can even impersonate root inside the container to allow for highly customized builds and a high level of control of the environment.

While containers allow for the management of sophisticated software stacks, they are not a panacea. As light as containers are, they still create a performance penalty for use, as the more layers you put between the code and the hardware, the more inefficiencies pile up. In addition, host filesystem access and various other permission issues can be tricky. Other incompatibilities can arise between the OS of the container and the OS of the compute node.

Still, with these provisos in mind, containers are an excellent tool for software management. Containers exist for many software packages, making software installation faster and more trouble-free. Containers also make it easy to record and share the exact stack of software required for a workflow, allowing other researchers to more-easily reproduce research results in order to validate and extend them.

Types of Containers

There are two main types of containers. The first is the industry standard OCI (Open Container Initiative) container, popularized by Docker. Docker uses a client-server architecture, with one (usually) privileged background server process (or “daemon” process, called “dockerd”) per host. If run on a multi-tenant system (e.g., HPC cluster such as Cannon), this results in a security issue in that users who interact with the privileged daemon process could access files owned by other users. Additionally, on an HPC cluster, the docker daemon process does not integrate with Slurm resource allocation facilities.

Podman, a daemonless OCI container toolchain developed by RedHat to address these issues, is installed on the FASRC cluster. The Podman CLI (command-line interface) was designed to be largely compatible with the Docker CLI, and on the FASRC cluster, the docker command runs podman under the hood. Many docker commands will just work with podman, though there are some differences.

The second is Singularity. Singularity grew out of the need for additional security in shared user contexts (like you find on a cluster). Since Docker normally requires the user to run as root, Singularity was created to alleviate this requirement and bring the advantages of containerization to a broader context. There are a couple of implementations of Singularity, and on the cluster, we use SingularityCE (the other implementation is Apptainer). Singularity has the ability to convert OCI (docker) images into Singularity Image Format (SIF) files. Singularity images have the advantage of being distributable as a single read-only file; on an HPC cluster, this can be located on a shared filesystem, which can be easily launched by processes on different nodes. Additionally, Singularity containers can run as the user who launched them without elevated privileges.

Rootless

Normally, building a container requires root permissions, and in the case of Podman/Docker, the containers themselves would ordinarily be launched by the root user. While this may be fine in a cloud context, it is not in a shared resource context like a HPC cluster. Rootless is the solution to this problem.

Rootless essentially allows the user to spoof being root inside the container. It does this via a Linux feature called subuid (short for Subordinate User ID) and subgid (Subordinate Group ID). This feature allows a range of uid’s (a unique integer assigned to each user name used for permissions identification) and gid’s (unique integer for groups) to be subordinated to another uid. An example is illustrative. Let’s say you are userA with a uid of 20000. You are assigned the subuid range of 1020001-1021000. When you run your container, the following mapping happens:

In the Container [username(uid)] Outside the Container [username(uid)]
root(0) userA(20000)
apache(48) 1020048
ubuntu(1000) 1021000

Thus, you can see that while you are inside the container, you pretend to be another user and have all the privileges of that user in the container. Outside the container, though, you are acting as your user, and the uid’s subordinated to your user.

A few notes are important here:

  1. The subuid/subgid range assigned to each user does not overlap the uid/gid or subuid/subgid range assigned to any other user or group.
  2. While you may be spoofing a specific user inside of the container, the process outside the container sees you as your normal uid or subuid. Thus, if you use normal Linux tools like top or ps outside the container, you will notice that the id’s that show up are your uid and subuid.
  3. Filesystems, since they are external, also see you as your normal uid/gid and subuid/subgid. So files created as root in the container will show up on the storage as owned by your uid/gid. Files created by other users in the container will show up as their mapped subuid/subgid.

Rootless is very powerful and allows you to both build containers on the cluster, as well as running Podman/Docker containers right out of the box. If you want to see what your subuid mapping is, you can find the mappings at /etc/subuid and /etc/subgid. You can find your uid by running the id command, which you can then use to look up your map (e.g., with the command: grep "^$(id -u):" /etc/subuid).

Rootless and Filesystems

Two more crucial notes about filesystems. The first is that since subuid’s are not part of our normal authentication system, it means that filesystems that cannot resolve subid’s will not permit them access. In particular Lustre (e.g., /n/holylabs) does not recognize subuid’s and since it cannot resolve them, it will not permit them. NFS filesystems (e.g., /n/netscratch) do not have this problem.

The second is that even if you can get into the filesystem, you may not be able to traverse into locations that do not have world access (o+rx) enabled. This is because the filesystem cannot resolve your user group or user name, does not see you as a valid member of the group, and thus will reject you. As such, it is imperative to test and validate filesystem access for filesystems you intend to map into the container and ensure that access is achievable. A simple way to ensure this is to utilize the Everyone directory which exists for most filesystems on the cluster. Note that your home directory is not world accessible for security reasons and thus cannot be used.

Getting Started

The first step in utilizing a container on the cluster is to submit a job. Login nodes are not appropriate places for development. If you are just beginning, the easiest method is to either get a command line interactive session via salloc, or launch an OOD session.

Once you have a session, you can then launch your container:

Singularity

[jharvard@holy8a26602 ~]$ singularity run docker://godlovedc/lolcow
INFO: Downloading library image to tmp cache: /scratch/sbuild-tmp-cache-701047440
INFO: Converting OCI blobs to SIF format
INFO: Starting build...
INFO: Fetching OCI image...
45.3MiB / 45.3MiB [============================================================================================================================] 100 % 21.5 MiB/s 0s
53.7MiB / 53.7MiB [============================================================================================================================] 100 % 21.5 MiB/s 0s
INFO: Extracting OCI image...
2025/01/09 10:49:52 warn rootless{dev/agpgart} creating empty file in place of device 10:175
2025/01/09 10:49:52 warn rootless{dev/audio} creating empty file in place of device 14:4
2025/01/09 10:49:52 warn rootless{dev/audio1} creating empty file in place of device 14:20
INFO: Inserting Singularity configuration...
INFO: Creating SIF file...
_________________________________________
/ Q: What do you call a principal female \
| opera singer whose high C |
| |
| is lower than those of other principal |
\ female opera singers? A: A deep C diva. /
-----------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||

Podman

[jharvard@holy8a24601 ~]$ podman run docker://godlovedc/lolcow
Trying to pull docker.io/godlovedc/lolcow:latest...
Getting image source signatures
Copying blob 8e860504ff1e done |
Copying blob 9fb6c798fa41 done |
Copying blob 3b61febd4aef done |
Copying blob 9d99b9777eb0 done |
Copying blob d010c8cf75d7 done |
Copying blob 7fac07fb303e done |
Copying config 577c1fe8e6 done |
Writing manifest to image destination
_____________________________
< Give him an evasive answer. >
-----------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||

Shell

If you want to get a shell prompt in a container do:

Singularity

[jharvard@holy8a26602 ~]$ singularity shell docker://godlovedc/lolcow
Singularity>

Podman

[jharvard@holy8a26601 ~]$ podman run --rm -it --entrypoint bash docker://godlovedc/lolcow
root@holy8a26601:/#

GPU

If you want to use a GPU in a container first start a job reserving a GPU on a gpu node. Then do the following:

Singularity

You will want to add the --nv flag for singularity:

[jharvard@holygpu7c26306 ~]$ singularity exec --nv docker://godlovedc/lolcow /bin/bash
Singularity> nvidia-smi
Fri Jan 10 15:50:20 2025 
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-SXM4-40GB On | 00000000:4B:00.0 Off | On |
| N/A 24C P0 43W / 400W | 74MiB / 40960MiB | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| 0 1 0 0 | 37MiB / 19968MiB | 42 0 | 3 0 2 0 0 |
| | 0MiB / 32767MiB | | |
+------------------+----------------------------------+-----------+-----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Singularity>

Podman

For podman you need to add --device nvidia.com/gpu=all:

[jharvard@holygpu7c26305 ~]$ podman run --rm --device nvidia.com/gpu=all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
Fri Jan 10 20:26:57 2025 
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-SXM4-40GB On | 00000000:31:00.0 Off | On |
| N/A 25C P0 47W / 400W | N/A | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| 0 2 0 0 | 37MiB / 19968MiB | 42 0 | 3 0 2 0 0 |
| | 0MiB / 32767MiB | | |
+------------------+----------------------------------+-----------+-----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
WARN[0005] Failed to add pause process to systemd sandbox cgroup: dbus: couldn't determine address of session bus

Docker Rate Limiting

Docker Hub limits the number of pulls anonymous accounts can make. If you hit either an error of:

ERROR: toomanyrequests: Too Many Requests.

or

You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limits.

you will need to create a Docker account to increase your limit. See the Docker documentation for more details.

Once you have a Docker account, you can authenticate with Docker Hub with your Docker Hub account (not FASRC account) and then run a Docker container.

Singularity

singularity remote login --username <dockerhub_username> docker://docker.io

Podman

podman login docker.io

Advanced Usage

For advanced usage tips such as how to build your own containers, see our specific container software pages:

© The President and Fellows of Harvard College
Except where otherwise noted, this content is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.