Search Docs by Keyword

Table of Contents

PyTorch

Description

PyTorch, developed by Facebook’s AI Research lab, is an open-source machine learning library that offers a flexible platform for building deep learning models. It features a Python front end and integrates seamlessly with Python libraries like NumPy, SciPy, and Cython to extend its functionality. Unique for its use of dynamic computational graphs, unlike TensorFlow’s static graphs, PyTorch allows for greater flexibility in model design. This is particularly advantageous for research applications involving novel architectures.

The library supports GPU acceleration, enhancing performance significantly, which is vital for tackling high-level research tasks in areas such as climate change modeling, DNA sequence analysis, and AI research that involve large datasets and complex architectures. Automatic differentiation in PyTorch is handled through a tape-based system at both the functional and neural network layers, offering both speed and flexibility as a deep learning framework.

Installing PyTorch

These instructions are intended to help you install PyTorch on the FASRC cluster.

GPU Support:  For general information on running GPU jobs refer to our user documentation. To set up PyTorch with GPU support in your user environment, please follow the below steps:

PyTorch with CUDA 12.1 in a conda environment

These instructions set up a conda environment with PyTorch version 2.2.1 and CUDA version 12.1, where the cuda-toolkit is installed directly in the conda environment.

Start an interactive job requesting GPUs, e.g., (Note: you will want to start a session on the same type of hardware as what you will run on)

salloc -p gpu -t 0-06:00 --mem=8000 --gres=gpu:1

Load required software modules, e.g.,

module load python/3.10.13-fasrc01

Create a conda environment, e.g.,

mamba create -n pt2.3.0_cuda12.1 python=3.10 pip wheel

Activate the new conda environment:

source activate pt2.3.0_cuda12.1

Install cuda-toolkit version 12.1.0 with mamba

mamba install -c  "nvidia/label/cuda-12.1.0" cuda-toolkit=12.1.0

Install PyTorch with mamba

mamba install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

Install additional Python packages, if needed, e.g.,

mamba install -c conda-forge numpy scipy pandas matplotlib seaborn h5py jupyterlab jupyterlab-spellchecker scikit-learn

PyTorch with CUDA 11.8 from a software module

These instructions set up a conda environment with PyTorch version 2.2.0 and CUDA version 11.8, where CUDA is loaded as a software module, cuda/11.8.0-fasrc01

# Start an interactive job on a GPU node (target the architecture where you plan to run), e.g.,
salloc -p gpu -t 0-06:00 --mem=8000 --gres=gpu:1

# Load the required modules, e.g.,
module load python 
module load cuda/11.8.0-fasrc01 # CUDA version 11.8.0

# Create a conda environment and activate it, e.g.,
mamba create -n pt2.2.0_cuda11.8 python=3.10 pip wheel -y
source activate pt2.2.0_cuda11.8

# Install PyTorch
mamba install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

# Install additional packages, e.g.,
mamba install pandas scikit-learn matplotlib seaborn jupyterlab -y

Installing PyG (torch geometry)

After you create the conda environment pt2.3.0_cuda12.1 and activated it, you can install PyG in your environment with the command:

(pt2.3.0_cuda12.1) [username@holygpu7c26103 ~]$ mamba install pyg -c pyg

Running PyTorch:

If you are running PyTorch on GPU with multi-instance GPU (MIG) mode on (e.g. gpu_test partition), see PyTorch on MIG mode

PyTorch checks

You can run the following tests to ensure that PyTorch was installed properly and can find the GPU card. Example output of PyTorch checks:

(pt2.3.0_cuda12.1_v0) [jharvard@holygpu7c26106 ~]$ python -c 'import torch;print(torch.__version__)'
2.3.0
(pt2.3.0_cuda12.1_v0) [jharvard@holygpu7c26106 ~]$ python -c 'import torch;print(torch.cuda.is_available())'
True
(pt2.3.0_cuda12.1_v0) [jharvard@holygpu7c26106 ~]$ python -c 'import torch;print(torch.cuda.device_count())'
1
(pt2.3.0_cuda12.1_v0) [jharvard@holygpu7c26106 ~]$ python -c 'import torch;print(torch.cuda.current_device())'
0
(pt2.3.0_cuda12.1_v0) [jharvard@holygpu7c26106 ~]$ python -c 'import torch;print(torch.cuda.device(0))'
<torch.cuda.device object at 0x14942e6579d0>
(pt2.3.0_cuda12.1_v0) [jharvard@holygpu7c26106 ~]$ python -c 'import torch;print(torch.cuda.get_device_name(0))'
NVIDIA A100-SXM4-40GB MIG 3g.20gb

Run PyTorch Interactively

For an interactive session to work with the GPUs you can use following:

salloc -p gpu -t 0-06:00 --mem=8000 --gres=gpu:1

Load required software modules and source your PyTorch conda environment.

[username@holygpu7c26103 ~]$ module load python/3.10.12-fasrc01
[username@holygpu7c26103 ~]$ source activate pt2.3.0_cuda12.1
(pt2.3.0_cuda12.1) [username@holygpu7c26103 ~]$

Test PyTorch interactively:

(pt2.3.0_cuda12.1) [username@holygpu7c26103 ~]$ python check_gpu.py
Using device: cuda

NVIDIA A100-SXM4-40GB
Memory Usage:
Allocated: 0.0 GB
Reserved:  0.0 GB

tensor([[-2.3792, -1.2330, -0.5143,  0.5844]], device='cuda:0')

check_gpu.py: checks if GPUs are available and if available sets up the device to use them.

Run PyTorch with Batch Jobs

An example batch-job submission script is included below:

#!/bin/bash
#SBATCH -c 1
#SBATCH -N 1
#SBATCH -t 0-00:30
#SBATCH -p gpu
#SBATCH --gres=gpu:1
#SBATCH --mem=4G
#SBATCH -o pytorch_%j.out 
#SBATCH -e pytorch_%j.err 

# Load software modules and source conda environment
module load python/3.10.12-fasrc01
source activate pt2.3.0_cuda12.1

# Run program
srun -c 1 --gres=gpu:1 python check_gpu.py

If you name the above batch-job submission script run.sbatch, for instance, the job is submitted with:

sbatch run.sbatch

PyTorch and Jupyter Notebook on Open OnDemand

If you would like to use the PyTorch environment on Open OnDemand/VDI, you will also need to install packages ipykernel and ipywidgets with the following commands:

(pt2.3.0_cuda12.1) [username@holygpu7c26103 ~]$ mamba install ipykernel ipywidgets

PyTorch on MIG Mode

Note: Currently only the gpu_test partition has MIG mode enabled.

# Get GPU card name
nvidia-smi -L

# Set CUDA_VISIBLE_DEVICES with the MIG instance
export CUDA_VISIBLE_DEVICES=MIG-5b36b802-0ab0-5f37-af2d-ac23f40ef62d

Or automate the process with:

export CUDA_VISIBLE_DEVICES=$(nvidia-smi -L | awk '/MIG/ {gsub(/[()]/,"");print $NF}')

Best Practices

PyTorch and Jupyter Notebook on Open OnDemand

To use PyTorch in Jupyter Notebook on Open OnDemand/VDI, install ipykernel and ipywidgets:

mamba install ipykernel ipywidgets

Pull a PyTorch Singularity Container

Alternatively, you can pull and use a PyTorch singularity container:

singularity pull docker://pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime

Other PyTorch/cuda versions

To install other versions, refer to the PyTorch compatibility chart:

Examples

For example scripts covering installation, and use cases, see our User Codes > AI > PyTorch repo.

External Resources:

© The President and Fellows of Harvard College
Except where otherwise noted, this content is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.