Search Docs by Keyword
PyTorch
Description
PyTorch, developed by Facebook’s AI Research lab, is an open-source machine learning library that offers a flexible platform for building deep learning models. It features a Python front end and integrates seamlessly with Python libraries like NumPy, SciPy, and Cython to extend its functionality. Unique for its use of dynamic computational graphs, unlike TensorFlow’s static graphs, PyTorch allows for greater flexibility in model design. This is particularly advantageous for research applications involving novel architectures.
The library supports GPU acceleration, enhancing performance significantly, which is vital for tackling high-level research tasks in areas such as climate change modeling, DNA sequence analysis, and AI research that involve large datasets and complex architectures. Automatic differentiation in PyTorch is handled through a tape-based system at both the functional and neural network layers, offering both speed and flexibility as a deep learning framework.
Installing PyTorch
These instructions are intended to help you install PyTorch on the FASRC cluster.
GPU Support: For general information on running GPU jobs refer to our user documentation. To set up PyTorch with GPU support in your user environment, please follow the below steps:
PyTorch with CUDA 12.1 in a conda environment
These instructions set up a conda
environment with PyTorch
version 2.2.1 and CUDA version 12.1, where the cuda-toolkit
is installed directly in the conda
environment.
Start an interactive job requesting GPUs, e.g., (Note: you will want to start a session on the same type of hardware as what you will run on)
salloc -p gpu -t 0-06:00 --mem=8000 --gres=gpu:1
Load required software modules, e.g.,
module load python/3.10.13-fasrc01
Create a conda environment, e.g.,
mamba create -n pt2.3.0_cuda12.1 python=3.10 pip wheel
Activate the new conda
environment:
source activate pt2.3.0_cuda12.1
Install cuda-toolkit
version 12.1.0 with mamba
mamba install -c "nvidia/label/cuda-12.1.0" cuda-toolkit=12.1.0
Install PyTorch with mamba
mamba install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
Install additional Python packages, if needed, e.g.,
mamba install -c conda-forge numpy scipy pandas matplotlib seaborn h5py jupyterlab jupyterlab-spellchecker scikit-learn
PyTorch with CUDA 11.8 from a software module
These instructions set up a conda
environment with PyTorch version 2.2.0 and CUDA
version 11.8, where CUDA
is loaded as a software module, cuda/11.8.0-fasrc01
# Start an interactive job on a GPU node (target the architecture where you plan to run), e.g.,
salloc -p gpu -t 0-06:00 --mem=8000 --gres=gpu:1
# Load the required modules, e.g.,
module load python
module load cuda/11.8.0-fasrc01 # CUDA version 11.8.0
# Create a conda environment and activate it, e.g.,
mamba create -n pt2.2.0_cuda11.8 python=3.10 pip wheel -y
source activate pt2.2.0_cuda11.8
# Install PyTorch
mamba install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
# Install additional packages, e.g.,
mamba install pandas scikit-learn matplotlib seaborn jupyterlab -y
Installing PyG (torch geometry)
After you create the conda environment pt2.3.0_cuda12.1
and activated it, you can install PyG in your environment with the command:
(pt2.3.0_cuda12.1) [username@holygpu7c26103 ~]$ mamba install pyg -c pyg
Running PyTorch:
If you are running PyTorch on GPU with multi-instance GPU (MIG) mode on (e.g. gpu_test
partition), see PyTorch on MIG mode
PyTorch checks
You can run the following tests to ensure that PyTorch was installed properly and can find the GPU card. Example output of PyTorch checks:
(pt2.3.0_cuda12.1_v0) [jharvard@holygpu7c26106 ~]$ python -c 'import torch;print(torch.__version__)'
2.3.0
(pt2.3.0_cuda12.1_v0) [jharvard@holygpu7c26106 ~]$ python -c 'import torch;print(torch.cuda.is_available())'
True
(pt2.3.0_cuda12.1_v0) [jharvard@holygpu7c26106 ~]$ python -c 'import torch;print(torch.cuda.device_count())'
1
(pt2.3.0_cuda12.1_v0) [jharvard@holygpu7c26106 ~]$ python -c 'import torch;print(torch.cuda.current_device())'
0
(pt2.3.0_cuda12.1_v0) [jharvard@holygpu7c26106 ~]$ python -c 'import torch;print(torch.cuda.device(0))'
<torch.cuda.device object at 0x14942e6579d0>
(pt2.3.0_cuda12.1_v0) [jharvard@holygpu7c26106 ~]$ python -c 'import torch;print(torch.cuda.get_device_name(0))'
NVIDIA A100-SXM4-40GB MIG 3g.20gb
Run PyTorch Interactively
For an interactive session to work with the GPUs you can use following:
salloc -p gpu -t 0-06:00 --mem=8000 --gres=gpu:1
Load required software modules and source your PyTorch conda environment.
[username@holygpu7c26103 ~]$ module load python/3.10.12-fasrc01
[username@holygpu7c26103 ~]$ source activate pt2.3.0_cuda12.1
(pt2.3.0_cuda12.1) [username@holygpu7c26103 ~]$
Test PyTorch interactively:
(pt2.3.0_cuda12.1) [username@holygpu7c26103 ~]$ python check_gpu.py
Using device: cuda
NVIDIA A100-SXM4-40GB
Memory Usage:
Allocated: 0.0 GB
Reserved: 0.0 GB
tensor([[-2.3792, -1.2330, -0.5143, 0.5844]], device='cuda:0')
check_gpu.py: checks if GPUs are available and if available sets up the device to use them.
Run PyTorch with Batch Jobs
An example batch-job submission script is included below:
#!/bin/bash
#SBATCH -c 1
#SBATCH -N 1
#SBATCH -t 0-00:30
#SBATCH -p gpu
#SBATCH --gres=gpu:1
#SBATCH --mem=4G
#SBATCH -o pytorch_%j.out
#SBATCH -e pytorch_%j.err
# Load software modules and source conda environment
module load python/3.10.12-fasrc01
source activate pt2.3.0_cuda12.1
# Run program
srun -c 1 --gres=gpu:1 python check_gpu.py
If you name the above batch-job submission script run.sbatch
, for instance, the job is submitted with:
sbatch run.sbatch
PyTorch and Jupyter Notebook on Open OnDemand
If you would like to use the PyTorch environment on Open OnDemand/VDI, you will also need to install packages ipykernel
and ipywidgets
with the following commands:
(pt2.3.0_cuda12.1) [username@holygpu7c26103 ~]$ mamba install ipykernel ipywidgets
PyTorch on MIG Mode
Note: Currently only the gpu_test
partition has MIG mode enabled.
# Get GPU card name
nvidia-smi -L
# Set CUDA_VISIBLE_DEVICES with the MIG instance
export CUDA_VISIBLE_DEVICES=MIG-5b36b802-0ab0-5f37-af2d-ac23f40ef62d
Or automate the process with:
export CUDA_VISIBLE_DEVICES=$(nvidia-smi -L | awk '/MIG/ {gsub(/[()]/,"");print $NF}')
Best Practices
PyTorch and Jupyter Notebook on Open OnDemand
To use PyTorch in Jupyter Notebook on Open OnDemand/VDI, install ipykernel
and ipywidgets
:
mamba install ipykernel ipywidgets
Pull a PyTorch Singularity Container
Alternatively, you can pull and use a PyTorch singularity container:
singularity pull docker://pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
Other PyTorch/cuda versions
To install other versions, refer to the PyTorch compatibility chart:
Examples
For example scripts covering installation, and use cases, see our User Codes > AI > PyTorch repo.
External Resources:
- Various PyTorch/CUDA version compatibility chart.
- PyTorch Official Documentation – Comprehensive resource for all functionalities including tutorials and API reference.
- PyTorch Tutorials – Practical tutorials covering basic to advanced topics, specifically tailored for deep learning and high-performance computing tasks.
- PyTorch Discussion Forums – A community forum for discussing specific issues, sharing solutions, and collaborating on projects.
- Introduction to Distributed Deep Learning – A detailed guide on implementing distributed deep learning models in PyTorch.
- Efficient PyTorch – An article by PyTorch developers on best practices for optimizing deep learning models for production.