Search Docs by Keyword
TensorFlow
Description
TensorFlow (TF) is an open-source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them. This flexible architecture lets you deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device without rewriting code.
TensorFlow was originally developed by researchers and engineers working on the Google Brain team within Google’s Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research. The system is general enough to be applicable in a wide variety of other domains, as well.
Installation:
The below instructions are intended to help you set up TF on the FASRC cluster.
GPU Version
The specific example illustrates the installation of TF version 2.16.1 with Python version 3.10, CUDA version 12.1.0, and CUDNN version 9.0.0.312. Please refer to our documentation on running GPU jobs on the FASRC cluster.
The two recommended methods for setting up TF in your user environment are installing TF in a conda environment in your user space or using a TF singularity container.
Installing TF in a Conda Environment
You can install your own TF instance following these simple steps:
# Load required software modules, e.g., module load python/3.10.13-fasrc01 # Create a new conda environment with Python: mamba create -n tf2.16.1_cuda12.1 python=3.10 pip wheel # Activate the new conda environment, e.g., source activate tf2.16.1_cuda12.1 # Install CUDA and cuDNN with conda/mamba and pip: mamba install -c "nvidia/label/cuda-12.1.0" cuda-toolkit=12.1.0 pip install nvidia-cudnn-cu12==9.0.0.312 # Configure the system paths. You can do it with the following command every time you start a new terminal after activating your conda environment: CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")) export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib # For your convenience, it is recommended that you automate it with the following commands. The system paths will be automatically configured when you activate this conda environment: mkdir -p $CONDA_PREFIX/etc/conda/activate.d echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh # Install extra packages required for data analytics, e.g., mamba install -c conda-forge numpy scipy pandas matplotlib seaborn h5py jupyterlab jupyterlab-spellchecker scikit-learn # Install TF plus required GPU libraries with pip, e.g., pip install --upgrade tensorflow[and-cuda]==2.16.* # Set up the KERAS backend (required for KERAS version 3.0 and above) export KERAS_BACKEND="tensorflow"
NOTE: Starting with version 2.16.1, TF includes KERAS version 3.0. Please, refer to the TensorFlow 2.16.1 release notes for important changes.
Pull a TF Singularity Container
Alternatively, one can pull and use a TensorFlow singularity container:
singularity pull --name tf2.16.1_gpu.simg docker://tensorflow/tensorflow:2.16.1-gpu
This will result in the image tf2.16.1_gpu.simg. The image then can be used with, e.g.,
$ KERAS_BACKEND="tensorflow" singularity exec --nv tf2.16.1_gpu.simg python3 Python 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import os; os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' >>> import tensorflow as tf >>> print(tf.__version__) 2.16.1 >>> print(tf.reduce_sum(tf.random.normal([1000, 1000]))) tf.Tensor(1365.5554, shape=(), dtype=float32) >>> print(tf.config.list_physical_devices('GPU')) [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU')]
Note: Please notice the use of the --nv
option. This is required to make use of the NVIDIA GPU card on the host system. Please also notice the use of KERAS_BACKEND="tensorflow"
environment variable, which is required to set the KERAS backend to TF.
Alternatively, you can pull a container from the NVIDA NGC Catalog, e.g.,
singularity pull docker://nvcr.io/nvidia/tensorflow:24.03-tf2-py3
This will result in the image tensorflow_24.03-tf2-py3.sif
, which has TF version 2.15.0
.
The NGC catalog provides access to optimized containers of many popular apps.
CPU Version
Similarly to the GPU installation, you can either install TF in a conda
environment or use a TF singularity container.
Installing TF in a Conda Environment
# (1) Load required software modules module load python/3.10.13-fasrc01 # (2) Create conda environment mamba create -n tf2.16.1_cpu python=3.10 pip wheel # (3) Activate the conda environment source activate tf2.16.1_cpu # (4) Install required packages for data analytics, e.g., mamba install -c conda-forge numpy scipy pandas matplotlib seaborn h5py jupyterlab jupyterlab-spellchecker scikit-learn # (5) Install a CPU version TF with pip pip install --upgrade tensorflow-cpu==2.16.* # (6) Set up KERAS backend to use TF export KERAS_BACKEND="tensorflow"
Pull a TF Singularity Container
singularity pull --name tf2.12_cpu.simg docker://tensorflow/tensorflow:2.12.0
This will result in the image tf2.12_cpu.simg
. The image then can be used with, e.g.,
KERAS_BACKEND="tensorflow" singularity exec tf2.16.1_cpu.simg python3 -c "import os; os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'; import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
tf.Tensor(2878.413, shape=(), dtype=float32)
Running TensorFlow:
Run TensorFlow Interactively
For an interactive session to work with the GPUs, you can use the following:
salloc -p gpu_test -t 0-06:00 --mem=8000 --gres=gpu:1
While on GPU node, you can run nvidia-smi
to get information about the assigned GPU’s.
[username@holygpu7c26306 ~]$ nvidia-smi Fri Apr 5 16:00:55 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA A100-SXM4-40GB On | 00000000:E3:00.0 Off | On | | N/A 25C P0 46W / 400W | 259MiB / 40960MiB | N/A Default | | | | Enabled | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | MIG devices: | +------------------+--------------------------------+-----------+-----------------------+ | GPU GI CI MIG | Memory-Usage | Vol| Shared | | ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG | | | | ECC| | |==================+================================+===========+=======================| | 0 2 0 0 | 37MiB / 19968MiB | 42 0 | 3 0 2 0 0 | | | 0MiB / 32767MiB | | | +------------------+--------------------------------+-----------+-----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+
Load required modules, and source your TF environment:
[username@holygpu7c26306 ~]$ module load python/3.10.13-fasrc01 && source activate tf2.16.1_cuda12.1
(tf2.16.1_cuda12.1) [username@holygpu7c26306 ~]$
Test TF:
(Example adapted from here.)
(tf2.16.1_cuda12.1) [username@holygpu7c26306 ~]$ python tf_test.py 2.16.1 Epoch 1/10 1875/1875 ━━━━━━━━━━━━━━━━━━━ 5s 839us/step - accuracy: 0.7867 - loss: 0.6247 Epoch 2/10 1875/1875 ━━━━━━━━━━━━━━━━━━━ 2s 829us/step - accuracy: 0.8600 - loss: 0.3855 Epoch 3/10 1875/1875 ━━━━━━━━━━━━━━━━━━━ 3s 827us/step - accuracy: 0.8788 - loss: 0.3373 Epoch 4/10 1875/1875 ━━━━━━━━━━━━━━━━━━━ 2s 831us/step - accuracy: 0.8852 - loss: 0.3124 Epoch 5/10 1875/1875 ━━━━━━━━━━━━━━━━━━━ 2s 828us/step - accuracy: 0.8912 - loss: 0.2915 Epoch 6/10 1875/1875 ━━━━━━━━━━━━━━━━━━━ 3s 830us/step - accuracy: 0.8961 - loss: 0.2773 Epoch 7/10 1875/1875 ━━━━━━━━━━━━━━━━━━━ 2s 828us/step - accuracy: 0.9025 - loss: 0.2625 Epoch 8/10 1875/1875 ━━━━━━━━━━━━━━━━━━━ 2s 830us/step - accuracy: 0.9044 - loss: 0.2606 Epoch 9/10 1875/1875 ━━━━━━━━━━━━━━━━━━━ 2s 828us/step - accuracy: 0.9081 - loss: 0.2489 Epoch 10/10 1875/1875 ━━━━━━━━━━━━━━━━━━━ 2s 829us/step - accuracy: 0.9109 - loss: 0.2405 313/313 - 2s - 6ms/step - accuracy: 0.8804 - loss: 0.3411 Test accuracy: 0.8804000020027161 313/313 ━━━━━━━━━━━ 1s 1ms/step [1.0222636e-07 7.9844620e-09 4.7857565e-11 5.2755653e-09 2.7131367e-10 2.1757800e-04 5.9717085e-09 6.6847289e-03 4.5007189e-07 9.9309713e-01]
In the above example, we used the test code, tf_test.py from User Codes.
TensorFlow Singularity Image from Definition File
You may pull a singularity TensorFlow version 2.12.0 image with the below command:
# Pull a singularity container with version 2.12.0 singularity pull --name tf2.12_gpu.simg docker://tensorflow/tensorflow:2.12.0-gpu
This image comes with a number of basic Python packages. If you need additional packages, you could use the example singularity definition file tf-2.12.def
to build the singularity image:
Bootstrap: docker From: tensorflow/tensorflow:2.12.0-gpu %post pip install --upgrade pip pip install matplotlib pip install seaborn pip install scipy pip install scikit-learn pip install jupyterlab pip install notebook
You could install additional packages directly in the image with pip
by adding them in the %post
section of the definition file as illustrated above. Please, refer to our documentation on how to build singularity images from definition files.
Examples:
- Example 1: Simple 2D CNN with the MNIST dataset
- Example 2: TensorBoard application
- Example 3: Multi-gpu example from TensorFlow documentation
- Example 4: Multi-gpu example — modified tf_test.py