Search Docs by Keyword

Table of Contents

Python Package Installation

Description

Python packages on the cluster are primarily managed with Mamba.  Direct use of pip, outside of a virtual environment, is discouraged on the FASRC clusters.

Mamba is a package manager that is a drop-in replacement for Conda, and is generally faster and better at resolving dependencies:

  • Speed: Mamba is written in C++, which makes it faster than Conda. Mamba uses parallel processing and efficient code to install packages faster.
  • Compatibility: Mamba is fully compatible with Conda, so it can use the same commands, packages, and environment configurations.
  • Cross-platform support: Mamba works on Mac, Linux and Windows.
  • Dependency resolution: Mamba is better at resolving dependencies than Conda.
  • Environment creation: Mamba is faster at creating environments, especially large ones.
  • Package repository: Mamba uses Mambaforge ( aka conda-forge ), the most up to date packages available.

Important:
Anaconda is currently reviewing its Terms of Service for Academia and Research and is expected to conclude the update by the end of 2024. There is a possibility that  Conda may no longer be free for non-profit academic research use at institutions with more than 200 employees. And downloading packages through Anaconda’s Main channel may incur costs.  Hence, we recommend our users switch to using open-source conda-forge channel for package distribution when possible. Our python module is built with Miniforge3 distribution that has conda-forge set as its default channel. 

Mamba is a drop-in replacement for Conda and uses the same commands and configuration options as conda. You can swap almost all commands between conda & mamba.  By default, mamba uses conda-forge, the free Mambaforge package repository.  ( In this doc, we will generally only refer to mamba.)

Usage

mamba is available on the FASRC cluster as a software module either as Mambaforge or as python/3* which is aliased to mamba. Once can access this by loading either of the following modules:

$ module load python/{PYTHON_VERS}-fasrc01
$ python -V Python {PYTHON_VERS}

Environments

You can create a virtual environments with mamba in the same way as with conda. However, it is important to start an interactive session prior to creating an environment and installing desired packages in the following manner:

$ salloc --partition test --nodes=1 --cpus-per-task=2 --mem=4GB --time=0-02:00:00

$ module load python/{PYTHON_VERS}-fasrc01

Create an environment using mamba: $ mamba create -n <ENV_NAME>

You can also install packages with the create command that could speed up your setup time significantly. For example,

$ mamba create -n <ENV_NAME> <PACKAGES> 
$ mamba create -n python_env1 python={PYTHON_VERS} pip wheel

You must activate an environment in order to use it or install any packages in it. To activate and use an environment: $ mamba activate python_env1

To deactivate an active environment: $ mamba deactivate

You can list the packages currently installed in the mamba or  conda environment with: $ mamba list

To install new packages in the environment with mamba using the default channel:

 $ mamba install -y <PACKAGES>

For example: $ mamba install -y numpy 

To install a package from a specific channel, instead:

$ mamba install --channel <CHANNEL-NAME> -y <PACKAGE>

For example: $ mamba install --channel conda-forge boltons

Note: Do not install pip outside a mamba environment on any FASRC cluster. If you execute pip install outside of a mamba environment, then all the packages that pip installs are located in your $HOME/.local, which could create package conflicts resulting in some packages either not getting installed or loaded viamamba successfully. 

To uninstall packages: $ mamba uninstall PACKAGE

For additional features, please refer to the Mamba documentation.

Best Practices

Use mamba environment in Jupyter Notebooks

If you would like to use a mamba environment as a kernel in a Jupyter Notebook on Open OnDemand (Cannon OOD or FASSE OOD), you have to install packages, ipykernel and nb_conda_kernels. These packages will allow Jupyter to detect mamba environments that you created from the command line.

For example, if your environment name is python_env1:

$ module load python
$ source activate python_env1
$ mamba install ipykernel nb_conda_kernels
After these packages are installed, launch a new Jupyter Notebook job (existing Jupyter Notebook jobs will fail to “see” this environment). Then:
  1. Open a Jupyter Notebook (a .ipynb file)
  2. On the top menu, click Kernel -> Change kernel -> select the conda environment

Mamba environments in holylabs space

With mamba, use the -p or --prefix option to specify writing environment files to a holylabs share location.  Don’t use your home directory as it has very low performance due to filesystem latency.  Using a lab share location, you can also share your conda environment with other people on the cluster.  Keep in mind, you will need to make the destination directory, and specify the python version to use.  For example:

$ mamba create --prefix /n/holylabs/LABS/{YOUR_LAB}/Lab/envs python={PYTHON_VERS}

$ mamba activate /n/holylabs/LABS/{YOUR_LAB}/Lab/envs

Pip Installs

Avoid using pip outside of a mamba environment on any FASRC cluster. If you run pip install outside of a mamba environment, the installed packages will be placed in your $HOME/.local directory, which can lead to package conflicts and may cause some packages to fail to install or load correctly via mamba.

For example, if your environment name is python_env1:

$ module load python
$ mamba activate python_env1
$ pip install <package_name>

Troubleshooting

Interactive vs. batch jobs

If your code works in an interactive job, but fails in a slurm batch job,

  1. You are submitting your jobs from within a mamba environment.
    Solution 1: Deactivate your environment with the command mamba deactivate and submit the job or
    Solution 2: Open another terminal and submit the job from outside the environment.

  2. Check if your ~/.bashrc or ~/.bash_profile files have a section of conda initialize or a source activate command. The conda initialize section is known to create issues on the FASRC clusters.
    Solution: Delete the section between the two conda initialize statements. If you have source activate in those files, delete it or comment it out.
    For more information on ~/.bashrc files, see https://docs.rc.fas.harvard.edu/kb/editing-your-bashrc/

© The President and Fellows of Harvard College
Except where otherwise noted, this content is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.