Search Docs by Keyword

Table of Contents

Python Package Installation

Description

Python packages on the cluster are primarily managed with Mamba.  Direct use of pip, outside of a virtual environment, is discouraged on the FASRC clusters.

Mamba is a package manager that is a drop-in replacement for Conda, and is generally faster and better at resolving dependencies:

  • Speed: Mamba is written in C++, which makes it faster than Conda. Mamba uses parallel processing and efficient code to install packages faster.
  • Compatibility: Mamba is fully compatible with Conda, so it can use the same commands, packages, and environment configurations.
  • Cross-platform support: Mamba works on Mac, Linux and Windows.
  • Dependency resolution: Mamba is better at resolving dependencies than Conda.
  • Environment creation: Mamba is faster at creating environments, especially large ones.
  • Package repository: Mamba uses Mambaforge ( aka conda-forge ), the most up to date packages available.

Important:
Anaconda is currently reviewing its Terms of Service for Academia and Research and is expected to conclude the update by the end of 2024. There is a possibility that  Conda may no longer be free for non-profit academic research use at institutions with more than 200 employees. And downloading packages through Anaconda’s Main channel may incur costs.  Hence, we recommend our users switch to using open-source conda-forge channel for package distribution when possible. Our python module is built with Miniforge3 distribution that has conda-forge set as its default channel. 

Mamba is a drop-in replacement for Conda and uses the same commands and configuration options as conda. You can swap almost all commands between conda & mamba.  By default, mamba uses conda-forge, the free Mambaforge package repository.  ( In this doc, we will generally only refer to mamba.)

Usage

mamba is available on the FASRC cluster as a software module either as Mambaforge/Miniforge or as python/3* which is aliased to mamba. One can access this by loading either of the following modules:

$ module load python

To see Python’s version

$ python --version

Environments

You can create a virtual environment with mamba in the same way as with conda. However, it is important to start an interactive session prior to creating an environment and installing desired packages in the following manner:

$ salloc --partition test --nodes=1 --cpus-per-task=2 --mem=4GB --time=0-02:00:00
$ module load python

By default, Python packages are installed in your home directory, in ~/.conda/envs. If you would like to locate the packages elsewhere, like a Lab shared directory, then specify the absolute file path.

export CONDA_PKGS_DIRS=/<FILEPATH>/conda/pkgs
export CONDA_ENVS_PATH=/<FILEPATH>/conda/envs

Create an environment using mamba

$ mamba create -n <ENV_NAME>

Alternatively, you can create an environment and install packages at the same time. This ensures package dependencies are met and could also speed up your setup time significantly. The general syntax to create an environment and install packages is:

$ mamba create -n <ENV_NAME> <PACKAGES>

For example,

$ mamba create -n python_env1 python={PYTHON_VERS} pip wheel

You must activate an environment to use it or install packages within it. To activate and use an environment:

$ source activate python_env1

To deactivate an active environment:

$ source deactivate

To list packages inside the environment:

$ mamba list

To install new packages in the environment (optional: -y is to proceed with installation):

$ mamba install -y <PACKAGE>

For example, to install numpy:

$ mamba install -y numpy

To install a package from a specific channel, add --channel (or -c) argument:

$ mamba install --channel <CHANNEL-NAME> -y <PACKAGE>

For example, to install the package boltons from the conda-forge channel:

$ mamba install --channel conda-forge boltons

To uninstall packages:

$ mamba uninstall <PACKAGE>

To delete an environment:

$ conda remove -n <ENV_NAME> --all -y

For additional features, please refer to the Mamba documentation.

Pip Installs

Avoid using pip outside of a mamba environment on any FASRC cluster. If you run pip install outside of a mamba environment, the installed packages will be placed in your $HOME/.local directory, which can lead to package conflicts and may cause some packages to fail to install or load correctly via mamba.

For example, if your environment name is python_env1:

$ module load python
$ source activate python_env1
$ pip install <package_name>

Best Practices

Use mamba environment in Jupyter Notebooks

If you would like to use a mamba environment as a kernel in a Jupyter Notebook on Open OnDemand (Cannon OOD or FASSE OOD), you have to install packages, ipykernel and nb_conda_kernels. These packages will allow Jupyter to detect mamba environments that you created from the command line.

For example, if your environment name is python_env1:

$ module load python
$ source activate python_env1
$ mamba install ipykernel nb_conda_kernels
After these packages are installed, launch a new Jupyter Notebook job (existing Jupyter Notebook jobs will fail to “see” this environment). Then:
  1. Open a Jupyter Notebook (a .ipynb file)
  2. On the top menu, click Kernel -> Change kernel -> select the conda environment

Mamba environments in a desired location

With mamba, use the -p or --prefix option to specify writing environment files to a desired location, such as the holylabs location.  Don’t use your home directory as it has very low performance due to filesystem latency.  Using a lab share location, you can also share your conda environment with other people on the cluster.  Keep in mind, you will need to make the destination directory, and specify the python version to use.  For example:

$ mamba create --prefix /n/holylabs/{YOUR_LAB}/Lab/envs python={PYTHON_VERS}

$ source activate /n/holylabs/{YOUR_LAB}/Lab/envs

To delete an environment at that desired location: $ conda remove -p /n/holylabs/{YOUR_LAB}/Lab/envs --all -y

Troubleshooting

Interactive vs. batch jobs

If your code works in an interactive job, but fails in a slurm batch job,

  1. You are submitting your jobs from within a mamba environment.
    Solution 1: Deactivate your environment with the command mamba deactivate and submit the job or
    Solution 2: Open another terminal and submit the job from outside the environment.

  2. Check if your ~/.bashrc or ~/.bash_profile files have a section of conda initialize or a source activate command. The conda initialize section is known to create issues on the FASRC clusters.
    Solution: Delete the section between the two conda initialize statements. If you have source activate in those files, delete it or comment it out.
    For more information on ~/.bashrc files, see https://docs.rc.fas.harvard.edu/kb/editing-your-bashrc/

Jupyter Notebook or JupyterLab on Open OnDemand/VDI problems

See Jupyter Notebook or JupyterLab on Open OnDemand/VDI troubleshooting section.

Unable to install packages

If you are not being able to install packages or the package installation is taking a significantly long time, check your ~/.condarc file. As stated in Conda docs, this is an optional runtime configuration file. One can use this file to configure conda/mamba to search from specific channels for package installation.

We recommend users not have this file or keep it empty. This allows users to install packages in their conda/mamba environments using the defaults provided by the open-source distribution, Miniforge , that we have made available to our users via our newer Python modules.

If, for any reason, ~/.condarc exists in your cluster profile then check its contents. If the default channel is showing up as conda , edit it to conda-forge so that your ~/.condarc uses this open-source channel for package installation.

Similarly, if  you had created an environment a long time ago using the Anaconda distribution and it is no longer working, then it is best to create a new environment using the open-source distribution as described above while ensuring that ~/.condarc, if exists, is pointing to conda-forge as its default channel.

For example, if you created a conda environment using one of our older Python modules, say Anaconda2/2019.10-fasrc01, you can see that conda is configured to use repo.anaconda.com for package installation.

$ module load Anaconda2 
$ conda info 
... 
channel URLs : 
https://repo.anaconda.com/pkgs/main/linux-64 
https://repo.anaconda.com/pkgs/main/noarch 
https://repo.anaconda.com/pkgs/r/linux-64 
https://repo.anaconda.com/pkgs/r/noarch 
...

In order to change this configuration, you can execute the conda config command to ensure that conda now points to conda-forge. This would also create a .condarc file in your $HOME, if it already doesn’t exist:

$ conda config --add default_channels https://conda.anaconda.org/conda-forge/ 

$ cat ~/.condarc 
default_channels: - https://conda.anaconda.org/conda-forge/ 

$ conda info 
... 
channel URLs : 
https://conda.anaconda.org/conda-forge/linux-64 
https://conda.anaconda.org/conda-forge/noarch
...
© The President and Fellows of Harvard College.
Except where otherwise noted, this content is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.