Search Docs by Keyword
Python Package Installation
Description
Python packages on the cluster are primarily managed with Mamba. Direct use of pip
, outside of a virtual environment, is discouraged on the FASRC clusters.
Mamba is a package manager that is a drop-in replacement for Conda, and is generally faster and better at resolving dependencies:
- Speed: Mamba is written in C++, which makes it faster than Conda. Mamba uses parallel processing and efficient code to install packages faster.
- Compatibility: Mamba is fully compatible with Conda, so it can use the same commands, packages, and environment configurations.
- Cross-platform support: Mamba works on Mac, Linux and Windows.
- Dependency resolution: Mamba is better at resolving dependencies than Conda.
- Environment creation: Mamba is faster at creating environments, especially large ones.
- Package repository: Mamba uses Mambaforge ( aka conda-forge ), the most up to date packages available.
Important:
Anaconda is currently reviewing its Terms of Service for Academia and Research and is expected to conclude the update by the end of 2024. There is a possibility that Conda may no longer be free for non-profit academic research use at institutions with more than 200 employees. And downloading packages through Anaconda’s Main channel may incur costs. Hence, we recommend our users switch to using open-source conda-forge
channel for package distribution when possible. Our python
module is built with Miniforge3 distribution that has conda-forge
set as its default channel.
Mamba is a drop-in replacement for Conda and uses the same commands and configuration options as conda
. You can swap almost all commands between conda
& mamba
. By default, mamba
uses conda-forge, the free Mambaforge package repository. ( In this doc, we will generally only refer to mamba
.)
Usage
mamba
is available on the FASRC cluster as a software module either as Mambaforge
or as python/3*
which is aliased to mamba
. Once can access this by loading either of the following modules:
$ module load python/{PYTHON_VERS}-fasrc01
$ python -V Python {PYTHON_VERS}
Environments
You can create a virtual environments with mamba
in the same way as with conda
. However, it is important to start an interactive session prior to creating an environment and installing desired packages in the following manner:
$ salloc --partition test --nodes=1 --cpus-per-task=2 --mem=4GB --time=0-02:00:00
$ module load python/{PYTHON_VERS}-fasrc01
Create an environment using mamba: $ mamba create -n <ENV_NAME>
You can also install packages with the create command that could speed up your setup time significantly. For example,
$ mamba create -n <ENV_NAME> <PACKAGES>
$ mamba create -n python_env1 python={PYTHON_VERS} pip wheel
You must activate an environment in order to use it or install any packages in it. To activate and use an environment: $ mamba activate python_env1
To deactivate an active environment: $ mamba deactivate
You can list the packages currently installed in the mamba
or conda
environment with: $ mamba list
To install new packages in the environment with mamba
using the default channel:
$ mamba install -y <PACKAGES>
For example: $ mamba install -y numpy
To install a package from a specific channel, instead:
$ mamba install --channel <CHANNEL-NAME> -y <PACKAGE>
For example: $ mamba install --channel conda-forge boltons
To uninstall packages: $ mamba uninstall PACKAGE
For additional features, please refer to the Mamba documentation.
Pip Installs
Avoid using pip outside of a mamba environment on any FASRC cluster. If you run pip install outside of a mamba environment, the installed packages will be placed in your $HOME/.local
directory, which can lead to package conflicts and may cause some packages to fail to install or load correctly via mamba.
For example, if your environment name is python_env1
:
$ module load python
$ mamba activate python_env1
$ pip install <package_name>
Best Practices
Use mamba
environment in Jupyter Notebooks
If you would like to use a mamba
environment as a kernel in a Jupyter Notebook on Open OnDemand (Cannon OOD or FASSE OOD), you have to install packages, ipykernel
and nb_conda_kernels
. These packages will allow Jupyter to detect mamba
environments that you created from the command line.
For example, if your environment name is python_env1
:
$ module load python
$ mamba activate python_env1
$ mamba install ipykernel nb_conda_kernels
- Open a Jupyter Notebook (a
.ipynb
file) - On the top menu, click Kernel -> Change kernel -> select the conda environment
holylabs
space Mamba environments in
With mamba, use the -p
or --prefix
option to specify writing environment files to a holylabs
share location. Don’t use your home directory as it has very low performance due to filesystem latency. Using a lab share location, you can also share your conda environment with other people on the cluster. Keep in mind, you will need to make the destination directory, and specify the python version to use. For example:
$ mamba create --prefix /n/holylabs/LABS/{YOUR_LAB}/Lab/envs python={PYTHON_VERS}
$ mamba activate /n/holylabs/LABS/{YOUR_LAB}/Lab/envs
Troubleshooting
Interactive vs. batch jobs
If your code works in an interactive job, but fails in a slurm
batch job,
-
You are submitting your jobs from within a mamba environment.
Solution 1: Deactivate your environment with the commandmamba deactivate
and submit the job or
Solution 2: Open another terminal and submit the job from outside the environment. -
Check if your
~/.bashrc
or~/.bash_profile
files have a section ofconda initialize
or asource activate
command. Theconda initialize
section is known to create issues on the FASRC clusters.
Solution: Delete the section between the twoconda initialize
statements. If you havesource activate
in those files, delete it or comment it out.
For more information on~/.bashrc
files, see https://docs.rc.fas.harvard.edu/kb/editing-your-bashrc/
Jupyter Notebook or JupyterLab on Open OnDemand/VDI problems
See Jupyter Notebook or JupyterLab on Open OnDemand/VDI troubleshooting section.