Python (Anaconda)

Python on the FASRC cluster

We like the Anaconda python distribution from Continuum Analytics. It includes hundreds of the most popular packages for large-scale data processing, predictive analytics, and scientific computing (numpy, scipy, ipython, pandas, scikit-learn, mpi4py, etc.; see below for the full list). It generally does a great job of including the latest versions of all the packages while maintaining compatibility between them all.

You can use Python and Anaconda on the cluster by running:

For Python 2.x load
module load Anaconda/5.0.1-fasrc02

For Python 3.x load
module load Anaconda3/2020.11

Customizing the environment

Anaconda has a concept of environments; these are used to manage alternative package sets and versions. For example, if you want newer versions of some packages in the default environment, you can make a new environment with your customizations.

Load the base environment based on the version of python. To see the available versions, visit our Portal Module Lookup

Create a new environment:
conda create -n <name> <packages>

After loading the Anaconda module, you can use the following command to list the existing environments.

conda env list

Conda documentation has more details on different options. Use the following command to run the environment.
$ source activate <name>

If you want to use this environment all the time, add the above line to your ~/.bashrc (or other appropriate shell config file) after the line that loads the module.

WARNING: It is advised not to use conda activate but instead to use source activateconda activate will ask you to run conda init to modify your .bashrc. However doing this will permanently activate the conda environment, and will create problems later on.  Instead use source activate.  If you have run conda init and you want to undo it see this doc for how.  If you want to maintain conda init see this example.

At this point you can upgrade or install a package named PACKAGE with the following command (it’s the same whether installing or upgrading):
$ conda install PACKAGE

The commands conda list and conda show, which list installed and available packages, respectively. See the conda documentation for all the details. If the package is not available from conda, you can install it into your environment using pip:
$ pip install PACKAGE

Note: Anaconda generally has the lastest versions of all packages that are compatible. When you install an extra package, it’ll often update core packages like numpy and scipy; other packages may then downgrade them back. This is why we recommend sticking to the default environment if possible.

If you have problems updating a package that is already installed in the Anaconda environment, you might need to remove the package first:
$ conda remove PACKAGE
$ conda install PACKAGE

This will often bypass update errors, especially with certain versions of matplotlib.

To stop using the custom environment, run:
$ source deactivate

To remove old environments, you can run.

conda env remove -n <name>

Source Activate instead of Conda Activate

It is advised not to use conda activate but instead to use source activateconda activate will ask you to run conda init to modify your .bashrc. However doing this will permanently activate the conda environment, and will create problems later on.  Instead use source activate.  If you have run conda init and you want to undo it see this doc for how.  If you want to maintain conda init see this example.

Choosing the Right Cluster Node to Install From

If you are installing packages in your home directory, it would be best to install from a Boston login node, not a Holyoke login node, because the cluster home directories are located in the Boston datacenter.  To ensure you are routed to a Boston login node when you ssh to the cluster, do the following:

ssh <your_username>@boslogin.rc.fas.harvard.edu

Useful Commands

The Conda Cheatsheet has other helpful commands.  conda-cheatsheet
[Conda Cheatsheet]