2023 Rocky 8 Transition Information
See Python information and examples at: https://github.com/fasrc/User_Codes/tree/master/Languages/Python
Pre-Rocky CentOS 7 Information
Python on the FASRC cluster
We like the Anaconda python distribution from Continuum Analytics. It includes hundreds of the most popular packages for large-scale data processing, predictive analytics, and scientific computing (numpy, scipy, ipython, pandas, scikit-learn, mpi4py, etc.; see below for the full list). It generally does a great job of including the latest versions of all the packages while maintaining compatibility between them all.
You can use Python and Anaconda on the cluster by running:
For Python 3.x load
module load Anaconda3/2020.11
Customizing the environment
Anaconda has a concept of environments; these are used to manage alternative package sets and versions. For example, if you want newer versions of some packages in the default environment, you can make a new environment with your customizations.
Load the base environment based on the version of python. To see the available versions, visit our Portal Module Lookup
Create a new environment:
conda create -n <name> <packages>
After loading the Anaconda module, you can use the following command to list the existing environments.
conda env list
Conda documentation has more details on different options. Use the following command to run the environment.
$ source activate <name>
If you want to use this environment all the time, add the above line to your ~/.bashrc
(or other appropriate shell config file) after the line that loads the module.
WARNING: It is advised not to use conda activate
but instead to use source activate
. conda activate
will ask you to run conda init
to modify your .bashrc
. However doing this will permanently activate the conda environment, and will create problems later on. Instead use source activate
. If you have run conda init
and you want to undo it see this doc for how. If you want to maintain conda init
see this example.
At this point you can upgrade or install a package named PACKAGE
with the following command (it’s the same whether installing or upgrading):
$ conda install PACKAGE
The commands conda list
and conda show
, which list installed and available packages, respectively. See the conda documentation for all the details. If the package is not available from conda, you can install it into your environment using pip:
$ pip install PACKAGE
Note: Anaconda generally has the lastest versions of all packages that are compatible. When you install an extra package, it’ll often update core packages like numpy and scipy; other packages may then downgrade them back. This is why we recommend sticking to the default environment if possible.
If you have problems updating a package that is already installed in the Anaconda environment, you might need to remove the package first:
$ conda remove PACKAGE
$ conda install PACKAGE
This will often bypass update errors, especially with certain versions of matplotlib
.
To stop using the custom environment, run:
$ source deactivate
To remove old environments, you can run.
conda env remove -n <name>
Source Activate instead of Conda Activate
It is advised not to use conda activate
but instead to use source activate
. conda activate
will ask you to run conda init
to modify your .bashrc
. However doing this will permanently activate the conda environment, and will create problems later on. Instead use source activate
. If you have run conda init
and you want to undo it see this doc for how. If you want to maintain conda init
see this example.
Submitting Jobs from within Conda Environments
Since Slurm clones the current environment into a submitted job, that means that jobs submitted with an active Conda Environment will have an odd environment which can lead to problems. It is recommended that users add the --export=NONE
option to their submissions, this will prevent this cloning. The submitted job will then run in a pristine environment.
Choosing the Right Cluster Node to Install From
If you are working on the Cannon cluster
If you are installing packages in your home directory, it would be best to install from a Boston login node, not a Holyoke login node, because the cluster home directories are located in the Boston datacenter. To ensure you are routed to a Boston login node when you ssh to the cluster, do the following:
ssh <your_username>@boslogin.rc.fas.harvard.edu
If you are working on FASSE
All FASSE login nodes are in Boston, so simply login to FASSE as normal:
ssh <your_username>@fasselogin.rc.fas.harvard.edu
Useful Commands
The Conda Cheatsheet has other helpful commands. conda-cheatsheet
[Conda Cheatsheet]