Search Docs by Keyword

Table of Contents

Rocky 8 Transition Guide

Overview

As part of our June 5-8, 2023 MGHPCC Downtime, FASRC will be upgrading the cluster operating system from CentOS 7 to Rocky 8. Details as to why the transition is taking place are provided on the downtime page. This page is intended to go over the test environment for users prior to the downtime as well as a guide for different aspects of the upgrade.

  • Town hall presentation slides
  • Town Hall presentation video

 

OOD/VDI (Open OnDemand) changes: The Open OnDemand instances for Cannon (and later for FASSE) will involve a change to each user’s settings and the location of the folder containing those settings and other OOD-specific files. The new location in your home directory will be ~/.fasrcood  This also means that values you had filled out in the OOD start form will not be retained. Form values will revert to the defaults.

Any “development mode” AKA “sandbox” apps that you have installed to ~/fasrc/dev (approximately 78 users have done so) will no longer be visible through your OOD Dashboard, and will need to be moved to ~/.fasrcood/dev before they can be used on the new OOD. The old folder at ~/fasrc will no longer be used after June 8th and, assuming you have nothing in /dev to move,  can be removed from your home directory (this directory may have grown large, so it is not advised to keep it around unecessarily).

 

Software

Warning

The change in operating system means that most users software built on CentOS 7 will not work and will need to be rebuilt. Even if the code does work with out being rebuilt the change in underlying libraries may impact code execution and results.

Users should test and verify that their codes are producing the expected results on the new operating system. This is a significant change with a host of bug fixes, performance upgrades, and numerical methods changes. These can change results, so users need to be sure to test and validate their codes.

Overview

Below you will find discussion and links to various software documentation.  In general if you are using a package manager you should work with in that manager. If you use Spack you should stick to installing packages via Spack, even if you are using python, Julia, or R. If you are using R, then stick to R and do not use Spack. If you are using conda or pip then do not use Spack.  If you are using Julia, do not use Spack. Mixing package managers can cause problems and weird dependencies and should be avoided unless absolutely necessary. Modules can mix with everything, so there should be no concern with using those.

Software Overview

Building

While software can be built on the login nodes, we recommend users start an interactive session via salloc. This is especially true if you want to build code optimized for specific chipsets. We run a variety of hardware and the login nodes are of a different architecture than the compute nodes.

Modules

FASRC has traditionally provided modules of different software packages for users. For Rocky 8 we are significantly reducing the number of modules we will support. Only modules considered core operational codes (like compilers, MPI, or software environments) or licensed software (e.g. matlab, mathematica) will be built. You can find the list of modules we provide by doing: module avail. Note that module avail only shows the modules that can currently be loaded, it does not show those modules built against specific compilers and versions of MPI. To see those modules you must load the compiler and MPI versions you want to use. To search the modules do: module spider <name>  CentOS 7 modules will not be available in Rocky 8. For modules formerly provided in CentOS 7 we are recommending users use Spack instead.

Available modules list

Spack

FASRC Spack Guide

Singularity

FASRC Singularity Guide

CentOS 7 Singularity Image

For those who cannot upgrade to Rocky 8 we are providing a CentOS 7 Singularity image which contains our full CentOS 7 environment as well as access to our old CentOS 7 modules. A guide for using that environment can be found here: Running Singularity image with CentOS 7 on Rocky 8

Julia

FASRC Julia Guide

Python

FASRC Python Guide

Python 2

Python 2 has been deprecated since 2020. Users are encouraged to migrate to Python 3. For those who need Python 2 for historic codes we recommend using either a Singularity container or the python/2.7.16-fasrc01 module (which uses Anaconda2).

PyTorch

Note that the rocky_gpu partition on Rocky 8 test cluster is setup with Multi-instance GPU (MIG) feature of Nvidia A100s. Due to MIG, PyTorch may not work. If you would like to test PyTorch on rocky_gpu, please send us a ticket.

R

FASRC R Guide

Other

FASRC User Codes

FAQ

Partitions

Based on a thorough analysis of usage patterns, many partitions’ time limits have changed from 7 days to 3.  Further explanation can be found here:

Legacy CentOS 7 Support

FASRC provides a container of our full CentOS 7 environment for those who require it for their code. Beyond that support for CentOS 7 will not be maintained for the compute environment. Slurm support for CentOS 7 will be dropped with the next major Slurm upgrade, if you have a host that connects to Slurm that is CentOS 7 contact FASRC to discuss migration. Virtual Machine’s and other systems running CentOS 7 and older will need to migrate to other hosting options or coordinate upgrades with FASRC.

I don’t see the module/software I use in modules, Spack, Singularity, or User Codes

Users are always welcome to build their own codes on the cluster or download compatible binaries. We provide guides for how to accomplish this in our documentation. If you have trouble doing this or you think a module is missing contact rchelp@rc.fas.harvard.edu

SSH key or ‘DNS spoofing’ errors when connecting to login or other nodes

WARNING: POSSIBLE DNS SPOOFING DETECTED! and/or The RSA host key for login.rc.fas.harvard.edu has changed error messages.  After an update of nodes the SSH key fingerprint of a node may change. This will, in turn, cause an error when you next try to log into that node as your locally stored key will no longer match. SSH uses this as a way to spot man-in-the-middle attacks or DNS spoofing. However, when a key changes for a valid reason, this does mean you need to clear out the one on your computer in order to be able to re-connect.

See this FAQ page for further instructions: SSH key error, DNS spoofing message

Modules in your .bashrc no longer work or give errors on login

If you have edited your .bashrc file to include module loads at login, you may find that some CentOS 7 modules will not be found or may not work on Rocky 8. You will need to edit your .bashrc and comment out or remove any such lines going forward. If you can no longer log in because of something in your .bashrc, contact us and we can rename your .bashrc and copy in a default version for you.

If you’d like to start from scratch, a default .bashrc contains the following:

# .bashrc

# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi

# User specific aliases and functions below

My alternate shell (csh, tcsh, etc.) doesn’t work right

Having a non-standard default shell will cause problems and does not allow us to set global environmental defaults for everyone. We will no longer change the default shell on any account or support the use of alternate shells as default login shell. Users who do not have bash as their default login shell will need to change back to bash. Users can, of course, still launch an alternate shell once logged in.

My module won’t load in a batch job

If you are getting a error similar to this when loading a module in a batch job

"/usr/bin/lua: /n/helmod/apps/lmod/7.7.32/libexec/lmod:61: module 'posix' not found:
no field package.preload['posix']
no file '/usr/share/lua/5.1/posix.lua'
no file '/usr/share/lua/5.1/posix/init.lua'
no file '/usr/lib64/lua/5.1/posix.lua'
no file '/usr/lib64/lua/5.1/posix/init.lua'
no file '/usr/lib64/lua/5.1/posix.so'
no file '/usr/lib64/lua/5.1/loadall.so'
stack traceback:
[C]: in function 'require'
/n/helmod/apps/lmod/7.7.32/libexec/lmod:61: in main chunk
[C]: in ?
/var/slurmd/spool/slurmd/job53240333/slurm_script: line 27: python: command not found
"

Then you are submitting your job from a Centos 7 host such as boslogin, holylogin, or a Centos 7 compute node. You must submit the job from a Rocky 8 host (e.g. rockylogin)

Processes run on a login node are restricted

We have moved away from the old pkill process on login nodes which killed processes using too much CPU or memory (RAM) usage to try and maintain a fair balance for all (example: running Mathematica on a login node and not a compute node). The Rocky 8 login nodes use a system-level mechanism called cgroups which limits each logged-in account to 1 core and 4GB of memory. This should eliminate memory or CPU hogging on login nodes. Please use an interactive session if you need to run more involved processes that require more memory or CPU. Login nodes are meant as gateways to launching and monitoring jobs and to run light processes to prepare your jobs or environment.

Should you run a process on a login node that runs afoul of cgroup policies, the process will be killed. Please be aware that there is no warning mechanism; such processes will be killed without warning, so please err on the side of caution when choosing to run something on a login node versus an interactive session.

Passwordless ssh not working

The permissions on your ~/.ssh directory maybe incorrect. It is now required that your ~/.ssh directory be only accessible by you. You can make sure of this by running chmod -R g-rwx,o-rwx ~/.ssh  It is also worth double checking the permissions on your home directory. In general it should only be accessible to you, or if it is accessible to others they should not have write access.

 

© The President and Fellows of Harvard College
Except where otherwise noted, this content is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.