The Harvard University Center for the Environment (HUCE) has their own purchased partitions on the FASRC cluster. These partitions are open to select groups in HUCE and their allocation is governed by the relative fairshare of the groups. A dashboard showing the relative allocations can be found here (RC VPN or Harvard network required). The partitions themselves are broken down into the following:
- huce_amd_bigmem: 1024 cores of AMD Abu Dhabi, each node of which has 64 cores and 499 GB of RAM.
- huce_bigmem: 128 cores of Intel Skylake, each node has 64 cores and 1.47 TB of RAM.
- huce_intel: 9488 cores of Intel Broadwell, each node has around 121 GB of RAM with 3.8 GB per core. Subject to requeue by huce_intel_priority.
- huce_intel_priority: Overlaps huce_intel and is accessible only to groups with a fairshare score higher than 0.75, the fairshare threshold may change in the future based on decisions by the HUCE Cluster Allocation Committee. This partition will requeue jobs in huce_intel to make room for jobs submitted to huce_intel_priority. Jobs on huce_intel are requeued only if needed, and starting with jobs that are at the lowest fairshare and have run for the least amount of time.
The above nodes are connected by FDR Infiniband and have no timelimit.
The following nodes are on the HDR Infiniband fabric and also have no timelimit.
- huce_cascade: 2880 cores of water cooled Intel Cascade Lake, each node has 48 cores and 184 GB of RAM. Subject to requeue by huce_cascade_priority.
- huce_cascade_priority: Overlaps huce_cascade and is accessible only to groups with a fairshare score higher than 0.75, the fairshare threshold may change in the future based on decisions by the HUCE Cluster Allocation Committee. This partition will requeue jobs in huce_cascade to make room for jobs submitted to huce_cascade_priority. Jobs on huce_cascade are requeued only if needed, and starting with jobs that are at the lowest fairshare and have run for the least amount of time.
For more information about Slurm partitions on the FAS RC cluster, please refer to the Running Jobs document.
Usage and Reservations
In general the usage of the the HUCE partitions is governed by fairshare. Since the resource is only of a certain size users should be aware of how much their runs will impact their labs usage and hence priority in the queue. The scalc utility can be used to project how much usage a specific job will have. seff can be used to find out how much memory was actually used by a job and thus tune job requests to be of the right size. If you want to learn more about optimizing usage of the cluster and how fairshare works feel free to contact FASRC and their staff will be more than happy to work with you and your lab.
Labs may request to have access to a reservation for a limited amount of time. Reservations set aside compute for the groups immediate use. When you request a reservation please include the following information:
- Which users need access.
- The characteristics of the jobs that will be run (i.e. how many cores, how much memory, how long, how many jobs).
- How long you will need the reservation for.
- Why you need the reservation.
This information will help to decide what resources to give the reservation. Reservations can be set up on a recurring basis to aid in development work or can be used if there is a deadline approaching that needs to be met.
Harvard Climate Modeling Wiki
The climate modeling groups also maintain a wiki that contains information about common software and workflows used in climate modeling. The wiki administrators of this site are Andrew Conahan, Lei Wang, Packard Chan. Please contact them for more information on how to contribute.
HUCE maintains a Google Group for cluster discussion at firstname.lastname@example.org This list is appropriate for talking about cluster usage, code compilation, and other topics related to high performance computing. Thanks to Melissa Sulprizio (Jacob Lab) for working with HUIT to get this setup. Current owners (listed below) can add new users to the list: