Cluster Customs and Responsibilities

Please also see the FASRC Acceptable Use guidelines.

The FASRC cluster is a large, shared resource performing massive computations on terabytes of data. These compute jobs are isolated as much as possible by the SLURM system. However, there are a number of things to keep in mind while using this shared resource so that the system can work as well as possible for everyone.

The cluster login nodes (holylogin## and boslogin##) must be kept free of significant computation. Running even modestly memory or CPU intensive programs on the login nodes, if widespread, would eventually lead to users being shut out of the cluster. Jobs run directly on the login nodes will be killed without warning if CPU or memory exceeds resource limits. The Rocky 8 login nodes use a system-level mechanism called cgroups which limits each logged-in account to 1 core and 8GB of memory. Please use an interactive session if you need to run more involved processes that require more memory or CPU. Should you run a process on a login node that runs afoul of cgroup policies, the process will be killed. Login nodes are meant as gateways to launching and monitoring jobs and to run light processes to prepare your jobs or environment.

Be as accurate as possible when specifying memory for jobs

The specification of memory for job submission (--mem or --mem-per-cpu) is important to the basic functioning of SLURM. If you specify too little, your job will likely crash. If you specify too much, however, you may 1) end up stuck in the PENDING state while SLURM looks for a node with enough available memory, and 2) you will prevent other users from running their jobs by consuming large portions of a node. Please see our Note on Requesting Memory for more info.

Keep job counts in a reasonable range

The FASRC cluster processes more than a million jobs every month and submissions of hundreds to thousands at once are not uncommon. However, job submissions in the 10s of thousands at once can become problematic for the SLURM controller, even when the jobs are requesting modest resources. As such, there is a 10,000 job limit per user (scheduled and/or running) — attempting to schedule more than 10,000 jobs will result in an error (“Job violates accounting/QOS policy (job submit limit, user’s size and/or time limits)”).
When the controller is overloaded, all partitions are affected and all researchers using the cluster will see delays. Try to design your submissions so that you can take advantage of parallel processing, but not overload the system. As a reasonable target, we recommend submitting no more than 1000 jobs.
A common source of excessive jobs is code loops that generate and submit new jobs. Please consider using job arrays or bundling jobs into smaller submissions. Please see our Submitting Large Numbers of Jobs doc for more info on this and job arrays.

Ensure your jobs request at least 10 minutes

Much work is needed by SLURM to schedule, set up, monitor, break down, and archive the logistics of running jobs on our large cluster. A 10 minute minimum runtime request allows SLURM to handle your jobs efficiently while handling those of the 800+ simultaneous users as well. Jobs requesting less than 10 minutes will perform poorly and could even end before your job’s work actually starts.

Don’t bother the scheduler too frequently

If you are using code to submit jobs, please pause at least 0.5 to 1 seconds between sbatch commands. If you are using code to submit and/or monitor job progress (what we call a meta-scheduler), please use sacct for job state monitoring. The squeue, showq, and scontrol commands are blocking commands (e.g. causes the scheduler to stop and answer the query) and thus can drag down scheduler performance. sacct talks to the slurm database and is nonblocking. Regardless of method, do not poll more frequently than once a minute.

Use the appropriate partition for your work

Nodes in partitions are grouped according to their technical characteristics and likely job profiles. Submitting jobs to a partition for work for which it was not designed can cause slowdowns on your jobs as well as the other jobs running on the same node (e.g. submitting non-GPU work to the GPU partition). On the other hand, using a non-appropriate partition in order to skip ahead of the queue violates our appropriate use policy: jobs may be killed without warning; and repeat offenders may have their priority lowered to zero, effectively suspending your ability to run future jobs.

Use `serial_requeue` when possible

The serial_requeue partition is the most efficient job allocator on the cluster. In particular, serial_requeue is able to use idle resources in partitions that are owned by individual labs and so it has a much larger pool than the shared partition. However, because these resources are owned by individual labs, users from those labs have a higher priority and may cause your job to be stopped and restarted elsewhere (hence the “requeue”).
Some jobs do not handle requeue very well and should be run on the shared partition instead. For example, a tool that appends to an existing file (rather than creating it new with each start) might generate incorrect output when requeued. See our Partitions write-up for more information.
Tips for efficient serial_requeue usage?

Zero out your output files at the start of a job if you’re appending output
For longer jobs, try checkpointing your code to let a requeued job pick up where it left off. If your code does not support checkpointing, leave *.finished files to mark where you’ve completed (file breadcrumbs) and use branchpoints to skip over completed parts

Do not over constrain your jobs

The scheduler is most efficient when jobs are given the least number of constraints and limits. Less is more. Only set this limits necessary to the efficient execution of your job. Avoid settings like --constraint, --dependency, and --contiguous as they will make it harder for the scheduler to find you space. If you need to include these settings for your job make sure that they actually improve execution speed and will not cause your job to pend for so long that it wipes out any run speed improvement (i.e. look to reduce to Time to Science). One should also avoid specifying number of cores/gpus per node and node count unless necessary for your job. Finally limit the number of partitions you submit to for multipartition work to only those partitions your code can actually run in (e.g. if your code needs 100 cores per node, do not submit to partitions that only have 48 cores per node).

Heavy I/O should be done on `/scratch` or `$SCRATCH` if possible

Many of the cluster’s file systems are networked storage. This is what allows them to be available to all the nodes in the cluster. However, this also means that tools that read and write files rapidly, especially if they are being run in thousands of parallel jobs, can overload the network hardware and protocols. If possible, these sorts of tasks should utilize /scratch located on each node or $SCRATCH (aka /n/netscratch) our networked scratch storage. The former is locally attached disks that will not interfere with I/O on other nodes and, additionally, have the best performance for intensive reads and writes. The latter is great for large numbers of parallel computations. Please see our Storage Policies and Description doc for more info.

Keep file counts for individual directories to a reasonable number

On any filesystem, including scratch filesystems, keep the number of files in a single directory under 1000. Numbers of files larger than 1000 can cause latency issues, and numbers much larger (an order of magnitude or more) may cause system-wide issues.

Production jobs are prohibited from the test partitions

The test partitions (test and gpu_test) are intended for interactive work and testing applications. Users are not to use the test cluster for production work. Any jobs judged to be production will be terminated. Repeated abuse may result in disabling of the user account.

Poorly behaved jobs will be terminated

Because of the shared nature of the FASRC cluster, problem jobs can inhibit access and processing for other users. Therefore, if a job is running improperly (e.g. excessive I/O; unreserved, excessive CPU usage) the job may be terminated. Repeated poor behaviors may result in disabling of the user account.

Don’t mine digital currency or otherwise misuse Harvard resources

The cluster should used for Harvard-based research and should not be used for other non-research tasks or for work at other universities that are not jointly held with Harvard. If you have external collaborators, they should only use the cluster for work related to your research and projects.

Bookmarkable Links

1 Don’t run anything on the login nodes
2 Be as accurate as possible when specifying memory for jobs
3 Keep job counts in a reasonable range
4 Ensure your jobs request at least 10 minutes
5 Don’t bother the scheduler too frequently
6 Use the appropriate partition for your work
7 Use serial_requeue when possible
8 Do not over constrain your jobs
9 Heavy I/O should be done on /scratch or $SCRATCH if possible
10 Keep file counts for individual directories to a reasonable number
11 Production jobs are prohibited from the test partitions
12 Poorly behaved jobs will be terminated
13 Don’t mine digital currency or otherwise misuse Harvard resources

Last UpdatedJune 2, 2026

Search Docs by Keyword