Search Docs by Keyword

Table of Contents

Common Cluster Pitfalls

These are some of the common problems that people have when using the cluster. We hope that these will not be a problem for you as well.

Asking for multiple cores but forgetting to specify one node. -n 4 -N 1 is very different from -n 4

ProblemSymptom/Reason
Throwing multiple cores at Python and R codeWithout special programming, code written for Python and R is single-threaded. That means give more cores in SLURM to your code will do nothing except waste resources. If you which to use multiple cores, you must explicitly write your code to use them, using modules such as 'multiprocessing' or packages like 'Rparallel' or 'RMPI'.
Jobs PENDing for >48 hrsAsking for very large resource requests (cores & memory): adjust lower and try again. Or very low Fairshare score
Quick run and FAIL...Not including -t parameterno -t means shortest possible in all partitions == 10 min
Not specifying enough coresprog1 | prog2 | prog3 > outfile should run with 3 cores!
Causing massive disk I/O on home folders/lab disk sharesYour work & others on the same filesystem slows to a crawl: simple commands like ls take forever
Hundreds/thousands of jobs access one common fileYour work & others on the same filesystem slows to a crawl. Make copies of the file and have jobs access one of the group
Don’t pack more than 5K files in one directoryI/O for your jobs will slow to a crawl
Bundle your work into ~10 min jobsKinder for us, kinder for you, kinder for the cluster
Please understand your software -- look at the optionsWho knows what could happen?? You wouldn't use an instrument without reading the instructions, would you?
Trying to sudo when installing softwarePlease don’t -- we admin the boxes for you.
© The President and Fellows of Harvard College
Except where otherwise noted, this content is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.