Search Docs by Keyword
Common Cluster Pitfalls
These are some of the common problems that people have when using the cluster. We hope that these will not be a problem for you as well.
Asking for multiple cores but forgetting to specify one node. -n 4 -N 1 is very different from -n 4
Problem | Symptom/Reason |
---|---|
Throwing multiple cores at Python and R code | Without special programming, code written for Python and R is single-threaded. That means give more cores in SLURM to your code will do nothing except waste resources. If you which to use multiple cores, you must explicitly write your code to use them, using modules such as 'multiprocessing' or packages like 'Rparallel' or 'RMPI'. |
Jobs PENDing for >48 hrs | Asking for very large resource requests (cores & memory): adjust lower and try again. Or very low Fairshare score |
Quick run and FAIL...Not including -t parameter | no -t means shortest possible in all partitions == 10 min |
Not specifying enough cores | prog1 | prog2 | prog3 > outfile should run with 3 cores! |
Causing massive disk I/O on home folders/lab disk shares | Your work & others on the same filesystem slows to a crawl: simple commands like ls take forever |
Hundreds/thousands of jobs access one common file | Your work & others on the same filesystem slows to a crawl. Make copies of the file and have jobs access one of the group |
Don’t pack more than 5K files in one directory | I/O for your jobs will slow to a crawl |
Bundle your work into ~10 min jobs | Kinder for us, kinder for you, kinder for the cluster |
Please understand your software -- look at the options | Who knows what could happen?? You wouldn't use an instrument without reading the instructions, would you? |
Trying to sudo when installing software | Please don’t -- we admin the boxes for you. |