Slurm memory limits
Slurm imposes a memory limit on each job. By default, it is deliberately relatively small — 100 MB per node. If your job uses more than that, you’ll get an error that your job Exceeded job memory limit. To set a larger limit, add to your job submission:
#SBATCH --mem X
X is the maximum amount of memory your job will use per node, in MB.
The larger your working data set, the larger this needs to be, but the smaller the number the easier it is for the scheduler to find a place to run your job. To determine an appropriate value, start relatively large (job slots on average have about 4000 MB per core, but that’s much larger than needed for most jobs) and then use seff to look at how much your job is actually using or used:
JOBID is the one you’re interested in. You should set the memory you request to something a little larger than what seff reports, since you’re defining a hard upper limit. Note that for parallel jobs spanning multiple nodes, this is the maximum memory used on any one node; if you’re not setting an even distribution of tasks per node (e.g. with
--ntasks-per-node), the same job could have very different values when run at different times. Also note that the number recorded by slurm for memory usage will be inaccurate if the job terminated due to being out of memory. To get an accurate measurement you must have a job that completes successfully as then slurm will record the true memory peak.