How do I figure out how efficient my job is?
You can see your job efficiency by using
seff. For example:
Job ID: 1234567
State: COMPLETED (exit code 0)
Cores per node: 64
CPU Utilized: 37-06:17:33
CPU Efficiency: 23.94% of 155-16:02:08 core-walltime
Job Wall-clock time: 07:17:49
Memory Utilized: 1.53 TB (estimated maximum)
Memory Efficiency: 100.03% of 1.53 TB (195.31 GB/node)
In this job you see that the user used 512 cores and their job ran for 7.5 hours. However their CPUTime is 894 hours which is close to 128*7 hours, which is about 25% of the actual amount of compute they request (i.e. 512*7). If your code is scaling effectively CPUTime (CPU Utilized) = NCPUS * Elapsed (Wall-clock time). If it is not that number will diverge. The best way to test this is to do some scaling tests. There are two styles you can do. Strong scaling is where you leave the problem size the same but increase the number of cores. If your code scales well it should take less time proportional to the number of cores you use. The other is weak scaling where the amount of work per core remains the same but you increase the number of cores, so the size of the job scales proportionally to the number of cores. Thus if your code scales in this case the run time should remain the same.
Typically most codes have a point where the scaling breaks down due to inefficiencies in the code. Thus beyond that point there is not any benefit to increasing the number of cores you throw at the problem. That’s the point you want to look for. This is most easily seen by plotting log of the number of cores vs. log of the runtime.
The other factor that is important in a scheduling environment is that the more cores you ask for the longer your job will pend for as the scheduler has to find more room for you. Thus you need to find the sweet spot where you minimize both your runtime and how long you pend in the queue for. For example it may be the case that if you asked for 32 cores your job would take a day to run but pend for 2 hours, but if you ask for 64 cores your job would take half a day to run but would pend for 2 days. Thus it would have been better to ask for 32 cores even though the job is slower.
We also now have an array capable variant of seff called
seff-array, that makes it easy to do this analysis for array jobs.