There are four regular partitions to choose from when running on Falcon. The default partition is the 'reg' partition. All of the general compute nodes are in all of these partitions.
Name | PriorityJobFactor | MaxTime | MaxSubmitPU |
---|---|---|---|
tiny | 72 | 6 hrs | unlimited |
short | 36 | 24 hrs | 1000 |
reg | 18 | 7 days | 500 |
long | 9 | unlimited | 50 |
It is best to pick the partition that most closely matches your job run time, as it is more likely to run promptly (due to the higher PriorityJobFactor). Use the '-p' command line flag to pick the partition, eg:
sbatch -p tiny myjob.slurm
There is now also a partition with GPU nodes available
Name | PriorityJobFactor | MaxTime | MaxSubmitPU |
---|---|---|---|
gpu-volatile | 18 | 128 hrs | 4 |
Each GPU node in this partition has 2 NVIDIA L40 GPUs, 512 GB RAM, and 64 Cores. You can request 1 or 2 GPUs for your job like this:
sbatch -p gpu-volatile --gres=gpu:1 my_script.slurm
These GPU nodes were purchased through the EPSCoR I-CREWS project and jobs in this partition are subject to preemption from users associated with that project (hence the 'volatile' name).
In order that slurm doesn't crash nodes by oversubcribing their available RAM, each job is allocated 3GB of RAM by default. If your job exceeds this amount, slurm will forcibly stop it. Use the '--mem-per-cpu=' or '--mem=' arguments to request more. (Note: the --mem-per-cpu argument is in MB, whereas the --mem argument will accept suffixes M | G | T). Most Falcon nodes have 128GB RAM (with 124GB available to slurm jobs), but several have been upgraded to 256GB (250GB available). If your job requires more than 124 GB of RAM, you can request more, and your job will run on nodes that have more RAM. There is one node that has 3TB RAM, if your job requires more than 250GB RAM, request that amount with the '--mem' argument and it will run on the himem node. The himem node is not available in the 'long' partition, if your job will take more than a week to run (and needs more than 256GB RAM), reach out to the Falcon system administrators.
sbatch --mem=150G my_job_script.slurm
If you want your job to use more than one cpu thread (core), you should explicitly request more cores so that nodes do not end up overloaded.
sbatch --cpus-per-task=16 --mem-per-cpu=5000 my_job_script.slurm
Jobs in the shorter partitions are given a higher priority scaling factor over jobs in longer partitions. The higher the priority your job is assigned, the more likely it is to run sooner. We have implemented the Slurm Fairshare feature. Basically, how this works is that the more you use Falcon - the lower priority your jobs have when compared to a user that has not been using as many compute resources. The algorithm also keeps track of usage by Account (University), in this way if one University's users have been making more use of the compute resource, users from that University will have a lower priority.
You can view the current FairShare weighting:
ondemand ~ * sshare
Account User RawShares NormShares RawUsage EffectvUsage FairShare
-------------------- ---------- ---------- ----------- ----------- ------------- ----------
root 0.000000 817254013 1.000000
root root 1 0.250000 577 0.000001 1.000000
bsu 1 0.250000 74222408 0.090821
isu 1 0.250000 25186 0.000015
ui 1 0.250000 743005840 0.909164
and for all users, add an -a (user names redacted here)
ondemand ~ * sshare -a
Account User RawShares NormShares RawUsage EffectvUsage FairShare
-------------------- ---------- ---------- ----------- ----------- ------------- ----------
root 0.000000 817254013 1.000000
root root 1 0.250000 577 0.000001 1.000000
bsu 1 0.250000 74222408 0.090821
bsu user_a 1 0.125000 5350669 0.072090 0.500000
bsu user_b 1 0.125000 146304 0.001971 0.545455
bsu user_c 1 0.125000 68725319 0.925938 0.454545
bsu user_d 1 0.125000 0 0.000000 0.772727
bsu user_e 1 0.125000 114 0.000002 0.590909
bsu user_f 1 0.125000 0 0.000000 0.636364
bsu user_g 1 0.125000 0 0.000000 0.772727
bsu user_h 1 0.125000 0 0.000000 0.772727
isu 1 0.250000 25186 0.000015
isu user_i 1 0.250000 0 0.000000 0.954545
isu user_j 1 0.250000 0 0.000000 0.954545
isu user_k 1 0.250000 0 0.000000 0.954545
isu user_l 1 0.250000 25186 1.000000 0.818182
ui 1 0.250000 743005840 0.909164
ui user_m 1 0.111111 724227447 0.974726 0.045455
ui user_n 1 0.111111 237005 0.000319 0.181818
ui user_o 1 0.111111 0 0.000000 0.409091
ui user_p 1 0.111111 597585 0.000804 0.136364
ui user_q 1 0.111111 0 0.000000 0.272727
ui user_r 1 0.111111 17943801 0.024150 0.090909
ui user_s 1 0.111111 1 0.000000 0.227273
ui user_t 1 0.111111 0 0.000000 0.409091
ui user_u 1 0.111111 0 0.000000 0.409091