There are four regular partitions to choose from when running on Falcon. The default partition is the 'reg' partition. All of the general compute nodes are in all of these partitions.
Name | PriorityJobFactor | MaxTime | MaxSubmitPU |
---|---|---|---|
tiny | 72 | 6 hrs | unlimited |
short | 36 | 24 hrs | 1000 |
reg | 18 | 7 days | 500 |
long | 9 | unlimited | 50 |
It is best to pick the partition that most closely matches your job run time, as it is more likely to run promptly (due to the higher PriorityJobFactor). Use the '-p' command line flag to pick the partition, eg:
sbatch -p tiny myjob.slurm
The general Falcon nodes each have 36 cores and either 128 or 256 GB of RAM. The MaxSubmitPU number is the maximum number of jobs a user can have running and pending at a given time.
There is also a "high-memory" node available with 192 CPUs and 3072G RAM. This node is in its own partition
Name | PriorityJobFactor | MaxTime | MaxSubmitPU |
---|---|---|---|
high-memory | 72 | 336 hrs |
Please use this partition only if your job requires more than 256G RAM
There are two partitions with GPU nodes available
Name | PriorityJobFactor | MaxTime | MaxSubmitPU | Nodes |
---|---|---|---|---|
gpu-volatile | 18 | 128 hrs | 4 | nodes01-05 |
gpu-interactive | 24 | 4 hrs | 1 | node05 |
You can request 1 or 2 GPUs for your job like this:
sbatch -p gpu-volatile --gres=gpu:1 my_script.slurm
GPU Node Specs
Name | GPU Qty | GPU Type | GPU RAM | CPU Cores | RAM |
---|---|---|---|---|---|
node01 | 8 | RTX A6000 | 48G | 128 | 1024G |
node02 | 7 | Quadro RTX 8000 | 48G | 40 | 512G |
node03 | 4 | L40 | 48G | 64 | 512G |
node04 | 4 | RTX A6000 | 48G | 64 | 512G |
node05 | 3 | RTX 4500 Ada | 24G | 32 | 512G |
These GPU nodes were purchased through the EPSCoR I-CREWS project and by individual researchers, thus jobs in this partition are subject to preemption from users associated with those project (hence the 'volatile' name). While node05 is in the gpu-volatile partition, it will be the last node to pick up and run jobs from that partition. Node05 is intended for interactive session use.
In order that slurm doesn't crash nodes by oversubcribing their available RAM, each job is allocated 3GB of RAM by default. If your job exceeds this amount, slurm will forcibly stop it. Use the '--mem-per-cpu=' or '--mem=' arguments to request more. (Note: the --mem-per-cpu argument is in MB, whereas the --mem argument will accept suffixes M | G | T). Most Falcon nodes have 128GB RAM (with 124GB available to slurm jobs), but several have been upgraded to 256GB (250GB available). If your job requires more than 124 GB of RAM, you can request more, and your job will run on nodes that have more RAM. There is one node that has 3TB RAM, if your job requires more than 250GB RAM, request that amount with the '--mem' argument and submit the job to the 'high-memory' partition.
sbatch --mem=150G my_job_script.slurm
If you want your job to use more than one cpu thread (core), you should explicitly request more cores so that nodes do not end up overloaded.
sbatch --cpus-per-task=16 --mem-per-cpu=5000 my_job_script.slurm
Jobs in the shorter partitions are given a higher priority scaling factor over jobs in longer partitions. The higher the priority your job is assigned, the more likely it is to run sooner. We have implemented the Slurm Fairshare feature. Basically, how this works is that the more you use Falcon - the lower priority your jobs have when compared to a user that has not been using as many compute resources. The algorithm also keeps track of usage by Account (University), in this way if one University's users have been making more use of the compute resource, users from that University will have a lower priority.
You can view the current FairShare weighting:
ondemand ~ * sshare
Account User RawShares NormShares RawUsage EffectvUsage FairShare
-------------------- ---------- ---------- ----------- ----------- ------------- ----------
root 0.000000 817254013 1.000000
root root 1 0.250000 577 0.000001 1.000000
bsu 1 0.250000 74222408 0.090821
isu 1 0.250000 25186 0.000015
ui 1 0.250000 743005840 0.909164
and for all users, add an -a (user names redacted here)
ondemand ~ * sshare -a
Account User RawShares NormShares RawUsage EffectvUsage FairShare
-------------------- ---------- ---------- ----------- ----------- ------------- ----------
root 0.000000 817254013 1.000000
root root 1 0.250000 577 0.000001 1.000000
bsu 1 0.250000 74222408 0.090821
bsu user_a 1 0.125000 5350669 0.072090 0.500000
bsu user_b 1 0.125000 146304 0.001971 0.545455
bsu user_c 1 0.125000 68725319 0.925938 0.454545
bsu user_d 1 0.125000 0 0.000000 0.772727
bsu user_e 1 0.125000 114 0.000002 0.590909
bsu user_f 1 0.125000 0 0.000000 0.636364
bsu user_g 1 0.125000 0 0.000000 0.772727
bsu user_h 1 0.125000 0 0.000000 0.772727
isu 1 0.250000 25186 0.000015
isu user_i 1 0.250000 0 0.000000 0.954545
isu user_j 1 0.250000 0 0.000000 0.954545
isu user_k 1 0.250000 0 0.000000 0.954545
isu user_l 1 0.250000 25186 1.000000 0.818182
ui 1 0.250000 743005840 0.909164
ui user_m 1 0.111111 724227447 0.974726 0.045455
ui user_n 1 0.111111 237005 0.000319 0.181818
ui user_o 1 0.111111 0 0.000000 0.409091
ui user_p 1 0.111111 597585 0.000804 0.136364
ui user_q 1 0.111111 0 0.000000 0.272727
ui user_r 1 0.111111 17943801 0.024150 0.090909
ui user_s 1 0.111111 1 0.000000 0.227273
ui user_t 1 0.111111 0 0.000000 0.409091
ui user_u 1 0.111111 0 0.000000 0.409091