Headingmessage

SLURM Partitions, Resources & Scheduling

Partitions

Standard

There are four regular partitions to choose from when running on Falcon. The default partition is the 'reg' partition. All of the general compute nodes are in all of these partitions.

Name PriorityJobFactor MaxTime MaxSubmitPU
tiny 72 6 hrs unlimited
short 36 24 hrs 1000
reg 18 7 days 500
long 9 unlimited 50

It is best to pick the partition that most closely matches your job run time, as it is more likely to run promptly (due to the higher PriorityJobFactor). Use the '-p' command line flag to pick the partition, eg:

sbatch -p tiny myjob.slurm

The general Falcon nodes each have 36 cores and either 128 or 256 GB of RAM. The MaxSubmitPU number is the maximum number of jobs a user can have running and pending at a given time.

High Memory

There is also a "high-memory" node available with 192 CPUs and 3072G RAM. This node is in its own partition

Name PriorityJobFactor MaxTime MaxSubmitPU
high-memory 72 336 hrs

Please use this partition only if your job requires more than 256G RAM

GPU

There are two partitions with GPU nodes available

Name PriorityJobFactor MaxTime MaxSubmitPU Nodes
gpu-volatile 18 128 hrs 4 nodes01-05
gpu-interactive 24 4 hrs 1 node05

You can request 1 or 2 GPUs for your job like this:

sbatch -p gpu-volatile --gres=gpu:1 my_script.slurm

GPU Node Specs

Name GPU Qty GPU Type GPU RAM CPU Cores RAM
node01 8 RTX A6000 48G 128 1024G
node02 7 Quadro RTX 8000 48G 40 512G
node03 4 L40 48G 64 512G
node04 4 RTX A6000 48G 64 512G
node05 3 RTX 4500 Ada 24G 32 512G

These GPU nodes were purchased through the EPSCoR I-CREWS project and by individual researchers, thus jobs in this partition are subject to preemption from users associated with those project (hence the 'volatile' name). While node05 is in the gpu-volatile partition, it will be the last node to pick up and run jobs from that partition. Node05 is intended for interactive session use.

Resources

In order that slurm doesn't crash nodes by oversubcribing their available RAM, each job is allocated 3GB of RAM by default. If your job exceeds this amount, slurm will forcibly stop it. Use the '--mem-per-cpu=' or '--mem=' arguments to request more. (Note: the --mem-per-cpu argument is in MB, whereas the --mem argument will accept suffixes M | G | T). Most Falcon nodes have 128GB RAM (with 124GB available to slurm jobs), but several have been upgraded to 256GB (250GB available). If your job requires more than 124 GB of RAM, you can request more, and your job will run on nodes that have more RAM. There is one node that has 3TB RAM, if your job requires more than 250GB RAM, request that amount with the '--mem' argument and submit the job to the 'high-memory' partition.

sbatch --mem=150G my_job_script.slurm

If you want your job to use more than one cpu thread (core), you should explicitly request more cores so that nodes do not end up overloaded.

sbatch --cpus-per-task=16 --mem-per-cpu=5000 my_job_script.slurm

Fairshare

Jobs in the shorter partitions are given a higher priority scaling factor over jobs in longer partitions. The higher the priority your job is assigned, the more likely it is to run sooner. We have implemented the Slurm Fairshare feature. Basically, how this works is that the more you use Falcon - the lower priority your jobs have when compared to a user that has not been using as many compute resources. The algorithm also keeps track of usage by Account (University), in this way if one University's users have been making more use of the compute resource, users from that University will have a lower priority.

You can view the current FairShare weighting:

ondemand ~ * sshare
Account                    User  RawShares  NormShares    RawUsage  EffectvUsage  FairShare 
-------------------- ---------- ---------- ----------- ----------- ------------- ---------- 
root                                          0.000000   817254013      1.000000            
 root                      root          1    0.250000         577      0.000001   1.000000 
 bsu                                     1    0.250000    74222408      0.090821            
 isu                                     1    0.250000       25186      0.000015            
 ui                                      1    0.250000   743005840      0.909164            

and for all users, add an -a (user names redacted here)

ondemand ~ * sshare -a
Account                    User  RawShares  NormShares    RawUsage  EffectvUsage  FairShare 
-------------------- ---------- ---------- ----------- ----------- ------------- ---------- 
root                                          0.000000   817254013      1.000000            
 root                      root          1    0.250000         577      0.000001   1.000000 
 bsu                                     1    0.250000    74222408      0.090821            
  bsu                    user_a          1    0.125000     5350669      0.072090   0.500000 
  bsu                    user_b          1    0.125000      146304      0.001971   0.545455 
  bsu                    user_c          1    0.125000    68725319      0.925938   0.454545 
  bsu                    user_d          1    0.125000           0      0.000000   0.772727 
  bsu                    user_e          1    0.125000         114      0.000002   0.590909 
  bsu                    user_f          1    0.125000           0      0.000000   0.636364 
  bsu                    user_g          1    0.125000           0      0.000000   0.772727 
  bsu                    user_h          1    0.125000           0      0.000000   0.772727 
 isu                                     1    0.250000       25186      0.000015            
  isu                    user_i          1    0.250000           0      0.000000   0.954545 
  isu                    user_j          1    0.250000           0      0.000000   0.954545 
  isu                    user_k          1    0.250000           0      0.000000   0.954545 
  isu                    user_l          1    0.250000       25186      1.000000   0.818182 
 ui                                      1    0.250000   743005840      0.909164            
  ui                     user_m          1    0.111111   724227447      0.974726   0.045455 
  ui                     user_n          1    0.111111      237005      0.000319   0.181818 
  ui                     user_o          1    0.111111           0      0.000000   0.409091 
  ui                     user_p          1    0.111111      597585      0.000804   0.136364 
  ui                     user_q          1    0.111111           0      0.000000   0.272727 
  ui                     user_r          1    0.111111    17943801      0.024150   0.090909 
  ui                     user_s          1    0.111111           1      0.000000   0.227273 
  ui                     user_t          1    0.111111           0      0.000000   0.409091 
  ui                     user_u          1    0.111111           0      0.000000   0.409091