This is a collection of Slurm commands that are often in use.
create account, add default account for user
sacctmgr create account name=test1
sacctmgr create account name=test2
sacctmgr create user name=bob cluster=tardis account=test1
sacctmgr show assoc format=cluster,account,user | grep bob
sacctmgr show assoc format=Account%15,User,QOS | grep -e QOS -e bob
sacctmgr add user bob DefaultAccount=test2
sacctmgr show user bob
User Def Acct Admin
---------- ---------- ---------
bob test2 None
report CPU and GPU usage for a week
sreport cluster AccountUtilizationByUser User=bob start=2024-04-29 end=2024-05-06 -t hour -T cpu,gres/gpu format=Accounts%21,TRESName,Used
create reservation for user
scontrol create reservation ReservationName=pavlokh starttime=2024-03-17T20:04:00 endtime=2025-03-19T07:00:00 flags=ignore_jobs nodes=c001-c010 user=bob
create daily reservation for account
scontrol create reservation ReservationName=daily starttime=06:00:00 endtime=09:00:00 flags=ignore_jobs,daily nodes=c001 account=test
change job priority to the maximum
scontrol update priority=4294967293 job=19487792
Show deatailed sinfo grouped by resource type
sinfo -o "%10P %5D %34N %5c %7m %37f %23G"
Release job, was helpful to force Slurm to re-evaluate job
scontrol release 19098440
see all collected information about this job with this command:
sacct -j 19361471 --format="ALL"
some fields are long. Example 150 character length %150.
sacct -j 19361471 --format="ALL%150"
selected fields
sacct -j 19108751 --format="JobID,JobName%30,Submit,Start,End,Elapsed"
change job time limit
scontrol update jobid=2569329 TimeLimit=8-00:00:00
show share information
sshare -l --format=Account,GrpTRESMins,TRESRunMins%215 -A account1
cluster usage by user bob for a week
sreport cluster AccountUtilizationByUser User=bob start=2024-05-01 end=2024-04-07 -t hour -T ALL
show QOS priority and limits per user
sacctmgr show qos format=name,priority,MaxTRESPerUser%20
change QOS priority and limits per user
sacctmgr modify qos normal set priority=25
sacctmgr modify qos high set priority=50
sacctmgr modify qos normal set maxtresperuser=cpu=800
sacctmgr modify qos high set maxtresperuser=cpu=800
sacctmgr modify qos normal set maxtresperuser=gres/gpu=80
sacctmgr modify qos high set maxtresperuser=gres/gpu=80
drain and resume node
scontrol update nodename=c003 state=drain reason=reinstall
scontrol update nodename=c003 state=resume
set shares and grptresmins
sacctmgr -i modify account test1 set share=1
sacctmgr -i modify account test1 set grptresmins=cpu=60
show shares and grptresmins
sacctmgr show assoc account=test1 format=cluster,account,user,share,grptresmins
show detailed overview of pending jobs (one per line)
# squeue -p GPUQ -t PENDING --format='%.20i|%.4P|%.5D|%.8c|%.10m|%.11l|%.8u|%.8q|%.17r|%b'
JOBID|PART|NODES|MIN_CPUS|MIN_MEMORY| TIME_LIMIT| USER| QOS| REASON|TRES_PER_NODE
19635781|GPUQ| 2| 4| 4000M|10-00:00:00|username| normal|QOSMaxGRESPerUser|gres/gpu:a100:8
Add, Remove, Show GrpTRES limits
sacctmgr modify user bob set GrpTRES=cpu=150,gres/gpu=5
sacctmgr modify user bob set GrpTRES=cpu=-1,mem=-1,gres/gpu=-1
sacctmgr show assoc format=Account,user,GrpTRES%100 | grep bob
Find all job run by user
sacct -T -S2024-06-19-00:00 -E2024-06-19-23:59 --user bob -X -ojobid,jobname%10,user,start,end,state,node