Presentation is loading. Please wait.

Presentation is loading. Please wait.

Joker: Getting the most out of the slurm scheduler

Similar presentations


Presentation on theme: "Joker: Getting the most out of the slurm scheduler"— Presentation transcript:

1 Joker: Getting the most out of the slurm scheduler
By HPC/CIA team

2 Organization of supercomputers
User Network Head Node Compute Nodes

3 Slurm scheduler – the quirks
Powerful, but stupid

4 Slurm scheduler – the quirks
Powerful, but stupid Language discrepancies CPU vs core vs thread

5 Slurm scheduler – the quirks
Powerful, but stupid Language discrepancies CPU vs core vs thread Head node Compute nodes

6 Slurm scheduler – the quirks
Powerful, but stupid Language discrepancies CPU vs core vs thread Head node Compute nodes 1 node (old)

7 Slurm scheduler – the quirks
Powerful, but stupid Language discrepancies CPU vs core vs thread Head node Compute nodes 1 CPU + 1 CPU 2 CPUs

8 Slurm scheduler – the quirks
Powerful, but stupid Language discrepancies CPU vs core vs thread Head node Compute nodes 8 cores + 8 cores 16 cores

9 Slurm scheduler – the quirks
Powerful, but stupid Language discrepancies CPU vs core vs thread Head node Compute nodes 16 threads + 16 threads 32 threads

10 Slurm – basic properties
10 job limit Quality of Service (QoS) Fairness Queues/Partitions with wall times normal – 7d 1h (32 and 48 thread nodes, 64 and 128GB) debug – 30 min (2 nodes, 64GB) gpu – 7d 1h (1 node, 64GB)

11 Basic slurm script #!/bin/sh #SBATCH --job-name myJobName ## name that will show up in the queue #SBATCH --output myJobName.o%j ## filename of the output; default is slurm-[jobID].out #SBATCH --partition normal ## the partition to run in; default = normal #SBATCH --nodes 1 ## number of nodes to use; default = 1 #SBATCH --ntasks 3 ## number of tasks (analyses) to run; default = 1 #SBATCH --cpus-per-task 16 ## the num of threads the code will use; default = 1 #SBATCH --time 0-00:05:00 ## time for analysis (day-hour:min:sec) #SBATCH --mail-user ## your address #SBATCH --mail-type BEGIN ## slurm will you when your job starts #SBATCH --mail-type END ## slurm will you when your job ends #SBATCH --mail-type FAIL ## slurm will you when your job fails #SBATCH --get-user-env ##passes along environmental settings

12 srun as an interactive command
Running on the head node Check squeue Running on the compute nodes Module load R/322 R --vanilla Module load R/322 srun R Where are you?

13 srun as an interactive command
Running on the head node Check squeue Running on the compute nodes Module load R/322 R --vanilla Module load R/322 srun R --vanilla

14 srun as an interactive command
Running on the head node Check squeue Running on the compute nodes Module load R/322 R Module load R/322 srun R Where are you?

15 Another srun example -- make the file test.cmd
#!/bin/bash date echo "I'm sleeping on..." hostname sleep 40 echo "Done sleeping at"

16 Another srun example -- make the file test.cmd
#!/bin/bash date echo "I'm sleeping on..." hostname sleep 40 echo "Done sleeping at" Don’t forget to make it executable!

17 Another srun example -- run the file test.cmd
Did you forget to make it executable?

18 Another srun example -- run the file test.cmd
Sun Apr 23 20:42:09 MDT 2017 I'm sleeping on... joker.nmsu.edu Done sleeping at Sun Apr 23 20:42:49 MDT 2017 What happened? What does it mean?

19 Using slurm to queue jobs
sbatch – submit a job squeue – show the queue scontrol – “scontrol show jobid #####” show you what resources you requested sinfo – shows what nodes are idle, mixed, or allocated/full (alloc) in each of the queues/partitions

20 sbatch to queue jobs Why?

21 sbatch to queue jobs What queue? What time? What notifications?
What name? What code? What set-up?

22 sbatch example – make the file test.sh
#!/bin/sh #SBATCH --job-name ex_slurm #SBATCH --output ex_slurm.o%j #SBATCH --partition normal #SBATCH --nodes 1 #SBATCH --ntasks 1 #SBATCH --cpus-per-task 1 #SBATCH --time 2:00 ./test.cmd

23 sbatch example – run the file test.sh
sbatch test.sh

24 Watching the system squeue http://magneto.nmsu.edu/ganglia sinfo
Where am I in the queue? How long have I been running? Visual picture of the load on the nodes sinfo How many nodes are at maximum? What are my chances of getting on soon? scontrol What exactly did I ask for?

25 sbatch example – run the file test.sh
sbatch test.sh Sun Apr 23 20:47:36 MDT 2017 I'm sleeping on... joker-c3 Done sleeping at Sun Apr 23 20:48:16 MDT 2017 What happened? What does it mean?

26 sbatch example – 2 nodes #!/bin/sh #SBATCH --job-name ex_slurm #SBATCH --output ex_slurm.o%j #SBATCH --partition normal #SBATCH --nodes 2 #SBATCH --ntasks 2 #SBATCH --cpus-per-task 1 #SBATCH --time 2:00 ./test.cmd

27 sbatch example – 2 nodes sbatch test.sh Sun Apr 23 21:10:35 MDT 2017
I'm sleeping on... joker-c3 Done sleeping at Sun Apr 23 21:11:15 MDT 2017 What happened? What does it mean?

28 Using srun inside of sbatch
srun allows you to run things interactively if typed into the command line Inside an sbatch script, it starts a process/instance on each resource requested

29 sbatch+srun example – 2 nodes (try 3)
#!/bin/sh #SBATCH --job-name ex_slurm #SBATCH --output ex_slurm.o%j #SBATCH --partition normal #SBATCH --nodes 2 #SBATCH --ntasks 2 #SBATCH --cpus-per-task 1 #SBATCH --time 2:00 srun ./test.cmd

30 sbatch+srun example – 2 nodes (try 3)
sbatch test.sh Sun Apr 23 21:14:48 MDT 2017 I'm sleeping on... joker-c3 joker-c4 Done sleeping at Sun Apr 23 21:15:28 MDT 2017 What happened? What does it mean?

31 Other ways to use sbatch
If you have several steps that need to be done in sequence #!/bin/sh #SBATCH --job-name ex_slurm #SBATCH --output ex_slurm.o%j #SBATCH --partition normal #SBATCH --nodes 2 #SBATCH --ntasks 2 #SBATCH --cpus-per-task 1 #SBATCH --time 2:00 Step1 Step2

32 Other ways to use srun If you have several jobs that are independent

33 Other ways to use srun If you have several jobs that are independent
Make a file listing what needs to be done/started simultaneously/independently myfile.txt 0 program1 variables input 1 program2 variables input

34 Other ways to use srun Make sure the number of tasks matches the number of jobs to do #!/bin/sh #SBATCH --job-name ex_slurm #SBATCH --output ex_slurm.o%j #SBATCH --partition normal #SBATCH --nodes 2 #SBATCH --ntasks 2 #SBATCH --cpus-per-task 1 #SBATCH --time 2:00 srun –multi-prog myfile.txt

35 Other ways to use srun Make sure the resources per job are correct
#!/bin/sh #SBATCH --job-name ex_slurm #SBATCH --output ex_slurm.o%j #SBATCH --partition normal #SBATCH --nodes 2 #SBATCH --ntasks 2 #SBATCH --cpus-per-task 1 #SBATCH --time 2:00 srun --multi-prog myfile.txt

36 Other ways to use srun If you have several jobs that are independent
Make a file listing what needs to be done/started simultaneously/independently All of the programs must use the same number of resources This means if one program needs 4 threads and the other 8, they need to be submitted as 2 jobs, not 1!

37 Other ways to use srun If you have several jobs that are independent
Make a file listing what needs to be done/started simultaneously/independently All of the programs must use the same number of resources This means if one program needs 4 threads and the other 8, they need to be submitted as 2 jobs, not 1! Great for small resource-using programs!

38 Thanks! Thanks to Matt Henderson and Robert Kelly Thanks to HPC-team
Po-Chou, Mohammed, Tracey, Strahinja, Dusan, Jelena Please visit us at cia.nmsu.edu or hpc.nmsu.edu Please contact us at Remember, we love helping you get the most out of the system!

39

40 sbatch example – 2 nodes #!/bin/sh #SBATCH --job-name ex_slurm #SBATCH --output ex_slurm.o%j #SBATCH --partition normal #SBATCH --nodes 2 #SBATCH --ntasks 1 #SBATCH --cpus-per-task 1 #SBATCH --time 2:00 ./test.cmd

41 sbatch example – 2 nodes sbatch test.sh
sbatch: Warning: can't run 1 processes on 2 nodes, setting nnodes to 1 Submitted batch job srun: Warning: can't run 1 processes on 2 nodes, setting nnodes to 1 Sun Apr 23 21:03:33 MDT 2017 I'm sleeping on... joker-c3 Done sleeping at Sun Apr 23 21:04:13 MDT 2017 What happened? What does it mean?


Download ppt "Joker: Getting the most out of the slurm scheduler"

Similar presentations


Ads by Google