Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to HPC Workshop October 9 2014. Introduction Rob Lane HPC Support Research Computing Services CUIT.

Similar presentations


Presentation on theme: "Introduction to HPC Workshop October 9 2014. Introduction Rob Lane HPC Support Research Computing Services CUIT."— Presentation transcript:

1 Introduction to HPC Workshop October 9 2014

2 Introduction Rob Lane HPC Support Research Computing Services CUIT

3 Introduction HPC Basics

4 Introduction First HPC Workshop

5 Yeti 2 head nodes 101 execute nodes 200 TB storage

6 Yeti 101 execute nodes –38 x 64 GB –8 x 128 GB –35 x 256 GB –16 x 64 GB + Infiniband –4 x 64 GB + nVidia K20 GPU

7 Yeti CPU –Intel E5-2650L –1.8 GHz –8 Cores –2 per Execute Node

8 Yeti Expansion Round –66 new systems –Faster CPU –More Infiniband –More GPU (nVidia K40) –ETA January 2015

9 Yeti

10 HP S6500 Chassis

11 HP SL230 Server

12 Job Scheduler Manages the cluster Decides when a job will run Decides where a job will run We use Torque/Moab

13 Job Queues Jobs are submitted to a queue Jobs sorted in priority order Not a FIFO

14 Access Mac Instructions 1.Run terminal

15 Access Windows Instructions 1.Search for putty on Columbia home page 2.Select first result 3.Follow link to Putty download page 4.Download putty.exe 5.Run putty.exe

16 Access Mac (Terminal) $ ssh UNI@yetisubmit.cc.columbia.edu Windows (Putty) Host Name: yetisubmit.cc.columbia.edu

17 Work Directory $ cd /vega/free/users/your UNI Replace “your UNI” with your UNI $ cd /vega/free/users/hpc2108

18 Copy Workshop Files Files are in /tmp/workshop $ cp /tmp/workshop/*.

19 Editing No single obvious choice for editor vi – simple but difficult at first emacs – powerful but complex nano – simple but not really standard

20 nano $ nano hellosubmit “^” means “hold down control” ^a : go to beginning of line ^e : go to end of line ^k: delete line ^o: save file ^x: exit

21 hellosubmit #!/bin/sh # Directives #PBS -N HelloWorld #PBS -W group_list=yetifree #PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb #PBS -M UNI@columbia.edu #PBS -m abe #PBS -V # Set output and error directories #PBS -o localhost:/vega/free/users/UNI #PBS -e localhost:/vega/free/users/UNI # Print "Hello World" echo "Hello World" # Sleep for 10 seconds sleep 10 # Print date and time date

22 hellosubmit #!/bin/sh # Directives #PBS -N HelloWorld #PBS -W group_list=yetifree #PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb #PBS -M UNI@columbia.edu #PBS -m abe #PBS -V # Set output and error directories #PBS -o localhost:/vega/free/users/UNI #PBS -e localhost:/vega/free/users/UNI # Print "Hello World" echo "Hello World" # Sleep for 10 seconds sleep 10 # Print date and time date

23 hellosubmit #!/bin/sh # Directives #PBS -N HelloWorld #PBS -W group_list=yetifree #PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb #PBS -M UNI@columbia.edu #PBS -m abe #PBS -V

24 hellosubmit #!/bin/sh # Directives #PBS -N HelloWorld #PBS -W group_list=yetifree #PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb #PBS -M UNI@columbia.edu #PBS -m abe #PBS -V

25 hellosubmit #!/bin/sh # Directives #PBS -N HelloWorld #PBS -W group_list=yetifree #PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb #PBS -M UNI@columbia.edu #PBS -m abe #PBS -V

26 hellosubmit #!/bin/sh # Directives #PBS -N HelloWorld #PBS -W group_list=yetifree #PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb #PBS -M UNI@columbia.edu #PBS -m abe #PBS -V

27 hellosubmit #!/bin/sh # Directives #PBS -N HelloWorld #PBS -W group_list=yetifree #PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb #PBS -M UNI@columbia.edu #PBS -m abe #PBS -V

28 hellosubmit #!/bin/sh # Directives #PBS -N HelloWorld #PBS -W group_list=yetifree #PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb #PBS -M UNI@columbia.edu #PBS -m abe #PBS -V

29 hellosubmit #!/bin/sh # Directives #PBS -N HelloWorld #PBS -W group_list=yetifree #PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb #PBS -M UNI@columbia.edu #PBS -m abe #PBS -V

30 hellosubmit #!/bin/sh # Directives #PBS -N HelloWorld #PBS -W group_list=yetifree #PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb #PBS -M UNI@columbia.edu #PBS -m abe #PBS -V

31 hellosubmit #!/bin/sh # Directives #PBS -N HelloWorld #PBS -W group_list=yetifree #PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb #PBS -M UNI@columbia.edu #PBS -m abe #PBS -V

32 hellosubmit #!/bin/sh # Directives #PBS -N HelloWorld #PBS -W group_list=yetifree #PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb #PBS -M UNI@columbia.edu #PBS -m abe #PBS -V

33 hellosubmit #!/bin/sh # Directives #PBS -N HelloWorld #PBS -W group_list=yetifree #PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb #PBS -M UNI@columbia.edu #PBS -m n #PBS -V

34 hellosubmit #!/bin/sh # Directives #PBS -N HelloWorld #PBS -W group_list=yetifree #PBS -l nodes=1:ppn=1,walltime=00:01:00,mem=20mb #PBS -M UNI@columbia.edu #PBS -m n #PBS -V

35 hellosubmit # Set output and error directories #PBS -o localhost:/vega/free/users/UNI #PBS -e localhost:/vega/free/users/UNI

36 hellosubmit # Set output and error directories #PBS -o localhost:/vega/free/users/UNI #PBS -e localhost:/vega/free/users/UNI

37 hellosubmit # Print "Hello World" echo "Hello World" # Sleep for 10 seconds sleep 10 # Print date and time date

38 hellosubmit $ qsub hellosubmit

39 hellosubmit $ qsub hellosubmit 298151.elk.cc.columbia.edu $

40 hellosubmit $ qsub hellosubmit 298151.elk.cc.columbia.edu $

41 qstat $ qsub hellosubmit 298151.elk.cc.columbia.edu $ qstat 298151 Job ID Name User Time Use S Queue ---------- ------------ ---------- -------- - ----- 298151.elk HelloWorld hpc2108 0 Q batch1

42 hellosubmit $ qsub hellosubmit 298151.elk.cc.columbia.edu $ qstat 298151 Job ID Name User Time Use S Queue ---------- ------------ ---------- -------- - ----- 298151.elk HelloWorld hpc2108 0 Q batch1

43 hellosubmit $ qsub hellosubmit 298151.elk.cc.columbia.edu $ qstat 298151 Job ID Name User Time Use S Queue ---------- ------------ ---------- -------- - ----- 298151.elk HelloWorld hpc2108 0 Q batch1

44 hellosubmit $ qsub hellosubmit 298151.elk.cc.columbia.edu $ qstat 298151 Job ID Name User Time Use S Queue ---------- ------------ ---------- -------- - ----- 298151.elk HelloWorld hpc2108 0 Q batch1

45 hellosubmit $ qsub hellosubmit 298151.elk.cc.columbia.edu $ qstat 298151 Job ID Name User Time Use S Queue ---------- ------------ ---------- -------- - ----- 298151.elk HelloWorld hpc2108 0 Q batch1

46 hellosubmit $ qsub hellosubmit 298151.elk.cc.columbia.edu $ qstat 298151 Job ID Name User Time Use S Queue ---------- ------------ ---------- -------- - ----- 298151.elk HelloWorld hpc2108 0 Q batch1

47 hellosubmit $ qsub hellosubmit 298151.elk.cc.columbia.edu $ qstat 298151 Job ID Name User Time Use S Queue ---------- ------------ ---------- -------- - ----- 298151.elk HelloWorld hpc2108 0 Q batch1 $ qstat 298151 qstat: Unknown Job Id Error 298151.elk.cc.columbia.edu

48 hellosubmit $ ls -l total 4 -rw------- 1 hpc2108 yetifree 398 Oct 8 22:13 hellosubmit -rw------- 1 hpc2108 yetifree 0 Oct 8 22:44 HelloWorld.e298151 -rw------- 1 hpc2108 yetifree 41 Oct 8 22:44 HelloWorld.o298151

49 hellosubmit $ ls -l total 4 -rw------- 1 hpc2108 yetifree 398 Oct 8 22:13 hellosubmit -rw------- 1 hpc2108 yetifree 0 Oct 8 22:44 HelloWorld.e298151 -rw------- 1 hpc2108 yetifree 41 Oct 8 22:44 HelloWorld.o298151

50 hellosubmit $ ls -l total 4 -rw------- 1 hpc2108 yetifree 398 Oct 8 22:13 hellosubmit -rw------- 1 hpc2108 yetifree 0 Oct 8 22:44 HelloWorld.e298151 -rw------- 1 hpc2108 yetifree 41 Oct 8 22:44 HelloWorld.o298151

51 hellosubmit $ ls -l total 4 -rw------- 1 hpc2108 yetifree 398 Oct 8 22:13 hellosubmit -rw------- 1 hpc2108 yetifree 0 Oct 8 22:44 HelloWorld.e298151 -rw------- 1 hpc2108 yetifree 41 Oct 8 22:44 HelloWorld.o298151

52 hellosubmit $ ls -l total 4 -rw------- 1 hpc2108 yetifree 398 Oct 8 22:13 hellosubmit -rw------- 1 hpc2108 yetifree 0 Oct 8 22:44 HelloWorld.e298151 -rw------- 1 hpc2108 yetifree 41 Oct 8 22:44 HelloWorld.o298151

53 hellosubmit $ cat HelloWorld.o298151 Hello World Thu Oct 9 12:44:05 EDT 2014

54 hellosubmit $ cat HelloWorld.o298151 Hello World Thu Oct 9 12:44:05 EDT 2014 Any Questions?

55 Interactive Most jobs run as “batch” Can also run interactive jobs Get a shell on an execute node Useful for development, testing, troubleshooting

56 Interactive $ cat interactive qsub -I -W group_list=yetifree -l walltime=5:00,mem=100mb

57 Interactive $ cat interactive qsub -I -W group_list=yetifree -l walltime=5:00,mem=100mb

58 Interactive $ cat interactive qsub -I -W group_list=yetifree -l walltime=5:00,mem=100mb

59 Interactive $ cat interactive qsub -I -W group_list=yetifree -l walltime=5:00,mem=100mb

60 Interactive $ cat interactive qsub -I -W group_list=yetifree -l walltime=5:00,mem=100mb

61 Interactive $ cat interactive qsub -I -W group_list=yetifree -l walltime=5:00,mem=100mb

62 Interactive $ qsub -I -W group_list=yetifree -l walltime=5:00,mem=100mb qsub: waiting for job 298158.elk.cc.columbia.edu to start

63 Interactive qsub: job 298158.elk.cc.columbia.edu ready.--.,-,-,--(/o o\)-,-,-,.,' // oo \\ ',,' /| __ |\ ',,' //\,__,/\\ ',, /\ /\,, /'`\ /' \, | /' `\ /' '\ | | \ ( ) / | ( /\| /' '\ |/\ ) \| /' /'`\ '\ |/ | /' `\ | ( ( ) ) `\ \ /' /' / / \ \ v v v v v v +--------------------------------+ | | | You are in an interactive job. | | | | Your walltime is 00:05:00 | | | +--------------------------------+

64 Interactive $ hostname charleston.cc.columbia.edu

65 Interactive $ exit logout qsub: job 298158.elk.cc.columbia.edu completed $

66 GUI Can run GUI’s in interactive jobs Need X Server on your local system See user documentation for more information

67 User Documentation hpc.cc.columbia.edu Go to “HPC Support” Click on Yeti user documentation

68 Job Queues Scheduler puts all jobs into a queue Queue selected automatically Queues have different settings

69 QueueTime LimitMemory Limit Max. User Run Batch 112 hours4 GB512 Batch 212 hours16 GB128 Batch 35 days16 GB64 Batch 43 daysNone8 Interactive4 hoursNone4 Job Queues

70 qstat -q $ qstat -q server: elk.cc.columbia.edu Queue Memory CPU Time Walltime Node Run Que Lm State ---------------- ------ -------- -------- ---- --- --- -- ----- batch1 4gb -- 12:00:00 -- 42 15 -- E R batch2 16gb -- 12:00:00 -- 129 73 -- E R batch3 16gb -- 120:00:0 -- 148 261 -- E R batch4 -- -- 72:00:00 -- 11 12 -- E R interactive -- -- 04:00:00 -- 0 1 -- E R interlong -- -- 48:00:00 -- 0 0 -- E R route -- -- -- -- 0 0 -- E R ----- ----- 330 362

71 yetifree Maximum processors limited –Currently 4 maximum Storage quota –16 GB No email support 

72 yetifree $ quota -s Disk quotas for user hpc2108 (uid 242275): Filesystem blocks quota limit grace files quota limit grace hpc-cuit-storage-2.cc.columbia.edu:/free/ 122M 16384M 16384M 8 4295m 4295m

73 yetifree $ quota -s Disk quotas for user hpc2108 (uid 242275): Filesystem blocks quota limit grace files quota limit grace hpc-cuit-storage-2.cc.columbia.edu:/free/ 122M 16384M 16384M 8 4295m 4295m

74 email from: root to: hpc2108@columbia.edu date: Wed, Oct 8, 2014 at 11:41 PM subject: PBS JOB 298161.elk.cc.columbia.edu PBS Job Id: 298161.elk.cc.columbia.edu Job Name: HelloWorld Exec host: dublin.cc.columbia.edu/4 Execution terminated Exit_status=0 resources_used.cput=00:00:02 resources_used.mem=8288kb resources_used.vmem=304780kb resources_used.walltime=00:02:02 Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161 Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161

75 email from: root to: hpc2108@columbia.edu date: Wed, Oct 8, 2014 at 11:41 PM subject: PBS JOB 298161.elk.cc.columbia.edu PBS Job Id: 298161.elk.cc.columbia.edu Job Name: HelloWorld Exec host: dublin.cc.columbia.edu/4 Execution terminated Exit_status=0 resources_used.cput=00:00:02 resources_used.mem=8288kb resources_used.vmem=304780kb resources_used.walltime=00:02:02 Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161 Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161

76 email from: root to: hpc2108@columbia.edu date: Wed, Oct 8, 2014 at 11:41 PM subject: PBS JOB 298161.elk.cc.columbia.edu PBS Job Id: 298161.elk.cc.columbia.edu Job Name: HelloWorld Exec host: dublin.cc.columbia.edu/4 Execution terminated Exit_status=0 resources_used.cput=00:00:02 resources_used.mem=8288kb resources_used.vmem=304780kb resources_used.walltime=00:02:02 Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161 Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161

77 email from: root to: hpc2108@columbia.edu date: Wed, Oct 8, 2014 at 11:41 PM subject: PBS JOB 298161.elk.cc.columbia.edu PBS Job Id: 298161.elk.cc.columbia.edu Job Name: HelloWorld Exec host: dublin.cc.columbia.edu/4 Execution terminated Exit_status=0 resources_used.cput=00:00:02 resources_used.mem=8288kb resources_used.vmem=304780kb resources_used.walltime=00:02:02 Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161 Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161

78 email from: root to: hpc2108@columbia.edu date: Wed, Oct 8, 2014 at 11:41 PM subject: PBS JOB 298161.elk.cc.columbia.edu PBS Job Id: 298161.elk.cc.columbia.edu Job Name: HelloWorld Exec host: dublin.cc.columbia.edu/4 Execution terminated Exit_status=0 resources_used.cput=00:00:02 resources_used.mem=8288kb resources_used.vmem=304780kb resources_used.walltime=00:02:02 Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161 Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161

79 email from: root to: hpc2108@columbia.edu date: Wed, Oct 8, 2014 at 11:41 PM subject: PBS JOB 298161.elk.cc.columbia.edu PBS Job Id: 298161.elk.cc.columbia.edu Job Name: HelloWorld Exec host: dublin.cc.columbia.edu/4 Execution terminated Exit_status=0 resources_used.cput=00:00:02 resources_used.mem=8288kb resources_used.vmem=304780kb resources_used.walltime=00:02:02 Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161 Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161

80 email from: root to: hpc2108@columbia.edu date: Wed, Oct 8, 2014 at 11:41 PM subject: PBS JOB 298161.elk.cc.columbia.edu PBS Job Id: 298161.elk.cc.columbia.edu Job Name: HelloWorld Exec host: dublin.cc.columbia.edu/4 Execution terminated Exit_status=0 resources_used.cput=00:00:02 resources_used.mem=8288kb resources_used.vmem=304780kb resources_used.walltime=00:02:02 Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161 Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161

81 email from: root to: hpc2108@columbia.edu date: Wed, Oct 8, 2014 at 11:41 PM subject: PBS JOB 298161.elk.cc.columbia.edu PBS Job Id: 298161.elk.cc.columbia.edu Job Name: HelloWorld Exec host: dublin.cc.columbia.edu/4 Execution terminated Exit_status=0 resources_used.cput=00:00:02 resources_used.mem=8288kb resources_used.vmem=304780kb resources_used.walltime=00:02:02 Error_Path: localhost:/vega/free/users/hpc2108/HelloWorld.e298161 Output_Path: localhost:/vega/free/users/hpc2108/HelloWorld.o298161

82 Intern Research Computing Services (RCS) is looking for an intern Paid position ~10 hours a week Will be on LionShare next week

83 MPI Message Passing Interface Allows applications to run across multiple computers

84 MPI Edit MPI submit file Load MPI environment module Compile sample program

85 MPI #!/bin/sh # Directives #PBS -N MpiHello #PBS -W group_list=yetifree #PBS -l nodes=3:ppn=1,walltime=00:01:00,mem=20mb #PBS -M UNI@columbia.edu #PBS -m abe #PBS -V # Set output and error directories #PBS -o localhost:/vega/free/users/UNI #PBS -e localhost:/vega/free/users/UNI # Load mpi module. module load openmpi # Run mpi program. mpirun mpihello

86 MPI $ module load openmpi $ which mpicc /usr/local/openmpi/bin/mpicc $ mpicc -o mpihello mpihello.c

87 MPI $ qsub mpisubmit 298501.elk.cc.columbia.edu

88 Questions? Any questions?


Download ppt "Introduction to HPC Workshop October 9 2014. Introduction Rob Lane HPC Support Research Computing Services CUIT."

Similar presentations


Ads by Google