Presentation is loading. Please wait.

Presentation is loading. Please wait.

NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable.

Similar presentations


Presentation on theme: "NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable."— Presentation transcript:

1 NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable Energy, LLC. Peregrine New User Training Ilene Carpenter 6/3/2015

2 Outline Introduction to Peregrine and Gyrfalcon Connecting to Peregrine Transferring files Filesystem overview Running jobs on Peregrine Introduction to modules Scripting tools on Peregrine

3 Peregrine System Architecture Infiniband cluster with multiple node types – 4 Login nodes – Service nodes (various types) – 1440 Compute nodes – multiple types Each node has at least 2 Intel Xeon processors and runs the Linux operating system. Has NFS file system and Lustre parallel file systems – 2.25 PB of storage

4

5 Peregrine has several types of compute nodes dual 8-core Intel Xeon processors (16 processor cores), 32 GB of memory dual 12-core Intel Xeon processors (24 cores), 32 GB of memory dual 12-core Intel Xeon processors (24 cores), 64 GB of memory dual 8-core Intel Xeon processors, 256 GB of memory dual 8-core Intel Xeon processors, 32 GB of memory + 2 Intel Xeon Phi coprocessors

6 Gyrfalcon: Long term data storage Over 3 PB of storage Sometimes called Mass Storage or Mass Storage System (MSS) Files are stored on disk and tape, migration is transparent to users. Two copies kept automatically, not backed up – if you delete a file, both copies get deleted! Quota on both number of files and total storage used

7 Long term storage: /mss file system Each user has a directory in /mss/users/ Projects may request allocations of space shared by the project members. – Files will be in /mss/projects/ The /mss file system is mounted on Peregrine login nodes but not on the compute nodes.

8 Connecting to Peregrine ssh to peregrine.nrel.gov %ssh –Y username@peregrine.nrel.govusername@peregrine.nrel.gov Windows users need to install a program that allows one to ssh, such as PuTTY. Mac users can use terminal or X11. For more information see http://hpc.nrel.gov/users/connectivity/connecting- to-hpc-systems

9 Linux CLI and shells When you connect, you will be in a “shell” on a login node. A shell is a command line interface to the operating system on the node. Default is bash shell. If you are new to command line interfaces and HPC systems, see instructions online at http://hpc.nrel.gov/users/systems/peregrine/gettin g-started-for-users-new-to-high-performance- computing

10 Peregrine File Systems “home” file system: /home/ – Store your scripts, programs and other small files here. – Backed up – 40 GB per user – Use homequota.check script to check how much space you’ve used. “nopt” file system: /nopt – the location of applications, libraries and tools “ projects” file system: /projects/ – A parallel file system for collaboration among project members. Useful for storing large input files and programs. “scratch” file system: /scratch – A high performance parallel file system to be used by jobs doing lots of file I/O

11 How to Transfer Files Laptop to/from Peregrine – Mac OSX and Linux: scp,sftp, rsync in terminal session – Windows: WinSCP – Any system: Globus Online From Peregrine to/from Gyrfalcon – cp, mv, rsync from a login node (which can access both /mss and all of the Peregrine file systems) From a computer at another computer center to Peregrine – Globus Online

12 Running jobs on Peregrine Peregrine uses Moab for job scheduling and workload management and Torque for resource management. Node scheduling is exclusive (jobs don’t share nodes). Use the qsub command %qsub – Batch file contains options in PBS/Torque job submission language, preceded by #PBS, to specify resource limits such as number of nodes and wall time limit. qsub –V exports all environment variables to the batch job qsub –l resource_list qsub –I requests interactive session qsub –q short puts a job in the “short” queue Use man qsub for more options

13 Allocations Only jobs associated with a valid project allocation will run. Use -A either as option to qsub command or within job script qsub –A CSC000 –lnodes=1,walltime=0:45:00 –q short asks for 1 node for 45 minutes from the short queue and tells the system that the CSC000 project allocation should be used

14 Interactive Jobs You can use compute nodes for interactive work. execute commands and scripts interactively run applications with GUIs (such as Matlab, COMSOL, etc.) You request an interactive “job” with –I option to the qsub command: qsub –I –q –A The same resource limits apply to interactive jobs as to non-interactive jobs. These depend on the queue you submit your interactive job to.

15 Asking for particular node types All nodes assigned to your job will be the same type Use –l feature=X option to request specific node types – X can be “ 16core ”, “ 24core ”, “ 64GB ”, “256GB”, “phi”

16 Job Queues debug – For quick access to nodes for debugging short – For jobs that take less than 4 hours batch – For jobs that take 2 days or less large – For jobs that use at least 16 nodes long – For jobs that take up to 10 days (by request) bigmem – For jobs that need a lot of memory per node (by request) phi – For jobs that use the Phi coprocessors

17 Debug queue No production work allowed Maximum run time is 1 hour Max of two jobs per user Max of two nodes per job Queue has 2 of each type of node Submit with qsub –q debug

18 Short queue For short production jobs Maximum run time is 4 hours Up to 8 nodes per user Up to 8 nodes per job Has nodes of each type Submit job with qsub –q short

19 Batch queue This is the default queue. Max runtime of 2 days. Max of 296 nodes per user. Max of 288 nodes per job. Has 740 nodes, of the following types: – 168 16core, 32GB nodes – 374 24core, 32 GB nodes – 198 24 core, 64 GB nodes

20 Large queue For jobs that use at least 16 nodes Maximum run time is 1 day Maximum number of nodes per user is 202 Maximum number of nodes per job is 202 Has 202 nodes with 24 cores and 32 GB of memory qsub –q large

21 Long queue For jobs that take more than 2 days to run. Maximum run time is 10 days. Access by request only, must have justified need. Maximum number of nodes per user is 120. Maximum number of nodes per job is 120. 126 nodes with 24 cores and 32 GB of memory 160 nodes with 16 cores and 32 GB of memory qsub –q long

22 Bigmem queue By request only, must have justified need. Maximum run time is 10 days. Maximum number of nodes per user is 46. Maximum number of nodes per job is 46. 80 nodes with 24 cores and 64 GB of memory. 52 nodes with 16 cores and 256 GB of memory.

23 Phi queue Intended for jobs that will use Intel Xeon Phi coprocessors. – Jobs may use both Phi and Xeon cores simultaneously. Maximum run time is 2 days. Maximum number of nodes per user is 32. Maximum number of nodes per job is 32. 32 nodes with 16 Xeon cores, 32 GB of memory and 2 Xeon Phi coprocessors These nodes run a slightly different software stack than nodes without Phi coprocessors.

24 Checking job status qstat will show the state of your job (queued, running, etc.) checkjob –v will give you information about why your job isn’t running yet shownodes shows what nodes are potentially available for running your jobs To get information about your job after it ran, use showhist.moab.pl – Shows submit time, start time, end time, exit code, node list and other useful information.

25 Sample serial job script #!/bin/bash #PBS –lwalltime=4:00:00 #PBS –lnodes=1 #PBS –N test1 #PBS –A CSC001 cd $PBS_O_WORKDIR./a.out

26 Sample MPI job script #!/bin/bash #PBS –lwalltime=4:00:00 #PBS –lnodes=4:ppn=16 #PBS –lfeature=16core #PBS –N test1 #PBS –A CSC001 cd $PBS_O_WORKDIR mpirun –np 64 /path/to/executable

27 Sample script with multiple serial jobs #!/bin/bash #PBS -l walltime=00:10:00 #PBS -l nodes=1:ppn=24 #PBS -N wait_test #PBS -o std.out #PBS -e std.err #PBS -A hpc-apps cd $PBS_O_WORKDIR JOBNAME=waitTest # Run 8 jobs N_JOB=8 for((i=1;i<=$N_JOB;i++)) do mkdir $JOBNAME.run$i cd $JOBNAME.run$i echo 10*10^$i | bc > input time../pi_test log & cd.. done #Wait for all wait echo echo "All done. Checking results:" grep "PI" $JOBNAME.*/log

28 Introduction to modules modules is a utility for allowing users to easily change their software environment. It allows a system to have multiple versions of software and enables easy use of installed applications. By default, two modules are loaded when you log in to Peregrine. These set up your environment to use the Intel compiler suite and the Intel MPI library. The module list command shows what is currently loaded: [icarpent@login2 ~]$ module list Currently Loaded Modulefiles: 1) comp-intel/13.1.3 2) impi-intel/4.1.1- 13.1.3 The module avail command shows what modules are available for you to use.

29 Why Scripting? Productivity Easily tuned to domain Performance where needed

30 Scripting Tools on Peregrine Shells: bash, csh, dash, ksh, ksh93, sh, tcsh, zsh, tclsh Perl, Octave, Lua, Lisp(Emacs, Guile), GNUplot, Tclx, Java, Ruby IDL, Python, R,.NET (Mono) SQL (Postgres, Sqlite, MySQL) MATLAB, GAMS

31 HPC website, help email http://hpc.nrel.gov/users/ To report a problem, send email to hpc-help@nrel.gov

32 Brief Introduction to Xeon Phi The Phi chips in Peregrine are “coprocessors”. – attached to the Xeon processors via PCIe – special Linux OS with limited capabilities – different instruction set from Xeon Each Phi coprocessor has peak performance of ~ 1 TFLOP (DP).

33 Phi/MIC Architecture ~ 60 cores – Each core has 4 hardware threads – Each core is slower than regular Xeon core – 512 bit vectors (SIMD) 8 GB of memory Phi is designed for highly parallel, well vectorized applications.

34 Application must scale to high level of task parallelism! Chart taken from Colfax Developer Boot Camp slides

35 Will your application benefit from MIC architecture? Chart taken from Colfax Developer Boot Camp slides

36 For more information, see book by Jim Jeffers and James Reinders:


Download ppt "NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable."

Similar presentations


Ads by Google