Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced High Performance Computing Workshop HPC 201 Dr. Charles J Antonelli LSAIT RSG October 30, 2013.

Similar presentations


Presentation on theme: "Advanced High Performance Computing Workshop HPC 201 Dr. Charles J Antonelli LSAIT RSG October 30, 2013."— Presentation transcript:

1 Advanced High Performance Computing Workshop HPC 201 Dr. Charles J Antonelli LSAIT RSG October 30, 2013

2 Roadmap Flux review More advanced troubleshooting Array & dependent scheduling Graphical output GPUs on Flux Scientific applications R, Python, MATLAB Parallel programming Debugging & tracing 2 cja 201311/132

3 Flux review 11/13cja 20133

4 The Flux cluster … 10/13cja 20134

5 A Flux node 12-40 Intel cores 48 GB – 1 TB RAM Local disk 10/135 8 GPUs (optional) cja 2013

6 Programming Models Two basic parallel programming models Message-passing The application consists of several processes running on different nodes and communicating with each other over the network Used when the data are too large to fit on a single node, and simple synchronization is adequate “Coarse parallelism” Implemented using MPI (Message Passing Interface) libraries Multi-threaded The application consists of a single process containing several parallel threads that communicate with each other using synchronization primitives Used when the data can fit into a single process, and the communications overhead of the message-passing model is intolerable “Fine-grained parallelism” or “shared-memory parallelism” Implemented using OpenMP (Open Multi-Processing) compilers and libraries Both 10/13cja 20136

7 Using Flux Three basic requirements: A Flux login account A Flux allocation An MToken (or a Software Token) Logging in to Flux ssh flux-login.engin.umich.edu Campus wired or Mwireless VPN ssh login.itd.umich.edu first 10/13cja 20137

8 Cluster batch workflow You create a batch script and submit it to PBS PBS schedules your job, and it enters the flux queue When its turn arrives, your job will execute the batch script Your script has access to all Flux applications and data When your script completes, anything it sent to standard output and error are saved in files stored in your submission directory You can ask that email be sent to you when your jobs starts, ends, or aborts You can check on the status of your job at any time, or delete it if it’s not doing what you want A short time after your job completes, it disappears from PBS 10/13cja 20138

9 Loosely-coupled batch script #PBS -N yourjobname #PBS -V #PBS -A youralloc_flux #PBS -l qos=flux #PBS -q flux #PBS –l procs=12,walltime=00:05:00 #PBS -M youremailaddress #PBS -m abe #PBS -j oe #Your Code Goes Below: cat $PBS_NODEFILE cd $PBS_O_WORKDIR mpirun./c_ex01 10/13cja 20139

10 Tightly-coupled batch script #PBS -N yourjobname #PBS -V #PBS -A youralloc_flux #PBS -l qos=flux #PBS -q flux #PBS –l nodes=1:ppn=12,walltime=00:05:00 #PBS -M youremailaddress #PBS -m abe #PBS -j oe #Your Code Goes Below: cd $PBS_O_WORKDIR matlab -nodisplay -r script 10/13cja 201310

11 Copying data Three ways to copy data to/from Flux Use scp from login server: scp flux-login.engin.umich.edu:hpc201/example563.png. Use scp from transfer host: scp flux-xfer.engin.umich.edu:hpc201/example563.png. Use Globus Connect 10/13cja 201311

12 Globus Online Features High-speed data transfer, much faster than SCP or SFTP Reliable & persistent Minimal client software: Mac OS X, Linux, Windows GridFTP Endpoints Gateways through which data flow Exist for XSEDE, OSG, … UMich: umich#flux, umich#nyx Add your own client endpoint! Add your own server endpoint: contact flux-support@umich.edu flux-support@umich.edu More information http://cac.engin.umich.edu/resources/login-nodes/globus-gridftp 10/13cja 201312

13 Job Arrays Submit copies of identical jobs Submit copies of identical jobs Invoked via qsub –t: Invoked via qsub –t: qsub –t array-spec pbsbatch.txt Where array-spec can be m-na,b,cm-n%slotlimite.g. qsub –t 1-50%10Fifty jobs, numbered 1 through 50, only ten can run simultaneously $PBS_ARRAYID records array identifier $PBS_ARRAYID records array identifier 13 cja 201311/1313

14 Dependent scheduling Submit jobs whose scheduling depends on other jobs Submit jobs whose scheduling depends on other jobs Invoked via qsub –W: Invoked via qsub –W: qsub -W depend=type:jobid[:jobid]… Where depend can be afterSchedule this job after jobids have started afteranySchedule this job after jobids have finished afterokSchedule this job after jobids have finished with no errors afternotokSchedule this job after jobids have finished with errors beforejobids scheduled after this job starts beforeanyjobids scheduled after this job completes beforeokjobids scheduled after this job completes with no errors beforenotokjobids scheduled after this job completes with errors 14 cja 201311/1314

15 Troubleshooting showq [-r][-i][-b][-w user=uniq] # running/idle/blocked jobs qstat -f jobno # full info qstat -n jobno # nodes/cores where job running diagnose -p # job prio and components pbsnodes # nodes, states, properties pbsnodes -l # list nodes marked down checkjob [-v] jobno # why job jobno not running mdiag -a # allocs & users (flux) tracejob jobno # info for compl jobs (not flux) freenodes # aggregate node/core busy/free mdiag -u uniq # allocs for uniq (flux) mdiag -a alloc_flux # cores active, alloc (flux) 11/13cja 201315

16 GPUs on Flux 11/13cja 201316

17 Scientific applications 11/13cja 201317

18 Parallel programming 11/13cja 201318

19 Debugging & tracing 11/13cja 201319

20 Debugging with GDB Command-line debugger Start programs or attach to running programs Display source program lines Display and change variables or memory Plant breakpoints, watchpoints Examine stack frames Excellent tutorial documentation http://www.gnu.org/s/gdb/documentation/ 20 cja 201311/1320

21 GDB symbols GDB requires program symbols to be generated by the compiler GDB will work without symbols But you’d better be fluent in machine instructions and hexadecimal Add –g flag to your compilation gcc –g hello.c –o chello gfortran –f hello.f90 –o fhello Do not use –O with –g Most compilers won’t optimize code for debugging gcc and gfortran will, but you often won’t recognize the resulting source code 21 cja 201311/1321

22 22 Useful GDB commands gdb execstart gdb on executable exec gdb exec corestart gdb on executable exec with core file core l [m,n]list source disasdisassemble function enclosing current instruction disas funcdisassemble function func b funcset breakpoint at entry to func b line#set breakpoint at source line# b *0xaddrset breakpoint at address addr i bshow breakpoints d bp#delete beakpoint bp# r [args]run program with optional args btshow stack backtrace ccontinue execution from breakpoint stepsingle-step one source line nextsingle-step, don’t step into function stepisingle-step one instruction p vardisplay contents of variable var p *vardisplay value pointed to by var p &vardisplay address of var p arr[idx]display element idx of array arr x 0xaddrdisplay hex word at addr x *0xaddrdisplay hex word pointed to by addr x/20x 0xaddrdisplay 20 words in hex starting at addr i rdisplay registers i r ebpdisplay register ebp set var = expressionset variable var to expression qquit gdb 11/13cja 2013

23 Resources http://cac.engin.umich.edu/started Cluster news, RSS feed and outages listed here http://cac.engin.umich.edu/ http://cac.engin.umich.edu/ Getting an account, training, basic tutorials http://cac.engin.umich.edu/resources/systems/flux/ Getting an allocation, Flux On-Demand, Flux info For assistance: flux-support@umich.edu flux-support@umich.edu Read by a team of people Cannot help with programming questions, but can help with scheduler issues cja 201311/1323

24 References 1.CAC supported Flux software, http://cac.engin.umich.edu/resources/software/index.html, (accessed August 2011) http://cac.engin.umich.edu/resources/software/index.html 2.Free Software Foundation, Inc., “GDB User Manual,” http://www.gnu.org/s/gdb/documentation/ (accessed October 2011). http://www.gnu.org/s/gdb/documentation/ 3.Infiniband, http://en.wikipedia.org/wiki/InfiniBand (accessed August 2011). http://en.wikipedia.org/wiki/InfiniBand 4.Intel C and C++ Compiler 1.1 User and Reference Guide, http://software.intel.com/sites/products/documentation/hpc/compilerpro/en- us/cpp/lin/compiler_c/index.htm (accessed August 2011). http://software.intel.com/sites/products/documentation/hpc/compilerpro/en- us/cpp/lin/compiler_c/index.htm http://software.intel.com/sites/products/documentation/hpc/compilerpro/en- us/cpp/lin/compiler_c/index.htm 5.Intel Fortran Compiler 11.1 User and Reference Guide,http://software.intel.com/sites/products/documentation/hpc/compilerpro/en- us/fortran/lin/compiler_f/index.htm (accessed August 2011). http://software.intel.com/sites/products/documentation/hpc/compilerpro/en- us/fortran/lin/compiler_f/index.htmhttp://software.intel.com/sites/products/documentation/hpc/compilerpro/en- us/fortran/lin/compiler_f/index.htm 6.Lustre file system, http://wiki.lustre.org/index.php/Main_Page (accessed August 2011). http://wiki.lustre.org/index.php/Main_Page 7.Torque User’s Manual, http://www.clusterresources.com/torquedocs21/usersmanual.shtml (accessed August 2011). http://www.clusterresources.com/torquedocs21/usersmanual.shtml cja 201311/1324


Download ppt "Advanced High Performance Computing Workshop HPC 201 Dr. Charles J Antonelli LSAIT RSG October 30, 2013."

Similar presentations


Ads by Google