CSCS-USI Summer School (Lugano, 8-19 July 2013)1 Hands-on exercise: NPB-MZ-MPI / BT VI-HPS Team.

Slides:



Advertisements
Similar presentations
05/11/2001 CPT week Natalia Ratnikova, FNAL 1 Software Distribution in CMS Distribution unitFormContent Version of SCRAM managed project.
Advertisements

Profiling your application with Intel VTune at NERSC
Thoughts on Shared Caches Jeff Odom University of Maryland.
Cygwin Linux for Windows Desktop Paul Stuyvesant.
Getting Started: Ansoft HFSS 8.0
Module 4.2 File management 1. Contents Introduction The file manager Files – the basic unit of storage The need to organise Glossary 2.
DKRZ Tutorial 2013, Hamburg1 Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Frank Winkler 1),
A Mini UNIX Tutorial. What’s UNIX?  An operating system run on many servers/workstations  Invented by AT&T Bell Labs in late 60’s  Currently there.
Using Macs and Unix Nancy Griffeth January 6, 2014 Funding for this workshop was provided by the program “Computational Modeling and Analysis of Complex.
Introduction to The Linaro Toolchain Embedded Processors Training Multicore Software Applications Literature Number: SPRPXXX 1.
1 Introduction to Tool chains. 2 Tool chain for the Sitara Family (but it is true for other ARM based devices as well) A tool chain is a collection of.
Task Farming on HPCx David Henty HPCx Applications Support
Automatic trace analysis with Scalasca Alexandru Calotoiu German Research School for Simulation Sciences (with content used with permission from tutorials.
Configuring ROMS for South of Java Kate Hedstrom, ARSC/UAF October, 2007.
BSC tools hands-on session. 2 Objectives Copy ~nct00001/tools-material into your ${HOME} –cp –r ~nct00001/tools-material ${HOME} Contents of.
FireRMS SQL Audit, Archiving & Purging Presented by Laura Small FireRMS Quality Assurance.
Unix Primer. Unix Shell The shell is a command programming language that provides an interface to the UNIX operating system. The shell is a “regular”
70-294: MCSE Guide to Microsoft Windows Server 2003 Active Directory Chapter 12: Deploying and Managing Software with Group Policy.
DKRZ Tutorial 2013, Hamburg Analysis report examination with CUBE Markus Geimer Jülich Supercomputing Centre.
Analysis report examination with CUBE Alexandru Calotoiu German Research School for Simulation Sciences (with content used with permission from tutorials.
Extending the Discovery Environment: Tool Integration and Customization.
DKRZ Tutorial 2013, Hamburg1 Hands-on: NPB-MZ-MPI / BT VI-HPS Team.
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
Trilinos 101: Getting Started with Trilinos November 7, :30-9:30 a.m. Mike Heroux Jim Willenbring.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering1 Score-P Hands-On CUDA: Jacobi example.
SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering Automatic trace analysis with Scalasca Markus Geimer, Brian Wylie, David.
®® Microsoft Windows 7 for Power Users Tutorial 13 Using the Command-Line Environment.
12th VI-HPS Tuning Workshop, 7-11 October 2013, JSC, Jülich1 Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca,
Makefiles CISC/QCSE 810. BeamApp and Tests in C++ 5 source code files After any modification, changed source needs to be recompiled all object files need.
Apache Web Server v. 2.2 Reference Manual Chapter 1 Compiling and Installing.
Makefiles. makefiles Problem: You are working on one part of a large programming project (e. g., MS Word).  It consists of hundreds of individual.cpp.
Introduction Use of makefiles to manage the build process Declarative, imperative and relational rules Environment variables, phony targets, automatic.
Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Alexandru Calotoiu German Research School for.
Vim Editor and Unix Command gcc compiler Computer Networks.
Cygwin Linux for Windows Desktop Paul Stuyvesant.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
Home Help Facilities. How to contact us. What we do. What we don’t do. What we have done.
DDT Debugging Techniques Carlos Rosales Scaling to Petascale 2010 July 7, 2010.
Introduction to Eclipse CSC 216 Lecture 3 Ed Gehringer Using (with permission) slides developed by— Dwight Deugo Nesa Matic
11th VI-HPS Tuning Workshop, April 2013, MdS, Saclay1 Hands-on exercise: NPB-MZ-MPI / BT VI-HPS Team.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
How to configure, build and install Trilinos November 2, :30-9:30 a.m. Jim Willenbring Mike Phenow.
Hands-on: NPB-MZ-MPI / BT VI-HPS Team. 10th VI-HPS Tuning Workshop, October 2012, Garching Local Installation VI-HPS tools accessible through the.
Semi-Automatic patch upgrade kit
Manage Directories and Files in Linux Part 2. 2 Identify File Types in the Linux System The file types in Linux referred to as normal files and directories.
Running Parallel Jobs Cray XE6 Workshop February 7, 2011 David Turner NERSC User Services Group.
Threaded Programming Lecture 2: Introduction to OpenMP.
An Overview of ROMS Code Kate Hedstrom, ARSC April 2007.
Installing and Running the WPS Michael Duda 2006 WRF-ARW Summer Tutorial.
Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Markus Geimer 1), Peter Philippen 1) Andreas.
How to configure, build and install Trilinos November 2, :30-9:30 a.m. Jim Willenbring.
SC‘13: Hands-on Practical Hybrid Parallel Application Performance Engineering Hands-on example code: NPB-MZ-MPI / BT (on Live-ISO/DVD) VI-HPS Team.
Advanced topics Cluster Training Center for Simulation and Modeling September 4, 2015.
Getting Started: XSEDE Comet Shahzeb Siddiqui - Software Systems Engineer Office: 222A Computer Building Institute of CyberScience May.
Debugging Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
Introduction to HPC Debugging with Allinea DDT Nick Forrington
NUMA Control for Hybrid Applications Kent Milfeld TACC May 5, 2015.
1 COMP 3500 Introduction to Operating Systems Project 4 – Processes and System Calls Part 3: Adding System Calls to OS/161 Dr. Xiao Qin Auburn University.
E Copyright © 2006, Oracle. All rights reserved. Using SQL Developer.
A+ Guide to Managing and Maintaining Your PC, 7e Chapter 2 Introducing Operating Systems.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL May 19, 2003 BNL Technology Meeting.
© 2011 Pittsburgh Supercomputing Center XSEDE 2012 Intro To Our Environment John Urbanic Pittsburgh Supercomputing Center July, 2012.
Python’s Modules Noah Black.
INTRODUCING Adams/CHASSIS
Chapter 2 Objectives Identify Windows 7 Hardware Requirements.
Advanced TAU Commander
Operation System Program 4
A configurable binary instrumenter
Working in The IITJ HPC System
Presentation transcript:

CSCS-USI Summer School (Lugano, 8-19 July 2013)1 Hands-on exercise: NPB-MZ-MPI / BT VI-HPS Team

CSCS-USI Summer School (Lugano, 8-19 July 2013)2 Tutorial exercise objectives Familiarise with usage of VI-HPS tools –complementary tools’ capabilities & interoperability Prepare to apply tools productively to your applications(s) Exercise is based on a small portable benchmark code –unlikely to have significant optimisation opportunities Optional (recommended) exercise extensions –analyse performance of alternative configurations –investigate effectiveness of system-specific compiler/MPI optimisations and/or placement/binding/affinity capabilities –investigate scalability and analyse scalability limiters –compare performance on different HPC platforms –…

CSCS-USI Summer School (Lugano, 8-19 July 2013)3 Local Installation (cont.) Copy tutorial sources to your working directory –(When available, generally advisable to use a parallel filesystem such as $WORK) % cp -r /project/csstaff/courses/CSCS_USI_School/Day8/tutorial. % cd tutorial/NPB3.3-MZ-MPI

CSCS-USI Summer School (Lugano, 8-19 July 2013)4 NPB-MZ-MPI Suite The NAS Parallel Benchmark suite (MPI+OpenMP version) –Available from –3 benchmarks in Fortran77 –Configurable for various sizes & classes Move into the NPB3.3-MZ-MPI root directory Subdirectories contain source code for each benchmark –plus additional configuration and common code The provided distribution has already been configured for the tutorial, such that it's ready to “make” one or more of the benchmarks and install them into a (tool-specific) “bin” subdirectory % ls bin/ common/ jobscript/ Makefile README.install SP-MZ/ BT-MZ/ config/ LU-MZ/ README README.tutorial sys/

CSCS-USI Summer School (Lugano, 8-19 July 2013)5 Building an NPB-MZ-MPI Benchmark Type “make” for instructions % make =========================================== = NAS PARALLEL BENCHMARKS 3.3 = = MPI+OpenMP Multi-Zone Versions = = F77 = =========================================== To make a NAS multi-zone benchmark type make CLASS= NPROCS= where is “bt-mz”, “lu-mz”, or “sp-mz” is “S”, “W”, “A” through “F” is number of processes [...] *************************************************************** * Custom build configuration is specified in config/make.def * * Suggested tutorial exercise configuration for HPC systems: * * make bt-mz CLASS=B NPROCS=4 * ***************************************************************

CSCS-USI Summer School (Lugano, 8-19 July 2013)6 Building an NPB-MZ-MPI Benchmark Specify the benchmark configuration –benchmark name: bt-mz, lu-mz, sp-mz –the number of MPI processes: NPROCS=4 –the benchmark class (S, W, A, B, C, D, E): CLASS=B % make bt-mz CLASS=B NPROCS=4 cd BT-MZ; make CLASS=B NPROCS=4 VERSION= make: Entering directory 'BT-MZ' cd../sys; cc -o setparams setparams.c../sys/setparams bt-mz 4 B ftn -c -O3 -openmp bt.f [...] cd../common; ftn -c -O3 -openmp timers.f ftn –O3 -openmp -o../bin/bt-mz_B.4 \ bt.o initialize.o exact_solution.o exact_rhs.o set_constants.o \ adi.o rhs.o zone_setup.o x_solve.o y_solve.o exch_qbc.o \ solve_subs.o z_solve.o add.o error.o verify.o mpi_setup.o \../common/print_results.o../common/timers.o Built executable../bin/bt-mz_B.4 make: Leaving directory 'BT-MZ'

CSCS-USI Summer School (Lugano, 8-19 July 2013)7 NPB-MZ-MPI / BT (Block Tridiagonal Solver) What does it do? –Solves a discretized version of unsteady, compressible Navier- Stokes equations in three spatial dimensions –Performs 200 time-steps on a regular 3-dimensional grid Implemented in 20 or so Fortran77 source modules Uses MPI & OpenMP in combination –4 processes with 4 threads each should be reasonable –bt-mz_B.4 should run in around 30 seconds –bt-mz_C.4 should take around 3-4x longer

CSCS-USI Summer School (Lugano, 8-19 July 2013)8 NPB-MZ-MPI / BT Reference Execution (interactive) Acquire partition and launch as a hybrid MPI+OpenMP application % cd bin % salloc –N 1 salloc: granted job allocation % export OMP_NUM_THREADS=4 % aprun –n 4 –d 4./bt-mz_B.4 NAS Parallel Benchmarks (NPB3.3-MZ-MPI) - BT-MZ MPI+OpenMP Benchmark Number of zones: 8 x 8 Iterations: 200 dt: Number of active processes: 4 Total number of threads: 16 ( 4.0 threads/process) Time step 1 Time step 20 [...] Time step 180 Time step 200 Verification Successful BT-MZ Benchmark Completed. Time in seconds = Hint: save the benchmark output (or note the run time) to be able to refer to it later

CSCS-USI Summer School (Lugano, 8-19 July 2013)9 NPB-MZ-MPI / BT Reference Execution (batch) Copy jobscript and launch as a hybrid MPI+OpenMP application % cd bin % cp../jobscript/todi/run.sbatch. % less run.sbatch % sbatch run.sbatch % cat slurm-.out NAS Parallel Benchmarks (NPB3.3-MZ-MPI) - BT-MZ MPI+OpenMP Benchmark Number of zones: 8 x 8 Iterations: 200 dt: Number of active processes: 4 Total number of threads: 16 ( 4.0 threads/process) Time step 1 Time step 20 [...] Time step 180 Time step 200 Verification Successful BT-MZ Benchmark Completed. Time in seconds = Hint: save the benchmark output (or note the run time) to be able to refer to it later

CSCS-USI Summer School (Lugano, 8-19 July 2013)10 Local tools installation Currently installed in a non-default location –Need to extend the module search path manually NB: use older PrgEnv-gnu environment % module use \ /project/csstaff/courses/CSCS_USI_School/Day9/modulefiles % module avail -- /project/csstaff/courses/CSCS_USI_School/Day9/modulefiles scalasca/1.4.3 scalasca/2.0-rc1(default) scorep/1.1.1 scorep/1.1.1-cuda(default) [...] % module load scorep scalasca % module switch PrgEnv-cray PrgEnv-gnu % module switch cray-mpich cray-mpich2/5.6.5 % module switch gcc gcc/4.7.2 % module switch cray-libsci cray-libsci/

CSCS-USI Summer School (Lugano, 8-19 July 2013)11 Tutorial Exercise Steps Edit config/make.def to adjust build configuration –Modify specification of compiler/linker: MPIF77 Make clean and build new tool-specific executable Change to the directory containing the new executable before running it with the desired tool configuration % make clean % make bt-mz CLASS=B NPROCS=4 Built executable../bin.$(TOOL)/bt-mz_B.4 % cd bin.$(TOOL) % export … % OMP_NUM_THREADS=4 aprun –n 4 –d 4./bt-mz_B.4

CSCS-USI Summer School (Lugano, 8-19 July 2013)12 NPB-MZ-MPI / BT: config/make.def # SITE- AND/OR PLATFORM-SPECIFIC DEFINITIONS # # Items in this file may need to be changed for each platform. # # # The Fortran compiler used for MPI programs # MPIF77 = ftn # Alternative variants to perform instrumentation #MPIF77 = psc_instrument -u user,mpi,omp –s ${PROGRAM}.sir ftn #MPIF77 = tau_f90.sh #MPIF77 = scalasca -instrument ftn #MPIF77 = vtf77 –vt:hyb -vt:f77 ftn #MPIF77 = scorep --mpi --user ftn # PREP is a generic preposition macro for instrumentation preparation #MPIF77 = $(PREP) ftn # This links MPI Fortran programs; usually the same as ${MPIF77} FLINK = $(MPIF77)... Hint: uncomment one of these alternative compiler wrappers to perform instrumentation Default (no instrumentation)