11th VI-HPS Tuning Workshop, 22-25 April 2013, MdS, Saclay1 Hands-on exercise: NPB-MZ-MPI / BT VI-HPS Team.

Slides:



Advertisements
Similar presentations
05/11/2001 CPT week Natalia Ratnikova, FNAL 1 Software Distribution in CMS Distribution unitFormContent Version of SCRAM managed project.
Advertisements

Profiling your application with Intel VTune at NERSC
SDM Center Coupling Parallel IO with Remote Data Access Ekow Otoo, Arie Shoshani, Doron Rotem, and Alex Sim Lawrence Berkeley National Lab.
Thoughts on Shared Caches Jeff Odom University of Maryland.
Cygwin Linux for Windows Desktop Paul Stuyvesant.
DKRZ Tutorial 2013, Hamburg1 Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Frank Winkler 1),
Eclipse Introduction Dwight Deugo Nesa Matic
Dr. Muhammed Al-Mulhem 1ICS ICS 535 Design and Implementation of Programming Languages Part 1 OpenMP -Example ICS 535 Design and Implementation.
Sharepoint Portal Server Basics. Introduction Sharepoint server belongs to Microsoft family of servers Integrated suite of server capabilities Hosted.
A Mini UNIX Tutorial. What’s UNIX?  An operating system run on many servers/workstations  Invented by AT&T Bell Labs in late 60’s  Currently there.
Using Ant to build J2EE Applications Kumar
Using Macs and Unix Nancy Griffeth January 6, 2014 Funding for this workshop was provided by the program “Computational Modeling and Analysis of Complex.
Introduction to The Linaro Toolchain Embedded Processors Training Multicore Software Applications Literature Number: SPRPXXX 1.
1 Introduction to Tool chains. 2 Tool chain for the Sitara Family (but it is true for other ARM based devices as well) A tool chain is a collection of.
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
Task Farming on HPCx David Henty HPCx Applications Support
Automatic trace analysis with Scalasca Alexandru Calotoiu German Research School for Simulation Sciences (with content used with permission from tutorials.
BSC tools hands-on session. 2 Objectives Copy ~nct00001/tools-material into your ${HOME} –cp –r ~nct00001/tools-material ${HOME} Contents of.
FireRMS SQL Audit, Archiving & Purging Presented by Laura Small FireRMS Quality Assurance.
Unix Primer. Unix Shell The shell is a command programming language that provides an interface to the UNIX operating system. The shell is a “regular”
Donald Stark National Center for Atmospheric Research (NCAR) The Developmental Testbed Center (DTC) Wednesday 29 June, 2011 GSI Fundamentals (1): Setup.
DKRZ Tutorial 2013, Hamburg Analysis report examination with CUBE Markus Geimer Jülich Supercomputing Centre.
Analysis report examination with CUBE Alexandru Calotoiu German Research School for Simulation Sciences (with content used with permission from tutorials.
Extending the Discovery Environment: Tool Integration and Customization.
DKRZ Tutorial 2013, Hamburg1 Hands-on: NPB-MZ-MPI / BT VI-HPS Team.
Trilinos 101: Getting Started with Trilinos November 7, :30-9:30 a.m. Mike Heroux Jim Willenbring.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering1 Score-P Hands-On CUDA: Jacobi example.
SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering Automatic trace analysis with Scalasca Markus Geimer, Brian Wylie, David.
®® Microsoft Windows 7 for Power Users Tutorial 13 Using the Command-Line Environment.
12th VI-HPS Tuning Workshop, 7-11 October 2013, JSC, Jülich1 Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca,
Compiler BE Panel IDC HPC User Forum April 2009 Don Kretsch Director, Sun Developer Tools Sun Microsystems.
Makefiles CISC/QCSE 810. BeamApp and Tests in C++ 5 source code files After any modification, changed source needs to be recompiled all object files need.
Apache Web Server v. 2.2 Reference Manual Chapter 1 Compiling and Installing.
Makefiles. makefiles Problem: You are working on one part of a large programming project (e. g., MS Word).  It consists of hundreds of individual.cpp.
Introduction Use of makefiles to manage the build process Declarative, imperative and relational rules Environment variables, phony targets, automatic.
Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Alexandru Calotoiu German Research School for.
Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov.
Cygwin Linux for Windows Desktop Paul Stuyvesant.
Tutorial build Main ideas –Reuse as much previously obtained configuration information as possible: from Babel, cca-spec-babel, etc. –Extract all irrelevant.
DDT Debugging Techniques Carlos Rosales Scaling to Petascale 2010 July 7, 2010.
Renesas Technology America Inc. 1 SKP8CMINI Tutorial 2 Creating A New Project Using HEW.
Introduction to Eclipse CSC 216 Lecture 3 Ed Gehringer Using (with permission) slides developed by— Dwight Deugo Nesa Matic
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Installing and Running the WPS Michael Duda 2006 WRF-ARW Summer Tutorial.
Hands-on: NPB-MZ-MPI / BT VI-HPS Team. 10th VI-HPS Tuning Workshop, October 2012, Garching Local Installation VI-HPS tools accessible through the.
Semi-Automatic patch upgrade kit
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
Manage Directories and Files in Linux Part 2. 2 Identify File Types in the Linux System The file types in Linux referred to as normal files and directories.
Threaded Programming Lecture 2: Introduction to OpenMP.
Renesas Technology America Inc. 1 SKP8CMINI Tutorial 2 Creating A New Project Using HEW.
An Overview of ROMS Code Kate Hedstrom, ARSC April 2007.
CSCS-USI Summer School (Lugano, 8-19 July 2013)1 Hands-on exercise: NPB-MZ-MPI / BT VI-HPS Team.
Makefiles1 MAKEFILES Purpose: contain UNIX commands and will run them in a specified sequence. Syntax Definition : { Section-name: {unix command #1} …
Installing and Running the WPS Michael Duda 2006 WRF-ARW Summer Tutorial.
Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Markus Geimer 1), Peter Philippen 1) Andreas.
How to configure, build and install Trilinos November 2, :30-9:30 a.m. Jim Willenbring.
SC‘13: Hands-on Practical Hybrid Parallel Application Performance Engineering Hands-on example code: NPB-MZ-MPI / BT (on Live-ISO/DVD) VI-HPS Team.
Debugging Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
Introduction to HPC Debugging with Allinea DDT Nick Forrington
1 COMP 3500 Introduction to Operating Systems Project 4 – Processes and System Calls Part 3: Adding System Calls to OS/161 Dr. Xiao Qin Auburn University.
E Copyright © 2006, Oracle. All rights reserved. Using SQL Developer.
A+ Guide to Managing and Maintaining Your PC, 7e Chapter 2 Introducing Operating Systems.
Travel Modelling Group Technical Advisory Committee
Python’s Modules Noah Black.
INTRODUCING Adams/CHASSIS
Advanced TAU Commander
Operation System Program 4
A configurable binary instrumenter
Working in The IITJ HPC System
Presentation transcript:

11th VI-HPS Tuning Workshop, April 2013, MdS, Saclay1 Hands-on exercise: NPB-MZ-MPI / BT VI-HPS Team

11th VI-HPS Tuning Workshop, April 2013, MdS, Saclay2 Tutorial exercise objectives Familiarise with usage of VI-HPS tools –complementary tools’ capabilities & interoperability Prepare to apply tools productively to your applications(s) Exercise is based on a small portable benchmark code –unlikely to have significant optimisation opportunities Optional (recommended) exercise extensions –analyse performance of alternative configurations –investigate effectiveness of system-specific compiler/MPI optimisations and/or placement/binding/affinity capabilities –investigate scalability and analyse scalability limiters –compare performance on different HPC platforms –…

11th VI-HPS Tuning Workshop, April 2013, MdS, Saclay3 Local UNITE installation VI-HPS tools accessible through the UNITE framework –Currently installed in a non-default location Need to extend the module search path manually –Load the UNITE meta-module % module use /gpfslocal/pub/vihps/UNITE/local % module load UNITE % module avail /gpfslocal/pub/vihps/UNITE/modulefiles/tools [...] scalasca/2.0b3-intel13-intelmpi(default) scorep/1.1.1-intel13-intel-papi-cuda-static scorep/1.1.1-intel13-intelmpi-papi-static(default) tau/ intelmpi-intel13 vampir/8.1-dev [...] maqao/2.0.0

11th VI-HPS Tuning Workshop, April 2013, MdS, Saclay4 Local installation for workshop VI-HPS-TW meta-module loads default set of VI-HPS tools (and corresponding compiler+MPI) for workshop % module load VI-HPS-TW VI-HPS-TW loaded (load) intel version (load) intelmpi version scorep/1.1.1-intel13-intelmpi-papi-static loaded cube4/4.1.6-gnu loaded scalasca/2.0b3-intel13-intelmpi loaded tau/ intelmpi-intel13 loaded vampir/8.1-dev loaded % module list 1) UNITE/1.1 6) scalasca/2.0b3-intel13-intelmpi 2) intel/ ) tau/ intelmpi-intel13 3) intelmpi/ ) vampir/8.1-dev 4) cube4/4.1.6-gnu 9) VI-HPS-TW/intel13 5) scorep/1.1.1-intel13-intelmpi-papi-static

11th VI-HPS Tuning Workshop, April 2013, MdS, Saclay5 Local Installation (cont.) Module help provides location of tutorial sources Copy tutorial sources to your working directory –(When available, generally advisable to use a parallel filesystem such as $WORK) % module help VI-HPS-TW [...] VI-HPS-TW: VI-HPS Tuning Workshop environment for intel/ compilers and intelmpi/4.0.3 MPI VI-HPS tutorial sources: $UNITE_ROOT/tutorial VI-HPS sample experiments: $UNITE_ROOT/samples [...] % cp -r $UNITE_ROOT/tutorial. % cd tutorial/NPB3.3-MZ-MPI

11th VI-HPS Tuning Workshop, April 2013, MdS, Saclay6 NPB-MZ-MPI Suite The NAS Parallel Benchmark suite (MPI+OpenMP version) –Available from –3 benchmarks in Fortran77 –Configurable for various sizes & classes Move into the NPB3.3-MZ-MPI root directory Subdirectories contain source code for each benchmark –plus additional configuration and common code The provided distribution has already been configured for the tutorial, such that it's ready to “make” one or more of the benchmarks and install them into a (tool-specific) “bin” subdirectory % ls bin/ common/ jobscript/ Makefile README.install SP-MZ/ BT-MZ/ config/ LU-MZ/ README README.tutorial sys/

11th VI-HPS Tuning Workshop, April 2013, MdS, Saclay7 Building an NPB-MZ-MPI Benchmark Type “make” for instructions % make =========================================== = NAS PARALLEL BENCHMARKS 3.3 = = MPI+OpenMP Multi-Zone Versions = = F77 = =========================================== To make a NAS multi-zone benchmark type make CLASS= NPROCS= where is “bt-mz”, “lu-mz”, or “sp-mz” is “S”, “W”, “A” through “F” is number of processes [...] *************************************************************** * Custom build configuration is specified in config/make.def * * Suggested tutorial exercise configuration for HPC systems: * * make bt-mz CLASS=B NPROCS=4 * ***************************************************************

11th VI-HPS Tuning Workshop, April 2013, MdS, Saclay8 Building an NPB-MZ-MPI Benchmark Specify the benchmark configuration –benchmark name: bt-mz, lu-mz, sp-mz –the number of MPI processes: NPROCS=4 –the benchmark class (S, W, A, B, C, D, E): CLASS=B % make bt-mz CLASS=B NPROCS=4 cd BT-MZ; make CLASS=B NPROCS=4 VERSION= make: Entering directory 'BT-MZ' cd../sys; cc -o setparams setparams.c../sys/setparams bt-mz 4 B mpiifort -c -O3 -openmp bt.f [...] cd../common; mpiifort -c -O3 -openmp timers.f mpiifort –O3 -openmp -o../bin/bt-mz_B.4 \ bt.o initialize.o exact_solution.o exact_rhs.o set_constants.o \ adi.o rhs.o zone_setup.o x_solve.o y_solve.o exch_qbc.o \ solve_subs.o z_solve.o add.o error.o verify.o mpi_setup.o \../common/print_results.o../common/timers.o Built executable../bin/bt-mz_B.4 make: Leaving directory 'BT-MZ'

11th VI-HPS Tuning Workshop, April 2013, MdS, Saclay9 NPB-MZ-MPI / BT (Block Tridiagonal Solver) What does it do? –Solves a discretized version of unsteady, compressible Navier- Stokes equations in three spatial dimensions –Performs 200 time-steps on a regular 3-dimensional grid Implemented in 20 or so Fortran77 source modules Uses MPI & OpenMP in combination –4 processes with 4 threads each should be reasonable –bt-mz_B.4 should run in around 13 seconds –bt-mz_C.4 should take around 3-4x longer

11th VI-HPS Tuning Workshop, April 2013, MdS, Saclay10 NPB-MZ-MPI / BT Reference Execution Copy jobscript and launch as a hybrid MPI+OpenMP application % cd bin % cp../jobscript/mds/run.ll. % less run.ll % llsubmit run.ll % cat NPB_B. NAS Parallel Benchmarks (NPB3.3-MZ-MPI) - BT-MZ MPI+OpenMP Benchmark Number of zones: 8 x 8 Iterations: 200 dt: Number of active processes: 4 Total number of threads: 16 ( 4.0 threads/process) Time step 1 Time step 20 [...] Time step 180 Time step 200 Verification Successful BT-MZ Benchmark Completed. Time in seconds = Hint: save the benchmark output (or note the run time) to be able to refer to it later

11th VI-HPS Tuning Workshop, April 2013, MdS, Saclay11 Tutorial Exercise Steps Edit config/make.def to adjust build configuration –Modify specification of compiler/linker: MPIF77 Make clean and build new tool-specific executable Change to the directory containing the new executable before running it with the desired tool configuration % make clean % make bt-mz CLASS=B NPROCS=4 Built executable../bin.$(TOOL)/bt-mz_B.4 % cd bin.$(TOOL) % export … % OMP_NUM_THREADS=4 mpiexec –n 4./bt-mz_B.4

11th VI-HPS Tuning Workshop, April 2013, MdS, Saclay12 NPB-MZ-MPI / BT: config/make.def # SITE- AND/OR PLATFORM-SPECIFIC DEFINITIONS # # Items in this file may need to be changed for each platform. # # # The Fortran compiler used for MPI programs # MPIF77 = mpiifort -fpp # Alternative variants to perform instrumentation #MPIF77 = psc_instrument -u user,mpi,omp –s ${PROGRAM}.sir mpiifort #MPIF77 = tau_f90.sh #MPIF77 = scalasca -instrument mpiifort #MPIF77 = vtf77 –vt:hyb -vt:f77 mpiifort #MPIF77 = scorep --user mpiifort # PREP is a generic preposition macro for instrumentation preparation #MPIF77 = $(PREP) mpiifort # This links MPI Fortran programs; usually the same as ${MPIF77} FLINK = $(MPIF77)... Hint: uncomment one of these alternative compiler wrappers to perform instrumentation Default (no instrumentation)