Www.bsc.es Belgrade, 25 September 2014 George S. Markomanolis, Oriol Jorba, Kim Serradell Performance analysis Tools: a case study of NMMB on Marenostrum.

Slides:



Advertisements
Similar presentations
Barcelona Supercomputing Center. The BSC-CNS objectives: R&D in Computer Sciences, Life Sciences and Earth Sciences. Supercomputing support to external.
Advertisements

Performance Analysis Tools for High-Performance Computing Daniel Becker
PERFORMANCE ANALYSIS OF MULTIPLE THREADS/CORES USING THE ULTRASPARC T1 (NIAGARA) Unique Chips and Systems (UCAS-4) Dimitris Kaseridis & Lizy K. John The.
Automated Instrumentation and Monitoring System (AIMS)
Performance Visualizations using XML Representations Presented by Kristof Beyls Yijun Yu Erik H. D’Hollander.
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
Tools for Engineering Analysis of High Performance Parallel Programs David Culler, Frederick Wong, Alan Mainwaring Computer Science Division U.C.Berkeley.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 11: Monitoring Server Performance.
Chapter 14 Chapter 14: Server Monitoring and Optimization.
Generic Simulator for Users' Movements and Behavior in Collaborative Systems A Application D Design D Document Alex Surguch, Niv Saar, Mattan Margalith,
Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
MPI Program Performance. Introduction Defining the performance of a parallel program is more complex than simply optimizing its execution time. This is.
VTF Applications Performance and Scalability Sharon Brunett CACR/Caltech ASCI Site Review October 28,
1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.
10th Argo data management 2009 Toulouse User’s manual comment Here is a list of comments received on User’s manual version 2.2 since August 28 th. Jean-Philippe.
1 The VAMPIR and PARAVER performance analysis tools applied to a wet chemical etching parallel algorithm S. Boeriu 1 and J.C. Bruch, Jr. 2 1 Center for.
Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.
BSC tools hands-on session. 2 Objectives Copy ~nct00001/tools-material into your ${HOME} –cp –r ~nct00001/tools-material ${HOME} Contents of.
Tool Visualizations, Metrics, and Profiled Entities Overview Adam Leko HCS Research Laboratory University of Florida.
Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.
Judit Giménez, Juan González, Pedro González, Jesús Labarta, Germán Llort, Eloy Martínez, Xavier Pegenaute, Harald Servat Brief introduction.
1 Performance Analysis with Vampir DKRZ Tutorial – 7 August, Hamburg Matthias Weber, Frank Winkler, Andreas Knüpfer ZIH, Technische Universität.
Statistical Performance Analysis for Scientific Applications Presentation at the XSEDE14 Conference Atlanta, GA Fei Xing Haihang You Charng-Da Lu July.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15.
Computer and Automation Research Institute Hungarian Academy of Sciences Presentation and Analysis of Grid Performance Data Norbert Podhorszki and Peter.
Adventures in Mastering the Use of Performance Evaluation Tools Manuel Ríos Morales ICOM 5995 December 4, 2002.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
Petascale workshop 2013 Judit Gimenez Detailed evolution of performance metrics Folding.
Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.
CS 584. Performance Analysis Remember: In measuring, we change what we are measuring. 3 Basic Steps Data Collection Data Transformation Data Visualization.
Application performance and communication profiles of M3DC1_3D on NERSC babbage KNC with 16 MPI Ranks Thanh Phung, Intel TCAR Woo-Sun Yang, NERSC.
© 2008, Renesas Technology America, Inc., All Rights Reserved 1 Introduction Purpose  The course describes the performance analysis and profiling tools.
Overview of CrayPat and Apprentice 2 Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
1 Performance Analysis with Vampir ZIH, Technische Universität Dresden.
Performance Monitoring Tools on TCS Roberto Gomez and Raghu Reddy Pittsburgh Supercomputing Center David O’Neal National Center for Supercomputing Applications.
Profiling, Tracing, Debugging and Monitoring Frameworks Sathish Vadhiyar Courtesy: Dr. Shirley Moore (University of Tennessee)
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
Parameter Range Study of Numerically-Simulated Isolated Multicellular Convection Z. DuFran, B. Baranowski, C. Doswell III, and D. Weber This work is supported.
UAB Dynamic Tuning of Master/Worker Applications Anna Morajko, Paola Caymes Scutari, Tomàs Margalef, Eduardo Cesar, Joan Sorribes and Emilio Luque Universitat.
Workshop BigSim Large Parallel Machine Simulation Presented by Eric Bohm PPL Charm Workshop 2004.
NUG Meeting Performance Profiling Using hpmcount, poe+ & libhpm Richard Gerber NERSC User Services
VICOMTECH VISIT AT CERN CERN 2013, October 3 rd & 4 th O.COUET CERN/PH/SFT DATA VISUALIZATION IN HIGH ENERGY PHYSICS THE ROOT SYSTEM.
Rassul Ayani 1 Performance of parallel and distributed systems  What is the purpose of measurement?  To evaluate a system (or an architecture)  To compare.
Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida.
Computer Network Lab. Korea University Computer Networks Labs Se-Hee Whang.
Belgrade, 26 September 2014 George S. Markomanolis, Oriol Jorba, Kim Serradell Overview of on-going work on NMMB HPC performance at BSC.
Overview of AIMS Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green:
Belgrade, 24 September 2014 George S. Markomanolis, Oriol Jorba, Kim Serradell Introduction to an HPC environment.
A Simulation Framework to Automatically Analyze the Communication-Computation Overlap in Scientific Applications Vladimir Subotic, Jose Carlos Sancho,
Vertical Profiling : Understanding the Behavior of Object-Oriented Applications Sookmyung Women’s Univ. PsLab Sewon,Moon.
Performane Analyzer Performance Analysis and Visualization of Large-Scale Uintah Simulations Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance.
Other Tools HPC Code Development Tools July 29, 2010 Sue Kelly Sandia is a multiprogram laboratory operated by Sandia Corporation, a.
Projections - A Step by Step Tutorial By Chee Wai Lee For the 2004 Charm++ Workshop.
CEPBA-Tools experiences with MRNet and Dyninst Judit Gimenez, German Llort, Harald Servat
3GPP2 TSG-C SWG1.2 Maui, HI C R2/C of 12 TSG-C WG3/SWG1.2 joint meeting 12/07/05 BCMCS FEC evaluation simulation requirements.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
Tuning Threaded Code with Intel® Parallel Amplifier.
Parallel Performance Wizard: A Generalized Performance Analysis Tool Hung-Hsun Su, Max Billingsley III, Seth Koehler, John Curreri, Alan D. George PPW.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Germán Llort, Judit Giménez
Architectural Interactions in High Performance Clusters
Projections Overview Ronak Buch & Laxmikant (Sanjay) Kale
BigSim: Simulating PetaFLOPS Supercomputers
BSC TOOLS: Instrumentation & Analysis
Presentation transcript:

Belgrade, 25 September 2014 George S. Markomanolis, Oriol Jorba, Kim Serradell Performance analysis Tools: a case study of NMMB on Marenostrum Supercomputer

Outline 2 Introduction to Paraver Examples with NMMB/BSC-CTM Various Paraver views Configuration of Extrae tool Summary 2

Tools 3 Since 1991 Based on traces Open source: http//: Core tools: Paraver Extrae Dimemas 3

Paraver 4 Every behavioral aspect/metric described as a function of time Those functions of time can be rendered into a 2D image Statistics can be computed for each possible value or range of values of that function of time 4

Extrae 5 BSC instrumentation package When/Where Parallel programming model runtime Selected user functions Periodic samples User events Additional information Counters 5

Timelines 6 Representation Function of time Colour encoding 6

Paraver – Generic View 7 7 Part of the timeline Colours for different events Example for 68 MPI processes 1 hour global domain, 24km, 64 layers, meteo configuration

Paraver – Menu (from BSC Tools presentation) 8 8

Paraver – Load configuration (from BSC Tools presentation) 9 9

Paraver – Menu (from BSC Tools presentation) 10

Paraver – Profiles (from BSC Tools presentation) 11

Paraver – Profiles (from BSC Tools presentation) 12

Paraver – Histograms (from BSC Tools presentation) 13

Paraver – Histograms (from BSC Tools presentation) 14

Paraver –View 15 Running and observing the events Computation

Paraver – Computation View 16 Create a profile view for the following part of the trace

Paraver – Profile View 17 Create a profile view for the following part of the trace

Paraver – Profile View 18 Percentage of MPI calls Average=98.7% is the parallel efficiency Maximum = 99.98% is the communication efficiency Avg/max = 0.99 is perfect load balanced only for this part of the trace

Paraver – Useful Duration 19 Part of the timeline 1 hour global domain, 24km, 64 layers, meteo configuration Green low computation, blue significant computation (useful duration view)

Paraver – Time histogram 20 For better load balancing is needed to have vertical lines

Paraver – Instructions histogram 21 The computation is not uniform

Paraver – Instructions per cycle (IPC) 22 Efficient computation Useful efficient computation

Paraver – Useful computation histogram 23

Paraver – Useful time histogram 24

Paraver – Useful IPC histogram 25

Paraver – Useful L2 cache miss hit ratio 26 Per user function Table

Paraver – MPI calls 27 MPI calls excluding computation MPI calls with partial communication visualization

Paraver – Total bytes sent 28

Paraver – Max bytes sent 29

Paraver – Percentage of MPI time per user function 30

Paraver – Communication matrix 31

MPI – Send a message 32

Paraver – User functions 33 User functions Useful user functions

Paraver – Global – 24km - Meteo Simulation: 02/12/2005

Paraver – Global – 24km – Meteo – between radiations

Paraver – Global – 24km – Meteo – radiation

Communication matrix

Paraver – Global – 24km – Meteo/Dust/Chem Simulation: 21/05/2010

Paraver – Global – 24km – Meteo/Dust/Chem Simulation: 21/09/2010

Paraver – (useful) user functions

Computation load imbalance

Zoom between radiation calls for dust/sea-salt

Extrae 44 How to use: mpirun … wrapper.sh /path/umo.x Contents of wrapper.sh file: export EXTRAE_HOME=/installation_path/ export LD_PRELOAD=/installation_path/lib/libmpitrace.so export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/installation_path/lib source ${EXTRAE_HOME}/etc/extrae.sh export EXTRAE_CONFIG_FILE=/path/extrae_config.xml $*

</trace enabled=“yes” … PAPI_TOT_INS,PAPI_TOT_CYC … <merge enabled=“yes” … > $TRACE_NAME$ Extrae – XML file 45

Summary 46 The performance analysis of an application is a long and sometimes difficult task We used Extrae/Paraver to analyze our model Performance tools are needed more and more! Hardware counters are important to study the computation phases Load imbalance issues are well known to the community but need to be studied We identified some serialization issues Extrae needs to be properly configured

Thank you! Questions? 47