Profile Analysis with ParaProf Sameer Shende Performance Reseaerch Lab, University of Oregon

Slides:



Advertisements
Similar presentations
Barcelona Supercomputing Center. The BSC-CNS objectives: R&D in Computer Sciences, Life Sciences and Earth Sciences. Supercomputing support to external.
Advertisements

Machine Learning-based Autotuning with TAU and Active Harmony Nicholas Chaimov University of Oregon Paradyn Week 2013 April 29, 2013.
Automated Instrumentation and Monitoring System (AIMS)
Workload Characterization using the TAU Performance System Sameer Shende, Allen D. Malony, Alan Morris University of Oregon {sameer,
Parallelization Technology v 0.2 Parallel-Developers Discussion 6/29/11.
Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.
Scalability Study of S3D using TAU Sameer Shende
Sameer Shende Department of Computer and Information Science Neuro Informatics Center University of Oregon Tool Interoperability.
Profiling S3D on Cray XT3 using TAU Sameer Shende
TAU Parallel Performance System DOD UGC 2004 Tutorial Allen D. Malony, Sameer Shende, Robert Bell Univesity of Oregon.
The TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen.
TAU Performance System
Workshop on Performance Tools for Petascale Computing 9:30 – 10:30am, Tuesday, July 17, 2007, Snowbird, UT Sameer S. Shende
TAU Performance System Alan Morris, Sameer Shende, Allen D. Malony University of Oregon {amorris, sameer,
Performance Tools BOF, SC’07 5:30pm – 7pm, Tuesday, A9 Sameer S. Shende Performance Research Laboratory University.
Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon TAU Performance.
Performance Evaluation of S3D using TAU Sameer Shende
TAU: Performance Regression Testing Harness for FLASH Sameer Shende
Scalability Study of S3D using TAU Sameer Shende
The TAU Performance System Sameer Shende, Allen D. Malony, Robert Bell University of Oregon.
Sameer Shende, Allen D. Malony Computer & Information Science Department Computational Science Institute University of Oregon.
Contemporary Languages in Parallel Computing Raymond Hummel.
Hossein Bastan Isfahan University of Technology 1/23.
Performance Tools for Empirical Autotuning Allen D. Malony, Nick Chaimov, Kevin Huck, Scott Biersdorff, Sameer Shende
Allen D. Malony Performance Research Laboratory (PRL) Neuroinformatics Center (NIC) Department.
1 Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Markus Geimer 2), Bert Wesarg 1), Brian Wylie.
DKRZ Tutorial 2013, Hamburg Analysis report examination with CUBE Markus Geimer Jülich Supercomputing Centre.
Analysis report examination with CUBE Alexandru Calotoiu German Research School for Simulation Sciences (with content used with permission from tutorials.
SUPER 1 Bob Lucas University of Southern California Sept. 23, 2011 Science Pipeline Allen D. Malony University of Oregon May 6, 2014 Support for this work.
Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering1 Score-P Hands-On CUDA: Jacobi example.
Chee Wai Lee, Allen D. Malony, Alan Morris Department of Computer and Information Science Performance Research.
SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering Introduction to VI-HPS Brian Wylie Jülich Supercomputing Centre.
Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Alexandru Calotoiu German Research School for.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
Using TAU on SiCortex Alan Morris, Aroon Nataraj Sameer Shende, Allen D. Malony University of Oregon {amorris, anataraj, sameer,
Allen D. Malony 1, Scott Biersdorff 2, Wyatt Spear 2 1 Department of Computer and Information Science 2 Performance Research Laboratory University of Oregon.
Performance Analysis Tool List Hans Sherburne Adam Leko HCS Research Laboratory University of Florida.
Kevin A. Huck Department of Computer and Information Science Performance Research Laboratory University of.
ASC Tri-Lab Code Development Tools Workshop Thursday, July 29, 2010 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA This work.
PerfExplorer Component for Performance Data Analysis Kevin Huck – University of Oregon Boyana Norris – Argonne National Lab Li Li – Argonne National Lab.
1 Typical performance bottlenecks and how they can be found Bert Wesarg ZIH, Technische Universität Dresden.
Allen D. Malony Performance Research Laboratory Department of Computer and Information Science.
Overview of AIMS Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green:
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
Allen D. Malony Department of Computer and Information Science Performance Research Laboratory University.
Allen D. Malony Department of Computer and Information Science Performance Research Laboratory.
Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs Allen D. Malony, Scott Biersdorff, Sameer Shende, Heike Jagode†, Stanimire.
SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering Analysis report examination with CUBE Markus Geimer Jülich Supercomputing.
Performane Analyzer Performance Analysis and Visualization of Large-Scale Uintah Simulations Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance.
TAU Performance System ® TAU is a profiling and tracing toolkit that supports programs written in C, C++, Fortran, Java, Python,
Other Tools HPC Code Development Tools July 29, 2010 Sue Kelly Sandia is a multiprogram laboratory operated by Sandia Corporation, a.
TAU Performance System Sameer Shende Performance Reseaerch Lab, University of Oregon
Integration and Synthesis for Automated Performance Tuning TAU Performance System ®  Performance problem solving framework for HPC  Integrated, scalable,
Navigating TAU Visual Display ParaProf and TAU Portal Mahin Mahmoodi Pittsburgh Supercomputing Center 2010.
Profiling OpenSHMEM with TAU Commander
Performance Tool Integration in Programming Environments for GPU Acceleration: Experiences with TAU and HMPP Allen D. Malony1,2, Shangkar Mayanglambam1.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Introduction to the TAU Performance System®
Python Performance Evaluation with the TAU Performance System
Performance Technology for Scalable Parallel Systems
Thanks for attending the ParaTools TAU Webex!
TAUmon: Scalable Online Performance Data Analysis in TAU
TAU integration with Score-P
Allen D. Malony, Sameer Shende
TAU Parallel Performance System
Advanced TAU Commander
Outline Introduction Motivation for performance mapping SEAA model
TAU Performance DataBase Framework (PerfDBF)
Presentation transcript:

Profile Analysis with ParaProf Sameer Shende Performance Reseaerch Lab, University of Oregon

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering TAU Performance System ® ( Parallel performance framework and toolkit –Supports all HPC platforms, compilers, runtime system –Provides portable instrumentation, measurement, analysis

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering TAU Performance System ® Instrumentation –Fortran, C++, C, UPC, Java, Python, Chapel –Automatic instrumentation Measurement and analysis support –MPI, OpenSHMEM, ARMCI, PGAS, DMAPP –pthreads, OpenMP, hybrid, other thread models –GPU, CUDA, OpenCL, OpenACC –Parallel profiling and tracing –Use of Score-P for native OTF2 and CUBEX generation –Efficient callpath proflles and trace generation using Score-P Analysis –Parallel profile analysis (ParaProf), data mining (PerfExplorer) –Performance database technology (PerfDMF, TAUdb) –3D profile browser

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering TAU Analysis

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering ParaProf Profile Analysis Framework

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering Parallel Profile Visualization: ParaProf

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering Parallel Profile Visualization: ParaProf

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering ParaProf: 3D Communication Matrix

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering Hands-on: Profile report exploration The Live-DVD contains Score-P experiments of BT-MZ –class “B“, 4 processes with 4 OpenMP threads each –collected on a dedicated node of the SuperMUC HPC system at Leibniz Rechenzentrum (LRZ), Munich, Germany Start TAU‘s paraprof GUI with default profile report 9 % cd % cd workshop-vihps/supermuc_expts % ls periscope-1.5 scorep_bt-mz_B_4x4_sum README scorep_bt-mz_B_4x4_sum+mets run.out scorep_bt-mz_B_4x4_trace scorep _1740_ % paraprof scorep _1740_ /profile.cubex OR % paraprof scorep_bt-mz_B_4x4_trace/scout.cubex

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering ParaProf: Manager Window: scout.cubex Metrics in the profile

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering ParaProf: Main window

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering ParaProf: Options Unselect this to expand each routine in its own space

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering ParaProf: Each color represents an event executing on one or more threads

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering ParaProf: Windows Right click on a given node to choose other windows

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering ParaProf: Thread Statistics Table Click to sort by a given metric, drag and move to rearrange columns

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering Example: Score-P with TAU (LU NPB)

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering ParaProf: Thread Callgraph Window Click on options to choose a different color or to resize the box based on metrics

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering ParaProf: Callpath Thread Relations Window

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering ParaProf:Windows -> 3D Visualization -> Bar Plot

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering ParaProf: 3D Scatter Plot

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering ParaProf: Scatter Plot

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering ParaProf: 3D Topology View for a Routine

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering ParaProf: Topology View 3D Torus (IBM BG/P)

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering ParaProf:Topology View (6D Torus Coordinates BG/Q)

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering ParaProf: Node View

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering ParaProf: Add Thread to Comparison Window

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering ParaProf: Score-P Profile Files, Database

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering ParaProf: File -> Preferences

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering ParaProf: Group Changer Window

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering ParaProf: Options -> Derived Metric Panel

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering Sorting Derived Flops Metric by Exclusive Time

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering U.S. Department of Energy (DOE) –Office of Science –ASC/NNSA, Tri-labs (LLNL,LANL, SNL) U.S. Department of Defense (DoD) –HPC Modernization Office (HPCMO) NSF Software Development for Cyberinfrastructure (SDCI) Juelich Supercomputing Center, NIC Argonne National Laboratory Technical University Dresden ParaTools, Inc. NVIDIA Support Acknowledgments