TAU Performance System

Slides:



Advertisements
Similar presentations
Machine Learning-based Autotuning with TAU and Active Harmony Nicholas Chaimov University of Oregon Paradyn Week 2013 April 29, 2013.
Advertisements

K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.
University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
Workload Characterization using the TAU Performance System Sameer Shende, Allen D. Malony, Alan Morris University of Oregon {sameer,
S3D: Performance Impact of Hybrid XT3/XT4 Sameer Shende
Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.
Scalability Study of S3D using TAU Sameer Shende
Sameer Shende Department of Computer and Information Science Neuro Informatics Center University of Oregon Tool Interoperability.
Profiling S3D on Cray XT3 using TAU Sameer Shende
TAU Parallel Performance System DOD UGC 2004 Tutorial Allen D. Malony, Sameer Shende, Robert Bell Univesity of Oregon.
The TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen.
Case Study: PETSc ex19  Non-linear solver (snes)  2-D driven cavity code  uses velocity-velocity formulation  finite difference discretization on a.
TAU Performance SystemS3D Scalability Study1 Total Execution Time.
Workshop on Performance Tools for Petascale Computing 9:30 – 10:30am, Tuesday, July 17, 2007, Snowbird, UT Sameer S. Shende
TAU Performance System Alan Morris, Sameer Shende, Allen D. Malony University of Oregon {amorris, sameer,
Performance Tools BOF, SC’07 5:30pm – 7pm, Tuesday, A9 Sameer S. Shende Performance Research Laboratory University.
Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee Sameer Shende, and.
June 2, 2003ICCS Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee.
TAU PERFORMANCE SYSTEM Sameer Shende Alan Morris, Wyatt Spear, Scott Biersdorff Performance Research Lab Allen D. Malony, Shangkar Mayanglambam, Suzanne.
Workshop on Performance Tools for Petascale Computing 9:30 – 10:30am, Tuesday, July 17, 2007, Snowbird, UT Sameer S. Shende
Performance Evaluation of S3D using TAU Sameer Shende
TAU: Performance Regression Testing Harness for FLASH Sameer Shende
Scalability Study of S3D using TAU Sameer Shende
Optimization of Instrumentation in Parallel Performance Evaluation Tools Sameer Shende, Allen D. Malony, Alan Morris University of Oregon {sameer,
S3D: Comparing Performance of XT3+XT4 with XT4 Sameer Shende
The TAU Performance System Sameer Shende, Allen D. Malony, Robert Bell University of Oregon.
Sameer Shende, Allen D. Malony Computer & Information Science Department Computational Science Institute University of Oregon.
Performance Technology for Complex Parallel Systems REFERENCES.
Charm++ Load Balancing Framework Gengbin Zheng Parallel Programming Laboratory Department of Computer Science University of Illinois at.
© 2008 Pittsburgh Supercomputing Center Performance Engineering of Parallel Applications Philip Blood, Raghu Reddy Pittsburgh Supercomputing Center.
TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
A Component Infrastructure for Performance and Power Modeling of Parallel Scientific Applications Boyana Norris Argonne National Laboratory Van Bui, Lois.
Using TAU on SiCortex Alan Morris, Aroon Nataraj Sameer Shende, Allen D. Malony University of Oregon {amorris, anataraj, sameer,
John Mellor-Crummey Robert Fowler Nathan Tallent Gabriel Marin Department of Computer Science, Rice University Los Alamos Computer Science Institute HPCToolkit.
Profile Analysis with ParaProf Sameer Shende Performance Reseaerch Lab, University of Oregon
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
Workshop BigSim Large Parallel Machine Simulation Presented by Eric Bohm PPL Charm Workshop 2004.
PerfExplorer Component for Performance Data Analysis Kevin Huck – University of Oregon Boyana Norris – Argonne National Lab Li Li – Argonne National Lab.
Allen D. Malony, Sameer S. Shende, Alan Morris, Robert Bell, Kevin Huck, Nick Trebon, Suravee Suthikulpanit, Kai Li, Li Li
TAU PERFORMANCE SYSTEM Sameer Shende Alan Morris, Wyatt Spear, Scott Biersdorff Performance Research Lab Allen D. Malony, Kevin Huck, Aroon Nataraj Department.
Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida.
Lawrence Livermore National Laboratory S&T Principal Directorate - Computation Directorate Tools and Scalable Application Preparation Project Computation.
Simplifying the Usage of Performance Evaluation Tools: Experiences with TAU and DyninstAPI Paradyn/Condor Week 2010, Rm 221, Fluno Center, U. of Wisconsin,
21 Sep UPC Performance Analysis Tool: Status and Plans Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant.
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
Allen D. Malony Department of Computer and Information Science Performance Research Laboratory University.
Allen D. Malony Department of Computer and Information Science Performance Research Laboratory.
Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs Allen D. Malony, Scott Biersdorff, Sameer Shende, Heike Jagode†, Stanimire.
Distributed Real-time Systems- Lecture 01 Cluster Computing Dr. Amitava Gupta Faculty of Informatics & Electrical Engineering University of Rostock, Germany.
TAU Performance System ® TAU is a profiling and tracing toolkit that supports programs written in C, C++, Fortran, Java, Python,
Other Tools HPC Code Development Tools July 29, 2010 Sue Kelly Sandia is a multiprogram laboratory operated by Sandia Corporation, a.
TAU Performance System Sameer Shende Performance Reseaerch Lab, University of Oregon
Petaflops Application Meeting Dec Agenda Updates Updates Repositories Repositories Tools Tools Projects Projects Katherine on Steve Pieper’s.
Presented by Jack Dongarra University of Tennessee and Oak Ridge National Laboratory KOJAK and SCALASCA.
© 2010 Pittsburgh Supercomputing Center Performance Engineering of Parallel Applications Philip Blood, Raghu Reddy Pittsburgh Supercomputing Center.
Navigating TAU Visual Display ParaProf and TAU Portal Mahin Mahmoodi Pittsburgh Supercomputing Center 2010.
Introduction to the TAU Performance System®
Performance Technology for Scalable Parallel Systems
Tracing and Performance Analysis Tools for Heterogeneous Multicore System by Soon Thean Siew.
TAU integration with Score-P
TAU: Performance Technology for Productive, High Performance Computing
Allen D. Malony, Sameer Shende
TAU The 11th DOE ACTS Workshop
TAU: A Framework for Parallel Performance Analysis
Outline Introduction Motivation for performance mapping SEAA model
Parallel Program Analysis Framework for the DOE ACTS Toolkit
Presentation transcript:

TAU Performance System

TAU Performance SystemIBM Blue Gene Consortium2 TAU Parallel Performance System   Multi-level performance instrumentation  Multi-language automatic source instrumentation  Flexible and configurable performance measurement  Widely-ported parallel performance profiling system  Computer system architectures and operating systems  Different programming languages and compilers  Support for multiple parallel programming paradigms  Multi-threading, message passing, mixed-mode, hybrid

TAU Performance SystemIBM Blue Gene Consortium3 TAU Port to IBM BG/P  Supports automatic instrumentation at:  Source level (PDT, tau_instrumentor; KOJAK, opari)  MPI  Flexible and configurable performance measurement  Support for profiling and tracing  Support for PAPI counters on BG/P  Uses bgxlC_r, bgxlc_r, bgxlf90_r as compilers  To configure TAU: ./installtau -arch=bgp -mpi -pdt= -pdt_c++=xlC -papi= ./tau_validate --html --build bgp >& results.html  Parallel Profile Analysis:  Paraprof profile browser  PerfDMF profile database  Perfexplorer cross-experiment data analysis toolkit

TAU Performance SystemIBM Blue Gene Consortium4 Using TAU on IBM BGP (surveyor.alcf.anl.gov)  Choose measurement configuration % ls /soft/apps/tau/tau_latest/bgp/lib/Makefile.* Makefile.tau-mpi-pdt Makefile.tau-mpi-pdt-trace Makefile.tau-callpath-mpi-pdt Makefile.tau-callpath-mpi-compensate-pdt Makefile.tau-depthlimit-mpi-pdt Makefile.tau-mpi-compensate-pdt Makefile.tau-multiplecounters-mpi-papi-pdt Makefile.tau-multiplecounters-mpi-papi-pdt-trace Makefile.tau-multiplecounters-papi-pdt Makefile.tau-multiplecounters-pthread-papi-pdt Makefile.tau-pdt Makefile.tau-phase-multiplecounters-mpi-compensate-papi-pdt Makefile.tau-phase-multiplecounters-mpi-papi-pdt Makefile.tau-pthread-pdt … % setenv TAU_MAKEFILE /soft/apps/tau/tau-2.17/bgp/lib/Makefile.tau-mpi-pdt % set path=(/soft/apps/tau/tau-2.17/ppc64/bin $path) # Front-end binaries  Replace mpixlf90_r with tau_f90.sh and compile your application  Use tau_cxx.sh and tau_cc.sh for C++ and C compilers respectively

TAU Performance SystemIBM Blue Gene Consortium5 Using TAU on IBM BGP (surveyor.alcf.anl.gov)  Choose measurement configuration % ls /soft/apps/tau/tau_latest/bgp/lib/Makefile.* Makefile.tau-mpi-pdt Makefile.tau-mpi-pdt-trace Makefile.tau-callpath-mpi-pdt Makefile.tau-callpath-mpi-compensate-pdt Makefile.tau-depthlimit-mpi-pdt Makefile.tau-mpi-compensate-pdt Makefile.tau-multiplecounters-mpi-papi-pdt Makefile.tau-multiplecounters-mpi-papi-pdt-trace Makefile.tau-multiplecounters-papi-pdt Makefile.tau-multiplecounters-pthread-papi-pdt Makefile.tau-pdt Makefile.tau-phase-multiplecounters-mpi-compensate-papi-pdt Makefile.tau-phase-multiplecounters-mpi-papi-pdt Makefile.tau-pthread-pdt … % setenv TAU_MAKEFILE /soft/apps/tau/tau-2.17/bgp/lib/Makefile.tau-mpi-pdt % set path=(/soft/apps/tau/tau-2.17/ppc64/bin $path) # Front-end binaries  Replace mpixlf90_r with tau_f90.sh and compile your application  Use tau_cxx.sh and tau_cc.sh for C++ and C compilers respectively  Visualize performance data with paraprof, pprof, vampir, jumpshot

TAU Performance SystemIBM Blue Gene Consortium6 TAU’s ParaProf 3D Profile Browser: Matmult

TAU Performance SystemIBM Blue Gene Consortium7 Profiling FLASH3 on IBM BG/P

TAU Performance SystemIBM Blue Gene Consortium8 Sedov 2D Auto Initial test run did not include a load balanced problem Small problem: too little work for 1024 processor Proof of concept to validate porting of tools

TAU Performance SystemIBM Blue Gene Consortium9 PerfExplorer: Cross Experiment Analysis

TAU Performance SystemIBM Blue Gene Consortium10 TAU PerfExplorer: Runtime Breakdown MPI_Barrier IO_OUTPUT

TAU Performance SystemIBM Blue Gene Consortium11 Relative Efficiency

TAU Performance SystemIBM Blue Gene Consortium12 Relative Speedup for One Event

TAU Performance SystemIBM Blue Gene Consortium13 TAU’s PerfExplorer: IBM BG/P

TAU Performance SystemIBM Blue Gene Consortium14 TAU Portal  TAU portal supports the FLASH regression testing  Allows groups to share profiling data in a secure way  Allows users to launch TAU performance tools (paraprof, perfexplorer)  Nightly regression testcases uploaded to the database automatically  SVN checkout each night  TAU:  TAU Portal:

TAU Performance SystemIBM Blue Gene Consortium15 Portal: Nightly Performance Regression Testing

TAU Performance SystemIBM Blue Gene Consortium16 TAU Portal: Launch ParaProf/PerfExplorer

TAU Performance SystemIBM Blue Gene Consortium17 PerfExplorer: Regression Testing

TAU Performance SystemIBM Blue Gene Consortium18 PerfExplorer: Limiting Events (> 3% ), Oct 2007

TAU Performance SystemIBM Blue Gene Consortium19 PerfExplorer: Exclusive Time for Events (2007)

TAU Performance SystemIBM Blue Gene Consortium20 ParaProf: 3D Visualization

TAU Performance SystemIBM Blue Gene Consortium21 Support Acknowledgements  Department of Energy (DOE)  Office of Science  LLNL, LANL, ASC  Argonne National Laboratory  University of Chicago  Department of Defense  NSF