Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.

Slides:



Advertisements
Similar presentations
Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon The TAU Performance.
Advertisements

Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
Sameer Shende, Allen D. Malony, and Alan Morris {sameer, malony, Steven Parker, and J. Davison de St. Germain {sparker,
ARCS Data Analysis Software An overview of the ARCS software management plan Michael Aivazis California Institute of Technology ARCS Baseline Review March.
Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.
Scalability Study of S3D using TAU Sameer Shende
Sameer Shende Department of Computer and Information Science Neuro Informatics Center University of Oregon Tool Interoperability.
Grouping Performance Data in TAU  Profile Groups  A group of related routines forms a profile group  Statically defined  TAU_DEFAULT, TAU_USER[1-5],
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
Profiling S3D on Cray XT3 using TAU Sameer Shende
TAU: Tuning and Analysis Utilities. TAU Performance System Framework  Tuning and Analysis Utilities  Performance system framework for scalable parallel.
Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon Integrating Performance.
TAU Parallel Performance System DOD UGC 2004 Tutorial Allen D. Malony, Sameer Shende, Robert Bell Univesity of Oregon.
The TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen.
Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, Department of.
TAU Performance System
On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Bernd Mohr 1, Allen D. Malony 2, Rudi Eigenmann 3 1 Forschungszentrum.
1 Dr. Frederica Darema Senior Science and Technology Advisor NSF Future Parallel Computing Systems – what to remember from the past RAMP Workshop FCRC.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
The TAU Performance System: Advances in Performance Mapping Sameer Shende University of Oregon.
Performance Tools BOF, SC’07 5:30pm – 7pm, Tuesday, A9 Sameer S. Shende Performance Research Laboratory University.
Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee Sameer Shende, and.
Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon TAU Performance.
June 2, 2003ICCS Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee.
Performance Evaluation of S3D using TAU Sameer Shende
TAU: Performance Regression Testing Harness for FLASH Sameer Shende
Scalability Study of S3D using TAU Sameer Shende
Allen D. Malony, Sameer Shende, Robert Bell Department of Computer and Information Science Computational Science Institute, NeuroInformatics.
Kai Li, Allen D. Malony, Robert Bell, Sameer Shende Department of Computer and Information Science Computational.
The TAU Performance System Sameer Shende, Allen D. Malony, Robert Bell University of Oregon.
Sameer Shende, Allen D. Malony Computer & Information Science Department Computational Science Institute University of Oregon.
Performance Technology for Complex Parallel Systems REFERENCES.
SC’01 Tutorial Nov. 7, 2001 TAU Performance System Framework  Tuning and Analysis Utilities  Performance system framework for scalable parallel and distributed.
Performance Technology for Complex Parallel Systems Part 2 – Complexity Scenarios Sameer Shende.
Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
POOMA 2.4 Progress and Plans Scott Haney, Mark Mitchell, James Crotinger, Jeffrey Oldham, and Stephen Smith October 22, 2001 Los Alamos National Laboratory.
TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
Performance Analysis Tool List Hans Sherburne Adam Leko HCS Research Laboratory University of Florida.
Allen D. Malony, Sameer S. Shende, Alan Morris, Robert Bell, Kevin Huck, Nick Trebon, Suravee Suthikulpanit, Kai Li, Li Li
Preparatory Research on Performance Tools for HPC HCS Research Laboratory University of Florida November 21, 2003.
Comparative Study of Parallel Performance Visualization Tools By J. Ramphis Castro December 4, 2002.
1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseehttp://
Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck Department of Computer.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
Allen D. Malony Department of Computer and Information Science Performance Research Laboratory.
Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs Allen D. Malony, Scott Biersdorff, Sameer Shende, Heike Jagode†, Stanimire.
Performane Analyzer Performance Analysis and Visualization of Large-Scale Uintah Simulations Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance.
Petaflops Application Meeting Dec Agenda Updates Updates Repositories Repositories Tools Tools Projects Projects Katherine on Steve Pieper’s.
Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon Integrating Performance.
Online Performance Analysis and Visualization of Large-Scale Parallel Applications Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance Research.
Shirley Moore Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress Shirley Moore
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Productive Performance Tools for Heterogeneous Parallel Computing
Introduction to the TAU Performance System®
Performance Technology for Scalable Parallel Systems
TAU integration with Score-P
Tutorial Outline – Part 1
Allen D. Malony, Sameer Shende
TAU Parallel Performance System
Performance Technology for Complex Parallel and Distributed Systems
TAU Parallel Performance System
TAU: A Framework for Parallel Performance Analysis
Allen D. Malony Computer & Information Science Department
Outline Introduction Motivation for performance mapping SEAA model
Allen D. Malony, Sameer Shende
Performance Technology for Complex Parallel and Distributed Systems
Parallel Program Analysis Framework for the DOE ACTS Toolkit
TAU Performance DataBase Framework (PerfDBF)
Presentation transcript:

Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access Incremental profile dumping Runtime profile access Multi-Level Performance Instrumentation and Mapping Optimized instrumentation Tracing library enhancement Performance mapping profile display System and Hardware Performance Integration Callback registration for system and hardware counters Trace recording of counts Project Goals and Recent Accomplishments Performance Technology for Tera-Class Parallel Computers: Evolution of the TAU Performance System Application / Library C / C++ parser Fortran 77/90 parser C / C++ IL analyzer Fortran 77/90 IL analyzer Program Database Files IL DUCTAPE PDBhtml SILOON CHASM TAU_instr Program documentation Application component glue C++ / F90 interoperability Automatic source instrumentation Program Database Toolkit (PDT) Platforms: IBM SP3, SGI Origin 2K/3K, Compaq Alpha Tru64/Linux Cluster, Cray T3E, Linux x86 / IA-64 clusters, Hewlett-Packard Superdome/V-class, Hitachi SR8000, NEC SX-5 Languages: C, C++, Fortran 77, F90, OpenMP, Java, Python Thread libraries: pthreads, SGI sproc, OpenMP, Java, Windows Communications libraries: MPI, PVM, SHMEM Parallelism paradigms: shared memory multi-threading, distributed memory message passing, mixed-mode Performance technologies: Dyninst dynamic instrumentation, PAPI and PCL hardware counter libraries, Opari automatic OpenMP instrumentation, EPILOG tracing library, EXPERT trace analyzer, Vampir trace visualization, Paraver trace visualization TAU Performance System Status SAMRAI (LLNL, Andy Wissink). TAU is being integrated in SAMRAI framework as the main performance measurement facility, replacing hand-instrumented counters and timers. TAU is included as part of the SAMRAI distribution. OVERTURE (LLNL, Brian Miller). TAU is being used in the Overture framework for object-oriented performance measurement. Miller has also integrated TAU in the adaptive-mesh refinement simulator, AMRSim, and is using TAU for another (undisclosed) software development project. ALPS (LLNL, James Schek). The Adaptive Laser Plasma Simulator uses SAMRAI as a software component. It gains access to TAU’s measurement support in SAMRAI as well as uses TAU directly in other ALPS code. C-SAFE (ASCI/ASAP; Utah, Chris Johnson and Steve Parker). TAU is being integrated in the Uintah Computation Framework (UCF) for use in performance measurement and analysis of C-SAFE applications. This work also includes new performance regression analysis and results reporting tools for cross-version and cross-experiment performance evaluation. PERC (DOE SciDAC; PERC consortium). TAU has been installed at PERC sites and is being applied in performance analysis studies of SciDAC applications, including the EVH11 astrophysics code. Oregon is an affiliate member of PERC. VTF (ASCI/ASAP; Caltech, Julian Cummings and Michael Aivazis). TAU is being integrated in the Virtual Test Facility (VTF) infrastructure because of its sole ability to portably measure performance across HPC systems with dynamically-loaded libraries and Python-controlled execution. OpenMP (ASCI Path Forward; Intel/KAI, Bob Kuhn; Pallas, Hans-Christian Hoppe). A performance monitoring interface for OpenMP is being defined and will be submitted to the OpenMP Architecture Review Board for standardization. A prototype has been developed and demonstrated with the TAU performance system and the EXPERT analysis tool. SAGE (LANL, Jack Horner). TAU has been applied to performance scalability analysis of the SAGE application. TAU is the only performance system that reliably provided a portable parallel profiling capability across all ASCI platforms. POOMA II (LANL / Code Sourcery LLC, Jeffrey Oldham and Mark Mitchel). TAU has been used as the primary tool for performance analysis throughout the POOMA project, most recently in the POOMA 2 development. TAU is the only tool able to instrument and measure the complex use of C++ templates and expression template parallelization in the POOMA framework. TAU Applications EPILOG Paraver TAU Performance System Architecture Work packet computation events colored by task type Distinct phases of computation can be identifed based on task Performance Mapping in Uintah Utah ASCI/ASAP Center for Simulation of Accidental Fires and Explosions Uintah Computational Framework (UCF) Taskgraph macro dataflow Performance mapping based on task semantics Performance Profiling of EVH1 Enhanced Virginia Hydrodynamics #1 benchmark IBM SP3, 16 processors jRacy parallel profile display tools Automatic instrumentation using PDT, MPI wrapper library TAU Integration in SAMRAI Structured Adaptive Mesh Refinement Application Infrastructure Group-based SAMRAI performance timers and counts Seamless integration in routine and MPI performance measurement Integrated OpenMP + MPI events Thread message pairing Mixed-Mode Performance Analysis (OpenMP + MPI) 2-D Stommel model of ocean circulation Jacobi iteration, 5-point stencil Integrated OpenMP and MPI events Uses OpenMP performance tools interface Automatic instrumentation with Opari Dr. Allen D. Malony, PI Dr. Sameer Shende, Post-doc Robert Bell, Research associate Kai Li, Ph.D. student Li Li, Ph.D. student Computational Science Institute Department Computer & Information Science University of Oregon DOE Grant No. DE FG03-01ER Raw performance data PerfDML data description Performance analysis programs PerfDML translators Performance analysis and query toolkit ORDB PostgreSQL TAU Performance Database Framework XML profile data representation Multiple experiment performance database