On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Bernd Mohr 1, Allen D. Malony 2, Rudi Eigenmann 3 1 Forschungszentrum.

Slides:



Advertisements
Similar presentations
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
Advertisements

Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.
OmpP: A Profiling Tool for OpenMP Karl Fürlinger Michael Gerndt {fuerling, Technische Universität München.
Sameer Shende Department of Computer and Information Science Neuro Informatics Center University of Oregon Tool Interoperability.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
TAU Parallel Performance System DOD UGC 2004 Tutorial Allen D. Malony, Sameer Shende, Robert Bell Univesity of Oregon.
The TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen.
Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, Department of.
1 Dr. Frederica Darema Senior Science and Technology Advisor NSF Future Parallel Computing Systems – what to remember from the past RAMP Workshop FCRC.
A Parallel Structured Ecological Model for High End Shared Memory Computers Dali Wang Department of Computer Science, University of Tennessee, Knoxville.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
The TAU Performance System: Advances in Performance Mapping Sameer Shende University of Oregon.
Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon TAU Performance.
Towards a Performance Tool Interface for OpenMP: An Approach Based on Directive Rewriting Bernd Mohr, Felix Wolf Forschungszentrum Jülich John von Neumann.
Allen D. Malony, Sameer Shende, Robert Bell Department of Computer and Information Science Computational Science Institute, NeuroInformatics.
Kai Li, Allen D. Malony, Robert Bell, Sameer Shende Department of Computer and Information Science Computational.
On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Rudi Eigenmann Department of Electrical and Computer Engineering.
The TAU Performance System Sameer Shende, Allen D. Malony, Robert Bell University of Oregon.
Sameer Shende, Allen D. Malony Computer & Information Science Department Computational Science Institute University of Oregon.
Instrumentation and Measurement CSci 599 Class Presentation Shreyans Mehta.
Performance Technology for Complex Parallel Systems REFERENCES.
Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
General Theme In general work in teams combining architects, compiler developers, performance and tools engineers, and application experts –Note this extends.
Cluster Reliability Project ISIS Vanderbilt University.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
A Component Infrastructure for Performance and Power Modeling of Parallel Scientific Applications Boyana Norris Argonne National Laboratory Van Bui, Lois.
Scalable Analysis of Distributed Workflow Traces Daniel K. Gunter and Brian Tierney Distributed Systems Department Lawrence Berkeley National Laboratory.
John Mellor-Crummey Robert Fowler Nathan Tallent Gabriel Marin Department of Computer Science, Rice University Los Alamos Computer Science Institute HPCToolkit.
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Manifold Execution Model and System.
Allen D. Malony Department of Computer and Information Science TAU Performance Research Laboratory University of Oregon Discussion:
Enabling Self-management of Component-based High-performance Scientific Applications Hua (Maria) Liu and Manish Parashar The Applied Software Systems Laboratory.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
© 2006, National Research Council Canada © 2006, IBM Corporation Solving performance issues in OTS-based systems Erik Putrycz Software Engineering Group.
Connections to Other Packages The Cactus Team Albert Einstein Institute
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs Allen D. Malony, Scott Biersdorff, Sameer Shende, Heike Jagode†, Stanimire.
Performane Analyzer Performance Analysis and Visualization of Large-Scale Uintah Simulations Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance.
Symbolic Analysis of Concurrency Errors in OpenMP Programs Presented by : Steve Diersen Contributors: Hongyi Ma, Liqiang Wang, Chunhua Liao, Daniel Quinlen,
Improving System Availability in Distributed Environments Sam Malek with Marija Mikic-Rakic Nels.
A Dynamic Tracing Mechanism For Performance Analysis of OpenMP Applications - Caubet, Gimenez, Labarta, DeRose, Vetter (WOMPAT 2001) - Presented by Anita.
Software Group © 2004 IBM Corporation Compiler Technology October 6, 2004 Experiments with auto-parallelizing SPEC2000FP benchmarks Guansong Zhang CASCON.
Presented by Jack Dongarra University of Tennessee and Oak Ridge National Laboratory KOJAK and SCALASCA.
Online Performance Analysis and Visualization of Large-Scale Parallel Applications Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance Research.
1 University of Maryland Using Information About Cache Evictions to Measure the Interactions of Application Data Structures Bryan R. Buck Jeffrey K. Hollingsworth.
PERFORMANCE OF THE OPENMP AND MPI IMPLEMENTATIONS ON ULTRASPARC SYSTEM Abstract Programmers and developers interested in utilizing parallel programming.
Parallel OpenFOAM CFD Performance Studies Student: Adi Farshteindiker Advisors: Dr. Guy Tel-Zur,Prof. Shlomi Dolev The Department of Computer Science Faculty.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Productive Performance Tools for Heterogeneous Parallel Computing
Performance Technology for Scalable Parallel Systems
Tracing and Performance Analysis Tools for Heterogeneous Multicore System by Soon Thean Siew.
TAU integration with Score-P
Texas Instruments TDA2x and Vision SDK
Allen D. Malony, Sameer Shende
TAU Parallel Performance System
Many-core Software Development Platforms
Model-Driven Analysis Frameworks for Embedded Systems
TAU: A Framework for Parallel Performance Analysis
Allen D. Malony Computer & Information Science Department
Outline Introduction Motivation for performance mapping SEAA model
Parallel Program Analysis Framework for the DOE ACTS Toolkit
Department of Computer Science, University of Tennessee, Knoxville
TAU Performance DataBase Framework (PerfDBF)
COMS 361 Computer Organization
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Bernd Mohr 1, Allen D. Malony 2, Rudi Eigenmann 3 1 Forschungszentrum Jülich, John von Neumann - Institut für Computing, Zentralinstitut für Angewandte Mathematik, 2 Department of Computer and Information Science, University of Oregon, 3 Department of Electrical and Computer Engineering, Purdue University,

WOMPAT August 5, 2002 Outline  SPEC OMP2001 benchmark suite  Detailed performance characterization study  Integrated performance tools in benchmarking suites  Motivation  Approach for OMP2001  POMP OpenMP performance monitoring interface  Opari automatic instrumentation  Profiling and trace measurement  EXPERT and TAU performance analysis  Experiments  Concluding remarks

WOMPAT August 5, 2002 SPEC OMP2001 Benchmark Suite  11 application programs used in scientific computing  CFD: APPLU, APSI, GALGEL, MGRID, SWIM  Molecular dynamics: AMMP  Crash simulation: FMA3D  Neural network: ART  Genetic algorithm: GAFORT  Earthquake modeling: EQUAKE  Quantum chromodynamics: WUPWISE  Fortran and C source code with OpenMP parallelization  Medium and large data sets  Goals of portability and relative ease of use

WOMPAT August 5, 2002 OMP2001 Performance Measurement Studies  OMP2001 measures and reports total execution time only  Scalability results for different processor numbers  “Performance Characteristics of the SPEC OMP2001 Benchmarks,” Aslot and Eigenman, EWOMP 2001  Study performance characteristics in detail  Timing profiles (scalability) across parallel sections  Memory system and cache (hardware counter) profiles  Use of high-resolution timers and hardware counters  Quantitative and qualitative explanations  Custom instrumentation and measurement libraries  Required hand-instrumentation of OpenMP constructs

WOMPAT August 5, 2002 Performance Tools and Benchmark Suites  Detailed performance measurement and analysis reveal interesting runtime characteristics in application codes  Important for performance diagnosis and tuning  Help to understand effects of new parallel API (OpenMP)  Benchmark suites typically do not have integrated tools  Portability of performance tools is poor  Hard to configure tools for benchmarking methodology  Tools often require manual application and operation  Automatic and portable performance tools could allow more in-depth, cross-platform performance analysis  Goal: integrated performance tools for OMP2001

WOMPAT August 5, 2002 Approach for OMP2001  Leverage state-of-the-art performance instrumentation, measurement, and analysis technology  POMP OpenMP performance monitoring interface  Opari automatic OpenMP source instrumentation  Performance profile and trace measurement libraries  EXPERT automatic event trace analyzer  TAU performance analysis system  Configure performance tools as integrated and automated components in OMP2001 benchmarking methodology  Conduct performance experiments on OMP2001codes  Evaluate with respect to portability, ease of use, results

WOMPAT August 5, 2002 Issues  Level of measurement detail  What is necessary and appropriate?  Could use base level and allow user-configured levels  Full program execution vs. portion of program execution  Distribution complexity  Tool packages should be added to benchmark distribution  Packages need to be easily obtained and configured  Must be public domain or licensed through SPEC  Publishing of detailed performance results  Part of official SPEC benchmark report?  …

WOMPAT August 5, 2002 TAU Performance System Framework  Tuning and Analysis Utilities  Performance system framework for scalable parallel and distributed high-performance computing  Targets a general complex system computation model  nodes / contexts / threads  Multi-level: system / software / parallelism  Measurement and analysis abstraction  Integrated toolkit for performance instrumentation, measurement, analysis, and visualization  Portable performance profiling/tracing facility  Open software approach

WOMPAT August 5, 2002 TAU Performance System Architecture EPILOG Paraver