Preparatory Research on Performance Tools for HPC HCS Research Laboratory University of Florida November 21, 2003.

Slides:



Advertisements
Similar presentations
Using MapuSoft Instead of OS Vendor’s Simulators.
Advertisements

Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Operating Systems Parallel Systems (Now basic OS knowledge)
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts Essentials – 2 nd Edition Chapter 4: Threads.
Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.
Operating Systems Parallel Systems and Threads (Soon to be basic OS knowledge)
Sameer Shende Department of Computer and Information Science Neuro Informatics Center University of Oregon Tool Interoperability.
The TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen.
On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Bernd Mohr 1, Allen D. Malony 2, Rudi Eigenmann 3 1 Forschungszentrum.
Chapter 13 Embedded Systems
Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee Sameer Shende, and.
June 2, 2003ICCS Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 2: Operating-System Structures Modified from the text book.
Sameer Shende, Allen D. Malony Computer & Information Science Department Computational Science Institute University of Oregon.
Instrumentation and Measurement CSci 599 Class Presentation Shreyans Mehta.
Overview of Eclipse Parallel Tools Platform Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Computer System Architectures Computer System Software
September 6, 2015 Connecting Client Applications to Informix Databases using IBM Informix Connect and ODBC James Edmiston Database Consultant Quest Information.
PAPI Tool Evaluation Bryan Golden 1/4/2004 HCS Research Laboratory University of Florida.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
@2011 Mihail L. Sichitiu1 Android Introduction Platform Overview.
UPC/SHMEM PAT High-level Design v.1.1 Hung-Hsun Su UPC Group, HCS lab 6/21/2005.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 4: Threads.
Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum.
Welcome to the Power of 64-bit Computing …now available on your desktop! © 1998, 1999 Compaq Computer Corporation.
1 Lecture 4: Threads Operating System Fall Contents Overview: Processes & Threads Benefits of Threads Thread State and Operations User Thread.
Process Management. Processes Process Concept Process Scheduling Operations on Processes Interprocess Communication Examples of IPC Systems Communication.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Adventures in Mastering the Use of Performance Evaluation Tools Manuel Ríos Morales ICOM 5995 December 4, 2002.
Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.
Chapter 34 Java Technology for Active Web Documents methods used to provide continuous Web updates to browser – Server push – Active documents.
BLU-ICE and the Distributed Control System Constraints for Software Development Strategies Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
11 July 2005 Tool Evaluation Scoring Criteria Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant Mr. Adam Leko,
VAMPIR. Visualization and Analysis of MPI Resources Commercial tool from PALLAS GmbH VAMPIRtrace - MPI profiling library VAMPIR - trace visualization.
March 17, 2005 Roadmap of Upcoming Research, Features and Releases Bart Miller & Jeff Hollingsworth.
SvPablo Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Performance Analysis Tool List Hans Sherburne Adam Leko HCS Research Laboratory University of Florida.
Profiling, Tracing, Debugging and Monitoring Frameworks Sathish Vadhiyar Courtesy: Dr. Shirley Moore (University of Tennessee)
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Portable Parallel Performance Tools Shirley Browne, UTK Clay Breshears, CEWES MSRC Jan 27-28, 1998.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
© 2001 Barton P. MillerParadyn/Condor Week (12 March 2001, Madison/WI) The Paradyn Port Report Barton P. Miller Computer Sciences Department.
Comparative Study of Parallel Performance Visualization Tools By J. Ramphis Castro December 4, 2002.
Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida.
1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseehttp://
Overview of AIMS Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green:
21 Sep UPC Performance Analysis Tool: Status and Plans Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant.
1 University of Maryland Runtime Program Evolution Jeff Hollingsworth © Copyright 2000, Jeffrey K. Hollingsworth, All Rights Reserved. University of Maryland.
Performance Testing Test Complete. Performance testing and its sub categories Performance testing is performed, to determine how fast some aspect of a.
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
1 Presentation Methodology Summary B. Golden. 2 Introduction Why use visualizations?  To facilitate user comprehension  To convey complexity and intricacy.
PAPI on Blue Gene L Using network performance counters to layout tasks for improved performance.
© 2001 Week (14 March 2001)Paradyn & Dyninst Demonstrations Paradyn & Dyninst Demos Barton P. Miller Computer.
Parallel Performance Wizard: A Generalized Performance Analysis Tool Hung-Hsun Su, Max Billingsley III, Seth Koehler, John Curreri, Alan D. George PPW.
Lecture 1 Page 1 CS 111 Summer 2013 Important OS Properties For real operating systems built and used by real people Differs depending on who you are talking.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
PERFORMANCE OF THE OPENMP AND MPI IMPLEMENTATIONS ON ULTRASPARC SYSTEM Abstract Programmers and developers interested in utilizing parallel programming.
Introduction to Operating Systems Concepts
Introduction to threads
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Introduction to Operating System (OS)
Performance Analysis, Tools and Optimization
A configurable binary instrumenter
Chapter 4: Threads.
Chapter 4: Threads.
Chapter 4: Threads.
Parallel Program Analysis Framework for the DOE ACTS Toolkit
Presentation transcript:

Preparatory Research on Performance Tools for HPC HCS Research Laboratory University of Florida November 21, 2003

11/21/032 Overview Background Evaluation Criteria  Ideal tool  Quantitative categories Performance Tools  Tool descriptions  Potential use and modifications Conclusions and Future Plans

11/21/033 Background High-performance computing  Drastic increase in complexity  Performance tools cannot keep pace Tools used for performance analysis, to identify bottlenecks, and enable source code and compilation optimizations Performance tools for HPC  Performance tools are often afterthoughts  Performance tools do not sell systems, therefore considered less important by vendors

11/21/034 Background UPC  Roots = Split-C, AC, Parallel C Preprocessor Takes best aspects of each  Parallel extension of ANSI C standard Allows parallel programming in familiar C style  Challenge: achieving optimal performance Abstracts communication between threads  Shared-memory programming model offers many advantages  Programs less complex than MPI and others, software more scalable  Allows shared data structures between threads Targeted for both shared- and distributed-memory architectures  Implementations for variety of systems and growing  CC-NUMA, SMP, Clusters

11/21/035 Background SHMEM  Single-ended, asynchronous communication library Remote write and read support without involvement or notification of remote CPU  Direct memory-to-memory copy  Uses explicit function calls (i.e. get and put) “Virtual” shared memory Only supported on Silicon Graphics and Cray systems  HCS lab evaluating options for clusters (e.g. via SCI, QsNet) Low-level functions and subroutines efficiently use hardware circuitry for low-overhead communication

11/21/036 Background Importance of performance tools  Identify bottlenecks in program Poor mapping of program to architecture Unoptimized code Parallelization areas Compiler inefficiencies  Provide insight on how code actually executes on specific architecture UPC and SHMEM have limited support  TotalView (UPC): Debugging software  CrayPat (SHMEM and UPC): Performance profiler  Vampirtrace (SHMEM): Performance profiler

11/21/037 Tool Evaluation Criteria – Features Desirable features for UPC/SHMEM performance tools:  Ability to profile each thread (and its variables) independently  Be able to show breakdown of communication with remote threads Frequency of read/write memory on each remote thread Frequency of each remote thread read/write data on local thread  Basic functional and block performance profiling (delay per function, total functions, total time in block) Highlight bottlenecks, points of contention  Break down computational stalls (block for I/O, shared-memory access, data dependencies, etc)  Communication delay on interconnect (SCI, GigE, IBA, QsNet, etc.)  Real-time profiling  Compiler independent  Network independent  Platform independent

11/21/038 Tool Evaluation Criteria – e.g. QFD Table FeatureWeight Independent thread profiling Remote thread communication Basic functional and block profiling Breakdown of stalls Real-time profiling Communication delay on interconnect Compiler independent Network independent Platform independent Miscellaneous characteristics

11/21/039 Tools Overview UPC tools  TotalView = show code correctness not performance! SHMEM tools  CrayPat  Vampir and VampirTrace General performance tools  Vampir and VampirTrace  PAPI  Perfometer  Kojak  SvPablo  Paradyn  TAU

11/21/0310 UPC Tools: TotalView Overview Developed by Etnus, LLC Version 6.3 (supported UPC since version 6.1) Debugger only, no performance analysis Supports UPC, SHMEM, C/C++, Fortran, MPI, OpenMP, and others  Supports UPC on SGI IRIX, HP Tru64 Features Commercial product Tests code modifications without recompilation Supports independent thread debugging Shared variable views  On each thread  Altogether (e.g. arrays) Other basic features  Breakpoints  Memory debugging  Reliable handling of complex code Desired Enhancements Performance profiling of UPC and SHMEM  Basic statistics gathering a starting point Minimize reduction in performance

11/21/0311 SHMEM Tools: CrayPat Overview Developed by CRAY Performance analysis and tracing tool for Cray X1 Only works for Cray systems, not currently portable Provides run-time analysis and profiling of program performance Supports Fortran, MPI, MPI2, Pthreads, SHMEM, and UPC Features Provided with Cray systems At cost of added complexity can provide extreme levels of detail Needs rebuilding of application with CrayPat instrumentation code and libraries Replacement for Cray SV1 performance analysis and profiling tools Provides direct access to read hardware performance counters Allows user to aggregate, display, format, and export collected performance data in various different ways Provides I/O performance profiling for Fortran, asynchronous, and system call routines Command line interface Desired Enhancements Support for architectures other than Cray systems User-friendly GUI

11/21/0312 Performance Tools: Vampirtrace Overview Developed by Pallas Version 4.0 Supports all platforms that use GNU Compiler Collection to compiler C or Fortran code Supports Java, C, and Fortran Supports MPI, Global Array programming model and SHMEM Features Commercial product Vampirtrace = generates program trace Vampir = GUI used to analyze trace Supports multithreaded MPI programs Link Vampirtrace library during compilation Can also record arbitrary user-defined events  Entry and exits from subroutines  Execution of code blocks Filtering mechanism to focus on user-defined events and statistics Desired Enhancements Support UPC programming model

11/21/0313 Performance Tools: PAPI Overview Developed at Innovative Computing Laboratory at U. of Tennessee, Knoxville Version released May 2003 Monitors computation events using hardware counters available on modern processors Available for Windows, Linux, UNIX platforms Features Open source, free download of full version Consists of two layers of software  Portable Platform Independent Layer — API  Platform Specific Layer — Interface substrate that allows API to communicate with hardware counters via patched kernel, operating system, or directly Linux systems must have kernel patched with perfctr tool to allow access to hardware counters Provides two interfaces  High-level interface for simple measurements and purposes  Low-level interface for more complex and sophisticated purposes Many tools feature optional support for PAPI  Additional features available when tool is configured with PAPI support  SvPablo, Perfometer, Visual Profiler, among others Example metrics: L1 data cache misses, cache line invalidation, floating-point stalls, instructions per second Desired Enhancements Addition of new PAPI metrics that reflect key issues directly relating to UPC/SHMEM

11/21/0314 Performance Tools: Perfometer Overview Developed at U. of Tennessee, Knoxville Version 1.1 released September 12, 2002 Works with any system with PAPI support Requires Java for GUI Provides run-time visualization of program performance Supports C/C++ programs and has MPI support Features Open source, free download of full version Monitors both local and remote applications  GUI and backend communicate through ports Returns information on processor and executables for each application Has alarms that pause program when data monitoring thresholds are reached Able to pause and continue program execution Requires perfometer() call inserted in program to enable monitoring Mark_perfometer() call allows user to change color of graph to see trends of different sections of code Desired Enhancements Support UPC and SHMEM programming models

11/21/0315 Performance Tools: Kojak Overview Collaborative research project of U. of Tennessee, Knoxville and Research Centre Juelich (Germany) Version.99 released Nov (3 rd release) Available for Linux IA-32, IBM Power3/Power4, SGI Mips, IA-64, SUN SPARC Supports MPI 1.2 and OpenMP, as well as uniprocessor applications Features Open source, free download of full version No modifications to source code needed  OPARI tool (also part of TAU) provides automatic instrumentation  Custom modifications can also be conducted to permit closer examination of arbitrary function calls EPILOG trace file generated at program run-time  Open trace file format  Support for conversion to VAMPIR format for analysis with VAMPIR tools EXPERT module provides automatic analysis of EPILOG trace files Tool is geared towards identifying performance problems  Range of problems known to EXPERT is flexible and extendable Desired Enhancements Support UPC and SHMEM programming models EXPERT Pre-Defined Monitored Properties

11/21/0316 Performance Tools: SvPablo Overview Developed at U. of Illinois, Urbana-Champaign Version 5.2 released March 2003 Tool to help developers “tune” their software for better performance and help them eliminate bottlenecks Available for Sun Solaris, SGI IRIX, IBM SP, Compaq Alpha and Linux Supports C, Fortran 77/90, HPF, MPI and OpenMP Features Open source, free download of full version Interactive instrumentation of code via GUI Link SvPablo library during compilation Provides performance data  Traces loops and function calls  But does not trace all instructions  Provides statistical data Counts how many times a function was executed Records execution time of function Correlates performance data with source code PAPI support Desired Enhancements Support UPC and SHMEM programming models

11/21/0317 Performance Tools: Paradyn Overview Developed by U. of Wisconsin, Madison Version 4.0 released May 31, 2003 Visuals include time-plots, bar graphs, and tables Available for Solaris (SPARC), Linux (x86), Windows NT and 2000 (x86), and AIX (RS6000) Supports Fortran, C/C++, Java, and MPI Features Open source, free download of full version No modifications to source or binary Dynamic instrumentation with real-time reporting Can focus on specific portions of a program and on specific performance parameters Records many different performance statistics such as CPU time, send/receive message count and sizes, sync time, and IO time Performance Consultant executes automated performance bottleneck search  Hypothesizes main bottleneck of program or chunks of program  Bottlenecks classified as CPUbound, ExcessiveSyncWaitingTime, ExcessiveIOBlockingTime, TooManySmallIOOps Desired Enhancements Support UPC and SHMEM programming models

11/21/0318 Performance Tools: TAU Overview Developed at U. of Oregon TAU = Tuning and Analysis Utilities Portable Profiling Package Available for SGI, Origin 2K, IBM SP2, Cray T3E, Sun, Windows 95/98/NT, Linux (x86) Supports C/C++, Java, Fortran 77/90, HPF, HPC++, and MPI Features Open source, free download of full version Maintains performance data for each thread, context, and node used in parallel, multi-threaded programs Captures data for functions, basic blocks Three methods of instrumentation 1. Automatic via TAU Program Database Toolkit 2. Manually via TAU instrumentation API 3. Automatic at run-time via tau_run instrumentor DyninstAPI dynamic instrumentation package Racy = GUI analyzer used to find bottlenecks PAPI support Fast, reliable support Desired Enhancements Support UPC and SHMEM programming models

11/21/0319 Conclusions and Future Plans Few tools support UPC or SHMEM programming models  TotalView does not analyze performance  CrayPat is not a portable tool Many performance analysis tools for message-passing programs We must bridge this gap  Bring performance analysis to UPC and SHMEM tools  Bring UPC and SHMEM support to performance tools  Determine most feasible approach and pursue  Focus on key issues at multiple levels; language, mapping, architecture Projected milestones/deliverables in proposed two-year project Year 1 Comprehensive survey and evaluation of HPC performance tools Investigation of key performance attributes in UPC and SHMEM Investigation of key performance attributes in existing/emerging system architectures Refinement of evaluation criteria and QFD table to identify primary approach Year 2 Development of prototype performance tools for HPC Performance benchmarking and optimization on selected system architectures Investigation of usability and productivity achieved with these tools