Presentation is loading. Please wait.

Presentation is loading. Please wait.

TAU: A Framework for Parallel Performance Analysis

Similar presentations


Presentation on theme: "TAU: A Framework for Parallel Performance Analysis"— Presentation transcript:

1 TAU: A Framework for Parallel Performance Analysis
Allen D. Malony ParaDucks Research Group Computer & Information Science Department Computational Science Institute University of Oregon

2 Outline Goals and challenges Targeted research areas
TAU (Tuning and Analysis Utilities) computation model, architecture, toolkit framework performance system technology examples of TAU use Tools associated with TAU PDT (Program Database Toolkit) distributed runtime monitoring Future plans Conclusions November 21, 2018 Ptools Annual Meeting

3 Goal and Challenges Create robust (performance) technology for the analysis and tuning of parallel software and systems Challenges different scalable computing platforms different programming languages and systems common, portable framework for analysis extensibe, retargetable tool technology complex set of requirements November 21, 2018 Ptools Annual Meeting

4 Targeted Research Areas
Performance analysis for scalable parallel systems targeting multiple programming and system levels and the mapping between levels Program code analysis for multiple languages enabling development of new source-based tools Integration and interoperation support for building analysis tool frameworks and environments Runtime tool interaction for dynamic applications November 21, 2018 Ptools Annual Meeting

5 TAU (Tuning and Analysis Utilities)
Performance analysis framework for scalable parallel and distributed high-performance computing Target a general parallel computation model computer nodes shared address space contexts threads of execution multi-level parallelism Integrated toolkit for performance instrumentation, measurement, analysis, and visualization portable performance profiling/tracing facility open software approach Network November 21, 2018 Ptools Annual Meeting

6 TAU Architecture November 21, 2018 Ptools Annual Meeting

7 TAU Instrumentation Flexible, multiple instrumentation mechanisms
source code manual automatic using PDT (tau_instrumentor) object code pre-instrumented libraries statically linked: MPI wrapper library using the MPI Profiling Interface (libTauMpi.a) dynamically linked: Java instrumentation using JVMPI and TAU shared object dynamically loaded in VM executable code dynamic instrumentation using DyninstAPI (tau_run) November 21, 2018 Ptools Annual Meeting

8 TAU Instrumentation (continued)
Common target measurement interface (TAU API) C++ (object-based) instrumentation macro-based, using constructor/destructor techniques function, classes, and templates uniquely identify functions and templates name and type signature (name registration) static object creates performance entry dynamic object receives static object pointer runtime type identification for template instantiations with C and Fortran instrumentation variants Instrumentation optimization November 21, 2018 Ptools Annual Meeting

9 TAU Measurement Performance information Organization
high resolution timer library (real-time clock) generalized software counter library hardware performance counters PCL (Performance Counter Library) (ZAM, Germany) PAPI (Performance API) (UTK, Ptools) consistent, portable API Organization node, context, thread levels profile groups for collective events (runtime selective) mapping between software levels November 21, 2018 Ptools Annual Meeting

10 TAU Measurement (continued)
Profiling function-level, block-level, statement-level supports user-defined events TAU profile (function) database (PD) function callstack hardware counts instead of time Tracing profile-level events interprocess communication events timestamp synchronization User-controlled configuration (configure) November 21, 2018 Ptools Annual Meeting

11 Timing of Multi-threaded Applications
Capture timing information on per thread basis Two alternative wall clock time works on all systems user-level measurement OS-maintained CPU time (e.g., Solaris, Linux) thread virtual time measurement TAU supports both alternatives CPUTIME module profiles user+system time % configure -pthread -CPUTIME November 21, 2018 Ptools Annual Meeting

12 TAU Analysis Profile analysis Trace analysis pprof racy
parallel profiler with text-based display racy graphical interface to pprof Trace analysis trace merging and clock adjustment (if necessary) trace format conversion (ALOG, SDDF, PV, Vampir) Vampir trace analysis and visualization tool (Pallas) November 21, 2018 Ptools Annual Meeting

13 TAU Status Usage platforms languages communication libraries
IBM SP, SGI Origin 2K, Intel Teraflop, Cray T3E, HP, Sun, Windows 95/98/NT, Alpha/Pentium Linux cluster languages C, C++, Fortran 77/90, HPF, pC++, HPC++, Java communication libraries MPI, PVM, Nexus, Tulip, ACLMPL thread libraries pthreads, Tulip, SMARTS, Java,Windows compilers KAI, PGI, GNU, Fujitsu, Sun, Microsoft, SGI, Cray November 21, 2018 Ptools Annual Meeting

14 TAU Status (continued)
application libraries Blitz++, A++/P++, ACLVIS, PAWS application frameworks POOMA, POOMA-2, MC++, Conejo, PaRP other projects ACPC, University of Vienna: Aurora UC Berkeley (Culler): Millenium, sensitivity analysis KAI and Pallas TAU profiling and tracing toolkit (Version 2.7) LANL ACL Fall 1999 CD-ROM distributed at SC'99 Extensive 70-page TAU User’s Guide November 21, 2018 Ptools Annual Meeting

15 TAU Examples Instrumentation Measurement Analysis
C++ template profiling (PETE, Blitz++) Java and MPI PAPI Measurement mapping of asynchronous execution (SMARTS) hybrid execution (Opus/HPF) Analysis SMARTS scheduling November 21, 2018 Ptools Annual Meeting

16 C++ Template Instrumentation (Blitz++, PETE)
High-level objects array classes templates Optimizations array processing expressions (PETE) Relate performance data to high-level statement Complexity of template evaluation Array expressions November 21, 2018 Ptools Annual Meeting

17 Standard Template Instrumentation Difficulties
Instantiated templates result in mangled identifiers Standard profiling techniques and tools are deficient integrated with proprietary compilers specific systems platforms and programming models Uninterpretable routine names November 21, 2018 Ptools Annual Meeting

18 TAU Template Instrumentation and Profiling
Profile of expression types Performance data presented with respect to high-level array expression types Graphical pprof November 21, 2018 Ptools Annual Meeting

19 Parallel Java Performance Instrumentation
Multi-language applications (Java, C, C++, Fortran) Hybrid execution models (Java threads, MPI) Java Virtual Machine Profiler Interface (JVMPI) event instrumentation in JVM profiler agent (libTAU.so) fields events Java Native Interface (JNI) invoke JVMPI control routines to control Java threads and access thread information MPI profiling interface “Performance Tools for Parallel Java Environments,” Java Workshop, ICS 2000, May 2000. November 21, 2018 Ptools Annual Meeting

20 TAU Java Instrumentation Architecture
Java program TAU package mpiJava package Thread API JNI MPI profiling interface Event notification TAU TAU wrapper Native MPI library JVMPI Profile DB November 21, 2018 Ptools Annual Meeting

21 Parallel Java Game of Life
mpiJava testcase 4 nodes, 28 threads Node process grouping Thread message pairing Vampir display Multi-level event grouping November 21, 2018 Ptools Annual Meeting

22 TAU and PAPI: NAS Parallel LU Benchmark
SGI Power Onyx (4 processors, R10K), MPI Floating point operations Cross-node full / routine profiles Full FP profile for each node Percentage profile November 21, 2018 Ptools Annual Meeting

23 TAU and PAPI: Matrix Multiply
Data cache miss comparison, “regular” vs. “strip-mining” execution 512x KB (P) 2 MB (S) Regular causes 4.5 times more misses November 21, 2018 Ptools Annual Meeting

24 Asynchronous Performance Analysis (SMARTS)
Scalable Multithreaded Asynchronuous Runtime System user-level threads, light-weight virtual processors macro-dataflow, asynchronous execution interleaving iterates from data-parallel statements integrated with POOMA II TAU measurement of asynchronous parallel execution utilized the TAU mapping API associate iterate performance with data parallel statement evaluate different scheduling policies “SMARTS: Exploting Temporal Locality & Parallelism through Vertical Execution,” ICS '99, August 1999. November 21, 2018 Ptools Annual Meeting

25 TAU Mapping of Asynchronous Execution
Without mapping Two threads executing With mapping POOMA / SMARTS November 21, 2018 Ptools Annual Meeting

26 With and without mapping (Thread 0)
Thread 0 blocks waiting for iterates Iterates get lumped together With mapping Iterates distinguished November 21, 2018 Ptools Annual Meeting

27 With and without mapping (Thread 1)
Array initialization performance lumped Without mapping Performance associated with ExpressionKernel object With mapping Iterate performance mapped to array statement Array initialization performance correctly separated November 21, 2018 Ptools Annual Meeting

28 TAU and Hybrid Execution in Opus/HPF
Fortran 77, Fortran 90, HPF Vienna Fortran Compiling System Opus / HPF combined data (HPF) and task (Opus) parallelism HPF compiler produces Fortran 90 modules processes interoperate using Opus runtime system producer / consumer model MPI and pthreads performance influence at multiple software levels November 21, 2018 Ptools Annual Meeting

29 TAU Profiling of Opus/HPF Application
Multiple producers Multiple consumers Parallelism View November 21, 2018 Ptools Annual Meeting

30 TAU Profiling of SMARTS
Iteration scheduling for two array expressions November 21, 2018 Ptools Annual Meeting

31 SMARTS Tracing (SOR) – Vampir Visualization
SCVE scheduler used in Red/Black SOR running on 32 processors of SGI Origin 2000 Asynchronous, overlapped parallelism November 21, 2018 Ptools Annual Meeting

32 Program Database Toolkit (PDT)
Program code analysis framework for developing source-based tools High-level interface to source code information Integrated toolkit for source code parsing, database creation, and database query commercial grade front end parsers portable IL analyzer, database format, and access API open software approach for tool development Target and integrate multiple source languages November 21, 2018 Ptools Annual Meeting

33 PDT Architecture and Tools
November 21, 2018 Ptools Annual Meeting

34 PDT Summary Program Database Toolkit (Version 1.1)
LANL ACL Fall 1999 CD-ROM distributed at SC'99 EDG C++ Front End (Version ) C++ IL Analyzer and DUCTAPE library tools: pdbmerge, pdbconv, pdbtree, pdbhtml standard C++ system header files (KAI KCC 3.4c) Fortran 90 IL Analyzer in progress Automated TAU performance instrumentation Program analysis support for SILOON (ACL CD) “A Tool Framework for Static and Dynamic Analysis of Object-Oriented Software,” submitted to SC ’00. November 21, 2018 Ptools Annual Meeting

35 Distributed Monitoring Framework
Extend usability of TAU performance analysis Access TAU performance data during execution Framework model each application context is a performance data server monitor agent thread is created within each context client processes attach to agents and request data server thread synchronization for data consistency pull mode of interaction Distributed TAU performance data space “A Runtime Monitoring Framework for the TAU Profiling System,” ISCOPE ’99, Nov November 21, 2018 Ptools Annual Meeting

36 TAU Distributed Monitor Architecture
Each context has a monitor agent Client in separate thread directs agent Pull model of interaction Initial HPC++ implementation TAU profile database November 21, 2018 Ptools Annual Meeting

37 Java Implementation of TAU Monitor
Motivations more portable monitor middleware system (RMI) more flexible and programmable server interface (JNI) more robust client development (EJB, JDBC, Swing) November 21, 2018 Ptools Annual Meeting

38 Future Plans TAU platforms: SGI Itanium, Sun Starfire, IBM Linux, ... languages: Java (Java Grande) , OpenMP instrument: automatic (F90, Java), Dyninst measurement: hardware counter, support PAPI displays: “beyond bargraphs” performance views performance database and technology support for multiple runs open API for analysis tool development PDT complete F90 and Java IL Analyzer source browsers: function, class, template tools for aiding in data marshalling and translation November 21, 2018 Ptools Annual Meeting

39 Future Plans (continued)
Distributed monitoring framework application and system monitoring ACL Supermon and SGI Performance Co-Pilot scalable SMP clusters and distributed systems performance monitoring clients Performance evaluation numerical libraries and frameworks scalable runtime systems ASCI application developers (benchmark codes) Investigate performance issues in Linux kernel Investigate integration with CCA November 21, 2018 Ptools Annual Meeting

40 Conclusions Complex parallel computing environments require robust program analysis tools portable, cross-platform, multi-level, integrated able to bridge and reuse existing technology technology savvy TAU offers a robust performance technology framework for complex parallel computing systems flexible instrumentation and instrumentation extendable profile and trace performance analysis integration with other performance technology Opportunities exist for open performance technology November 21, 2018 Ptools Annual Meeting

41 Open Performance Technology (OPT)
Performance problem is complex diverse platforms, software development, applications things evolve History of incompatible and competing tools instrumentation / measurement technology reinvention lack of common, reusable software foundations Need “value added” (open) approach technology for high-level performance tool development layered performance tool architecture portable, flexible, programmable, integrative technology Opportunity for Industry/National Labs/PACI sites November 21, 2018 Ptools Annual Meeting


Download ppt "TAU: A Framework for Parallel Performance Analysis"

Similar presentations


Ads by Google