Download presentation
Presentation is loading. Please wait.
1
TAU: Tuning and Analysis Utilities
2
TAU Performance System Framework Tuning and Analysis Utilities Performance system framework for scalable parallel and distributed high- performance computing Targets a general complex system computation model nodes / contexts / threads Multi-level: system / software / parallelism Measurement and analysis abstraction Integrated toolkit for performance instrumentation, measurement, analysis, and visualization Portable, configurable performance profiling/tracing facility Open software approach University of Oregon, LANL, FZJ Germany http://www.cs.uoregon.edu/research/paracomp/tau http://www.cs.uoregon.edu/research/paracomp/tau
3
TAU Performance System Architecture
4
TAU Instrumentation Flexible instrumentation mechanisms at multiple levels Source code manual automatic using Program Database Toolkit (PDT), OPARI Object code pre-instrumented libraries (e.g., MPI using PMPI) statically linked dynamically linked (e.g., Virtual machine instrumentation) fast breakpoints (compiler generated) Executable code dynamic instrumentation (pre-execution) using DynInstAPI
5
TAU Instrumentation (continued) Targets common measurement interface (TAU API) Object-based design and implementation Macro-based, using constructor/destructor techniques Program units: function, classes, templates, blocks Uniquely identify functions and templates name and type signature (name registration) runtime type identification for template instantiations C and Fortran instrumentation variants Instrumentation and measurement optimization
6
Multi-Level Instrumentation Uses multiple instrumentation interfaces Shares information: cooperation between interfaces Taps information at multiple levels Provides selective instrumentation at each level Targets a common performance model Presents a unified view of execution
7
TAU Measurement Performance information High-resolution timer library (real-time / virtual clocks) General software counter library (user-defined events) Hardware performance counters PAPI (Performance API) (UTK, Ptools Consortium) consistent, portable API Organization Node, context, thread levels Profile groups for collective events (runtime selective) Performance data mapping between software levels
8
TAU Measurement (continued) Parallel profiling Function-level, block-level, statement-level Supports user-defined events TAU parallel profile database Function callstack Hardware counts values (in replace of time) Tracing All profile-level events Inter-process communication events Timestamp synchronization User-configurable measurement library (user controlled)
9
TAU Measurement System Configuration configure [OPTIONS] {-c++=, -cc= } Specify C++ and C compilers {-pthread, -sproc}Use pthread or SGI sproc threads -openmpUse OpenMP threads -opari= Specify location of Opari OpenMP tool -papi= Specify location of PAPI -pdt= Specify location of PDT -dyninst= Specify location of DynInst Package {-mpiinc=, mpilib= }Specify MPI library instrumentation - TRACE Generate TAU event traces -PROFILE Generate TAU profiles -MULTIPLECOUNTERSUse more than one hardware counter -CPUTIMEUse usertime+system time -PAPIWALLCLOCKUse PAPI to access wallclock time -PAPIVIRTUALUse PAPI for virtual (user) time …
10
TAU Measurement Configuration – Examples ./configure -c++=xlC -cc=xlc –pdt=/usr/packages/pdtoolkit-2.1 -pthread Use TAU with IBM’s xlC compiler, PDT and the pthread library Enable TAU profiling (default) ./configure -TRACE –PROFILE Enable both TAU profiling and tracing ./configure -c++=guidec++ -cc=guidec -papi=/usr/local/packages/papi –openmp -mpiinc=/usr/packages/mpich/include -mpilib=/usr/packages/mpich/lib Use OpenMP+MPI using KAI's Guide compiler suite and use PAPI for accessing hardware performance counters for measurements Typically configure multiple measurement libraries
11
Program Database Toolkit (PDT) Program code analysis framework for developing source- based tools High-level interface to source code information Integrated toolkit for source code parsing, database creation, and database query commercial grade front end parsers portable IL analyzer, database format, and access API open software approach for tool development Target and integrate multiple source languages Use in TAU to build automated performance instrumentation tools
12
PDT Architecture and Tools C/C++ Fortran 77/90
13
PDT Components Language front end Edison Design Group (EDG): C (C99), C++ Mutek Solutions Ltd.: F77, F90 creates an intermediate-language (IL) tree IL Analyzer processes the intermediate language (IL) tree creates “program database” (PDB) formatted file DUCTAPE C++ program Database Utilities and Conversion Tools APplication Environment processes and merges PDB files C++ library to access the PDB for PDT applications
14
Including TAU Makefile - Example include /usr/tau/sgi64/lib/Makefile.tau-pthread-kcc CXX = $(TAU_CXX) CC = $(TAU_CC) CFLAGS = $(TAU_DEFS) LIBS = $(TAU_LIBS) OBJS =... TARGET= a.out TARGET: $(OBJS) $(CXX) $(LDFLAGS) $(OBJS) -o $@ $(LIBS).cpp.o: $(CC) $(CFLAGS) -c $< -o $@
15
TAU Makefile for PDT include /usr/tau/include/Makefile CXX = $(TAU_CXX) CC = $(TAU_CC) PDTPARSE = $(PDTDIR)/$(CONFIG_ARCH)/bin/cxxparse TAUINSTR = $(TAUROOT)/$(CONFIG_ARCH)/bin/tau_instrumentor CFLAGS = $(TAU_DEFS) LIBS = $(TAU_LIBS) OBJS =... TARGET= a.out TARGET: $(OBJS) $(CXX) $(LDFLAGS) $(OBJS) -o $@ $(LIBS).cpp.o: $(PDTPARSE) $< $(TAUINSTR) $*.pdb $< -o $*.inst.cpp $(CC) $(CFLAGS) -c $*.inst.cpp -o $@
16
Setup: Running Applications % setenv PROFILEDIR /home/data/experiments/profile/01 % setenv TRACEDIR/home/data/experiments/trace/01(optional) % set path=($path / /bin) % setenv LD_LIBRARY_PATH $LD_LIBRARY_PATH\: / /lib For PAPI (1 counter): % setenv PAPI_EVENT PAPI_FP_INS For PAPI (multiplecounters): % setenv COUNTER1 PAPI_FP_INS % setenv COUNTER2 PAPI_L1_DCM % setenv COUNTER3 P_WALL_CLOCK_TIME (PAPI’s wallclock time) % mpirun –np For DyninstAPI: % a.out % tau_run a.out (instruments using default TAU library) % tau_run -XrunTAUsh-papi a.out (uses libTAUsh-papi.so)
17
TAU Analysis Profile analysis pprof parallel profiler with text-based display racy graphical interface to pprof (Tcl/Tk) jracy Java implementation of Racy Trace analysis and visualization Trace merging and clock adjustment (if necessary) Trace format conversion (ALOG, SDDF, Vampir) Vampir (Pallas) trace visualization
18
Pprof Command pprof [-c|-b|-m|-t|-e|-i] [-r] [-s] [-n num] [-f file] [-l] [nodes] -cSort according to number of calls -bSort according to number of subroutines called -mSort according to msecs (exclusive time total) -tSort according to total msecs (inclusive time total) -eSort according to exclusive time per call -iSort according to inclusive time per call -vSort according to standard deviation (exclusive usec) -rReverse sorting order -sPrint only summary profile information -n numPrint only first number of functions -f fileSpecify full path and filename without node ids -l List all functions and exit
19
Pprof Output (NAS Parallel Benchmark – LU) Intel Quad PIII Xeon, RedHat, PGI F90 F90 + MPICH Profile for: Node Context Thread Application events and MPI events
20
jRacy (NAS Parallel Benchmark – LU) n: node c: context t: thread Global profiles Individual profile Routine profile across all nodes
21
Vampir Trace Visualization Tool Visualization and Analysis of MPI Programs Originally developed by Forschungszentrum Jülich Current development by Technical University Dresden Distributed by PALLAS, Germany http://www.pallas.de/pages/vampir.htm
22
Vampir (NAS Parallel Benchmark – LU) Timeline display Callgraph display Communications display Parallelism display
23
Applications: EVH1
24
Applications: VTF (ASCI ASAP Caltech) C++, C, F90, Python PDT, MPI
25
Applications: SAMRAI (LLNL) C++ PDT, MPI SAMRAI timers (groups)
26
Applications: Uintah (U. Utah) (500 cpus) TAU uses SCIRun [U. Utah] for visualization of performance data (online/offline)
27
Applications: Uintah (contd.) Scalability analysis
28
TAU Performance System Status Computing platforms IBM SP, SGI Origin, ASCI Red, Cray T3E, Compaq SC, HP, Sun, Apple, Windows, IA-32, IA-64 (Linux), Hitachi, NEC Programming languages C, C++, Fortran 77/90, HPF, Java Communication libraries MPI, PVM, Nexus, Tulip, ACLMPL, MPIJava Thread libraries pthread, Java,Windows, SGI sproc, Tulip, SMARTS, OpenMP Compilers KAI (KCC, KAP/Pro), PGI, GNU, Fujitsu, HP, Sun, Microsoft, SGI, Cray, IBM, HP, Compaq, Hitachi, NEC, Intel
29
Support Acknowledgement TAU and PDT support: Department of Energy (DOE) DOE 2000 ACTS contract DOE MICS contract DOE ASCI Level 3 (LANL, LLNL) U. of Utah DOE ASCI Level 1 subcontract DARPA NSF National Young Investigator (NYI) award
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.