Presentation is loading. Please wait.

Presentation is loading. Please wait.

TAU: Tuning and Analysis Utilities. TAU Performance System Framework  Tuning and Analysis Utilities  Performance system framework for scalable parallel.

Similar presentations


Presentation on theme: "TAU: Tuning and Analysis Utilities. TAU Performance System Framework  Tuning and Analysis Utilities  Performance system framework for scalable parallel."— Presentation transcript:

1 TAU: Tuning and Analysis Utilities

2 TAU Performance System Framework  Tuning and Analysis Utilities  Performance system framework for scalable parallel and distributed high- performance computing  Targets a general complex system computation model  nodes / contexts / threads  Multi-level: system / software / parallelism  Measurement and analysis abstraction  Integrated toolkit for performance instrumentation, measurement, analysis, and visualization  Portable, configurable performance profiling/tracing facility  Open software approach  University of Oregon, LANL, FZJ Germany  http://www.cs.uoregon.edu/research/paracomp/tau http://www.cs.uoregon.edu/research/paracomp/tau

3 TAU Performance System Architecture

4 TAU Instrumentation  Flexible instrumentation mechanisms at multiple levels  Source code  manual  automatic using Program Database Toolkit (PDT), OPARI  Object code  pre-instrumented libraries (e.g., MPI using PMPI)  statically linked  dynamically linked (e.g., Virtual machine instrumentation)  fast breakpoints (compiler generated)  Executable code  dynamic instrumentation (pre-execution) using DynInstAPI

5 TAU Instrumentation (continued)  Targets common measurement interface (TAU API)  Object-based design and implementation  Macro-based, using constructor/destructor techniques  Program units: function, classes, templates, blocks  Uniquely identify functions and templates  name and type signature (name registration)  runtime type identification for template instantiations  C and Fortran instrumentation variants  Instrumentation and measurement optimization

6 Multi-Level Instrumentation  Uses multiple instrumentation interfaces  Shares information: cooperation between interfaces  Taps information at multiple levels  Provides selective instrumentation at each level  Targets a common performance model  Presents a unified view of execution

7 TAU Measurement  Performance information  High-resolution timer library (real-time / virtual clocks)  General software counter library (user-defined events)  Hardware performance counters  PAPI (Performance API) (UTK, Ptools Consortium)  consistent, portable API  Organization  Node, context, thread levels  Profile groups for collective events (runtime selective)  Performance data mapping between software levels

8 TAU Measurement (continued)  Parallel profiling  Function-level, block-level, statement-level  Supports user-defined events  TAU parallel profile database  Function callstack  Hardware counts values (in replace of time)  Tracing  All profile-level events  Inter-process communication events  Timestamp synchronization  User-configurable measurement library (user controlled)

9 TAU Measurement System Configuration  configure [OPTIONS]  {-c++=, -cc= } Specify C++ and C compilers  {-pthread, -sproc}Use pthread or SGI sproc threads  -openmpUse OpenMP threads  -opari= Specify location of Opari OpenMP tool  -papi= Specify location of PAPI  -pdt= Specify location of PDT  -dyninst= Specify location of DynInst Package  {-mpiinc=, mpilib= }Specify MPI library instrumentation  - TRACE Generate TAU event traces  -PROFILE Generate TAU profiles  -MULTIPLECOUNTERSUse more than one hardware counter  -CPUTIMEUse usertime+system time  -PAPIWALLCLOCKUse PAPI to access wallclock time  -PAPIVIRTUALUse PAPI for virtual (user) time …

10 TAU Measurement Configuration – Examples ./configure -c++=xlC -cc=xlc –pdt=/usr/packages/pdtoolkit-2.1 -pthread  Use TAU with IBM’s xlC compiler, PDT and the pthread library  Enable TAU profiling (default) ./configure -TRACE –PROFILE  Enable both TAU profiling and tracing ./configure -c++=guidec++ -cc=guidec -papi=/usr/local/packages/papi –openmp -mpiinc=/usr/packages/mpich/include -mpilib=/usr/packages/mpich/lib  Use OpenMP+MPI using KAI's Guide compiler suite and use PAPI for accessing hardware performance counters for measurements  Typically configure multiple measurement libraries

11 Program Database Toolkit (PDT)  Program code analysis framework for developing source- based tools  High-level interface to source code information  Integrated toolkit for source code parsing, database creation, and database query  commercial grade front end parsers  portable IL analyzer, database format, and access API  open software approach for tool development  Target and integrate multiple source languages  Use in TAU to build automated performance instrumentation tools

12 PDT Architecture and Tools C/C++ Fortran 77/90

13 PDT Components  Language front end  Edison Design Group (EDG): C (C99), C++  Mutek Solutions Ltd.: F77, F90  creates an intermediate-language (IL) tree  IL Analyzer  processes the intermediate language (IL) tree  creates “program database” (PDB) formatted file  DUCTAPE  C++ program Database Utilities and Conversion Tools APplication Environment  processes and merges PDB files  C++ library to access the PDB for PDT applications

14 Including TAU Makefile - Example include /usr/tau/sgi64/lib/Makefile.tau-pthread-kcc CXX = $(TAU_CXX) CC = $(TAU_CC) CFLAGS = $(TAU_DEFS) LIBS = $(TAU_LIBS) OBJS =... TARGET= a.out TARGET: $(OBJS) $(CXX) $(LDFLAGS) $(OBJS) -o $@ $(LIBS).cpp.o: $(CC) $(CFLAGS) -c $< -o $@

15 TAU Makefile for PDT include /usr/tau/include/Makefile CXX = $(TAU_CXX) CC = $(TAU_CC) PDTPARSE = $(PDTDIR)/$(CONFIG_ARCH)/bin/cxxparse TAUINSTR = $(TAUROOT)/$(CONFIG_ARCH)/bin/tau_instrumentor CFLAGS = $(TAU_DEFS) LIBS = $(TAU_LIBS) OBJS =... TARGET= a.out TARGET: $(OBJS) $(CXX) $(LDFLAGS) $(OBJS) -o $@ $(LIBS).cpp.o: $(PDTPARSE) $< $(TAUINSTR) $*.pdb $< -o $*.inst.cpp $(CC) $(CFLAGS) -c $*.inst.cpp -o $@

16 Setup: Running Applications % setenv PROFILEDIR /home/data/experiments/profile/01 % setenv TRACEDIR/home/data/experiments/trace/01(optional) % set path=($path / /bin) % setenv LD_LIBRARY_PATH $LD_LIBRARY_PATH\: / /lib For PAPI (1 counter): % setenv PAPI_EVENT PAPI_FP_INS For PAPI (multiplecounters): % setenv COUNTER1 PAPI_FP_INS % setenv COUNTER2 PAPI_L1_DCM % setenv COUNTER3 P_WALL_CLOCK_TIME (PAPI’s wallclock time) % mpirun –np For DyninstAPI: % a.out % tau_run a.out (instruments using default TAU library) % tau_run -XrunTAUsh-papi a.out (uses libTAUsh-papi.so)

17 TAU Analysis  Profile analysis  pprof  parallel profiler with text-based display  racy  graphical interface to pprof (Tcl/Tk)  jracy  Java implementation of Racy  Trace analysis and visualization  Trace merging and clock adjustment (if necessary)  Trace format conversion (ALOG, SDDF, Vampir)  Vampir (Pallas) trace visualization

18 Pprof Command  pprof [-c|-b|-m|-t|-e|-i] [-r] [-s] [-n num] [-f file] [-l] [nodes]  -cSort according to number of calls  -bSort according to number of subroutines called  -mSort according to msecs (exclusive time total)  -tSort according to total msecs (inclusive time total)  -eSort according to exclusive time per call  -iSort according to inclusive time per call  -vSort according to standard deviation (exclusive usec)  -rReverse sorting order  -sPrint only summary profile information  -n numPrint only first number of functions  -f fileSpecify full path and filename without node ids  -l List all functions and exit

19 Pprof Output (NAS Parallel Benchmark – LU)  Intel Quad PIII Xeon, RedHat, PGI F90  F90 + MPICH  Profile for: Node Context Thread  Application events and MPI events

20 jRacy (NAS Parallel Benchmark – LU) n: node c: context t: thread Global profiles Individual profile Routine profile across all nodes

21 Vampir Trace Visualization Tool  Visualization and Analysis of MPI Programs  Originally developed by Forschungszentrum Jülich  Current development by Technical University Dresden  Distributed by PALLAS, Germany  http://www.pallas.de/pages/vampir.htm

22 Vampir (NAS Parallel Benchmark – LU) Timeline display Callgraph display Communications display Parallelism display

23 Applications: EVH1

24 Applications: VTF (ASCI ASAP Caltech)  C++, C, F90, Python  PDT, MPI

25 Applications: SAMRAI (LLNL)  C++  PDT, MPI  SAMRAI timers (groups)

26 Applications: Uintah (U. Utah) (500 cpus) TAU uses SCIRun [U. Utah] for visualization of performance data (online/offline)

27 Applications: Uintah (contd.) Scalability analysis

28 TAU Performance System Status  Computing platforms  IBM SP, SGI Origin, ASCI Red, Cray T3E, Compaq SC, HP, Sun, Apple, Windows, IA-32, IA-64 (Linux), Hitachi, NEC  Programming languages  C, C++, Fortran 77/90, HPF, Java  Communication libraries  MPI, PVM, Nexus, Tulip, ACLMPL, MPIJava  Thread libraries  pthread, Java,Windows, SGI sproc, Tulip, SMARTS, OpenMP  Compilers  KAI (KCC, KAP/Pro), PGI, GNU, Fujitsu, HP, Sun, Microsoft, SGI, Cray, IBM, HP, Compaq, Hitachi, NEC, Intel

29 Support Acknowledgement  TAU and PDT support:  Department of Energy (DOE)  DOE 2000 ACTS contract  DOE MICS contract  DOE ASCI Level 3 (LANL, LLNL)  U. of Utah DOE ASCI Level 1 subcontract  DARPA  NSF National Young Investigator (NYI) award


Download ppt "TAU: Tuning and Analysis Utilities. TAU Performance System Framework  Tuning and Analysis Utilities  Performance system framework for scalable parallel."

Similar presentations


Ads by Google