Presentation is loading. Please wait.

Presentation is loading. Please wait.

Simplifying the Usage of Performance Evaluation Tools: Experiences with TAU and DyninstAPI Paradyn/Condor Week 2010, Rm 221, Fluno Center, U. of Wisconsin,

Similar presentations


Presentation on theme: "Simplifying the Usage of Performance Evaluation Tools: Experiences with TAU and DyninstAPI Paradyn/Condor Week 2010, Rm 221, Fluno Center, U. of Wisconsin,"— Presentation transcript:

1 Simplifying the Usage of Performance Evaluation Tools: Experiences with TAU and DyninstAPI Paradyn/Condor Week 2010, Rm 221, Fluno Center, U. of Wisconsin, Madison, 10:45am – 11:30 am Tuesday, 14 th April, 2010 Sameer Shende, Allen D. Malony, Alan Morris Performance Research Laboratory University of Oregon, Eugene, OR {sameer, malony, amorris}@cs.uoregon.edu http://tau.uoregon.edu

2 2 Acknowledgements: University of Oregon  Dr. Allen D. Malony, Professor, CIS Dept, and Director, NeuroInformatics Center  Alan Morris, Senior software engineer  Dr. Chee Wai Lee, Research faculty  Wyatt Spear, Software engineer  Scott Biersdorff, Software engineer  Dr. Robert Yelle, Research faculty  Suzanne Millstein, Ph.D. student And  Matt Legendre and Dan McNulty, University of Wisconsin at Madison

3 http://tau.uoregon.edu3 Motivation  We have made great advances in instrumentation, measurement and analysis techniques  Tools are rich in features and have a complex tool dependency  Tools are getting more complex to use and to install  We need to simplify the usage of our performance evaluation tools!

4 http://tau.uoregon.edu4 TAU Performance System ®  Integrated toolkit for performance problem solving  Instrumentation, measurement, analysis, visualization  Portable performance profiling and tracing facility  Performance data management and data mining  Based on direct performance measurement approach  Open source  Available on all HPC platforms  Partners  LLNL, ANL, ORNL, LANL, PNNL, LBL  Research Centre Jülich, TU Dresden TAU Architecture

5 http://tau.uoregon.edu5 TAU Parallel Performance System Goals  Portable (open source) parallel performance system  Computer system architectures and operating systems  Different programming languages and compilers  Multi-level, multi-language performance instrumentation  Flexible and configurable performance measurement  Support for multiple parallel programming paradigms  Multi-threading, message passing, mixed-mode, hybrid, object oriented (generic), component-based  Support for performance mapping  Integration of leading performance technology  Scalable (very large) parallel performance analysis

6 http://tau.uoregon.edu6 TAU Performance System Components TAU Architecture Program Analysis Parallel Profile Analysis PDT PerfDMF ParaProf Performance Data Mining Performance Monitoring TAUoverMRNet (ToM) PerfExplorer

7 http://tau.uoregon.edu7 TAU Performance System Architecture

8 http://tau.uoregon.edu8 TAU Performance System Architecture

9 http://tau.uoregon.edu9 Parallel Profile Visualization: ParaProf

10 http://tau.uoregon.edu10 Scalable Visualization: ParaProf (128k cores)

11 http://tau.uoregon.edu11 Scatter Plot: ParaProf (128k cores)

12 http://tau.uoregon.edu12 ParaProf: Communication Matrix Display

13 http://tau.uoregon.edu13 Comparing Effects of Multi-Core Processors AORSA2D  magnetized plasma simulation  Automatic loop level instrumentation  Blue is single node  Red is dual core  Cray XT3 (4K cores)

14 http://tau.uoregon.edu14 ParaProf: Mflops Sorted by Exclusive Time low mflops?

15 http://tau.uoregon.edu15 Performance Regression Testing

16 http://tau.uoregon.edu16 Usage Scenarios: Evaluate Scalability

17 http://tau.uoregon.edu17 Scaling NAMD with CUDA (Jumpshot with TAU) Data transfer

18 http://tau.uoregon.edu18 Measuring Performance of PGI Accelerated Code

19 http://tau.uoregon.edu19 TAU and Eclipse  Provide an interface for configuring TAU’s automatic instrumentation within Eclipse’s build system  Manage runtime configuration settings and environment variables for execution of TAU instrumented programs C/C++/Fortran Project in Eclipse Add or modify an Eclipse build configuration w/ TAU Temporary copy of instrumented code Compilation/linking with TAU libraries TAU instrumented libraries Program execution Performance data Program output

20 http://tau.uoregon.edu20 TAU and Eclipse PerfDMF

21 http://tau.uoregon.edu21 Choosing PAPI Counters with TAU in Eclipse

22 http://tau.uoregon.edu22 TAU Performance System Architecture

23 http://tau.uoregon.edu23 TAU Instrumentation Approach  Support for standard program events  Routines, classes and templates  Statement-level blocks  Begin/End events (Interval events)  Support for user-defined events  Begin/End events specified by user  Atomic events (e.g., size of memory allocated/freed)  Selection of event statistics  Support definition of “semantic” entities for mapping  Support for event groups (aggregation, selection)  Instrumentation optimization  Eliminate instrumentation in lightweight routines

24 http://tau.uoregon.edu24 TAU Instrumentation Mechanisms  Source code  Manual (TAU API, TAU component API)  Automatic (robust)  C, C++, F77/90/95 (Program Database Toolkit (PDT))  OpenMP (directive rewriting (Opari), POMP2 spec)  Object code  Compiler-based instrumentation (-optCompInst)  Pre-instrumented libraries (e.g., MPI using PMPI)  Statically-linked and dynamically-linked (tau_wrap)  Executable code  Binary re-writing and dynamic instrumentation (DyninstAPI, U. Wisconsin, U. Maryland)  Virtual machine instrumentation (e.g., Java using JVMPI)  Interpreter based instrumentation (Python)  Kernel based instrumentation (KTAU)

25 http://tau.uoregon.edu25 Program Database Toolkit (PDT) Application / Library C / C++ parser Fortran parser F77/90/95 C / C++ IL analyzer Fortran IL analyzer Program Database Files IL DUCTAPE PDBhtml SILOON CHASM TAU_instr Program documentation Application component glue C++ / F90/95 interoperability Automatic source instrumentation

26 http://tau.uoregon.edu26 Automatic Source-Level Instrumentation in TAU TAU v2.19.1+: If source based instrumentation fails, compiler-based instrumentation is used automatically

27 http://tau.uoregon.edu27 Using TAU with Source Code Instrumentation  TAU supports several measurement options (profiling, tracing, profiling with hardware counters, etc.)  Each measurement configuration of TAU corresponds to a unique stub makefile that is generated when you configure it  To instrument source code using PDT  Choose an appropriate TAU stub makefile in /lib: % export TAU_MAKEFILE=/usr/local/packages/tau/x86_64/lib/Makefile.tau-mpi-pdt % export TAU_OPTIONS=‘-optVerbose …’ (see tau_compiler.sh -help) And use tau_f90.sh, tau_cxx.sh or tau_cc.sh as Fortran, C++ or C compilers: % mpif90 foo.f90 changes to % tau_f90.sh foo.f90  Execute application and analyze performance data: % pprof (for text based profile display) % paraprof (for GUI)

28 http://tau.uoregon.edu28 TAU Measurement Configuration – Examples % cd /usr/local/packages/tau/x86_64/lib; ls Makefile.* Makefile.tau-pdt Makefile.tau-mpi-pdt Makefile.tau-papi-mpi-pdt Makefile.tau-pthread-pdt Makefile.tau-pthread-mpi-pdt Makefile.tau-openmp-opari-pdt Makefile.tau-openmp-opari-mpi-pdt Makefile.tau-papi-openmp-opari-mpi-pdt …  For an MPI+F90 application, you may want to start with: Makefile.tau-mpi-pdt  Supports MPI instrumentation & PDT for automatic source instrumentation  % setenv TAU_MAKEFILE /usr/local/packages/tau/x86_64/lib/Makefile.tau-mpi-pdt  % tau_f90.sh application.f90; mpirun –np 256./a.out

29 http://tau.uoregon.edu29 Compile-Time Environment Variables  Optional parameters for TAU_OPTIONS: [tau_compiler.sh –help] -optVerboseTurn on verbose debugging messages -optCompInstUse compiler based instrumentation -optNoCompInstDo not revert to compiler instrumentation if source instrumentation fails. -optDetectMemoryLeaks Turn on debugging memory allocations/ de-allocations to track leaks -optKeepFiles Does not remove intermediate.pdb and.inst.* files -optPreProcess Preprocess Fortran sources before instrumentation -optTauSelectFile="" Specify selective instrumentation file for tau_instrumentor -optLinking="" Options passed to the linker. Typically $(TAU_MPI_FLIBS) $(TAU_LIBS) $(TAU_CXXLIBS) -optCompile="" Options passed to the compiler. Typically $(TAU_MPI_INCLUDE) $(TAU_INCLUDE) $(TAU_DEFS) -optPdtF95Opts="" Add options for Fortran parser in PDT (f95parse/gfparse) -optPdtF95Reset="" Reset options for Fortran parser in PDT (f95parse/gfparse) -optPdtCOpts="" Options for C parser in PDT (cparse). Typically $(TAU_MPI_INCLUDE) $(TAU_INCLUDE) $(TAU_DEFS) -optPdtCxxOpts="" Options for C++ parser in PDT (cxxparse). Typically $(TAU_MPI_INCLUDE) $(TAU_INCLUDE) $(TAU_DEFS)...

30 http://tau.uoregon.edu30 Runtime Environment Variables in TAU Environment VariableDefaultDescription TAU_TRACE0Setting to 1 turns on tracing TAU_CALLPATH0Setting to 1 turns on callpath profiling TAU_TRACK_HEAP or TAU_TRACK_HEADROOM 0Setting to 1 turns on tracking heap memory/headroom at routine entry & exit using context events (e.g., Heap at Entry: main=>foo=>bar) TAU_CALLPATH_DEPTH2Specifies depth of callpath. Setting to 0 generates no callpath or routine information, setting to 1 generates flat profile and context events have just parent information (e.g., Heap Entry: foo) TAU_SYNCHRONIZE_CLOCKS1Synchronize clocks across nodes to correct timestamps in traces TAU_COMM_MATRIX0Setting to 1 generates communication matrix display using context events TAU_THROTTLE1Setting to 0 turns off throttling. Enabled by default to remove instrumentation in lightweight routines that are called frequently TAU_THROTTLE_NUMCALLS100000Specifies the number of calls before testing for throttling TAU_THROTTLE_PERCALL10Specifies value in microseconds. Throttle a routine if it is called over 100000 times and takes less than 10 usec of inclusive time per call TAU_COMPENSATE0Setting to 1 enables runtime compensation of instrumentation overhead TAU_PROFILE_FORMATProfileSetting to “merged” generates a single file. “snapshot” generates xml format TAU_METRICSTIMESetting to a comma separted list generates other metrics. (e.g., TIME:linuxtimers:PAPI_FP_OPS:PAPI_NATIVE_ )

31 http://tau.uoregon.edu31 Simplifying Instrumentation using DyninstAPI  TAU uses DyninstAPI to create a binary re-writer (tau_run)  TAU’s measurement library (DSO) is loaded by tau_run  Both runtime instrumentation and binary re-writing are supported  Selection of files and routines based on exclude/include lists  Simplifies tool usage greatly!  Available on POINT LiveDVD [http://tau.uoregon.edu/point.iso]  Usage:  % tau_run a.out –o a.inst.out  % mpirun –np 4 a.inst.out  % paraprof

32 http://tau.uoregon.edu32 Issues  Re-writing static executables limited to gcc, limited platforms in beta  Currently, we support dynamic executables (v6.1)  We are working on supporting both static and dynamic executables  We hope to support more platforms, compilers and runtime systems in the future  Rewriting shared libraries used by the application  LD_PRELOAD’able wrapper libraries can be created using tau_wrap  requires interface information in header file

33 http://tau.uoregon.edu33 Binary Rewriting in TAU using DyninstAPI

34 http://tau.uoregon.edu34 Wish List for tau_run  Support for more platforms  Apple Mac OS X, Windows, IBM BG/P, AIX, …  Support for more compilers  Support for rewriting shared objects  Support for static binary rewriting with validation for compilers other than gcc  XLC, PathScale, Cray CCE, Intel, PGI,…

35 http://tau.uoregon.edu35 Other Tools…  Other TAU tools that use technologies from the ParaDyn/DyninstAPI group  TAU over MRNet (ToM) for runtime  Stackwalker API for accessing callstack

36 http://tau.uoregon.edu36 StackWalkerAPI in TAU  Requirements overview:  Minimal information required (PC is enough)  Threaded support necessary  Low overhead (for high sample rates)  Stack unwinding from a signal handler  Malloc could be interrupted  Need to walk through signal handler frame

37 http://tau.uoregon.edu37 Issues encountered with StackWalkerAPI  StackWalkerAPI:  Isn’t thread safe (and locking to use it can cause significant overhead)  Uses malloc/new (and so do dependent libraries such as libdwarf)  C++ (we would prefer C)  Issues walking certain kinds of stack frames  Matt Legendre was able to help us out a lot though!  Alternatives:  TAU is currently using stack walking constructs from HPCToolkit

38 http://tau.uoregon.edu38 Online Monitoring using TAU over MRNet (ToM)  Back-End (BE) TAU adapter offloads performance data  Filters  reduction  distributed analysis  upstream / downstream  Front-End (FE) unpacks, interprets, stores  Paths  reverse data reduction path  multicast control path  Push-Pull model  source pushes, sink pulls

39 http://tau.uoregon.edu39 Conclusions  TAU and DyninstAPI represents mature technology for performance instrumentation, measurement and analysis  Using DyninstAPI’s binary re-writing capabilities, we have produced a tool that simplifies code instrumentation  We hope to collaborate on other projects and include support for an enhanced stack walker API Questions?

40 http://tau.uoregon.edu40 Support Acknowledgements  Department of Energy (DOE)  Office of Science  MICS, Argonne National Lab  ASC/NNSA  University of Utah ASC/NNSA Level 1  ASC/NNSA, LLNL  Department of Defense (DoD)  NSF SDCI  Partners:  Research Centre Juelich  LBL, ORNL, ANL, LANL, PNNL, LLNL  TU Dresden  ParaTools, Inc.


Download ppt "Simplifying the Usage of Performance Evaluation Tools: Experiences with TAU and DyninstAPI Paradyn/Condor Week 2010, Rm 221, Fluno Center, U. of Wisconsin,"

Similar presentations


Ads by Google