A configurable binary instrumenter

Slides:



Advertisements
Similar presentations
Performance Analysis Tools for High-Performance Computing Daniel Becker
Advertisements

Java Script Session1 INTRODUCTION.
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Program Representations. Representing programs Goals.
CPSC Compiler Tutorial 9 Review of Compiler.
SSP Re-hosting System Development: CLBM Overview and Module Recognition SSP Team Department of ECE Stevens Institute of Technology Presented by Hongbing.
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
Sameer Shende Department of Computer and Information Science Neuro Informatics Center University of Oregon Tool Interoperability.
Cpeg421-08S/final-review1 Course Review Tom St. John.
Guide To UNIX Using Linux Third Edition
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
XML files (with LINQ). Introduction to LINQ ( Language Integrated Query ) C#’s new LINQ capabilities allow you to write query expressions that retrieve.
Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
University of Maryland The DPCL Hybrid Project James Waskiewicz.
CSE3201/CSE4500 Information Retrieval Systems XSLT – Part 2.
Scalable Analysis of Distributed Workflow Traces Daniel K. Gunter and Brian Tierney Distributed Systems Department Lawrence Berkeley National Laboratory.
March 17, 2005 Roadmap of Upcoming Research, Features and Releases Bart Miller & Jeff Hollingsworth.
XSLT Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
1 COMP 3438 – Part II-Lecture 1: Overview of Compiler Design Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Chapter 6 Programming Languages (2) Introduction to CS 1 st Semester, 2015 Sanghyun Park.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
Replay Compilation: Improving Debuggability of a Just-in Time Complier Presenter: Jun Tao.
Overview of Form and Javascript fundamentals. Brief matching exercise 1. This is the software that allows a user to access and view HTML documents 2.
CS536 Semantic Analysis Introduction with Emphasis on Name Analysis 1.
CSE3201/CSE4500 Information Retrieval Systems XSLT – Part 2.
©SoftMoore ConsultingSlide 1 Structure of Compilers.
Presented by Jack Dongarra University of Tennessee and Oak Ridge National Laboratory KOJAK and SCALASCA.
Tuning Threaded Code with Intel® Parallel Amplifier.
Profiling/Tracing Method and Tool Evaluation Strategy Summary Slides Hung-Hsun Su UPC Group, HCS lab 1/25/2005.
Csontos Péter, Porkoláb Zoltán Eötvös Loránd Tudományegyetem, Budapest ECOOP 2001 On the complexity of exception handling.
Beyond Application Profiling to System Aware Analysis Elena Laskavaia, QNX Bill Graham, QNX.
Code Generation Instruction Selection Higher level instruction -> Low level instruction Register Allocation Which register to assign to hold which items?
Product Training Program
Applications Active Web Documents Active Web Documents.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Run-Time Environments Chapter 7
Compiler Design (40-414) Main Text Book:
Efficient Evaluation of XQuery over Streaming Data
Muen Policy & Toolchain
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Chapter 1 Introduction.
Introduction to Compiler Construction
Self Healing and Dynamic Construction Framework:
Compiler Construction (CS-636)
SOFTWARE DESIGN AND ARCHITECTURE
Semantic Analysis with Emphasis on Name Analysis
TAU integration with Score-P
Chapter 1 Introduction.
CISC 7120X Programming Languages and Compilers
Array Array is a variable which holds multiple values (elements) of similar data types. All the values are having their own index with an array. Index.
Structural testing, Path Testing
Database Performance Tuning and Query Optimization
Course Name: QTP Trainer: Laxmi Duration: 25 Hrs Session: Daily 1 Hr.
Many-core Software Development Platforms
Computer Programming.
Behavioral Models for Software Development
Human Complexity of Software
Optimizing Your Dyninst Program
Software Architecture
COP4020 Programming Languages
Analysis models and design models
Chapter 7 –Implementation Issues
Allen D. Malony Computer & Information Science Department
Chapter 11 Database Performance Tuning and Query Optimization
CISC 7120X Programming Languages and Compilers
Chapter 1 Introduction.
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Implementation Plan system integration required for each iteration
Presentation transcript:

A configurable binary instrumenter making use of heuristics to select relevant instrumentation points

Presentation outline Introduction Instrumentation Configurable instrumenter Heuristics to select relevant points Architecture Example

Introduction Lead by Professor F. Wolf Student at the RWTH Aachen, Germany Helmholtz-University Young Investigators Group “Performance Analysis of Parallel Programs” Lead by Professor F. Wolf Located at the “Jülich Supercomputing Center”

Integrated measurement & analysis toolset Runtime summarization (aka profiling) Automatic event trace analysis Objective Development of a scalable performance analysis toolset Specifically targeting large-scale applications Supports various languages & parallel programming paradigms Fortran, C, C++ MPI, OpenMP & hybrid MPI/OpenMP More information at: www.scalasca.org

Presentation outline Introduction Instrumentation Configurable instrumenter Heuristics to select relevant points Architecture Example

Instrumentation Two ways to gather information By direct instrumentation By sampling, periodic measurement Link between program and measurement system Trace events during program execution Profile to evaluate where time is spent

Possibilities of instrumentation Source code transformation Manually added by user Automatically, e.g. TAU, OPARI Compiler supported Wrapper functions Adding function calls Library interposition MPI <-> PMPI Binary instrumentation Static, e.g. TAU Dynamic, e.g. Paradyn

Static binary instrumentation Advantages Language independent Instrumentation of optimized code Possible if no source available, e.g. libraries Templates are instantiated No need to recompile Disadvantages Limited information available Not all platforms are supported

Information provided by Dyninst Method identification E.g. Namespace::Class::Method in C++ List of called subroutines in given function Control flow graph and loop tree Possibility to access basic blocks What information is available? Depends in part on available symbol table Improves when debug information are present Sourcefile and sourceline become available

Presentation outline Introduction Instrumentation Configurable instrumenter Heuristics to select relevant points Architecture Example

Configurable binary instrumenter Configurable by both the tool provider and user Tool provider focuses on adapter specification Define code for initialization Define code for instrumentation Includes filter for the measurement system User starts with provided filter Refines the filter to his or her needs

Possible instrumentation points Functions Function enter and exit Loops Before and after the loop Loop body enter and exit Callsites Before the function call After the return

Filter requirements Selective binary instrumentation Provide a usable default filter Allow the user to refine which parts to instrument Configurable set of instrumentation points Filter by function, class and module names Filter by properties Ability to combine filters

Filter What to instrument? Start with set of functions All None Properties Modules Patterns Start with set of functions All None Filter set further using String patterns for Filename (module) Namespace, classname Properties E.g. callgraph, depth

Filter specifcation A single XML document Patternlists as plain text for elements taking lists Filter Include or exclude elements containing „and“, „or“, „not“ and „true“ or „false“ Functions, classes, namespaces, modules Property Callsite filter for restricting instrumentation

Filter specification example <filter name="pathtest" instrument="functions=handletest" start="all"> <exclude> <or> <not> <property name="path"> <functionnames match="simple"> MPI* </functionnames> </property> </not> <functionnames>main</functionnames> </or> </exclude> </filter>

Inserted code The instrumenter has to support Additional dependencies (measurement system) Variable declarations (e.g. region handles ) Code for initialization (run once at startup ) Code to be executed at points Enter / exit Before / after Provide access to context information @linenumber@, @functionname@,…

Instrumentation specification Independent XML document Include adapter filter Dependencies Add dynamic libraries Variable element Type information Memory to allocate Code in plain text C-like syntax Code Variables Var Init Before Enter Exit After

Example specification <code name="handletest"> <variables> <var name="handle" type="void*" size="4" /> </variables> <init> initNotify(@functionname@,@linenumber@,@filename@); handle = createHandle(@functionname@); </init> <enter>enterHandle(handle);</enter> <exit>exitHandle(handle); </exit> </code>

Presentation outline Introduction Instrumentation Configurable instrumenter Heuristics to select relevant points Architecture Example

Goal Automatic selection of relevant instrumentation points

How to select instrumentation points What makes a point relevant? Granularity of trace to locate possible problems Ability to profile where time is spent Communication I/O Is decision possible with available information?

Heuristics using binary code Aim here: do not instrument short functions Instrumentation costs exceed function costs Complexity of function Contains „if“ and „loop“ statements Amount of instructions Subroutine calls Cyclomatic Complexity Complexity M = E(dges) – N(odes) + 1

Heuristics using debug information Lines of code May be obscured by comments and code style Method name hints Exclude e.g., helper functions „get*“, „set*“ Include „do*“ ,“process*“, „calculate*“, or „solve*“ Classname and namespace

Heuristics using callpath Callpath of functions Leads to I/O functions? Leads to MPI functions? Leads to functions using OpenMP? Depth of function in call graph Instrument only to specified depth Problem for static callpath construction Virtual functions, function pointers

Unevaluated results CP2K Fortran code with Intel 10 compiler 12652 functions ( 50MB binary ) Using MPI path reduced to 5194 Using adapter filter and mpi path 767 remain GENE Fortran code 7095 functions ( 13MB binary ) Using adapter filter and MPI path reduced to 3144 Remove nodes on direct path, leaves 2510 function BT (NAS Parallel Benchmark) Reduced to 27 functions with MPI callpath filter More in the example later

Presentation outline Introduction Instrumentation Configurable instrumenter Heuristics to select relevant points Architecture Example

Architecture Mutatee Layer between Dyninst and filter component Filter Responsible for reading filter Evaluate filter CodeGenerator Parses code specification Generates Dyninst snippets Instrumenter Instruments the filtered set with generated code

Dependencies Dyninst Boost Spirit – Parser for adapter code Regex – Regular expressions in filter Tokenizer Apache Xerces XML DOM parser for the adapter and filter files

Open issues Binaries contain a lot of functions Compiler-specific functions added Scalasca does not provide dynamic library Need to preinstrument with “skin –comp=none –user”

Future work Adding more properties sourceLines, hasControlStructure, calledInLoop Evaluate reduction in instrumented functions Instrument benchmarks Instrument sample application Evaluate advantage over filtering at runtime Evaluate advantage of instrumenting optimized code

preinstrument binary instrumenter analyze Example Instrumenting NAS Parallel Benchmark BT preinstrument binary instrumenter analyze skin = scalasca –instrument –comp=none –user scan = scalasca –analyze mpirun –n 4 ./mutated