San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation.

Slides:

Advertisements

Similar presentations

Chapt.2 Machine Architecture Impact of languages –Support – faster, more secure Primitive Operations –e.g. nested subroutine calls »Subroutines implemented.

Advertisements

Instrumentation of Linux Programs with Pin Robert Cohn & C-K Luk Platform Technology & Architecture Development Enterprise Platform Group Intel Corporation.

Instruction Set Design

P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.

Evaluating Indirect Branch Handling Mechanisms in Software Dynamic Translation Systems Jason D. Hiser, Daniel Williams, Wei Hu, Jack W. Davidson, Jason.

Programming Languages and Paradigms

ISA Issues; Performance Considerations. Testing / System Verilog: ECE385.

MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 3 Memory Management Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,

Advanced microprocessor optimization Kampala August, 2007 Agner Fog

Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.

Programming Languages Marjan Sirjani 2 2. Language Design Issues Design to Run efficiently : early languages Easy to write correctly : new languages.

Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.

Pin : Building Customized Program Analysis Tools with Dynamic Instrumentation Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff.

Chapter 9 Subprogram Control Consider program as a tree- –Each parent calls (transfers control to) child –Parent resumes when child completes –Copy rule.

CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.

Computer Organization and Architecture The CPU Structure.

4/23/09Prof. Hilfinger CS 164 Lecture 261 IL for Arrays & Local Optimizations Lecture 26 (Adapted from notes by R. Bodik and G. Necula)

EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.

Run-Time Storage Organization

Run time vs. Compile time

Advanced OS Chapter 3p2 Sections 3.4 / 3.5. Interrupts These enable software to respond to signals from hardware. The set of instructions to be executed.

Multiscalar processors

Intermediate Code. Local Optimizations

University of Colorado

Chapter 8 :: Subroutines and Control Abstraction

5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2.

Dynamically Linked Libraries. 2 What’s the goal? Each program you build consists of –Code you wrote –Pre-existing libraries your code accesses In early.

JIT in webkit. What’s JIT See time_compilation for more info. time_compilation.

Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,

CSC3315 (Spring 2009)1 CSC 3315 Programming Languages Hamid Harroud School of Science and Engineering, Akhawayn University

CSC 310 – Imperative Programming Languages, Spring, 2009 Virtual Machines and Threaded Intermediate Code (instead of PR Chapter 5 on Target Machine Architecture)

PMaC Performance Modeling and Characterization Performance Modeling and Analysis with PEBIL Michael Laurenzano, Ananta Tiwari, Laura Carrington Performance.

1 Instrumentation of Intel® Itanium® Linux* Programs with Pin download: Robert Cohn MMDC Intel * Other names and brands.

PMaC Performance Modeling and Characterization A Static Binary Instrumentation Threading Model for Fast Memory Trace Collection Michael Laurenzano 1, Joshua.

Memory Management 3 Tanenbaum Ch. 3 Silberschatz Ch. 8,9.

CIS250 OPERATING SYSTEMS Memory Management Since we share memory, we need to manage it Memory manager only sees the address A program counter value indicates.

Copyright © 2005 Elsevier Chapter 8 :: Subroutines and Control Abstraction Programming Language Pragmatics Michael L. Scott.

COP4020 Programming Languages Subroutines and Parameter Passing Prof. Xin Yuan.

Virtual Machines, Interpretation Techniques, and Just-In-Time Compilers Kostis Sagonas

RUN-Time Organization Compiler phase— Before writing a code generator, we must decide how to marshal the resources of the target machine (instructions,

Limits of Instruction-Level Parallelism Presentation by: Robert Duckles CSE 520 Paper being presented: Limits of Instruction-Level Parallelism David W.

Practical Path Profiling for Dynamic Optimizers Michael Bond, UT Austin Kathryn McKinley, UT Austin.

Programming Fundamentals. Topics to be covered Today Recursion Inline Functions Scope and Storage Class A simple class Constructor Destructor.

Branch Prediction Prof. Mikko H. Lipasti University of Wisconsin-Madison Lecture notes based on notes by John P. Shen Updated by Mikko Lipasti.

CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.

Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 31 Memory Management.

Procedures and Functions Procedures and Functions – subprograms – are named fragments of program they can be called from numerous places  within a main.

Building Programs from Existing Information Solutions for programs often can be developed from previously solved problems. Data requirements and solution.

1 ROGUE Dynamic Optimization Framework Using Pin Vijay Janapa Reddi PhD. Candidate - Electrical And Computer Engineering University of Colorado at Boulder.

LECTURE 19 Subroutines and Parameter Passing. ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments.

7-Nov Fall 2001: copyright ©T. Pearce, D. Hutchinson, L. Marshall Oct lecture23-24-hll-interrupts 1 High Level Language vs. Assembly.

Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore

Smalltalk Implementation Harry Porter, October 2009 Smalltalk Implementation: Optimization Techniques Prof. Harry Porter Portland State University 1.

William Stallings Computer Organization and Architecture 8th Edition

ENERGY 211 / CME 211 Lecture 25 November 17, 2008.

The University of Adelaide, School of Computer Science

Chapter 5 Conclusion CIS 61.

5.2 Eleven Advanced Optimizations of Cache Performance

Design III Chapter 13 9/20/2018 Crowley OS Chap. 13.

Chapter 9 :: Subroutines and Control Abstraction

Chap. 8 :: Subroutines and Control Abstraction

Chap. 8 :: Subroutines and Control Abstraction

Memory Management Tasks

Computer Organization and Design Assembly & Compilation

Binding Times Binding is an association between two things Examples:

Sampoorani, Sivakumar and Joshua

Lecture 4: Instruction Set Design/Pipelining

RUN-TIME STORAGE Chuen-Liang Chen Department of Computer Science

Dynamic Binary Translators and Instrumenters

Presentation transcript:

San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation C.K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V.J. Reddi, K. Hazelwood Presented by: Michael Laurenzano

San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC What is Program Instrumentation? Inserting extra code into an application to observe its behavior –Example: Cache Simulation for (int i = 0; i < LENGTH; i++) { CacheSim(&A[i]); A[i] = (double)i; CacheSim(&B[i]); B[i] = (double)i; CacheSim(&C[i]); C[i] = (double)i; }

San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Uses of Program Instrumentation Code Profiles –Basic block/Instruction count –Operation results Microarchitectural study –Branch outcomes –Memory addresses Bug checking –Memory leaks –Uninitialized data

San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Pin System Layout

San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Pin System Layout The code being analyzed

San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Pin System Layout The code being analyzed Tells us where and how to perform analysis

San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Pin System Layout The code being analyzed Tells us where and how to perform analysis Combines application and pintool code to create instrumented code

San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Pin System Layout The code being analyzed Tells us where and how to perform analysis Combines application and pintool code to create instrumented code Stores the Instrumented code created by the JIT

San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Pin System Layout The code being analyzed Tells us where and how to perform analysis Combines application and pintool code to create instrumented code Stores the Instrumented code created by the JIT Controls execution, maintains data structures, tracks program state

San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Simplified Instrumentation Transfer control to VM at an application control transfer Look for instrumented version of branch target in code cache –If found: execute instrumented code –If not: compile the code, insert into code cache, execute new code Repeat

San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Trace Linking Transfer control directly between traces –Branch target must be known statically –Target trace must be present in code cache Sequence 1 Trace 1 Trace 2 Virtual Machine Trace 1 Trace 2 Sequence 2 Regular Execution Pin w/o Trace Linking Pin w/ Trace Linking

San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Trace Linking (Indirect) “Unknown” targets are usually somewhat predictable –Function typically returns to a few locations (few call sites) –Indirect Jump usually goes to a few locations Try several predicted targets to see if we can avoid VM intervention –Short target lists are maintained for each indirect branch –If we exhaust this list, use the VM

San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Function Cloning Most common indirect control transfer is a function return Create a function instance for each call site –Return address is then unique and known for each function instance –Turns this indirect control transfer into a direct control transfer –Code bloat! Implemented by keeping a call stack for each instrumented instruction sequence –Keep last 4 in call stack –Call stack represented as a 64-bit integer

San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Register Bindings Register re-allocation occurs so that Pin can use registers –The register bindings can be different from one trace to the next When compiling, keep register bindings from the previous trace if possible When linking traces, modify the register bindings before going to the next trace –Usually only a few registers are mismatched in practice

San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Optimization – Inlined Analysis Routines Without InliningWith Inlining Application Bridge Routine Bridge Routine Analysis Routine -2 fewer calls and 2 fewer returns Application Bridge Code Analysis Code Bridge Code Application -Other optimizations: constant folding, code relocation

San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Optimization – eflags Register Liveness The x86 eflags register is treated as a bit-vector containing state information –This register can be modified as a side- effect of some instructions eflags might not be live when we reach analysis routine –If this is the case, we do not need to save/restore it

San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Optimization – Call Scheduling User can specify that the routine be put anywhere in the particular scope –Anywhere in instruction, basic block, function, program, etc. Pin can schedule the call according to best performance –Perhaps at a point where few registers need to be saved –How well will this actually work?

San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Basic Pin Overhead

San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Effectiveness of Optimizations

San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Questions?