Inlining and Devirtualization Hal Perkins Autumn 2011

Slides:



Advertisements
Similar presentations
Objects and Classes David Walker CS 320. Advanced Languages advanced programming features –ML data types, exceptions, modules, objects, concurrency,...
Advertisements

Code optimization: –A transformation to a program to make it run faster and/or take up less space –Optimization should be safe, preserve the meaning of.
Lecture 10: Part 1: OO Issues CS 540 George Mason University.
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Prof. Necula CS 164 Lecture 141 Run-time Environments Lecture 8.
6/9/2015© Hal Perkins & UW CSEU-1 CSE P 501 – Compilers SSA Hal Perkins Winter 2008.
Intermediate code generation. Code Generation Create linear representation of program Result can be machine code, assembly code, code for an abstract.
Objects and Classes David Walker CS 320. Advanced Languages advanced programming features –ML data types, exceptions, modules, objects, concurrency,...
The Use of Traces for Inlining in Java Programs Borys J. Bradel Tarek S. Abdelrahman Edward S. Rogers Sr.Department of Electrical and Computer Engineering.
1 Storage Registers vs. memory Access to registers is much faster than access to memory Goal: store as much data as possible in registers Limitations/considerations:
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
Aarhus University, 2005Esmertec AG1 Implementing Object-Oriented Virtual Machines Lars Bak & Kasper Lund Esmertec AG
Feedback: Keep, Quit, Start
Recap from last time Saw several examples of optimizations –Constant folding –Constant Prop –Copy Prop –Common Sub-expression Elim –Partial Redundancy.
CS 536 Spring Run-time organization Lecture 19.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
3/17/2008Prof. Hilfinger CS 164 Lecture 231 Run-time organization Lecture 23.
4/23/09Prof. Hilfinger CS 164 Lecture 261 IL for Arrays & Local Optimizations Lecture 26 (Adapted from notes by R. Bodik and G. Necula)
Previous finals up on the web page use them as practice problems look at them early.
Intermediate Code. Local Optimizations
Run-time Environment and Program Organization
Schedule Midterm out tomorrow, due by next Monday Final during finals week Project updates next week.
Objects and Classes David Walker CS 320. Advanced Languages advanced programming features –ML data types, exceptions, modules, objects, concurrency,...
Composing Dataflow Analyses and Transformations Sorin Lerner (University of Washington) David Grove (IBM T.J. Watson) Craig Chambers (University of Washington)
Procedure Optimizations and Interprocedural Analysis Chapter 15, 19 Mooly Sagiv.
Adaptive Optimization in the Jalapeño JVM M. Arnold, S. Fink, D. Grove, M. Hind, P. Sweeney Presented by Andrew Cove Spring 2006.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1.
Adaptive Optimization in the Jalapeño JVM Matthew Arnold Stephen Fink David Grove Michael Hind Peter F. Sweeney Source: UIUC.
Runtime Environments Compiler Construction Chapter 7.
O VERVIEW OF THE IBM J AVA J UST - IN -T IME C OMPILER Presenters: Zhenhua Liu, Sanjeev Singh 1.
Adaptive Optimization with On-Stack Replacement Stephen J. Fink IBM T.J. Watson Research Center Feng Qian (presenter) Sable Research Group, McGill University.
The Procedure Abstraction, Part VI: Inheritance in OOLs Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
Spring 2014Jim Hogg - UW - CSE - P501X1-1 CSE P501 – Compiler Construction Inlining Devirtualization.
Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08.
Virtual Support for Dynamic Join Points C. Bockisch, M. Haupt, M. Mezini, K. Ostermann Presented by Itai Sharon
Targeted Path Profiling : Lower Overhead Path Profiling for Staged Dynamic Optimization Systems Rahul Joshi, UIUC Michael Bond*, UT Austin Craig Zilles,
Targeted Path Profiling : Lower Overhead Path Profiling for Staged Dynamic Optimization Systems Rahul Joshi, UIUC Michael Bond*, UT Austin Craig Zilles,
CSE 598c – Virtual Machines Survey Proposal: Improving Performance for the JVM Sandra Rueda.
CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Method Profiling John Cavazos University.
CS412/413 Introduction to Compilers and Translators April 2, 1999 Lecture 24: Introduction to Optimization.
A Single Intermediate Language That Supports Multiple Implemtntation of Exceptions Delvin Defoe Washington University in Saint Louis Department of Computer.
Code Optimization Overview and Examples
Code Optimization.
Data Flow Analysis Suman Jana
Compositional Pointer and Escape Analysis for Java Programs
Run-time organization
Multi-Dispatch in the Java™ Virtual Machine
Threads and Memory Models Hal Perkins Autumn 2011
Optimizing Transformations Hal Perkins Autumn 2011
CSc 453 Interpreters & Interpretation
Adaptive Code Unloading for Resource-Constrained JVMs
Towards JIT compiler for IO language Dynamic mixin optimization
Inlining and Devirtualization Hal Perkins Autumn 2009
Threads and Memory Models Hal Perkins Autumn 2009
Optimizing Transformations Hal Perkins Winter 2008
Adaptive Optimization in the Jalapeño JVM
Code Optimization Overview and Examples Control Flow Graph
Register Allocation Hal Perkins Autumn 2005
Exam Topics Hal Perkins Autumn 2009
Instruction Level Parallelism (ILP)
Lecture 9 Dynamic Compilation
자바 언어를 위한 정적 분석 (Static Analyses for Java) ‘99 한국정보과학회 가을학술발표회 튜토리얼
CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019
Procedure Linkages Standard procedure linkage Procedure has
CSc 453 Interpreters & Interpretation
Dynamic Binary Translators and Instrumenters
Exam Topics Hal Perkins Winter 2008
Presentation transcript:

Inlining and Devirtualization Hal Perkins Autumn 2011 CSE P 501 – Compilers Inlining and Devirtualization Hal Perkins Autumn 2011 12/5/2018 © 2002-11 Hal Perkins & UW CSE

References Adaptive Online Context-Sensitive Inlining Hazelwood and Grove, ICG 2003 A Study of Devirtualization Techniques for a Java JIT Compiler Ishizaki, et al, OOPSLA 2000 Slides by Vijay Menon, CSE 501, Sp09 12/5/2018 © 2002-11 Hal Perkins & UW CSE

Inlining long res; void foo(long x) { res = 2 * x; } void bar() { 12/5/2018 © 2002-11 Hal Perkins & UW CSE

Benefits Reduction on function invocation overhead No marshalling / unmarshalling parameters and return values Better instruction cache locality Expanded optimization opportunities CSE, constant propagation, unreachable code elimination, ... Poor man’s interprocedural optimization 12/5/2018 © 2002-11 Hal Perkins & UW CSE

Costs Code size Compilation time Typically expands overall program size Can hurt instruction cache Compilation time Larger methods can lead to more expensive compilation, more complex control flow 12/5/2018 © 2002-11 Hal Perkins & UW CSE

Language / runtime aspects What is the cost of a function call? C: cheap, Java: moderate, Python: expensive Are targets resolved at compile time or run time? C: compile time; Java, Python: run time Is the whole program available for analysis? Is profile information available? 12/5/2018 © 2002-11 Hal Perkins & UW CSE

When to inline? Jikes RVM (with Hazelwood/Grove adaptations): Call Instruction Sequence (CIS) = # of instructions to make call Tiny (function size < 2x call size): Always inline Small (2-5x): Inline subject to space constraints Medium (5-25x): Inline if hot (subject to space constraints) Large : Never inline 12/5/2018 © 2002-11 Hal Perkins & UW CSE

Gathering profile info Counter-based: Instrument edges in CFG Entry + loop back edges Enough edges (enough to get good results without excessive overhead) Expensive - typically removed in optimized code Call stack sampling Periodically walk stack Interrupt-based or instrumentation-based 12/5/2018 © 2002-11 Hal Perkins & UW CSE

Object-oriented languages OO encourages lots of small methods getters, setters, ... Inlining is a requirement for performance High call overhead wrt total execution Limited scope for compiler optimizations without it For Java, if you’re going to anything, do this! But ... virtual methods are a challenge 12/5/2018 © 2002-11 Hal Perkins & UW CSE

Virtual methods class A { int foo() { return 0; } int bar() { return 1; } } class B extends A { int foo() { return 2; } void baz(A x) { y = x.foo(); z = x.bar(); In general, we cannot determine the target until runtime Some languages (e.g., Java) allow dynamic class loading: all subclasses of A may not be visible until runtime 12/5/2018 © 2002-11 Hal Perkins & UW CSE

Virtual tables Object layout in a JVM: 12/5/2018 © 2002-11 Hal Perkins & UW CSE

Virtual method dispatch Source: y = x.foo(); z = x.bar(); x is the receiver object For a receiver object with a runtime type of B, t2 will refer to B::foo. t1 = ldvtable x t2 = ldvirtfunaddr t1, A::foo t3 = call [t2] (x) t4 = ldvtable x t5 = ldvirtfunaddr t4, A::bar t6 = call [t4] (x) 12/5/2018 © 2002-11 Hal Perkins & UW CSE

Devirtualization Goal: virtual calls to static calls in compiler Benefits: enables inlining, lowers call overhead, better branch prediction on calls Often optimistic: Make guess at compile time Test guess at run time Fall back to virtual call if necessary 12/5/2018 © 2002-11 Hal Perkins & UW CSE

Guarded devirtualization Guess receiver type is B (based on profile or other information) Call to B::foo is statically known - can be inlined But guard inhibits optimization t1 = ldvtable x t7 = getvtable B if t1 == t7 t3 = call B::foo(x) else t2 = ldvirtfunaddr t1, A::foo t3 = call [t2] (x) ... 12/5/2018 © 2002-11 Hal Perkins & UW CSE

Guarded by method test Guess that method is B:foo outside guard t1 = ldvtable x t2 = ldvirtfunaddr t1 t7 = getfunaddr B::foo if t2 == t7 t3 = call B::foo(x) else t2 = ldvirtfunaddr t1, A::foo t3 = call [t2] (x) ... Guess that method is B:foo outside guard More robust, but more overhead Harder to optimize redundant guards 12/5/2018 © 2002-11 Hal Perkins & UW CSE

How to guess receiver? Profile information Class hierarchy analysis Record call site targets and / or frequently executed methods at run time Class hierarchy analysis Walk class hierarchy at compile time Type analysis Intra / interprocedural data flow analysis 12/5/2018 © 2002-11 Hal Perkins & UW CSE

Class hierarchy analysis Walk class hierarchy at compilation time If only one implementation of a method (i.e., in the base class), devirtualize to that target Not guaranteed in the presence of class loading Still need runtime test / fallback 12/5/2018 © 2002-11 Hal Perkins & UW CSE

Flow sensitive type analysis Perform a forward dataflow analysis propagating type information. At each use site, compute the possible set of types. At call sites, use type information of receiver to narrow targets. A a1 = new B(); a1.foo(); if (a2 instanceof C) a2.bar(); 12/5/2018 © 2002-11 Hal Perkins & UW CSE

Alternatives to guarding Guarding impose overheads run-time test on every call, merge points impede optimization Often “know” only one target is invoked call site is monomorphic Alternative: compile without guards recover as assumption is violated (e.g, class load) cheaper runtime test vs more costly recovery 12/5/2018 © 2002-11 Hal Perkins & UW CSE

Recompilation approach Optimistically assume current class hierarchy will never change wrt a call Devirtualize and/or inline call sites without guard On violating class load, recompile caller method Recompiled code installed before new class New invocations will call de-optimized code What about current invocations? 12/5/2018 © 2002-11 Hal Perkins & UW CSE

Preexistence analysis Idea: if the receiver object pre-existed the caller method invocation, then the call site is only affected by a class load in future invocations. If new class C is loaded during execution of baz, x cannot have type C: void baz(A x) { ... // C loaded here x.bar(); } 12/5/2018 © 2002-11 Hal Perkins & UW CSE

Code-patching Pre-generate fallback virtual call out of line On invalidating class load, overwrite direct call / inlined code with a jump to the fallback code Must be thread-safe! On x86, single write within a cache line is atomic No recompilation necessary 12/5/2018 © 2002-11 Hal Perkins & UW CSE

Patching goto fallback next: ... fallback: t3 = 2 // B::foo next: ... fallback: t2 = ldvirtfunaddr t1, A::foo t3 = call [t2] (x) goto next goto fallback 12/5/2018 © 2002-11 Hal Perkins & UW CSE