Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab.

Slides:

Advertisements

Similar presentations

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.

Advertisements

P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.

Computer Science 313 – Advanced Programming Topics.

Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.

ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto

CS412/413 Introduction to Compilers Radu Rugina Lecture 37: DU Chains and SSA Form 29 Apr 02.

1 Programming Languages (CS 550) Lecture Summary Functional Programming and Operational Semantics for Scheme Jeremy R. Johnson.

Components of representation Control dependencies: sequencing of operations –evaluation of if & then –side-effects of statements occur in right order Data.

Program Representations. Representing programs Goals.

Lightweight Abstraction for Mathematical Computation in Java 1 Pavel Bourdykine and Stephen M. Watt Department of Computer Science Western University London.

CSE341: Programming Languages Lecture 6 Tail Recursion, Accumulators, Exceptions Dan Grossman Fall 2011.

Introduction to Code Optimization Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.

Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.

Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.

Feedback: Keep, Quit, Start

Previous finals up on the web page use them as practice problems look at them early.

Another example p := &x; *p := 5 y := x + 1;. Another example p := &x; *p := 5 y := x + 1; x := 5; *p := 3 y := x + 1; ???

CSE S. Tanimoto Syntax and Types 1 Representation, Syntax, Paradigms, Types Representation Formal Syntax Paradigms Data Types Type Inference.

From last time: Inlining pros and cons Pros –eliminate overhead of call/return sequence –eliminate overhead of passing args & returning results –can optimize.

Introduction to Optimization Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

Recursion Chapter 7. Chapter 7: Recursion2 Chapter Objectives To understand how to think recursively To learn how to trace a recursive method To learn.

Schedule Midterm out tomorrow, due by next Monday Final during finals week Project updates next week.

Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.

Composing Dataflow Analyses and Transformations Sorin Lerner (University of Washington) David Grove (IBM T.J. Watson) Craig Chambers (University of Washington)

Precision Going back to constant prop, in what cases would we lose precision?

Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Back 2 Basics with Proto Jonathan Bachrach MIT AI Lab.

Chapter Twenty-ThreeModern Programming Languages1 Formal Semantics.

Formal Semantics Chapter Twenty-ThreeModern Programming Languages, 2nd ed.1.

CSE 425: Data Types I Data and Data Types Data may be more abstract than their representation –E.g., integer (unbounded) vs. 64-bit int (bounded) A language.

1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”

CS 598 Scripting Languages Design and Implementation 14. Self Compilers.

17NOV01 LL1 Rethinking LL: JSE & Proto Jonathan Bachrach MIT AI Lab.

Compiler Optimizations ECE 454 Computer Systems Programming Topics: The Role of the Compiler Common Compiler (Automatic) Code Optimizations Cristiana Amza.

OODL Runtime Optimizations Jonathan Bachrach MIT AI Lab Feb 2001.

Types and Programming Languages Lecture 11 Simon Gay Department of Computing Science University of Glasgow 2006/07.

1/33 Basic Scheme February 8, 2007 Compound expressions Rules of evaluation Creating procedures by capturing common patterns.

CS412/413 Introduction to Compilers and Translators April 2, 1999 Lecture 24: Introduction to Optimization.

LLVM IR, File - Praakrit Pradhan. Overview The LLVM bitcode has essentially two things A bitstream container format Encoding of LLVM IR.

Functional Programming

CSE341: Programming Languages Lecture 21 Dynamic Dispatch Precisely, and Manually in Racket Dan Grossman Spring 2016.

Introduction to Optimization

Code Optimization.

CSE341: Programming Languages Lecture 21 Dynamic Dispatch Precisely, and Manually in Racket Dan Grossman Winter 2013.

CSE341: Programming Languages Lecture 21 Dynamic Dispatch Precisely, and Manually in Racket Dan Grossman Spring 2017.

6.001 SICP Compilation Context: special purpose vs. universal machines

CSE341: Programming Languages Lecture 6 Nested Patterns Exceptions Tail Recursion Dan Grossman Spring 2017.

Optimization Code Optimization ©SoftMoore Consulting.

CSE341: Programming Languages Lecture 21 Dynamic Dispatch Precisely, and Manually in Racket Dan Grossman Autumn 2017.

.NET and .NET Core 5.2 Type Operations Pan Wuming 2016.

Introduction to Optimization

Code Generation Part III

CSE341: Programming Languages Lecture 21 Dynamic Dispatch Precisely, and Manually in Racket Dan Grossman Autumn 2018.

Inlining and Devirtualization Hal Perkins Autumn 2011

CSE341: Programming Languages Lecture 6 Nested Patterns Exceptions Tail Recursion Dan Grossman Spring 2013.

Representation, Syntax, Paradigms, Types

CSE341: Programming Languages Lecture 21 Dynamic Dispatch Precisely, and Manually in Racket Zach Tatlock Winter 2018.

CSE341: Programming Languages Section 9 Dynamic Dispatch Manually in Racket Zach Tatlock Winter 2018.

Code Generation Part III

CSE341: Programming Languages Lecture 6 Nested Patterns Exceptions Tail Recursion Dan Grossman Autumn 2018.

Introduction to Optimization

CSE341: Programming Languages Lecture 6 Nested Patterns Exceptions Tail Recursion Zach Tatlock Winter 2018.

Intermediate Code Generation

CSE341: Programming Languages Lecture 6 Nested Patterns Exceptions Tail Recursion Dan Grossman Spring 2016.

CSE 373: Data Structures and Algorithms

CSE341: Programming Languages Lecture 6 Nested Patterns Exceptions Tail Recursion Dan Grossman Spring 2019.

Brett Wortzman Summer 2019 Slides originally created by Dan Grossman

CSE341: Programming Languages Lecture 21 Dynamic Dispatch Precisely, and Manually in Racket Dan Grossman Spring 2019.

CSE341: Programming Languages Lecture 6 Nested Patterns Exceptions Tail Recursion Dan Grossman Autumn 2017.

Presentation transcript:

Intraprocedural Optimizations Jonathan Bachrach MIT AI Lab

Outline Goal: eliminate abstraction overhead using static analysis and program transformation Topics: –Intraprocedural type inference –Static method selection –Specialization and Inlining –Static class prediction –Splitting –Box/unboxing –Common Subexpression Elimination –Overflow and range checks –Partial evaluation revisited Partially based on: Chambers’ “Efficient Implementation of Object- oriented Programming Languages” OOPSLA Tutorial

Running Example (dg + ((x ) (y ) => )) (dm + ((x ) (y ) => ) (%ib (%i+ (%iu x) (%iu y))) (dm + ((x ) (y ) => ) (%fb (%f+ (%fu x) (%fu y))) (dm x2 ((x ) => ) (+ x x)) (dm x2 ((x ) => ) (+ x x)) Anatomy of Pure Proto Arithmetic –Dispatch –Boxing –Overflow checks –Actual instruction C Arithmetic –Actual instruction

Biggest Inefficiencies Method dispatch Method calls Boxing Type checks Overflow and range checks Slot access Object creation

Intraprocedural Type Inference Goal: determine concrete class(es) of each variable and expression Standard data flow analysis through control graph –Propagate bindings b -> { class … } –Sources are literals, isa expressions, results of some primitives, and type declarations –Form unions of bindings at merge points –Narrow sets after typecases –Assumes closed world (or at least final classes)

Type Inference Example (set x (isa …)) ;; x in { } (set y (table-growth-factor x));; y in { } (set z (if t x y));; z in { }

Narrowing Type Precision (if (isa? x ) (+ x 1) (+ x 37.0)) (if (isa? x ) (let (([x ] x)) (+ x 1)) (let (([x ! ] x)) (+ x 37.0)))

Static Method Selection (set x (isa …)) ;; x in { } (set y (table-growth-factor x));; y in { } (print out y) If only one class is statically possible then can perform dispatch statically: (set y ( :table-growth-factor x)) If a couple classes are statically possible then can insert typecase: (sel (class-of y) (( ) ( :print y)) (( ) ( :print y)))

Type Check Removal Type inference can clearly be used to remove type checks and casts (set x (isa …)) ;; x in { } (if (isa? x ) (go) (stop)) ==> (set x (isa …)) ;; x in { } (go)

Intraprocedural Type Inference Critique Pros: –Simple –Fast –Fewer dependents Cons: –Limited type precision No result types Incoming arg types No slot types Etc.

Specialization Q: How can we improve intraprocedural type inference precision? A: Specialization which is the cloning of methods with narrowed argument types Improves type precision of callee by contextualizing body: (dm sqr ((x ) (y )) (* x y)) ==> (dm sqr ((x ) (y )) (* x y)) Must make sure super calls still mean same thing

Specialization of Constructors Crucial to get object creation to be fast Specialization can be used to build custom constructors (def (isa )) (slot thingy-x 0) (slot (t ) thingy-tracker (+ (thingy-x t) 1)) (slot thingy-cache (fab )) (df thingy-isa (x tracker cache) (let ((thingy (clone ))) (unless (== x nul) (set (%slot-value thingy thingy-x) x)) (set (%slot-value thingy thingy-tracker) (if (== tracker nul) (+ (thingy-x p) 1) tracker)))) (set (%slot-value thingy thingy-cache) (if (== cache nul) (fab ) cache))))

Inlining Q: Can we do better? A: Inlining can improve specialization by inserting specialized body Improves type precision at call-site by contextualizing body (includes result types): (dm f ((x ) (y )) (+ (g x y) 1)) (dm g (x y) (+ x y)) ==> (dm f ((x ) (y )) (+ (+ x y) 1))

Synergy: Method Selection + Inlining (df f ((x ) (y )) (+ x y)) ;; method selection (df f ((x ) (y )) ( :+ x y)) ;; inlining (df f ((x ) (y )) (%ib (%i+ (%iu x) (%iu y))))

Pitfalls of Inlining and Specialization Must control inlining and specialization carefully to avoid code bloat Inlining can work merely using syntactic size trying never to increase size over original call Class-centric specialization usually works by copying down inherited methods tightening up self references (harder for multimethods) Can run inlining/specialization trials based on –Final static size –Performance feedback

Class Centric Specialization (def (isa )) (slot (point-x ) 0) (dm point-move ((p ) (offset )) (set (point-x p) (+ (point-x p) offset))) (def (isa )) ==> (dm point-move ((p ) (offset )) (set (point-x p) (+ (point-x p) offset)))

Static Class Prediction Can improve type precision in cases where for a given generic a particular method is much more frequent Insert type check testing prediction –Can narrow type precision along then and else branches Especially useful in combination with inlining

Static Class Prediction Example (df f (x) (let ((y (+ x 1))) (+ y 2))) (df f (x) (let ((y (if (isa? x ) (+ x 1) (+ x 1)))) (if (isa? y ) (+ y 2) (+ y 2))))) (df f (x) (let ((y (if (isa? x ) ( :+ x 1) (+ x 1)))) (if (isa? y ) ( :+ y 2) (+ y 2)))))

Synergy: Class Prediction + Method Selection + Inlining (df f (x) (let ((y (if (isa? x ) (+ x 1) (+ x 1)))) (if (isa? y ) (+ y 2) (+ y 2))))) ;; method selection (df f (x) (let ((y (if (isa? x ) ( :+ x 1) (+ x 1)))) (if (isa? y ) ( :+ y 2) (+ y 2))))) ;; inlining (df f (x) (let ((y (if (isa? x ) (%ib (%i+ (%iu x) %1)) (+ x 1)))) (if (isa? y ) (%ib (%i+ (%iu y) (%iu 2))) (+ y 2)))))

Splitting Problem: Class prediction often leads to a bunch of redundant type tests Solution: Split off whole sections of graph specialized to particular class on variable –Can split off entire loops –Can specialize on other dataflow information

Splitting Example (df f (x) (let ((y (+ x 1))) (+ y 2))) (df f (x) (if (isa? x ) (let ((y (+ x 1))) (+ y 2)) (let ((y (+ x 1))) (+ y 2)))) (df f (x) (if (isa? x ) (let ((y ( :+ x 1))) ( :+ y 2)) (let ((y (+ x 1))) (+ y 2))))

Splitting Downside Splitting can also lead to code bloat Must be intelligent about what to split –A priori knowledge (e.g., integers most frequent) –Actual performance

Box / Unboxing (df + ((x ) (y ) => ) (%ib (%i+ (%iu x) (%iu y)))) (df f ((a ) (b ) => ) (+ (+ a b) a)) ;; inlining + (df f ((a ) (b ) => ) (%ib (%i+ (%iu (%ib (%i+ (%iu a) (%iu b)))) (%iu a)))) ;; remove box/unbox pair (df f ((a ) (b ) => ) (%ib (%i+ (%i+ (%iu a) (%iu b)) (%iu a))))

Synergy: Splitting + Method Selection + Inlining + Box/Unboxing (df f (x) (if (isa? x ) (let ((y (+ x 1))) (+ y 2)) (let ((y (+ x 1))) (+ y 2)))) ;; method selection (df f (x) (if (isa? x ) (let ((y ( :+ x 1))) ( :+ y 2)) (let ((y (+ x 1))) (+ y 2)))) (df f (x) (if (isa? x ) ( :+ ( :+ x 1) 2) (let ((y (+ x 1))) (+ y 2)))) ;; inlining (df f (x) (if (isa? x ) (%ib (i+ (%iu (%ib (%i+ (%iu x) %1)))) %2)) (let ((y (+ x 1))) (+ y 2)))) ;; box/unbox (df f (x) (if (isa? x ) (%ib (%i+ (%i+ (%iu x) %1)) %2)) (let ((y (+ x 1))) (+ y 2))))

Common Subexpression Elimination (CSE) Removes redundant computations –Constant slot or binding access –Stateless/side-effect-free function calls Examples (or (elt (cache x) ‘a) (elt (cache x) ‘b)) ==> (let ((t (cache x))) (or (elt t ‘a) (elt t ‘b)) (if (< i 0) (if (< i 0) (go) (putz)) (dance)) ==> (if (< i 0) (go) (dance))

Overflow and Bounds Checks aka “Moon Challenge” Goal: –Support mathematical integers and bounds checked collection access –Eliminate bounds and overflow checks Strategy: –Assume most integer arithmetic and collection accesses occur in restricted loop context where range can be readily inferred –Perform range analysis to remove checks Bound from above variables by size of collection Bound from below variables by zero Induction step is 1+

Range Check Example (rep (((sum ) 0) ((i ) 0)) (if (< i (len v)) (let ((e (elt v i))) (rep (+ sum e) (+ i 1))) sum)) ;; inlining bounds checks (rep (((sum ) 0) ((i ) 0)) (if (< i (len v)) (let ((e (if (or (< i 0) (>= i (len v))) (sig...) (vref v i)))) (rep (+ sum e) (+ i 1))) sum)) ;; CSE (rep (((sum ) 0) ((i ) 0)) (if (< i (len v)) (let ((e (if (< i 0) (sig...) (vref v i)))) (rep (+ sum e) (+ i 1))) sum)) ;; range analysis (rep (((sum ) 0) ((i ) 0)) (if (< i (len v)) (let ((e (vref v i))) (rep (+ sum e) (+ i 1))) sum))

Overflow Check Removal aka “Moon Challenge” Critique Pros: –simple analysis Cons: –could miss a number of cases but then previous approaches (e.g., box/unbox) could be applied

Advanced topic: Representation Selection Embed objects in others to remove indirections Change object representation over time Use minimum number of bits to represent enums Pack fields in objects

Advanced Topic: Algorithm Selection Goal: compiler determines that one algorithm is more appropriate for given data –Sorted data –Biased data Solution: –Embed statistics gathering in runtime –Add guards to code and split

Rule-based Compilation First millennium compilers were based on special rules for –Method selection –Pattern matching –Oft-used system functions like format Problems –Error prone –Don’t generalize to user code Challenge –Minimize number of rules –Competitive compiler speed –Produce competitive code

Partial Evaluation to the Rescue Holy grail idea: –Optimizations are manifest in code –Do previous optimizations with only p.e. Simplify compiler based on limited moves –Static eval and folding –Inlining Eliminate –Custom method selection –Custom constructor optimization –Etc.

Partial Eval Example (dm format (port msg (args …)) (rep nxt ((I 0) (ai 0)) (when (< I (len msg))) (let ((c (elt msg I))) (if (= c #\%) (seq (print port (elt args ai)) (nxt (+ I 1) (+ ai 1)))) (seq (write port c) (nxt (+ I 1) ai))))))) (format out “%>? ” n) First millennium solution is to have a custom optimizer for format (seq (print port n) (write port “> “)) Second millennium solution with partial evaluation (nxt 0 0) (seq (print port n) (nxt 1 1)) (seq (print port n) (seq (write port #\>) (nxt 2 1))) (seq (print port n) (seq (write port #\>) (seq (write port #\space))))

Partial Eval Challenge Inlining and static eval are slow –“Running” code through inlining –Need to compile oft-used optimizations Residual code is not necessarily efficient –Sometimes algorithmic change is necessary for optimal efficiency Example: method selection uses class numbering and decision tree whereas straightforward code does naïve method sorting Perhaps there is a middle ground

Open Problems Automatic inlining, splitting, and specialization Efficient mathematical integers Constant determination Representation selection Algorithmic selection Efficient partial evaluation Super compiler that runs for days

Reading List Chambers: “Efficient Implementation of Object-oriented Programming Languages” OOPSLA Tutorial Chambers and Ungar: SELF papers Chambers et al.: Vortex papers