11/23/2015© 2002-09 Hal Perkins & UW CSEQ-1 CSE P 501 – Compilers Introduction to Optimization Hal Perkins Autumn 2009.

Slides:



Advertisements
Similar presentations
Code Optimization, Part II Regional Techniques Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp.
Advertisements

Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Lecture 11: Code Optimization CS 540 George Mason University.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
Control-Flow Graphs & Dataflow Analysis CS153: Compilers Greg Morrisett.
SSA.
Program Representations. Representing programs Goals.
1 Data flow analysis Goal : collect information about how a procedure manipulates its data This information is used in various optimizations For example,
Introduction to Code Optimization Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.
6/9/2015© Hal Perkins & UW CSEU-1 CSE P 501 – Compilers SSA Hal Perkins Winter 2008.
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
4/23/09Prof. Hilfinger CS 164 Lecture 261 IL for Arrays & Local Optimizations Lecture 26 (Adapted from notes by R. Bodik and G. Necula)
4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
CS 412/413 Spring 2007Introduction to Compilers1 Lecture 29: Control Flow Analysis 9 Apr 07 CS412/413 Introduction to Compilers Tim Teitelbaum.
1 Copy Propagation What does it mean? – Given an assignment x = y, replace later uses of x with uses of y, provided there are no intervening assignments.
Prof. Fateman CS 164 Lecture 221 Global Optimization Lecture 22.
Intermediate Code. Local Optimizations
Improving Code Generation Honors Compilers April 16 th 2002.
Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...
Introduction to Optimization Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Prof. Fateman CS164 Lecture 211 Local Optimizations Lecture 21.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
PSUCS322 HM 1 Languages and Compiler Design II IR Code Optimization Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU.
Prof. Bodik CS 164 Lecture 16, Fall Global Optimization Lecture 16.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
10/1/2015© Hal Perkins & UW CSEG-1 CSE P 501 – Compilers Intermediate Representations Hal Perkins Autumn 2009.
What’s in an optimizing compiler?
Introduction to Optimization, II Value Numbering & Larger Scopes Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
First Principles (with examples from value numbering) C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon,
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
1 June 4, June 4, 2016June 4, 2016June 4, 2016 Azusa, CA Sheldon X. Liang Ph. D. Azusa Pacific University, Azusa, CA 91702, Tel: (800)
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
12/5/2002© 2002 Hal Perkins & UW CSER-1 CSE 582 – Compilers Data-flow Analysis Hal Perkins Autumn 2002.
Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved.
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.
Terminology, Principles, and Concerns, II With examples from superlocal value numbering (Ch 8 in EaC2e) Copyright 2011, Keith D. Cooper & Linda Torczon,
2/22/2016© Hal Perkins & UW CSEP-1 CSE P 501 – Compilers Register Allocation Hal Perkins Winter 2008.
3/2/2016© Hal Perkins & UW CSES-1 CSE P 501 – Compilers Optimizing Transformations Hal Perkins Autumn 2009.
Code Optimization Overview and Examples
Introduction to Optimization
High-level optimization Jakub Yaghob
Princeton University Spring 2016
CSE P501 – Compiler Construction
Machine-Independent Optimization
Topic 10: Dataflow Analysis
Introduction to Optimization
Optimizing Transformations Hal Perkins Autumn 2011
Introduction to Optimization Hal Perkins Summer 2004
Optimizing Transformations Hal Perkins Winter 2008
Optimization through Redundancy Elimination: Value Numbering at Different Scopes COMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith.
Code Optimization Overview and Examples Control Flow Graph
Static Single Assignment Form (SSA)
Dataflow Analysis Hal Perkins Winter 2008
Introduction to Optimization
Static Single Assignment
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
Lecture 19: Code Optimisation
CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019
Presentation transcript:

11/23/2015© Hal Perkins & UW CSEQ-1 CSE P 501 – Compilers Introduction to Optimization Hal Perkins Autumn 2009

11/23/2015© Hal Perkins & UW CSEQ-2 Agenda Optimization Goals Scope: local, superlocal, regional, global (intraprocedural), interprocedural Control flow graphs Value numbering Dominators Ref.: Cooper/Torczon ch. 8

11/23/2015© Hal Perkins & UW CSEQ-3 Code Improvement – How? Pick a better algorithm(!) Use machine resources effectively Instruction selection & scheduling Register allocation

11/23/2015© Hal Perkins & UW CSEQ-4 Code Improvement (2) Local optimizations – basic blocks Algebraic simplifications Constant folding Common subexpression elimination (i.e., redundancy elimination) Dead code elimination Specialize computation based on context etc., etc., …

11/23/2015© Hal Perkins & UW CSEQ-5 Code Improvement (3) Global optimizations Code motion Moving invariant computations out of loops Strength reduction (replace multiplications by repeated additions, for example) Global common subexpression elimination Global register allocation Many others…

11/23/2015© Hal Perkins & UW CSEQ-6 “Optimization” None of these improvements are truly “optimal” Hard problems Proofs of optimality assume artificial restrictions Best we can do is to improve things Most (much?) (some?) of the time

11/23/2015© Hal Perkins & UW CSEQ-7 Example: A[i,j] Without any surrounding context, need to generate code to calculate address(A) + (i-low 1 (A)) * (high 2 (A)-low 2 (a)+1) * size(A) + (j-low 2 (A)) * size(A) low i and high i are subscript bounds in dimension i address(A) is the runtime address of first element of A … And we really should be checking that i, j are in bounds

11/23/2015© Hal Perkins & UW CSEQ-8 Some Optimizations for A[i,j] With more context, we can do better Examples If A is local, with known bounds, much of the computation can be done at compile time If A[i,j] is in a loop where i and j change systematically, we probably can replace multiplications with additions each time around the loop to reference successive rows/columns Even if not, we can move “loop-invariant” parts of the calculation outside the loop

11/23/2015© Hal Perkins & UW CSEQ-9 Optimization Phase Goal Discover, at compile time, information about the runtime behavior of the program, and use that information to improve the generated code

11/23/2015© Hal Perkins & UW CSEQ-10 A First Running Example: Redundancy Elimination An expression x+y is redundant at a program point iff, along every path from the procedure’s entry, it has been evaluated and its constituent subexpressions (x and y) have not been redefined If the compiler can prove the expression is redundant Can store the result of the earlier evaluation Can replace the redundant computation with a reference to the earlier (stored) result

11/23/2015© Hal Perkins & UW CSEQ-11 Common Problems in Code Improvement This strategy is typical of most compiler optimizations First, discover opportunities through program analysis Then, modify the IR to take advantage of the opportunities Historically, goal usually was to decrease execution time Other possibilities: reduce space, power, …

Issues (1) Safety – transformation must not change program meaning Must generate correct results Can’t generate spurious errors Optimizations must be conservative Large part of analysis goes towards proving safety Can pay off to speculate (be optimistic) but then need to recover if reality is different 11/23/2015© Hal Perkins & UW CSEQ-12

11/23/2015© Hal Perkins & UW CSEQ-13 Issues (2) Profitibility If a transformation is possible, is it profitable? Example: loop unrolling Can increase amount of work done on each iteration, i.e., reduce loop overhead Can eliminate duplicate operations done on separate iterations

11/23/2015© Hal Perkins & UW CSEQ-14 Issues (3) Downside risks Even if a transformation is generally worthwhile, need to factor in potential problems For example: Transformation might need more temporaries, putting additional pressure on registers Increased code size could cause cache misses, or in bad cases, increase page working set

11/23/2015© Hal Perkins & UW CSEQ-15 Example: Value Numbering Technique for eliminating redundant expressions: assign an identifying number VN(n) to each expression VN(x+y)=VN(j) if x+y and j have the same value Use hashing over value numbers for effeciency Old idea (Balke 1968, Ershov 1954) Invented for low-level, linear IRs Equivalent methods exist for tree IRs, e.g., build a DAG

11/23/2015© Hal Perkins & UW CSEQ-16 Uses of Value Numbers Improve the code Replace redundant expressions Simplify algebraic identities Discover, fold, and propagate constant valued expressions

11/23/2015© Hal Perkins & UW CSEQ-17 Local Value Numbering Algorithm For each operation o = in a block 1. Get value numbers for operands from hash lookup 2. Hash to get a value number for o (If op is commutative, sort VN(o1), VN(o2) first) 3. If o already has a value number, replace o with a reference to the value 4. If o1 and o2 are constant, evaluate o at compile time and replace with an immediate load If hashing behaves well, this runs in linear time

11/23/2015© Hal Perkins & UW CSEQ-18 Example CodeRewritten a = x + y b = x + y a = 17 c = x + y

11/23/2015© Hal Perkins & UW CSEQ-19 Bug in Simple Example If we use the original names, we get in trouble when a name is reused Solutions Be clever about which copy of the value to use (e.g., use c=b in last statement) Create an extra temporary Rename around it (best!)

11/23/2015© Hal Perkins & UW CSEQ-20 Renaming Idea: give each value a unique name a i j means i th definition of a with VN = j Somewhat complex notation, but meaning is clear This is the idea behind SSA (Static Single Assignment) Popular modern IR – exposes many opportunities for optimizations

11/23/2015© Hal Perkins & UW CSEQ-21 Example Revisited CodeRewritten a = x + y b = x + y a = 17 c = x + y

11/23/2015© Hal Perkins & UW CSEQ-22 Simple Extensions to Value Numbering Constant folding Add a bit that records when a value is constant Evaluate constant values at compile time Replace op with load immediate Algebraic identities: x+0, x*1, x-x, … Many special cases Switch on op to narrow down checks needed Replace result with input VN

11/23/2015© Hal Perkins & UW CSEQ-23 Larger Scopes This algorithm works on straight-line blocks of code (basic blocks) Best possible results for single basic blocks Loses all information when control flows to another block To go further we need to represent multiple blocks of code and the control flow between them

11/23/2015© Hal Perkins & UW CSEQ-24 Basic Blocks Definition: A basic block is a maximal length sequence of straight-line code Properties Statements are executed sequentially If any statement executes, they all do (baring exceptions) In a linear IR, the first statement of a basic block is often called the leader

11/23/2015© Hal Perkins & UW CSEQ-25 Control Flow Graph (CFG) Nodes: basic blocks Possible representations: linear 3-address code, expression-level AST, DAG Edges: include a directed edge from n1 to n2 if there is any possible way for control to transfer from block n1 to n2 during execution

11/23/2015© Hal Perkins & UW CSEQ-26 Constructing Control Flow Graphs from Linear IRs Algorithm Pass 1: Identify basic block leaders with a linear scan of the IR Pass 2: Identify operations that end a block and add appropriate edges to the CFG to all possible successors See your favorite compiler book for details For convenience, ensure that every block ends with conditional or unconditional jump Code generator can pick the most convenient “fall- through” case later and eliminate unneeded jumps

11/23/2015© Hal Perkins & UW CSEQ-27 Scope of Optimizations Optimization algorithms can work on units as small as a basic block or as large as a whole program Local information is generally more precise and can lead to locally optimal results Global information is less precise (lose information at join points in the graph), but exposes opportunities for improvements across basic blocks

11/23/2015© Hal Perkins & UW CSEQ-28 Optimization Categories (1) Local methods Usually confined to basic blocks Simplest to analyze and understand Most precise information

11/23/2015© Hal Perkins & UW CSEQ-29 Optimization Categories (2) Superlocal methods Operate over Extended Basic Blocks (EBBs) An EBB is a set of blocks b 1, b 2, …, b n where b 1 has multiple predecessors and each of the remaining blocks b i (2≤i≤n) have only b i-1 as its unique predecessor The EBB is entered only at b 1, but may have multiple exits A single block b i can be the head of multiple EBBs (these EBBs form a tree rooted at b i ) Use information discovered in earlier blocks to improve code in successors

11/23/2015© Hal Perkins & UW CSEQ-30 Optimization Categories (3) Regional methods Operate over scopes larger than an EBB but smaller than an entire procedure/ function/method Typical example: loop body Difference from superlocal methods is that there may be merge points in the graph (i.e., a block with two or more predecessors)

11/23/2015© Hal Perkins & UW CSEQ-31 Optimization Categories (4) Global methods Operate over entire procedures Sometimes called intraprocedural methods Motivation is that local optimizations sometimes have bad consequences in larger context Procedure/method/function is a natural unit for analysis, separate compilation, etc. Almost always need global data-flow analysis information for these

11/23/2015© Hal Perkins & UW CSEQ-32 Optimization Categories (5) Whole-program methods Operate over more than one procedure Sometimes called interprocedural methods Challenges: name scoping and parameter binding issues at procedure boundaries Classic examples: inline method substitution, interprocedural constant propagation Common in aggressive JIT compilers and optimizing compilers for object-oriented languages

11/23/2015© Hal Perkins & UW CSEQ-33 Value Numbering Revisited Local Value Numbering 1 block at a time Strong local results No cross-block effects Missed opportunities m = a + b n = a + b A p = c + d r = c + d B q = a + b r = c + d C e = b + 18 s = a + b u = e + f D e = a + 17 t = c + d u = e + f E v = a + b w = c + d x = e + f F y = a + b z = c + d G

11/23/2015© Hal Perkins & UW CSEQ-34 Superlocal Value Numbering Idea: apply local method to EBBs {A,B}, {A,C,D}, {A,C,E} Final info from A is initial info for B, C; final info from C is initial for D, E Gets reuse from ancestors Avoid reanalyzing A, C Doesn’t help with F, G m = a + b n = a + b A p = c + d r = c + d B q = a + b r = c + d C e = b + 18 s = a + b u = e + f D e = a + 17 t = c + d u = e + f E v = a + b w = c + d x = e + f F y = a + b z = c + d G

11/23/2015© Hal Perkins & UW CSEQ-35 SSA Name Space (from before) CodeRewritten a 0 3 = x y 0 2 b 0 3 = x y 0 2 b 0 3 = a 0 3 a 1 4 = 17a 1 4 = 17 c 0 3 = x y 0 2 c 0 3 = a 0 3 Unique name for each definition Name  VN a 0 3 is available to assign to c 0 3

11/23/2015© Hal Perkins & UW CSEQ-36 SSA Name Space Two Principles Each name is defined by exactly one operation Each operand refers to exactly one definition Need to deal with merge points Add Φ functions at merge points to reconcile names Use subscripts on variable names for uniqueness

11/23/2015© Hal Perkins & UW CSEQ-37 Superlocal Value Numbering with All Bells & Whistles Finds more redundancies Little extra cost Still does nothing for F and G m 0 = a 0 + b 0 n 0 = a 0 + b 0 A p 0 = c 0 + d 0 r 0 = c 0 + d 0 B q 0 = a 0 + b 0 r 1 = c 0 + d 0 C e 0 = b s 0 = a 0 + b 0 u 0 = e 0 + f 0 D e 1 = a t 0 = c 0 + d 0 u 1 = e 1 + f 0 E e 2 = Φ(e 0,e 1 ) u 2 = Φ(u 0,u 1 ) v 0 = a 0 + b 0 w 0 = c 0 + d 0 x 0 = e 2 + f 0 F r 2 = Φ(r 0,r 1 ) y 0 = a 0 + b 0 z 0 = c 0 + d 0 G

11/23/2015© Hal Perkins & UW CSEQ-38 Larger Scopes Still have not helped F and G Problem: multiple predecessors Must decide what facts hold in F and in G For G, combine B & F? Merging states is expensive Fall back on what we know m 0 = a 0 + b 0 n 0 = a 0 + b 0 A p 0 = c 0 + d 0 r 0 = c 0 + d 0 B q 0 = a 0 + b 0 r 1 = c 0 + d 0 C e 0 = b s 0 = a 0 + b 0 u 0 = e 0 + f 0 D e 1 = a t 0 = c 0 + d 0 u 1 = e 1 + f 0 E e 2 = Φ(e 0,e 1 ) u 2 = Φ(u 0,u 1 ) v 0 = a 0 + b 0 w 0 = c 0 + d 0 x 0 = e 2 + f 0 F r 2 = Φ(r 0,r 1 ) y 0 = a 0 + b 0 z 0 = c 0 + d 0 G

11/23/2015© Hal Perkins & UW CSEQ-39 Dominators Definition x dominates y iff every path from the entry of the control-flow graph to y includes x By definition, x dominates x Associate a Dom set with each node | Dom(x) | ≥ 1 Many uses in analysis and transformation Finding loops, building SSA form, code motion

11/23/2015© Hal Perkins & UW CSEQ-40 Immediate Dominators For any node x, there is a y in Dom(x) closest to x This is the immediate dominator of x Notation: IDom(x)

11/23/2015© Hal Perkins & UW CSEQ-41 Dominator Sets m 0 = a 0 + b 0 n 0 = a 0 + b 0 A p 0 = c 0 + d 0 r 0 = c 0 + d 0 B q 0 = a 0 + b 0 r 1 = c 0 + d 0 C e 0 = b s 0 = a 0 + b 0 u 0 = e 0 + f 0 D e 1 = a t 0 = c 0 + d 0 u 1 = e 1 + f 0 E e 2 = Φ(e 0,e 1 ) u 2 = Φ(u 0,u 1 ) v 0 = a 0 + b 0 w 0 = c 0 + d 0 x 0 = e 2 + f 0 F r 2 = Φ(r 0,r 1 ) y 0 = a 0 + b 0 z 0 = c 0 + d 0 G Block Dom IDom

11/23/2015© Hal Perkins & UW CSEQ-42 Dominator Value Numbering m 0 = a 0 + b 0 n 0 = a 0 + b 0 A p 0 = c 0 + d 0 r 0 = c 0 + d 0 B q 0 = a 0 + b 0 r 1 = c 0 + d 0 C e 0 = b s 0 = a 0 + b 0 u 0 = e 0 + f 0 D e 1 = a t 0 = c 0 + d 0 u 1 = e 1 + f 0 E e 2 = Φ(e 0,e 1 ) u 2 = Φ(u 0,u 1 ) v 0 = a 0 + b 0 w 0 = c 0 + d 0 x 0 = e 2 + f 0 F r 2 = Φ(r 0,r 1 ) y 0 = a 0 + b 0 z 0 = c 0 + d 0 G Still looking for a way to handle F and G Idea: Use info from IDom(x) to start analysis of x Use C for F and A for G Dominator VN Technique (DVNT)

11/23/2015© Hal Perkins & UW CSEQ-43 DVNT algorithm Use superlocal algorithm on extended basic blocks Use scoped hash tables & SSA name space as before Start each node with table from its IDOM No values flow along back edges (i.e., loops) Constant folding, algebraic identities as before

11/23/2015© Hal Perkins & UW CSEQ-44 Dominator Value Numbering m 0 = a 0 + b 0 n 0 = a 0 + b 0 A p 0 = c 0 + d 0 r 0 = c 0 + d 0 B q 0 = a 0 + b 0 r 1 = c 0 + d 0 C e 0 = b s 0 = a 0 + b 0 u 0 = e 0 + f 0 D e 1 = a t 0 = c 0 + d 0 u 1 = e 1 + f 0 E e 2 = Φ(e 0,e 1 ) u 2 = Φ(u 0,u 1 ) v 0 = a 0 + b 0 w 0 = c 0 + d 0 x 0 = e 2 + f 0 F r 2 = Φ(r 0,r 1 ) y 0 = a 0 + b 0 z 0 = c 0 + d 0 G Advantages Finds more redundancy Little extra cost Shortcomings Misses some opportunities (common calculations in ancestors that are not IDOMs) Doesn’t handle loops or other back edges

11/23/2015© Hal Perkins & UW CSEQ-45 The Story So Far… Local algorithm Superlocal extension Some local methods extend cleanly to superlocal scopes Dominator VN Technique (DVNT) All of these propagate along forward edges None are global

11/23/2015© Hal Perkins & UW CSEQ-46 Coming Attractions Data-flow analysis Provides global solution to redundant expression analysis Catches some things missed by DVNT, but misses some others Generalizes to many other analysis problems, both forward and backward Transformations A catalog of some of the things a compiler can do with the analysis information