1 Classical Optimization Types of classical optimizations  Operation level: one operation in isolation  Local: optimize pairs of operations in same basic.

Slides:



Advertisements
Similar presentations
Example of Constructing the DAG (1)t 1 := 4 * iStep (1):create node 4 and i 0 Step (2):create node Step (3):attach identifier t 1 (2)t 2 := a[t 1 ]Step.
Advertisements

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
7. Optimization Prof. O. Nierstrasz Lecture notes by Marcus Denker.
Course Outline Traditional Static Program Analysis Software Testing
Lecture 11: Code Optimization CS 540 George Mason University.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
Loops or Lather, Rinse, Repeat… CS153: Compilers Greg Morrisett.
19 Classic Examples of Local and Global Code Optimizations Local Constant folding Constant combining Strength reduction.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
Static Single Assignment CS 540. Spring Efficient Representations for Reachability Efficiency is measured in terms of the size of the representation.
CS412/413 Introduction to Compilers Radu Rugina Lecture 37: DU Chains and SSA Form 29 Apr 02.
Chapter 10 Code Optimization. A main goal is to achieve a better performance Front End Code Gen Intermediate Code source Code target Code user Machine-
C Chuen-Liang Chen, NTUCS&IE / 321 OPTIMIZATION Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University.
1 Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
1 Code Optimization. 2 The Code Optimizer Control flow analysis: control flow graph Data-flow analysis Transformations Front end Code generator Code optimizer.
Register Allocation CS 671 March 27, CS 671 – Spring Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.
1 Data flow analysis Goal : collect information about how a procedure manipulates its data This information is used in various optimizations For example,
6/9/2015© Hal Perkins & UW CSEU-1 CSE P 501 – Compilers SSA Hal Perkins Winter 2008.
1 CS 201 Compiler Construction Lecture 5 Code Optimizations: Copy Propagation & Elimination.
Peephole Optimization Final pass over generated code: examine a few consecutive instructions: 2 to 4 See if an obvious replacement is possible: store/load.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
9. Optimization Marcus Denker. 2 © Marcus Denker Optimization Roadmap  Introduction  Optimizations in the Back-end  The Optimizer  SSA Optimizations.
Optimization Compiler Optimization – optimizes the speed of compilation Execution Optimization – optimizes the speed of execution.
CS 412/413 Spring 2007Introduction to Compilers1 Lecture 29: Control Flow Analysis 9 Apr 07 CS412/413 Introduction to Compilers Tim Teitelbaum.
U NIVERSITY OF M ASSACHUSETTS, A MHERST D EPARTMENT OF C OMPUTER S CIENCE Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
Prof. Fateman CS 164 Lecture 221 Global Optimization Lecture 22.
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
CSE P501 – Compiler Construction
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Dataflow IV: Loop Optimizations, Exam 2 Review EECS 483 – Lecture 26 University of Michigan Wednesday, December 6, 2006.
U NIVERSITY OF M ASSACHUSETTS, A MHERST D EPARTMENT OF C OMPUTER S CIENCE Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
1 Code Generation Part II Chapter 8 (1 st ed. Ch.9) COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
1 Code Generation Part II Chapter 9 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Slide 3- 1 Rule 1 Derivative of a Constant Function.
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Static Single Assignment John Cavazos.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 1 Developed By:
Dataflow II: Finish Dataflow Analysis, Start on Classical Optimizations EECS 483 – Lecture 24 University of Michigan Wednesday, November 29, 2006.
Register Allocation CS 471 November 12, CS 471 – Fall 2007 Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.
EECS 583 – Class 8 Classic Optimization University of Michigan October 3, 2011.
3/2/2016© Hal Perkins & UW CSES-1 CSE P 501 – Compilers Optimizing Transformations Hal Perkins Autumn 2009.
CS 412/413 Spring 2005Introduction to Compilers1 CS412/CS413 Introduction to Compilers Tim Teitelbaum Lecture 30: Loop Optimizations and Pointer Analysis.
Code Optimization More Optimization Techniques. More Optimization Techniques  Loop optimization  Code motion  Strength reduction for induction variables.
Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
Code Optimization Overview and Examples
Optimization Code Optimization ©SoftMoore Consulting.
Princeton University Spring 2016
Machine-Independent Optimization
Topic 10: Dataflow Analysis
Factored Use-Def Chains and Static Single Assignment Forms
Code Generation Part III
Optimizing Transformations Hal Perkins Autumn 2011
Optimizing Transformations Hal Perkins Winter 2008
Code Optimization Overview and Examples Control Flow Graph
Code Generation Part III
Static Single Assignment Form (SSA)
EECS 583 – Class 8 Classic Optimization
Combining Like terms.
EECS 583 – Class 8 Finish SSA, Start on Classic Optimization
9. Optimization Marcus Denker.
Lecture 19: Code Optimisation
CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019
Objectives Identify advantages (and disadvantages ?) of optimizing in SSA form Given a CFG in SSA form, perform Global Constant Propagation Dead code elimination.
Objectives Identify advantages (and disadvantages ?) of optimizing in SSA form Given a CFG in SSA form, perform Global Constant Propagation Dead code elimination.
2008 Multiple Choice.
Presentation transcript:

1 Classical Optimization Types of classical optimizations  Operation level: one operation in isolation  Local: optimize pairs of operations in same basic block (with or without dataflow analysis), e.g. peephole optimization  Global: optimize pairs of operations spanning multiple basic blocks and must use dataflow analysis in this case, e.g. reaching definitions, UD/DU chains, or SSA forms  Loop: optimize loop body and nested loops

2 Local Constant Folding Goal: eliminate unnecessary operations Rules: 1. X is an arithmetic operation 2. If src1(X) and src2(X) are constant, then change X by applying the operation r7 = r5 = 2 * r4 r6 = r5 * 2 src1(X) = 4 src2(X) = 1

3 Local Constant Combining Goal: eliminate unnecessary operations  First operation often becomes dead Rules: 1. Operations X and Y in same basic block 2. X and Y have at least one literal src 3. Y uses dest(X) 4. None of the srcs of X have defs between X and Y (excluding Y) r7 = 5 r5 = 2 * r4 r6 = r5 * 2 r6 = r4 * 4

4 Local Strength Reduction Goal: replace expensive operations with cheaper ones Rules (example): 1. X is an multiplication operation where src1(X) or src2(X) is a const 2 k integer literal 2. Change X by using shift operation 3. For k=1 can use add r7 = 5 r5 = 2 * r4 r6 = r4 * 4 r6 = r4 << 2 r5 = r4 + r4

5 Local Constant Propagation Goal: replace register uses with literals (constants) in single basic block Rules: 1. Operation X is a move to register with src1(X) literal 2. Operation Y uses dest(X) 3. There is no def of dest(X) between X and Y (excluding defs at X and Y) 4. Replace dest(X) in Y with src1(X) r1 = 5 r2 = _x r3 = 7 r4 = r4 + r1 r1 = r1 + r2 r1 = r1 + 1 r3 = 12 r8 = r1 - r2 r9 = r3 + r5 r3 = r2 + 1 r7 = r3 - r1 M[r7] = 0

6 Local Common Subexpression Elimination (CSE) Goal: eliminate recomputations of an expression  More efficient code  Resulting moves can get copy propagated (see later) Rules: 1. Operations X and Y have the same opcode and Y follows X 2. src(X) = src(Y) for all srcs 3. For all srcs, no def of a src between X and Y (excluding Y) 4. No def of dest(X) between X and Y (excluding X and Y) 5. Replace Y with move dest(Y) = dest(X) r1 = r2 + r3 r4 = r4 + 1 r1 = 6 r6 = r2 + r3 r2 = r1 - 1 r5 = r4 + 1 r7 = r2 + r3 r5 = r1 - 1

7 Dead Code Elimination Goal: eliminate any operation who’s result is never used Rules (dataflow required) 1. X is an operation with no use in DU chain, i.e. dest(X) is not live 2. Delete X if removable (not a mem store or branch) Rules too simple!  Misses deletion of r4, even after deleting r7, since r4 is live in loop  Better is to trace UD chains backwards from “critical” operations r4 = r4 + 1 r7 = r1 * r4 r3 = r3 + 1r2 = 0 r3 = r2 + r1 M[r1] = r3 r1 = 3 r2 = 10

8 Local Backward Copy Propagation Goal: propagate LHS of moves backward  Eliminates useless moves Rules (dataflow required) 1. X and Y in same block 2. Y is a move to register 3. dest(X) is a register that is not live out of the block 4. Y uses dest(X) 5. dest(Y) not used or defined between X and Y (excluding X and Y) 6. No uses of dest(X) after the first redef of dest(Y) 7. Replace src(Y) on path from X to Y with dest(X) and remove Y r1 = r8 + r9 r2 = r9 + r1 r4 = r2 r6 = r2 + 1 r9 = r1 r7 = r6 r5 = r6 + 1 r4 = 0 r8 = r2 + r7

9 Global Constant Propagation Goal: globally replace register uses with literals Rules (dataflow required) 1. X is a move to a register with src1(X) literal 2. Y uses dest(X) 3. dest(X) has only one def at X for UD chains to Y 4. Replace dest(X) in Y with src1(X) r5 = 2 r7 = r1 * r5 r3 = r3 + r5r2 = 0 r3 = r2 + r1 r6 = r7 * r4 M[r1] = r3 r1 = 4 r2 = 10

10 Global Constant Propagation with SSA Goal: globally replace register uses with literals Rules (high level) 1. For operation X with a register src(X) 2. Find def of src(X) in chain 3. If def is move of literal, src(X) is constant: done 4. If RHS of def is an operation, including  node, recurse on all srcs 5. Apply rule for operation to determine src(X) constant 6. Note: abstract values T (top) and  (bottom) are often used to indicate unknown values r5 = 2 r7 = r1 * r5 r3 = r3 + r5r2 = 0 r3 = r2 + r1 r6 = r7 * r4 M[r1] = r3 r1 = 4 r2 = 10 Exercise: compute SSA form and propagate constants

11 Forward Copy Propagation Goal: globally propagate RHS of moves forward  Reduces dependence chain  May be possible to eliminate moves Rules (dataflow required) 1. X is a move with src1(X) register 2. Y uses dest(X) 3. dest(X) has only one def at X for UD chains to Y 4. src1(X) has no def on any path from X to Y 5. Replace dest(X) in Y with src1(X) r1 = r2 r3 = r4 r6 = r3 + 1r2 = 0 r5 = r2 + r3

12 Global Common Subexpression Elimination (CSE) Goal: eliminate recomputations of an expression Rules: 1. X and Y have the same opcode and X dominates Y 2. src(X) = src(Y) for all srcs 3. For all srcs, no def of a src on any path between X and Y (excluding Y) 4. Insert rx = dest(X) immediately after X for new register rx 5. Replace Y with move dest(Y) = rx r1 = r2 * r6 r3 = r4 / r7 r2 = r2 + 1r1 = r3 * 7 r5 = r2 * r6 r8 = r4 / r7 r9 = r3 * 7

13 Loop Optimizations Loops are the most important target for optimization  Programs spend much time in loops Loop optimizations  Invariant code removal (aka. code motion)  Global variable migration  Induction variable strength reduction  Induction variable elimination

14 Code Motion Goal: move loop-invariant computations to preheader Rules: 1. Operation X in block that dominates all exit blocks 2. X is the only operation to modify dest(X) in loop body 3. All srcs of X have no defs in any of the basic blocks in the loop body 4. Move X to end of preheader 5. Note 1: if one src of X is a memory load, need to check for stores in loop body 6. Note 2: X must be movable and not cause exceptions r4 = M[r5] r7 = r4 * 3 r8 = r2 + 1 r7 = r8 * r4 r3 = r2 + 1 r1 = r1 + r7 M[r1] = r3 r1 = 0 preheader header

15 Global Variable Migration Goal: assign a global variable to a register for the entire duration of a loop Rules: 1. X is a load or store to M[x] 2. Address x of M[x] not modified in loop 3. Replace all M[x] in loop by new register rx 4. Add rx = M[x] to preheader 5. Add M[x] = rx to each loop exit 6. Memory disambiguation is required: all mem ops in loop whose address can equal x must use same address x r4 = M[r5] r4 = r4 + 1 r8 = M[r5] r7 = r8 * r4 M[r5] = r4 M[r5] = r7

16 Loop Strength Reduction (1) Goal: create basic IVs from derived IVs Rules 1. X is a *, <<, +, or - operation 2. src1(X) is a basic IV 3. src2(X) is invariant 4. No other ops modify dest(X) 5. dest(X) != src(X) for all srcs 6. dest(X) is a register r5 = r4 - 3 r4 = r4 + 1 r7 = r4 * r9 r6 = r4 << 2 preheader header src1(X) = r4 src2(X) = r9 dest(X) = r7 Basic IV r4 has triple ( r4, 1, ?)

17 Loop Strength Reduction (2) Transformation 1. Insert into the bottom of the preheader: new_reg = RHS(X) 2. If opcode(X) is not + or -, then insert into the bottom of the preheader: new_inc = inc(src1(X)) opcode(X) src2(X) 3. Else new_inc = inc(src1(X)) 4. Insert at each update of src1(X): new_reg += new_inc 5. Change X by: dest(X) = new_reg r5 = r4 - 3 r4 = r4 + 1 r1 = r1 + r2 r7 = r1 r6 = r4 << 2 r1 = r4 * r9 r2 = 1 * r9 Exercise: apply strength reduction to r5 and r6

18 IV Elimination (1) Goal: remove unnecessary basic IVs from the loop by substituting uses with another basic IV Rules for IVs with same increment and initial value: 1. Find two basic IV x and y 2. If x and y in same family and have same increment and initial values 3. Incremented at same place 4. x is not live at loop exit 5. For each basic block where x is defined, there are no uses of x between first/last def of x and last/first def of y 6. Replace uses of x with y r1 = r1 - 1 r2 = r2 - 1 r9 = r2 + r4r7 = r1 * r9 r4 = M[r1] M[r2] = r7 r1 = 0 r2 = 0 Exercise: apply IV elimination

19 IV Elimination (2) Many variants, from simple to complex: 1. Trivial cases: IV variable that is never used except by the increment operations and is not live at loop exit 2. IVs with same increment and same initial value 3. IVs with same increment and initial values are known constant offset from each other 4. IVs with same increment, but initial values unknown 5. IVs with different increments and no info on initial values Method 1 and 2 are virtually free, so always applied Methods 3 to 5 require preheader operations

20 IV Elimination (3) Example for method 4 r3 = M[r1+4] r4 = M[r2+8] … r1 = r1 + 4 r2 = r2 + 4 r1 = ? r2 = ? r3 = M[r1+4] r4 = M[r1+r5] … r1 = r1 + 4 r1 = ? r2 = ? r5 = r2-r1+8