Introduction to Optimizations

Slides:



Advertisements
Similar presentations
Example of Constructing the DAG (1)t 1 := 4 * iStep (1):create node 4 and i 0 Step (2):create node Step (3):attach identifier t 1 (2)t 2 := a[t 1 ]Step.
Advertisements

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Lecture 11: Code Optimization CS 540 George Mason University.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
CMPUT Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic 5: Peep Hole Optimization José Nelson Amaral
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
1 Chapter 8: Code Generation. 2 Generating Instructions from Three-address Code Example: D = (A*B)+C =* A B T1 =+ T1 C T2 = T2 D.
Jeffrey D. Ullman Stanford University. 2  A never-published Stanford technical report by Fran Allen in  Fran won the Turing award in  Flow.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
Control-Flow Graphs & Dataflow Analysis CS153: Compilers Greg Morrisett.
SSA.
Chapter 10 Code Optimization. A main goal is to achieve a better performance Front End Code Gen Intermediate Code source Code target Code user Machine-
1 Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
Advanced Compiler Techniques LIU Xianhua School of EECS, Peking University Control Flow Analysis & Local Optimizations.
Program Representations. Representing programs Goals.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Partial Redundancy Elimination Guo, Yao.
U NIVERSITY OF M ASSACHUSETTS, A MHERST D EPARTMENT OF C OMPUTER S CIENCE Advanced Compilers CMPSCI 710 Spring 2003 Lecture 2 Emery Berger University of.
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
Peephole Optimization Final pass over generated code: examine a few consecutive instructions: 2 to 4 See if an obvious replacement is possible: store/load.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
4/23/09Prof. Hilfinger CS 164 Lecture 261 IL for Arrays & Local Optimizations Lecture 26 (Adapted from notes by R. Bodik and G. Necula)
Code Generation Professor Yihjia Tsai Tamkang University.
2015/6/24\course\cpeg421-10F\Topic1-b.ppt1 Topic 1b: Flow Analysis Some slides come from Prof. J. N. Amaral
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
CS 412/413 Spring 2007Introduction to Compilers1 Lecture 29: Control Flow Analysis 9 Apr 07 CS412/413 Introduction to Compilers Tim Teitelbaum.
1 COMP 144 Programming Language Concepts Felix Hernandez-Campos Lecture 34: Code Optimization COMP 144 Programming Language Concepts Spring 2002 Felix.
Intermediate Code. Local Optimizations
Improving Code Generation Honors Compilers April 16 th 2002.
Prof. Fateman CS164 Lecture 211 Local Optimizations Lecture 21.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
1 CS 201 Compiler Construction Data Flow Analysis.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Introduction For some compiler, the intermediate code is a pseudo code of a virtual machine. Interpreter of the virtual machine is invoked to execute the.
1 Code Generation Part II Chapter 8 (1 st ed. Ch.9) COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
1 Code Generation Part II Chapter 9 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
CPSC 388 – Compiler Design and Construction Optimization.
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
1 CS 201 Compiler Construction Introduction. 2 Instructor Information Rajiv Gupta Office: WCH Room Tel: (951) Office.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 1 Developed By:
Cleaning up the CFG Eliminating useless nodes & edges C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon,
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.
Cleaning up the CFG Eliminating useless nodes & edges This lecture describes the algorithm Clean, presented in Chapter 10 of EaC2e. The algorithm is due.
CS412/413 Introduction to Compilers and Translators April 2, 1999 Lecture 24: Introduction to Optimization.
©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.
Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
Code Optimization Overview and Examples
High-level optimization Jakub Yaghob
Code Optimization.
Optimization Code Optimization ©SoftMoore Consulting.
Coding Design, Style, Documentation and Optimization
Unit IV Code Generation
Topic 4: Flow Analysis Some slides come from Prof. J. N. Amaral
Advanced Compiler Techniques
Code Optimization Overview and Examples Control Flow Graph
Optimizations using SSA
Interval Partitioning of a Flow Graph
8 Code Generation Topics A simple code generator algorithm
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
Intermediate Code Generation
Code Generation Part II
Code Optimization.
Presentation transcript:

Introduction to Optimizations Guo, Yao

“Advanced Compiler Techniques” Outline Optimization Rules Basic Blocks Control Flow Graph (CFG) Loops Local Optimizations Peephole optimization Fall 2011 “Advanced Compiler Techniques”

Levels of Optimizations Local inside a basic block Global (intraprocedural) Across basic blocks Whole procedure analysis Interprocedural Across procedures Whole program analysis Fall 2011 “Advanced Compiler Techniques”

The Golden Rules of Optimization Premature Optimization is Evil Donald Knuth, premature optimization is the root of all evil Optimization can introduce new, subtle bugs Optimization usually makes code harder to understand and maintain Get your code right first, then, if really needed, optimize it Document optimizations carefully Keep the non-optimized version handy, or even as a comment in your code Fall 2011 “Advanced Compiler Techniques”

The Golden Rules of Optimization The 80/20 Rule In general, 80% percent of a program’s execution time is spent executing 20% of the code 90%/10% for performance-hungry programs Spend your time optimizing the important 10/20% of your program Optimize the common case even at the cost of making the uncommon case slower Fall 2011 “Advanced Compiler Techniques”

The Golden Rules of Optimization Good Algorithms Rule The best and most important way of optimizing a program is using good algorithms E.g. O(n*log) rather than O(n2) However, we still need lower level optimization to get more of our programs In addition, asymptotic complexity is not always an appropriate metric of efficiency Hidden constant may be misleading E.g. a linear time algorithm than runs in 100*n+100 time is slower than a cubic time algorithm than runs in n3+10 time if the problem size is small asymptotic complexity: 渐进时间复杂度 Fall 2011 “Advanced Compiler Techniques”

Asymptotic Complexity Hidden Constants Fall 2011 “Advanced Compiler Techniques”

General Optimization Techniques Strength reduction Use the fastest version of an operation E.g. x >> 2 instead of x / 4 x << 1 instead of x * 2 Common sub expression elimination Eliminate redundant calculations double x = d * (lim / max) * sx; double y = d * (lim / max) * sy; double depth = d * (lim / max); double x = depth * sx; double y = depth * sy; Fall 2011 “Advanced Compiler Techniques”

General Optimization Techniques Code motion Invariant expressions should be executed only once E.g. for (int i = 0; i < x.length; i++) x[i] *= Math.PI * Math.cos(y); double picosy = Math.PI * Math.cos(y); x[i] *= picosy; Fall 2011 “Advanced Compiler Techniques”

General Optimization Techniques Loop unrolling The overhead of the loop control code can be reduced by executing more than one iteration in the body of the loop. E.g. double picosy = Math.PI * Math.cos(y); for (int i = 0; i < x.length; i++) x[i] *= picosy; for (int i = 0; i < x.length; i += 2) { x[i+1] *= picosy; } A efficient “+1” in array indexing is required Fall 2011 “Advanced Compiler Techniques”

Compiler Optimizations Compilers try to generate good code i.e. Fast Code improvement is challenging Many problems are NP-hard Code improvement may slow down the compilation process In some domains, such as just-in-time compilation, compilation speed is critical Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Phases of Compilation The first three phases are language-dependent The last two are machine-dependent The middle two dependent on neither the language nor the machine Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Phases Why is loop optimization important? 80/20 rule Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Outline Optimization Rules Basic Blocks Control Flow Graph (CFG) Loops Local Optimizations Peephole optmization Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Basic Blocks A basic block is a maximal sequence of consecutive three-address instructions with the following properties: The flow of control can only enter the basic block thru the 1st instr. Control will leave the block without halting or branching, except possibly at the last instr. Basic blocks become the nodes of a flow graph, with edges indicating the order. What happens if an interrupt happens??? Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Examples i = 1 j = 1 t1 = 10 * i t2 = t1 + j t3 = 8 * t2 t4 = t3 - 88 a[t4] = 0.0 j = j + 1 if j <= 10 goto (3) i = i + 1 if i <= 10 goto (2) t5 = i - 1 t6 = 88 * t5 a[t6] = 1.0 if i <= 10 goto (13) for i from 1 to 10 do for j from 1 to 10 do a[i,j]=0.0 a[i,i]=0.0 Fall 2011 “Advanced Compiler Techniques”

Identifying Basic Blocks Input: sequence of instructions instr(i) Output: A list of basic blocks Method: Identify leaders: the first instruction of a basic block Iterate: add subsequent instructions to basic block until we reach another leader Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Identifying Leaders Rules for finding leaders in code First instr in the code is a leader Any instr that is the target of a (conditional or unconditional) jump is a leader Any instr that immediately follow a (conditional or unconditional) jump is a leader Fall 2011 “Advanced Compiler Techniques”

Basic Block Partition Algorithm leaders = {1} // start of program for i = 1 to |n| // all instructions if instr(i) is a branch leaders = leaders U targets of instr(i) U instr(i+1) worklist = leaders While worklist not empty x = first instruction in worklist worklist = worklist – {x} block(x) = {x} for i = x + 1; i <= |n| && i not in leaders; i++ block(x) = block(x) U {i} Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Basic Block Example A i = 1 j = 1 t1 = 10 * i t2 = t1 + j t3 = 8 * t2 t4 = t3 - 88 a[t4] = 0.0 j = j + 1 if j <= 10 goto (3) i = i + 1 if i <= 10 goto (2) t5 = i - 1 t6 = 88 * t5 a[t6] = 1.0 if i <= 10 goto (13) B C Leaders Basic Blocks D E F Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Outline Optimization Rules Basic Blocks Control Flow Graph (CFG) Loops Local Optimizations Peephole optmization Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Control-Flow Graphs Control-flow graph: Node: an instruction or sequence of instructions (a basic block) Two instructions i, j in same basic block iff execution of i guarantees execution of j Directed edge: potential flow of control Distinguished start node Entry & Exit First & last instruction in program Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Control-Flow Edges Basic blocks = nodes Edges: Add directed edge between B1 and B2 if: Branch from last statement of B1 to first statement of B2 (B2 is a leader), or B2 immediately follows B1 in program order and B1 does not end with unconditional branch (goto) Definition of predecessor and successor B1 is a predecessor of B2 B2 is a successor of B1 Fall 2011 “Advanced Compiler Techniques”

Control-Flow Edge Algorithm Input: block(i), sequence of basic blocks Output: CFG where nodes are basic blocks for i = 1 to the number of blocks x = last instruction of block(i) if instr(x) is a branch for each target y of instr(x), create edge (i -> y) if instr(x) is not unconditional branch, create edge (i -> i+1) Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” CFG Example Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Loops Loops comes from while, do-while, for, goto…… Loop definition: A set of nodes L in a CFG is a loop if There is a node called the loop entry: no other node in L has a predecessor outside L. Every node in L has a nonempty path (within L) to the entry of L. Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Loop Examples {B3} {B6} {B2, B3, B4} Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Identifying Loops Motivation majority of runtime focus optimization on loop bodies! remove redundant code, replace expensive operations ) speed up program Finding loops: easy… i = 1; j = 1; k = 1; A1: if i > 1000 goto L1; A2: if j > 1000 goto L2; A3: if k > 1000 goto L3; do something k = k + 1; goto A3; L3: j = j + 1; goto A2; L2: i = i + 1; goto A1; L1: halt for i = 1 to 1000 for j = 1 to 1000 for k = 1 to 1000 do something or harder (GOTOs) Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Outline Optimization Rules Basic Blocks Control Flow Graph (CFG) Loops Local Optimizations Peephole optmization Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Local Optimization Optimization of basic blocks §8.5 Fall 2011 “Advanced Compiler Techniques”

Transformations on basic blocks Common subexpression elimination: recognize redundant computations, replace with single temporary Dead-code elimination: recognize computations not used subsequently, remove quadruples Interchange statements, for better scheduling Renaming of temporaries, for better register usage All of the above require symbolic execution of the basic block, to obtain def/use information Fall 2011 “Advanced Compiler Techniques”

Simple symbolic interpretation: next-use information If x is computed in statement i, and is an operand of statement j, j > i, its value must be preserved (register or memory) until j. If x is computed at k, k > i, the value computed at i has no further use, and be discarded (i.e. register reused) Next-use information is annotated over statements and symbol table. Computed on one backwards pass over statement. Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Next-Use Information Definitions Statement i assigns a value to x; Statement j has x as an operand; Control can flow from i to j along a path with no intervening assignments to x; Statement j uses the value of x computed at statement i. i.e., x is live at statement i. Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Computing next-use Use symbol table to annotate status of variables Each operand in a statement carries additional information: Operand liveness (boolean) Operand next use (later statement) On exit from block, all temporaries are dead (no next-use) Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Algorithm INPUT: a basic block B OUTPUT: at each statement i: x=y op z in B, create liveness and next-use for x, y, z METHOD: for each statement in B (backward) Retrieve liveness & next-use info from a table Set x to “not live” and “no next-use” Set y, z to “live” and the next uses of y,z to “i” Note: step 2 & 3 cannot be interchanged. E.g., x = x + y Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Example x = 1 y = 1 x = x + y z = y x = y + z Exit: x: live, 6 y: not live z: not live Fall 2011 “Advanced Compiler Techniques”

Computing dependencies in a basic block: the DAG Use directed acyclic graph (DAG) to recognize common subexpressions and remove redundant quadruples. Intermediate code optimization: basic block => DAG => improved block => assembly Leaves are labeled with identifiers and constants. Internal nodes are labeled with operators and identifiers Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” DAG construction Forward pass over basic block For x = y op z; Find node labeled y, or create one Find node labeled z, or create one Create new node for op, or find an existing one with descendants y, z (need hash scheme) Add x to list of labels for new node Remove label x from node on which it appeared For x = y; Add x to list of labels of node which currently holds y Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” DAG Example Transform a basic block into a DAG. + - b0 c0 d0 a b, d c a = b + c b = a – d c = b + c d = a - d Fall 2011 “Advanced Compiler Techniques”

Local Common Subexpr. (LCS) Suppose b is not live on exit. a = b + c b = a – d c = b + c d = a - d c + b, d - + a d0 a = b + c d = a – d c = d + c b0 c0 Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” LCS: another example + - b0 c0 d0 a b e c a = b + c b = b – d c = c + d e = b + c Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Common subexp Programmers don’t produce common subexpressions, code generators do! Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Dead Code Elimination Delete any root that has no live variables attached a = b + c b = b – d c = c + d e = b + c + - b0 c0 d0 a b e c On exit: a, b live c, e not live a = b + c b = b – d Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Outline Optimization Rules Basic Blocks Control Flow Graph (CFG) Loops Local Optimizations Peephole optmization Fall 2011 “Advanced Compiler Techniques”

Peephole Optimization Dragon§8.7 Introduction to peephole Common techniques Algebraic identities An example Fall 2011 “Advanced Compiler Techniques”

Peephole Optimization Simple compiler do not perform machine-independent code improvement They generates naive code It is possible to take the target hole and optimize it Sub-optimal sequences of instructions that match an optimization pattern are transformed into optimal sequences of instructions This technique is known as peephole optimization Peephole optimization usually works by sliding a window of several instructions (a peephole) Fall 2011 “Advanced Compiler Techniques”

Peephole Optimization Goals: - improve performance - reduce memory footprint - reduce code size Method: 1. Exam short sequences of target instructions 2. Replacing the sequence by a more efficient one. redundant-instruction elimination algebraic simplifications flow-of-control optimizations use of machine idioms Fall 2011 “Advanced Compiler Techniques”

Peephole Optimization Common Techniques 常量叠算(Constant folding),还是叫常量合并,常量折叠 Fall 2011 “Advanced Compiler Techniques”

Peephole Optimization Common Techniques Fall 2011 “Advanced Compiler Techniques”

Peephole Optimization Common Techniques Fall 2011 “Advanced Compiler Techniques”

Peephole Optimization Common Techniques Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Algebraic identities Worth recognizing single instructions with a constant operand Eliminate computations A * 1 = A A * 0 = 0 A / 1 = A Reduce strenth A * 2 = A + A A/2 = A * 0.5 Constant folding 2 * 3.14 = 6.28 More delicate with floating-point algebraic identity 代数恒等式; 代数等价。 Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Is this ever helpful? Why would anyone write X * 1? Why bother to correct such obvious junk code? In fact one might write #define MAX_TASKS 1 ... a = b * MAX_TASKS; Also, seemingly redundant code can be produced by other optimizations. This is an important effect. Fall 2011 “Advanced Compiler Techniques”

Replace Multiply by Shift A := A * 4; Can be replaced by 2-bit left shift (signed/unsigned) But must worry about overflow if language does A := A / 4; If unsigned, can replace with shift right But shift right arithmetic is a well-known problem Language may allow it anyway (traditional C) Fall 2011 “Advanced Compiler Techniques”

The Right Shift problem Arithmetic Right shift: shift right and use sign bit to fill most significant bits -5 111111...1111111011 SAR 111111...1111111101 which is -3, not -2 in most languages -5/2 = -2 Fall 2011 “Advanced Compiler Techniques”

Addition chains for multiplication If multiply is very slow (or on a machine with no multiply instruction like the original SPARC), decomposing a constant operand into sum of powers of two can be effective: X * 125 = x * 128 - x*4 + x two shifts, one subtract and one add, which may be faster than one multiply Note similarity with efficient exponentiation method Fall 2011 “Advanced Compiler Techniques”

Flow-of-control optimizations goto L1 . . . L1: goto L2 goto L2 . . . L1: goto L2 if a < b goto L1 . . . L1: goto L2 if a < b goto L2 . . . L1: goto L2 goto L1 . . . L1: if a < b goto L2 L3: if a < b goto L2 goto L3 . . . L3: Fall 2011 “Advanced Compiler Techniques”

Peephole Opt: an Example debug = 0 . . . if(debug) { print debugging information } Source Code: debug = 0 . . . if debug = 1 goto L1 goto L2 L1: print debugging information L2: Intermediate Code: Fall 2011 “Advanced Compiler Techniques”

Eliminate Jump after Jump debug = 0 . . . if debug = 1 goto L1 goto L2 L1: print debugging information L2: Before: debug = 0 . . . if debug  1 goto L2 print debugging information L2: After: Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Constant Propagation debug = 0 . . . if debug  1 goto L2 print debugging information L2: Before: debug = 0 . . . if 0  1 goto L2 print debugging information L2: After: Fall 2011 “Advanced Compiler Techniques”

Unreachable Code (dead code elimination) debug = 0 . . . if 0  1 goto L2 print debugging information L2: Before: debug = 0 . . . After: Fall 2011 “Advanced Compiler Techniques”

Peephole Optimization Summary Peephole optimization is very fast Small overhead per instruction since they use a small, fixed-size window It is often easier to generate naïve code and run peephole optimization than generating good code! Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Summary Introduction to optimization Basic knowledge Basic blocks Control-flow graphs Local Optimizations Peephole optimizations Fall 2011 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Next Time Dataflow analysis Dragon§9.2 Fall 2011 “Advanced Compiler Techniques”