Lazy Code Motion Comp 512 Spring 2011

Slides:



Advertisements
Similar presentations
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
Advertisements

Lecture 11: Code Optimization CS 540 George Mason University.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
Jeffrey D. Ullman Stanford University. 2  Generalizes: 1.Moving loop-invariant computations outside the loop. 2.Eliminating common subexpressions. 3.True.
Advanced Compiler Techniques LIU Xianhua School of EECS, Peking University Partial Redundancy Elimination.
Code Motion of Control Structures From the paper by Cytron, Lowry, and Zadeck, COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda.
SSA-Based Constant Propagation, SCP, SCCP, & the Issue of Combining Optimizations 1COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda Torczon,
The Last Lecture Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission.
1 CS 201 Compiler Construction Lecture 7 Code Optimizations: Partial Redundancy Elimination.
1 Partial Redundancy Elimination Finding the Right Place to Evaluate Expressions Four Necessary Data-Flow Problems.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Partial Redundancy Elimination Guo, Yao.
Partial Redundancy Elimination & Lazy Code Motion
Lazy Code Motion C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
Λλ Fernando Magno Quintão Pereira P ROGRAMMING L ANGUAGES L ABORATORY Universidade Federal de Minas Gerais - Department of Computer Science P ROGRAM A.
Partial Redundancy Elimination. Partial-Redundancy Elimination Minimize the number of expression evaluations By moving around the places where an expression.
1 Data flow analysis Goal : collect information about how a procedure manipulates its data This information is used in various optimizations For example,
Loop Invariant Code Motion — classical approaches — 1COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students.
Introduction to Code Optimization Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
1 Data flow analysis Goal : –collect information about how a procedure manipulates its data This information is used in various optimizations –For example,
4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
CS 412/413 Spring 2007Introduction to Compilers1 Lecture 29: Control Flow Analysis 9 Apr 07 CS412/413 Introduction to Compilers Tim Teitelbaum.
Introduction to Optimization Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Instruction Scheduling II: Beyond Basic Blocks Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp.
Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.
Prof. Bodik CS 164 Lecture 16, Fall Global Optimization Lecture 16.
1 CS 201 Compiler Construction Data Flow Analysis.
Global Common Subexpression Elimination with Data-flow Analysis Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Code Optimization, Part III Global Methods Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Computer Science & Engineering, Indian Institute of Technology, Bombay Code optimization by partial redundancy elimination using Eliminatability paths.
Structural Data-flow Analysis Algorithms: Allen-Cocke Interval Analysis Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students.
Introduction to Optimization, II Value Numbering & Larger Scopes Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
First Principles (with examples from value numbering) C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon,
Building SSA Form, III 1COMP 512, Rice University This lecture presents the problems inherent in out- of-SSA translation and some ways to solve them. Copyright.
Jeffrey D. Ullman Stanford University. 2 boolean x = true; while (x) {... // no change to x }  Doesn’t terminate.  Proof: only assignment to x is at.
Global Redundancy Elimination: Computing Available Expressions Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
Cleaning up the CFG Eliminating useless nodes & edges C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon,
Algebraic Reassociation of Expressions Briggs & Cooper, “Effective Partial Redundancy Elimination,” Proceedings of the ACM SIGPLAN 1994 Conference on Programming.
1 Data Flow Analysis Data flow analysis is used to collect information about the flow of data values across basic blocks. Dominator analysis collected.
Dead Code Elimination This lecture presents the algorithm Dead from EaC2e, Chapter 10. That algorithm derives, in turn, from Rob Shillner’s unpublished.
Boolean & Relational Values Control-flow Constructs Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in.
Terminology, Principles, and Concerns, II With examples from superlocal value numbering (Ch 8 in EaC2e) Copyright 2011, Keith D. Cooper & Linda Torczon,
Cleaning up the CFG Eliminating useless nodes & edges This lecture describes the algorithm Clean, presented in Chapter 10 of EaC2e. The algorithm is due.
1 CS 201 Compiler Construction Lecture 2 Control Flow Analysis.
Building SSA Form, I 1COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2011 Data flow analysis John Cavazos University.
Definition-Use Chains
Introduction to Optimization
Data Flow Analysis Suman Jana
Lecture 5 Partial Redundancy Elimination
Finding Global Redundancies with Hopcroft’s DFA Minimization Algorithm
CSC D70: Compiler Optimization LICM: Loop Invariant Code Motion
Introduction to Optimization
University Of Virginia
Building SSA Form COMP 512 Rice University Houston, Texas Fall 2003
1. Reaching Definitions Definition d of variable v: a statement d that assigns a value to v. Use of variable v: reference to value of v in an expression.
Code Shape III Booleans, Relationals, & Control flow
Advanced Compiler Techniques
Optimization through Redundancy Elimination: Value Numbering at Different Scopes COMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith.
The Last Lecture COMP 512 Rice University Houston, Texas Fall 2003
Topic 5a Partial Redundancy Elimination and SSA Form
Introduction to Optimization
Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved.
Algebraic Reassociation of Expressions COMP 512 Rice University Houston, Texas Fall 2003 P. Briggs & K.D. Cooper, “Effective Partial Redundancy Elimination,”
CSC D70: Compiler Optimization LICM: Loop Invariant Code Motion
The Partitioning Algorithm for Detecting Congruent Expressions COMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper.
Prof. Dhananjay M Dhamdhere
Presentation transcript:

Lazy Code Motion Comp 512 Spring 2011 “Lazy Code Motion,” J. Knoop, O. Ruthing, & B. Steffen, in Proceedings of the ACM SIGPLAN 92 Conference on Programming Language Design and Implementation, June 1992. “A Variation of Knoop, Ruthing, and Steffen’s Lazy Code Motion,” K. Drechsler & M. Stadel, SIGPLAN Notices, 28(5), May 1993 § 10.3.1 of EaC2e Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved. Drechsler & Stadel give a much more complete and satisfying example. COMP 512, Rice University

Redundant Expression An expression is redundant at point p if, on every path to p 1. It is evaluated before reaching p, and 2. Non of its constitutent values is redefined before p Example a  b + c a  b + c a  b + c Some occurrences of b+c are redundant a  b + c a  b + c b  b + 1 a  b + c * COMP 512, Rice University

Partially Redundant Expression An expression is partially redundant at p if it is redundant along some, but not all, paths reaching p Example b  b + 1 a  b + c Inserting a copy of “a  b + c” after the definition of b can make it redundant fully redundant? b  b + 1 a  b + c * COMP 512, Rice University

Loop Invariant Expression Another example Loop invariant expressions are partially redundant Partial redundancy elimination performs code motion Major part of the work is figuring out where to insert operations x  y * z a  b * c x  y * z a  b * c b+c is partially redundant here COMP 512, Rice University

Lazy Code Motion The concept Solve data-flow problems that show opportunities & limits Availability & anticipability Compute INSERT & DELETE sets from solutions Linear pass over the code to rewrite it (using INSERT & DELETE) The history Partial redundancy elimination (Morel & Renvoise, CACM, 1979) Improvements by Drechsler & Stadel, Joshi & Dhamdhere, Chow, Knoop, Ruthing & Steffen, Dhamdhere, Sorkin, … All versions of PRE optimize placement Guarantee that no path is lengthened LCM was invented by Knoop et al. in PLDI, 1992 Drechsler & Stadel simplified the equations PRE and its descendants are conservative COMP 512, Rice University Citations for the critical papers are on the title page.

} Lazy Code Motion The intuitions Compute available expressions Compute anticipable expressions From Avail & Ant, we can compute an earliest placement for each expression Push expressions down the CFG until it changes behavior Assumptions Uses a lexical notion of identity (not value identity) ILOC-style code with unlimited name space Consistent, disciplined use of names Identical expressions define the same name No other expression defines that name LCM operates on expressions It moves expression evaluations, not assignments } Avoids copies Result serves as proxy COMP 512, Rice University

Lazy Code Motion The Name Space Digression in Chapter 5 of EAC: “The impact of naming” Lazy Code Motion The Name Space ri + rjrk, always, with both i < k and j < k (hash to find k) We can refer to ri + rj by rk (bit-vector sets) Variables must be set by copies No consistent definition for a variable Break the rule for this case, but require rsource < rdestination To achieve this, assign register names to variables first Without this name space LCM must insert copies to preserve redundant values LCM must compute its own map of expressions to unique ids LCM operates on expressions It moves expression evaluations, not assignments COMP 512, Rice University

We have seen all three of these previously. Lazy Code Motion Local Predicates DEEXPR(b) contains expressions defined in b that survive to the end of b (downward exposed expressions) e  DEEXPR(b)  evaluating e at the end of b produces the same value for e UEEXPR(b) contains expressions defined in b that have upward exposed arguments (both args) (upward exposed expressions) e  UEEXPR(b)  evaluating e at the start of b produces the same value for e EXPRKILL(b) contains those expressions that have one or more arguments defined (killed ) in b (killed expressions) e  EXPRKILL(b)  evaluating e produces the same result at the start and end of b We have seen all three of these previously. COMP 512, Rice University

Lazy Code Motion Availability Initialize AVAILIN(n) to the set of all names, except at n0 Set AVAILIN(n0) to Ø Interpreting AVAIL e  AVAILOUT(b)  evaluating e at end of b produces the same value for e. AVAILOUT tells the compiler how far forward e can move This differs from the way we talk about AVAIL in global redundancy elimination; the equations, however, are unchanged. AVAILIN(n) = m preds(n) AVAILOUT(m), n ≠ n0 AVAILOUT(m) = DEEXPR(m) (AVAILIN(m)  EXPRKILL(m)) COMP 512, Rice University

Lazy Code Motion Anticipability Anticipability is identical to VeryBusy expressions Lazy Code Motion Anticipability Initialize ANTOUT(n) to the set of all names, except at exit blocks Set ANTOUT(n) to Ø, for each exit block n Interpreting ANTOUT e  ANTIN(b)  evaluating e at start of b produces the same value for e. ANTIN tells the compiler how far backward e can move This view shows that anticipability is, in some sense, the inverse of availablilty (& explains the new interpretation of AVAIL) ANTOUT(n) = m succs(n) ANTIN(m), n not an exit block ANTIN(m) = UEEXPR (m) (ANTOUT(m)  EXPRKILL(m)) COMP 512, Rice University

Lazy Code Motion The intuitions Available expressions e  AVAILOUT(b )  evaluating e at exit of b gives same result e  AVAILIn(b )  e is available from every predecessor of b  an evaluation at entry of b is redundant Anticipable expressions e  ANTIN(b )  evaluating e at entry of b gives same result e  ANTOUT(b )  e is anticipable from every successor of b  evaluation at exit of b would a later evaluation redundant, on every path, so exit of b is a profitable place to insert e COMP 512, Rice University

Lazy Code Motion Earliest placement on an edge EARLIEST is a predicate Computed for edges rather than nodes (placement ) e  EARLIEST(i,j) if It can move to head of j, (ANTIN(j)) It is not available at the end of i and (EXPRKILL(i)) either it cannot move to the head of i or another edge leaving i prevents its placement in i (ANTOUT(i)) Can move e to head of j & it is not redundant from i Either killed in i or would not be busy at exit of i and EARLIEST(i,j) = ANTIN(j)  AVAILOUT(i)  (EXPRKILL(i)  ANTOUT(i)) EARLIEST(n0,j) = ANTIN(j)  AVAILOUT(n0)  insert e on the edge Partial comes from (AntIn(j)  NOT(AvailOut(I))  Not(AntOut(I)) — can move it to head of j — not redundant at end of i => (i,j) is a place where inserting it might create the “full” redundancy — not AntOut(i) => cannot move it through i (presumably, because it is not AntIn(x) for some other (i,x) COMP 512, Rice University

Lazy Code Motion Later (than earliest) placement Initialize LATERIN(n0) to Ø x  LATERIN(k)  every path that reaches k has x  EARLIEST(i,j) for some edge (i,j) leading to x, and the path from the entry of j to k is x-clear & does not evaluate x  the compiler can move x through k without losing any benefit x  LATER(i,j)  <i,j> is its earliest placement, or it can be moved forward from i (LATER(i)) and placement at entry to i does not anticipate a use in i (moving it across the edge exposes that use) LATERIN(j) = i pred(j) LATER(i,j), j ≠ n0 LATER(i,j) = EARLIEST(i,j) (LATERIN(i)  UEEXPR(i)) COMP 512, Rice University Propagate forward until a block kills it (UEEXPR)

Lazy Code Motion Rewriting the code INSERT & DELETE are predicates Compiler uses them to guide the rewrite step x  INSERT(i,j)  insert x at start of j, end of i, or new block x  DELETE(k)  delete first evaluation of x in k Can go on the edge but not in j  no later placement INSERT(i,j) = LATER(i,j)  LATERIN(j) DELETE(k) = UEEXPR(k)  LATERIN(k), k ≠ n0 Upward exposed (so we will cover it) & not an evaluation that might be used later If local redundancy elimination has already been performed, only one copy of x exists. Otherwise, remove all upward exposed copies of x * COMP 512, Rice University

Lazy Code Motion Edge placement x  INSERT(i,j) Three cases |succs(i)| = 1  insert at end of i | succs(i)| > 1, but |preds(j)| = 1 insert at start of j | succs(i)| > 1, & |preds(j)| > 1  create new block in <i,j> for x Bi Bj |succs(i)| = 1 x |preds(j)| = 1 Bi Bj Bk x |succs(i) > 1 |preds(j)| > 1 Bi Bj Bk Bh x A “Critical” Edge COMP 512, Rice University

Lazy Code Motion Example r2  r0 + @m B1 if r1< r2  B2,B3 B1: r1  1 r2  r0 + @m if r1< r2  B2,B3 B2: … r20  r17 * r18 ... r4  r1 + 1 r1  r4 if r1 < r2  B2,B3 B3: ... B1 B2 B3 Critical edge rule will create landing pad when needed, as on edge (B1,B2) Example is too small to show off Later Insert(1,2) = { r20 } Delete(2) = { r20 } COMP 512, Rice University