CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic G: Static Single- Assignment Form José Nelson Amaral

Slides:



Advertisements
Similar presentations
Topic G: Static Single-Assignment Form José Nelson Amaral
Advertisements

Static Single-Assignment ? ? Introduction: Over last few years [1991] SSA has been Stablished as… Intermediate program representation.
SSA and CPS CS153: Compilers Greg Morrisett. Monadic Form vs CFGs Consider CFG available exp. analysis: statement gen's kill's x:=v 1 p v 2 x:=v 1 p v.
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
8. Static Single Assignment Form Marcus Denker. © Marcus Denker SSA Roadmap  Static Single Assignment Form (SSA)  Converting to SSA Form  Examples.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
SSA.
CS412/413 Introduction to Compilers Radu Rugina Lecture 37: DU Chains and SSA Form 29 Apr 02.
Components of representation Control dependencies: sequencing of operations –evaluation of if & then –side-effects of statements occur in right order Data.
Stanford University CS243 Winter 2006 Wei Li 1 SSA.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Static Single Assignment John Cavazos.
1 Constant Propagation for Loops with Factored Use-Def Chains Reporter : Lai, Yen-Chang.
Advanced Compiler Design – Assignment 1 SSA Construction & Destruction Michael Fäs (Original Slides by Luca Della Toffola)
6/9/2015© Hal Perkins & UW CSEU-1 CSE P 501 – Compilers SSA Hal Perkins Winter 2008.
CS 536 Spring Global Optimizations Lecture 23.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
CS745: SSA© Seth Copen Goldstein & Todd C. Mowry Static Single Assignment.
U NIVERSITY OF M ASSACHUSETTS, A MHERST D EPARTMENT OF C OMPUTER S CIENCE Advanced Compilers CMPSCI 710 Spring 2003 Computing SSA Emery Berger University.
4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)
1 Intermediate representation Goals: encode knowledge about the program facilitate analysis facilitate retargeting facilitate optimization scanning parsing.
CS 201 Compiler Construction
CMPUT Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic H: SSA for Predicated Code José Nelson Amaral
2015/6/24\course\cpeg421-10F\Topic1-b.ppt1 Topic 1b: Flow Analysis Some slides come from Prof. J. N. Amaral
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
Prof. Fateman CS 164 Lecture 221 Global Optimization Lecture 22.
1 CS 201 Compiler Construction Lecture 9 Static Single Assignment Form.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Loops Guo, Yao.
Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
Prof. Bodik CS 164 Lecture 16, Fall Global Optimization Lecture 16.
CSE P501 – Compiler Construction
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Advanced Compiler Techniques LIU Xianhua School of EECS, Peking University Static Single Assignment.
Dominators, control-dependence and SSA form. Organization Dominator relation of CFGs –postdominator relation Dominator tree Computing dominator relation.
Dataflow Analysis Topic today Data flow analysis: Section 3 of Representation and Analysis Paper (Section 3) NOTE we finished through slide 30 on Friday.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Static Single Assignment John Cavazos.
U NIVERSITY OF M ASSACHUSETTS, A MHERST D EPARTMENT OF C OMPUTER S CIENCE Advanced Compilers CMPSCI 710 Spring 2003 Dominators, etc. Emery Berger University.
Cleaning up the CFG Eliminating useless nodes & edges C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon,
Λλ Fernando Magno Quintão Pereira P ROGRAMMING L ANGUAGES L ABORATORY Universidade Federal de Minas Gerais - Department of Computer Science P ROGRAM A.
Generating SSA Form (mostly from Morgan). Why is SSA form useful? For many dataflow problems, SSA form enables sparse dataflow analysis that –yields the.
Dead Code Elimination This lecture presents the algorithm Dead from EaC2e, Chapter 10. That algorithm derives, in turn, from Rob Shillner’s unpublished.
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.
Control Flow Analysis Compiler Baojian Hua
Introduction to SSA Data-flow Analysis Revisited – Static Single Assignment (SSA) Form Liberally Borrowed from U. Delaware and Cooper and Torczon Text.
Cleaning up the CFG Eliminating useless nodes & edges This lecture describes the algorithm Clean, presented in Chapter 10 of EaC2e. The algorithm is due.
Single Static Assignment Intermediate Representation (or SSA IR) Many examples and pictures taken from Wikipedia.
Optimizing The Optimizer: Improving Data Flow Analysis in Soot Michael Batchelder 4 / 6 / 2005 A COMP-621 Class Project.
Definition-Use Chains
Static Single Assignment
© Seth Copen Goldstein & Todd C. Mowry
Efficiently Computing SSA
Topic 10: Dataflow Analysis
Factored Use-Def Chains and Static Single Assignment Forms
CS 201 Compiler Construction
CSC D70: Compiler Optimization Static Single Assignment (SSA)
Code Optimization Overview and Examples Control Flow Graph
Static Single Assignment Form (SSA)
Optimizations using SSA
Data Flow Analysis Compiler Design
Static Single Assignment
Reference These slides, with minor modification and some deletion, come from U. of Delaware – and the web, of course. 4/4/2019 CPEG421-05S/Topic5.
Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved.
Reference These slides, with minor modification and some deletion, come from U. of Delaware – and the web, of course. 4/17/2019 CPEG421-05S/Topic5.
Building SSA Harry Xu CS 142 (b) 04/22/2018.
CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019
Objectives Identify advantages (and disadvantages ?) of optimizing in SSA form Given a CFG in SSA form, perform Global Constant Propagation Dead code elimination.
Objectives Identify advantages (and disadvantages ?) of optimizing in SSA form Given a CFG in SSA form, perform Global Constant Propagation Dead code elimination.
Presentation transcript:

CMPUT Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic G: Static Single- Assignment Form José Nelson Amaral

CMPUT Compiler Design and Optimization2 Reading Material Chapter 19 of the “Tiger book” (with a grain of salt!!). Bilardi, G., Pingali, K., “The Static Single Assignment Form and its Computation,” unpublished? (citeseer). Cytron, R., Ferrante, J., Rosen, B. K., Wegman, M. N., Zadeck, F. K., “An Efficient Method of Computing Static Single Assignment Form,” ACM Symposium on Principles of Programming Languages (PoPL), pp , Austin, TX, Jan., Cytron, R., Ferrante, J., Rosen, B. K., Wegman, M. N., “Efficiently Computing Static Single Assignment Form and the Control Dependence Graph,” ACM Transactions on Programming Languages and Systems (TOPLAS), Vol. 13, No. 4, October, 1991, pp Sreedhar, V. C., Gao, G. R., “A Linear Time Algorithm for Placing  -Nodes,” ACM Symposium on Principles of Programming Languages (PoPL), pp , 1995.

CMPUT Compiler Design and Optimization3 Static Single-Assignment Form Each variable has only one definition in the program text. This single static definition can be in a loop and may be executed many times. Thus even in a program expressed in SSA, a variable can be dynamically defined many times.

CMPUT Compiler Design and Optimization4 Advantages of SSA Simpler dataflow analysis No need to use use-def/def-use chains, which requires N  M space for N uses and M definitions SSA form relates in a useful way with dominance structures. SSA simplifies algorithms that construct interference graphs.

CMPUT Compiler Design and Optimization5 SSA Form in Control-Flow Path Merges b  M[x] a  0 if b<4 a  b c  a + b B1 B2 B3 B4 Is this code in SSA form? No, two definitions of a appear in the code (in B1 and B3) How can we transform this code into a code in SSA form? We can create two versions of a, one for B1 and another for B3.

CMPUT Compiler Design and Optimization6 SSA Form in Control-Flow Path Merges b  M[x] a1  0 if b<4 a2  b c  a? + b B1 B2 B3 B4 But which version should we use in B4 now? We define a fictional function that “knows” which control path was taken to reach the basic block B4:

CMPUT Compiler Design and Optimization7 SSA Form in Control-Flow Path Merges b  M[x] a1  0 if b<4 a2  b a3   (a2,a1) c  a3 + b B1 B2 B3 B4 But which version should we use in B4 now? We define a fictional function that “knows” which control path was taken to reach the basic block B4:

CMPUT Compiler Design and Optimization8 A Loop Example a  0 b  a+1 c  c+b a  b*2 if a < N return a1  0 b0  undef c0  undef a3   (a1,a2) b1   (b0,b2) c2   (c0,c1) b2  a3+1 c1  c2+b2 a2  b2*2 if a < N return  (b0,b2) is not necessary because b1 is never used. But the phase that generates  functions does not know it. Unnecessary  functions are later eliminated by dead code elimination.

CMPUT Compiler Design and Optimization9 The  Function How can we implement a  function that “knows” which control path was taken? Answer 1: We don’t!! The  function is used only to connect use to definitions during optimization, but is never implemented. Answer 2: If we must execute the  function, we can implement it by inserting MOVE instructions in all control paths.

CMPUT Compiler Design and Optimization10 Criteria for Inserting  Functions We could insert one  function for each variable at every join point (a point in the CFG with more than one predecessor). But that would be wasteful. What criteria should we use to insert a  function for a variable a at node z of the CFG? Intuitively, we should add a function  if there are two definitions of a that can reach the point z through distinct paths.

CMPUT Compiler Design and Optimization11 Path Convergence Criterion (Cytron-Ferrante/89) Insert a  function for a variable a at a node z if all the following conditions are true: 1. There is a block x that defines a 2. There is a block y  x that defines a 3. There is a non-empty paths x  z and y  z 4. Paths x  z and y  z don’t have any nodes in common other than z 5. The node z does not appear within both x  z and y  z prior to the end, but it may appear in one or the other. Note: The start node contains an implicit definition of every variable.

CMPUT Compiler Design and Optimization12  -Candidates are Join Nodes Notice that according to the path convergence criterion, the node z that will receive the  function must be a join node. z is the first node that joins the paths P xz and P yz.

CMPUT Compiler Design and Optimization13 Iterated Path-Convergence Criterion The  function itself is a definition of a. Therefore the path-convergence criterion is a set of equations that must be satisfied. while there are nodes x, y, z satisfying conditions 1-5 and z does not contain a  function for a do insert a   (a 0, a 1, …, a n ) at node z This algorithm is extremely costly, because it requires the examination of every triple of nodes x, y, z and every path from x to z and from y to z. Can we do better?

CMPUT Compiler Design and Optimization14 The SSA Conversion Problem For each variable x defined in a CFG G=(V,E), given the set of nodes S  V that contain a definition for x, find the minimal set, J(S) of nodes that requires a  (x i,x j ) function. By definition, the START node defines all the variables, therefore  S  V, START  S. If we need to compute  nodes for several variables, it may be efficient to precompute data structures based on the CFG.

CMPUT Compiler Design and Optimization15 Processing Time for SSA Conversion The performance of an SSA conversion algorithm should be measured by the processing time Tp, the preprocessing space Sp, and the query time Tq. (Shapiro and Saint 1970): outline an algorithm (Reif and Tarjan 1981): extend the Lengauer-Tarjan dominator algorithm to compute  -nodes. (Cytron et al. 1991): show that SSA conversion can use the idea of dominance frontiers, resulting on an O(|V| 2 ) algorithm. (Sreedhar and Gao, 1995): An O(|E|) algorithm, but in private commun. with Pingali in 1996 admits that it is in practice 5 times slower than Cytron et al.

CMPUT Compiler Design and Optimization16 Processing Time for SSA Conversion Bilardi, Pingali, 1999: present a generalized framework and a parameterized Augmented Dominator Tree (ADT) algorithm that allows for a space-time tradeoff. They show that Cytron et al. and Gao-Shreedhar are special cases of the ADT algorithm. Bilardi and Pingali describe three strategies to compute  -placement: Two-Phase Algorithms Lock-Step Algorithms Lazy Algorithms

CMPUT Compiler Design and Optimization17 Two-Phase Algorithms First build the entire Dominance Frontier Graph, then find the nodes reachable from S Simple DF Graph may be quite large DF Computation Reachability DF Graph J(S) S CFG

CMPUT Compiler Design and Optimization18 Lock-Step Algorithms Performs the reachability computation incrementally while the DF relation is computed. DF Computation Reachability J(S) Avoid storing the DF Graph. Perform computations at all nodes of the graph, even though most are irrelevant Inneficient when computing the  -nodes for many variables. CFG S

CMPUT Compiler Design and Optimization19 Lazy Algorithms Lazily compute only the portion fo the DF Graph that is needed. Carefully select a portion of the DF Graph to compute eagerly (before it is needed). A Two-Phase Algorithm is an extreme case of a lazy algorithm. DF Computation Reachability DF Graph SubGraph J(S) S CFG

CMPUT Compiler Design and Optimization20 Computing a Dominator Tree (Lowry and Medlock, 1969): Introduce the problem and give an O(n 4 ) algorithm. (Lengauer and Tarjan, 1979): Give a complicated O(m  (m.n)) algorithm [  (m.n) is the inverse Ackermann’s function]. (Harel, 1985): Give a linear time algorithm. (Alstrup, Harel and Thorup, 1997): Give a simpler version of Harel’s algorithm. (n: # of nodes; m: # of edges)

CMPUT Compiler Design and Optimization21 Dominance Property of the SSA Form In SSA form definitions dominate uses, i.e.: 1. If x is used in a  function in block n, then the definition of x dominates every predecessor of n. 2. If x is used in a non-  statement in block n, then the definition of x dominates n.

CMPUT Compiler Design and Optimization22 The Dominance Frontier A node x dominates a node w if every path from the start node to w must go through x. A node x strictly dominates a node w if x dominates w and x  w. The dominance frontier of a node x is the set of all nodes w such that x dominates a predecessor of w, but x does not strictly dominates w.

CMPUT Compiler Design and Optimization23 Example What is the dominance frontier of node 5?

CMPUT Compiler Design and Optimization24 Example First we must find all nodes that node 5 dominates.

CMPUT Compiler Design and Optimization25 Example A node w is in the dominance frontier of node 5 if 5 dominates a predecessor of w, but 5 does not strictly dominates w itself. What is the dominance frontier of 5?

CMPUT Compiler Design and Optimization26 Example A node w is in the dominance frontier of node 5 if 5 dominates a predecessor of w, but 5 does not strictly dominates w itself. What is the dominance frontier of 5?

CMPUT Compiler Design and Optimization27 Example DF(5) = {4, 5, 12, 13} A node w is in the dominance frontier of node 5 if 5 dominates a predecessor of w, but 5 does not strictly dominates w itself. What is the dominance frontier of 5?

CMPUT Compiler Design and Optimization28 Dominance Frontier Criterion Dominance Frontier Criterion: If a node x contains a definition of variable a, then any node z in the dominance frontier of x needs a  function for a. Can you think of an intuitive explanation for why a node in the dominance frontier of another node must be a join node?

CMPUT Compiler Design and Optimization29 Example If a node (12) is in the dominance frontier of another node (5), than there must be at least two paths converging to (12). These paths must be non-intersecting, and one of them (5,7,12) must contain a node strictly dominated by (5).

CMPUT Compiler Design and Optimization30 Dominator Tree To compute the dominance frontiers, we first compute the dominator tree of the CFG. There is an edge from node x to node y in the dominator tree if node x immediately dominates node y. I.e., x dominates y  x, and x does not dominate any other dominator of y. Dominator trees can be computed using the Lengauer-Tarjan algorithm(1979). See sec of Appel.

CMPUT Compiler Design and Optimization31 Example: Dominator Tree Control Flow Graph Dominator Tree

CMPUT Compiler Design and Optimization32 Local Dominance Frontier Cytron-Ferrante define the local dominance frontier of a node n as: DF local [n] = successors of n in the CFG that are not strictly dominated by n

CMPUT Compiler Design and Optimization33 Example: Local Dominance Frontier Control Flow Graph In the example, what are the local dominance frontier of nodes 5, 6 and 7? DF local [5] =  DF local [6] = {4,8} DF local [7] = {8,12}

CMPUT Compiler Design and Optimization34 Dominance Frontier Inherited From Its Children The dominance frontier of a node n is formed by its local dominance frontier plus nodes that are passed up by the children of n in the dominator tree. The contribution of a node c to its parents dominance frontier is defined as [Cytron-Ferrante, 1991]: DF up [c] = nodes in the dominance frontier of c that are not strictly dominated by the immediate dominator of c

CMPUT Compiler Design and Optimization35 Example: Local Dominance Frontier Control Flow Graph In the example, what are the contributions of nodes 6, 7, and 8 to its parent dominance frontier? First we compute the DF and the immediate dominator of each node: DF[6] = {4,8}, idom(6)= 5 DF[7] = {8,12}, idom(7)= 5 DF[8] = {5,13}, idom(8)= 5

CMPUT Compiler Design and Optimization36 Example: Local Dominance Frontier Control Flow Graph First we compute the DF and the immediate dominator of each node: DF[6] = {4,8}, idom(6)= 5 DF[7] = {8,12}, idom(7)= 5 DF[8] = {5,13}, idom(8)= 5 Now we check for the DF up condition: DF up [6] = {4} DF up [7] = {12} DF up [8] = {5,13}

CMPUT Compiler Design and Optimization37 A note on implementation We want to represent these sets efficiently: DF[6] = {4,8} DF[7] = {8,12} DF[8] = {5,13} If we use bitvectors to represent these sets: DF[6] = DF[7] = DF[8] =

CMPUT Compiler Design and Optimization38 Strictly Dominated Sets Dominator Tree We can also represent the strictly dominated sets as vectors: SD[1] = SD[2] = SD[5] = SD[9] =

CMPUT Compiler Design and Optimization39 A note on implementation DF up [c] = nodes in the dominance frontier of c that are not strictly dominated by the immediate dominator of c If we use bitvectors to represent these sets: DF[6] = DF[7] = DF[8] = SD[5] = DF up [c] = DF[6] ^ ~SD[5]

CMPUT Compiler Design and Optimization40 Dominance Frontier Inherited From Its Children The dominance frontier of a node n is formed by its local dominance frontier plus nodes that are passed up by the children of n in the dominator tree. Thus the dominance frontier of a node n is defined as [Cytron-Ferrante, 1991]:

CMPUT Compiler Design and Optimization41 Example: Local Dominance Frontier Control Flow Graph What is DF[5]? Remember that: DF local [5] =  DF up [6] = {4} DF up [7] = {12} DF up [8] = {5,13} DTchildren[5] = {6,7,8}

CMPUT Compiler Design and Optimization42 Example: Local Dominance Frontier Control Flow Graph What is DF[5]? Remember that: DF local [5] =  DF up [6] = {4} DF up [7] = {12} DF up [8] = {5,13} DTchildren[5] = {6,7,8} Thus, DF[5] = {4, 5, 12, 13}

CMPUT Compiler Design and Optimization43 Join Sets In order to insert  -nodes for a variable x that is defined in a set of nodes S={n 1, n 2, …, n k } we need to compute the iterated set of join nodes of S. Given a set of nodes S of a control flow graph G, the set of join nodes of S, J(S), is defined as follows: J(S) ={z  G|  two paths P xz and P yz in G that have z as its first common node, x  S and y  S}

CMPUT Compiler Design and Optimization44 Iterated Join Sets Because a  -node is itself a definition of a variable, once we insert  -nodes in the join set of S, we need to find out the join set of S  J(S). Thus, Cytron-Ferrante define the iterated join set of a set of nodes S, J + (S), as the limit of the sequence:

CMPUT Compiler Design and Optimization45 Iterated Dominance Frontier We can extend the concept of dominance frontier to define the dominance frontier of a set of nodes as: Now we can define the iterated dominance frontier, DF + (S), of a set of nodes S as the limit of the sequence: Exercise: Find an example in which the IDF of a set S is different from the DF of the set! Exercise: Find an example in which the IDF of a set S is different from the DF of the set!

CMPUT Compiler Design and Optimization46 Location of  -Nodes Given a variable x that is defined in a set of nodes S={n 1, n 2, …, n k } the set of nodes that must receive  -nodes for x is J + (S). An important result proved by Cytron-Ferrante is that: Thus we are mostly interested in computing the iterated dominance frontier of a set of nodes.

CMPUT Compiler Design and Optimization47 Algorithms to Compute Dominance Frontier The algorithm to insert  -nodes, due to Cytron and Ferrante (1991), computes the dominance frontier of each node in the set S before computing the iterated dominance frontier of the set. In 1994, Shreedar and Gao proposed a simple, linear algorithm for the insertion of  -nodes. In the worst case, the combination of the dominance frontier of the sets can be quadratic in the number of nodes in the CFG. Thus, Cytron-Ferrante’s algorithm has a complexity O(N 2 ).

CMPUT Compiler Design and Optimization48 Sreedhar and Gao’s DJ Graph Control Flow Graph Dominator Tree

CMPUT Compiler Design and Optimization49 Sreedhar and Gao’s DJ Graph Control Flow Graph Dominator Tree D nodes

CMPUT Compiler Design and Optimization50 Sreedhar and Gao’s DJ Graph Control Flow Graph Dominator Tree D nodes J nodes

CMPUT Compiler Design and Optimization51 Shreedar-Gao’s Dominance Frontier Algorithm DominanceFrontier(x) 0: DF[x] =  1: foreach y  SubTree(x) do 2: if((y  z == J-edge) and 3: (z.level  x.level)) 4: then DF[x] = DF[x]  z What is the DF[5]?

CMPUT Compiler Design and Optimization52 Shreedar-Gao’s Dominance Frontier Algorithm DominanceFrontier(x) 0: DF[x] =  1: foreach y  SubTree(x) do 2: if((y  z == J-edge) and 3: (z.level  x.level)) 4: then DF[x] = DF[x]  z SubTree(5) = {5, 6, 7, 8} Initialization: DF[5] = 

CMPUT Compiler Design and Optimization53 Shreedar-Gao’s Dominance Frontier Algorithm DominanceFrontier(x) 0: DF[x] =  1: foreach y  SubTree(x) do 2: if((y  z == J-edge) and 3: (z.level  x.level)) 4: then DF[x] = DF[x]  z SubTree(5) = {5, 6, 7, 8} There are three edges originating in 5: {5  6, 5  7, 5  8} but they are all D-edges Initialization: DF[5] = 

CMPUT Compiler Design and Optimization54 Shreedar-Gao’s Dominance Frontier Algorithm DominanceFrontier(x) 0: DF[x] =  1: foreach y  SubTree(x) do 2: if((y  z == J-edge) and 3: (z.level  x.level)) 4: then DF[x] = DF[x]  z SubTree(5) = {5, 6, 7, 8} There are two edges originating in 6: {6  4, 6  8} but 8.level > 5.level Initialization: DF[5] =  After visiting 6: DF = {4}

CMPUT Compiler Design and Optimization55 Shreedar-Gao’s Dominance Frontier Algorithm DominanceFrontier(x) 0: DF[x] =  1: foreach y  SubTree(x) do 2: if((y  z == J-edge) and 3: (z.level  x.level)) 4: then DF[x] = DF[x]  z SubTree(5) = {5, 6, 7, 8} There are two edges originating in 7: {7  8, 7  12} again 8.level > 5.level Initialization: DF[5] =  After visiting 6: DF = {4} After visiting 7: DF = {4,12}

CMPUT Compiler Design and Optimization56 Shreedar-Gao’s Dominance Frontier Algorithm DominanceFrontier(x) 0: DF[x] =  1: foreach y  SubTree(x) do 2: if((y  z == J-edge) and 3: (z.level  x.level)) 4: then DF[x] = DF[x]  z SubTree(5) = {5, 6, 7, 8} There are two edges originating in 8: {8  5, 8  13} both satisfy cond. in steps 2-3 Initialization: DF[5] =  After visiting 6: DF = {4} After visiting 7: DF = {4,12} After visiting 8: DF = {4, 12, 5, 13}

CMPUT Compiler Design and Optimization57 Shreedhar-Gao’s  -Node Insertion Algorithm Using the D-J graph, Shreedhar and Gao propose a linear time algorithm to compute the iterated dominance frontier of a set of nodes. An important intuition in Shreedhar-Gao’s algorithm is: If two nodes x and y are in S, and y is an ancestor of x in the dominator tree, then if we compute DF[x] first, we do not need to recompute DF[x] when computing DF[y].

CMPUT Compiler Design and Optimization58 Shreedhar-Gao’s  -Node Insertion Algorithm Shreedhar-Gao’s algorithm also use a work list of nodes hashed by their level in the dominator tree and a visited flag to avoid visiting the same node more than once. The basic operation of the algorithm is similar to their dominance-frontier algorithm, but it requires a careful implementation to deliver the linear-time complexity.

CMPUT Compiler Design and Optimization59 Dead-Code Elimination in SSA Form Because there is only one definition for each variable, if the list of uses of the variable is empty, the definition is dead. When a statement v  x  y is eliminated because v is dead, this statement must be removed from the list of uses of x and y. Which might cause those definitions to become dead. Thus we need to iterate the dead code elimination algorithm.

CMPUT Compiler Design and Optimization60 Simple Constant Propagation in SSA If there is a statement v  c, where c is a constant, then all uses of v can be replaced for c. A  function of the form v   (c 1, c 2, …, c n ) where all c i are identical can be replaced for v  c. Using a work-list algorithm in a program in SSA form, we can perform constant propagation in linear time In the next slide we assume that x, y, z are variables and a, b, c are constants.

CMPUT Compiler Design and Optimization61 Linear Time Optimizations in SSA form Copy propagation: The statement x   (y) or the statement x  y can be deleted and y can substitute every use of x. Constant folding: If we have the statement x  a  b, we can evaluate c  a  b at compile time and replace the statement for x  c Constant conditions: The conditional if a < b goto L1 else L2 can be replaced for goto L1 or goto L2, according to the compile time evaluation of a < b, and the CFG, use lists, adjust accordingly Unreachable Code: eliminate unreachable blocks.

CMPUT Compiler Design and Optimization62 Single Assignment Form i=1; j=1; k=0; while(k<100) { if(j<20) { j=i; k=k+1; } else { j=k; k=k+2; } return j; } i  1 j  1 k  0 j  i k  k+1 j  k k  k+2 return j if j<20 if k<100 B1 B2 B3 B5 B6 B4 B7

CMPUT Compiler Design and Optimization63 Single Assignment Form i=1; j=1; k=0; while(k<100) { if(j<20) { j=i; k=k+1; } else { j=k; k=k+2; } return j; } i  1 j  1 k1  0 j  i k3  k+1 j  k k5  k+2 return j if j<20 if k<100 B1 B2 B3 B5 B6 B4 B7

CMPUT Compiler Design and Optimization64 Single Assignment Form i=1; j=1; k=0; while(k<100) { if(j<20) { j=i; k=k+1; } else { j=k; k=k+2; } return j; } i  1 j  1 k1  0 j  i k3  k+1 j  k k5  k+2 return j if j<20 if k<100 k4   (k3,k5) B1 B2 B3 B5 B6 B4 B7

CMPUT Compiler Design and Optimization65 Single Assignment Form i=1; j=1; k=0; while(k<100) { if(j<20) { j=i; k=k+1; } else { j=k; k=k+2; } return j; } i  1 j  1 k1  0 j  i k3  k+1 j  k k5  k+2 return j if j<20 k2   (k4,k1) if k<100 k4   (k3,k5) B1 B2 B3 B5 B6 B4 B7

CMPUT Compiler Design and Optimization66 Single Assignment Form i=1; j=1; k=0; while(k<100) { if(j<20) { j=i; k=k+1; } else { j=k; k=k+2; } return j; } i  1 j  1 k1  0 j  i k3  k2+1 j  k k5  k2+2 return j if j<20 k2   (k4,k1) if k2<100 k4   (k3,k5) B1 B2 B3 B5 B6 B4 B7

CMPUT Compiler Design and Optimization67 Single Assignment Form i=1; j=1; k=0; while(k<100) { if(j<20) { j=i; k=k+1; } else { j=k; k=k+2; } return j; } i1  1 j1  1 k1  0 j3  i1 k3  k2+1 j5  k2 k5  k2+2 return j2 if j2<20 j2   (j4,j1) k2   (k4,k1) if k2<100 j4   (j3,j5) k4   (k3,k5) B1 B2 B3 B5 B6 B4 B7

CMPUT Compiler Design and Optimization68 Example: Constant Propagation i1  1 j1  1 k1  0 j3  i1 k3  k2+1 j5  k2 k5  k2+2 return j2 if j2<20 j2   (j4,j1) k2   (k4,k1) if k2<100 j4   (j3,j5) k4   (k3,k5) B1 B2 B3 B5 B6 B4 B7 i1  1 j1  1 k1  0 j3  1 k3  k2+1 j5  k2 k5  k2+2 return j2 if j2<20 j2   (j4,1) k2   (k4,0) if k2<100 j4   (j3,j5) k4   (k3,k5) B1 B2 B3 B5 B6 B4 B7

CMPUT Compiler Design and Optimization69 Example: Dead-code Elimination i1  1 j1  1 k1  0 j3  1 k3  k2+1 j5  k2 k5  k2+2 return j2 if j2<20 j2   (j4,1) k2   (k4,0) if k2<100 j4   (j3,j5) k4   (k3,k5) B1 B2 B3 B5 B6 B4 B7 j3  1 k3  k2+1 j5  k2 k5  k2+2 return j2 if j2<20 j2   (j4,1) k2   (k4,0) if k2<100 j4   (j3,j5) k4   (k3,k5) B2 B3 B5 B6 B4 B7

CMPUT Compiler Design and Optimization70 Constant Propagation and Dead Code Elimination j3  1 k3  k2+1 j5  k2 k5  k2+2 return j2 if j2<20 j2   (j4,1) k2   (k4,0) if k2<100 j4   (1,j5) k4   (k3,k5) B2 B3 B5 B6 B4 B7 j3  1 k3  k2+1 j5  k2 k5  k2+2 return j2 if j2<20 j2   (j4,1) k2   (k4,0) if k2<100 j4   (j3,j5) k4   (k3,k5) B2 B3 B5 B6 B4 B7

CMPUT Compiler Design and Optimization71 Example: Is this the end? But block 6 is never executed! How can we find this out, and simplify the program? SSA conditional constant propagation finds the least fixed point for the program and allows further elimination of dead code. See algorithm on pg of Appel. k3  k2+1j5  k2 k5  k2+2 return j2 if j2<20 j2   (j4,1) k2   (k4,0) if k2<100 j4   (1,j5) k4   (k3,k5) B2 B3 B5 B6 B4 B7

CMPUT Compiler Design and Optimization72 Example: Dead code elimination k3  k2+1j5  k2 k5  k2+2 return j2 if j2<20 j2   (j4,1) k2   (k4,0) if k2<100 j4   (1,j5) k4   (k3,k5) B2 B3 B5 B6 B4 B7 B4 k3  k2+1 return j2 j2   (j4,1) k2   (k4,0) if k2<100 j4   (1) k4   (k3) B2 B5 B7

CMPUT Compiler Design and Optimization73 Example: Single Argument  -Function Elimination k3  k2+1 return j2 j2   (j4,1) k2   (k4,0) if k2<100 j4   (1) k4   (k3) B2 B5 B7 B4 k3  k2+1 return j2 j2   (j4,1) k2   (k4,0) if k2<100 j4  1 k4  k3 B2 B5 B7 B4

CMPUT Compiler Design and Optimization74 Example: Constant and Copy Propagation k3  k2+1 return j2 j2   (j4,1) k2   (k4,0) if k2<100 j4  1 k4  k3 B2 B5 B7 k3  k2+1 return j2 j2   (1,1) k2   (k3,0) if k2<100 j4  1 k4  k3 B2 B5 B7 B4

CMPUT Compiler Design and Optimization75 Example: Dead Code Elimination k3  k2+1 return j2 j2   (1,1) k2   (k3,0) if k2<100 j4  1 k4  k3 B2 B5 B7 B4 k3  k2+1 return j2 j2   (1,1) k2   (k3,0) if k2<100 B2 B5 B4

CMPUT Compiler Design and Optimization76 Example:  -Function Simplification k3  k2+1 return j2 j2   (1,1) k2   (k3,0) if k2<100 B2 B5 B4 k3  k2+1 return j2 j2  1 k2   (k3,0) if k2<100 B2 B5 B4

CMPUT Compiler Design and Optimization77 Example: Constant Propagation k3  k2+1 return j2 j2  1 k2   (k3,0) if k2<100 B2 B5 B4 k3  k2+1 return 1 j2  1 k2   (k3,0) if k2<100 B2 B5 B4

CMPUT Compiler Design and Optimization78 Example: Dead Code Elimination k3  k2+1 return 1 j2  1 k2   (k3,0) if k2<100 B2 B5 B4 return 1 B4