Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Compiler Techniques LIU Xianhua School of EECS, Peking University Static Single Assignment.

Similar presentations


Presentation on theme: "Advanced Compiler Techniques LIU Xianhua School of EECS, Peking University Static Single Assignment."— Presentation transcript:

1 Advanced Compiler Techniques LIU Xianhua School of EECS, Peking University Static Single Assignment

2 “Advanced Compiler Techniques” Content  SSA Introduction  Converting to SSA  SSA Example  SSAPRE  Reading Tiger Book : 19.1, 19.3 Related Papers 2

3 “Advanced Compiler Techniques” Prelude  SSA ( Static Single - Assignment ): A program is said to be in SSA form iff Each variable is statically defined exactly only once, and each use of a variable is dominated by that variable ’ s definition. So, straight line code is in SSA form ? 3

4 “Advanced Compiler Techniques” Example  In general, how to transform an arbitrary program into SSA form?  Does the definitio n of X 2 dominates its use in the example? X  x x  X 3   (X 1, X 2 ) 1 2 4 4

5 “Advanced Compiler Techniques” SSA: Motivation  Provide a uniform basis of an IR to solve a wide range of classical dataflow problems  Encode both dataflow and control flow information  A SSA form can be constructed and maintained efficiently  Many SSA dataflow analysis algorithms are more efficient ( have lower complexity ) than their CFG counterparts. 5

6 Static Single-Assignment Form  Each variable has only one definition in the program text.  This single static definition can be in a loop and may be executed many times. Thus even in a program expressed in SSA, a variable can be dynamically defined many times. 6 “Advanced Compiler Techniques”

7 Advantages of SSA  Simpler dataflow analysis  No need to use use - def / def - use chains, which requires N  M space for N uses and M definitions  SSA form relates in a useful way with dominance structures.  Differentiate unrelated uses of the same variable E. g. loop induction variables 7

8 SSA Is Used In Modern Compilers 8 Interprocedural Analysis and Optimization Loop Nest Optimization and Parallelization Global (Scalar) Optimization Backend Code Generation Front end Good IR Middle-End “Advanced Compiler Techniques”

9 SSA Is Used In Modern Compilers 9 Interprocedural Analysis and Optimization Loop Nest Optimization and Parallelization Global (Scalar) Optimization Backend Code Generation Front end Good IR Middle-End “Advanced Compiler Techniques”

10 10 Def-Use Chain  A tuple connects 2 data - flow events is a chain Chains express data - flow relationships directly Chains provide a graphical representation Chains jump across unrelated code, simplifying search  We can build chains efficiently Four interesting types of chains Def - Use chains are the most common “Advanced Compiler Techniques” 10

11 11 Def-Use Chains Example “Advanced Compiler Techniques” 11

12 12 Building Def-Use Chains  To construct D ef - U se chains, we solve reaching definitions  A definition d of some variable v reaches an operation i if and only if i reads v and there is a v - clear path from d to i “Advanced Compiler Techniques” 12

13 “Advanced Compiler Techniques” SSA Form – An Example SSA - form  Each name is defined exactly once  Each use refers to exactly one name What ’ s hard  Straight - line code is trivial  Splits in the CFG are trivial  Joins in the CFG are hard Building SSA Form  Insert Ø - functions at birth points  Rename all values for uniqueness x  17 - 4 x  a + b x  y - z x  13 z  x * q s  w - x ? 13

14 Def-Use Chains x  17 - 4 x  a + b x  y - z x  13 z  x * q s  w - x Computes  of three values here Computes  of four values here Value is born here 17 - 4  y - z Value is born here 17 - 4  y - z  13 Value is born 17 - 4  y - z  13  a+b “Advanced Compiler Techniques” 14 Birth point s

15 Def-Use Chains Birth point s x  17 - 4 x  a + b x  y - z x  13 z  x * q s  w - x We should be able to compute the values that we need with fewer meet operations, if only we can find these birth points. Need to identify birth points Need to insert some artifact to force the evaluation to follow the birth points Enter Static Single Assignment form “Advanced Compiler Techniques” 15

16 Def-Use Chains Making Birth Points Explicit x 0  17 - 4 x 5  a + b x 1  y - z x 3  13 z  x 4 * q s  w - x 6 These are all birth points for values All birth points are join points Not all join points are birth points Birth points are value-specific … “Advanced Compiler Techniques” 16

17 Def-Use Chains Making Birth Points Explicit x 0  17 - 4 x 5  a + b x 1  y - z x 3  13 z  x 4 * q s  w - x 6 x 2  Ø(x 1,x 0 ) x 4  Ø(x 3,x 2 ) x 6  Ø(x 5,x 4 ) Each needs a definition to reconcile the values of x Insert a ø - function at each birth point Rename values so each name is defined once Now, each use refers to one definition  Static Single Assignment Form “Advanced Compiler Techniques” 17

18 “Advanced Compiler Techniques” Review SSA - form  Each name is defined exactly once  Each use refers to exactly one name What ’ s hard  Straight - line code is trivial  Splits in the CFG are trivial  Joins in the CFG are hard Building SSA Form  Insert Ø - functions at birth points  Rename all values for uniqueness A Ø-function is a special kind of copy that selects one of its parameters. The choice of parameter is governed by the CFG edge along which control reached the current block. Real machines do not implement a Ø-function directly in hardware.(not yet!) y 1 ...y 2 ... y 3  Ø(y 1,y 2 ) 18

19 Use-def Dependencies in Non-straight-line Code 19 a = a a a  Many uses to many defs Overhead in representation Hard to manage Complicated : M * N “Advanced Compiler Techniques”

20 Factoring Operator 20 Number of edges reduced from 9 to 6 A  is regarded as def ( its parameters are uses ) Many uses to 1 def Each def dominates all its uses a = a a a a =   a, a, a ) Factoring – when multiple edges cross a join point, create a common node that all edges must pass through “Advanced Compiler Techniques”

21 Rename to represent U-D edges 21 a2=a2= a4a4 a4a4 a3=a3= a4a4 a 1 = a4=a1,a2,a3)a4=a1,a2,a3) No longer necessary to represent the use - def edges explicitly “Advanced Compiler Techniques”

22 SSA Form in Control-Flow Path Merges 22 b  M[x] a  0 if b<4 a  b c  a + b B1 B2 B3 B4 Is this code in SSA form? No, two definitions of a at B 4 appear in the code ( in B 1 and B 3) How can we transform this code into a code in SSA form? We can create two versions of a, one for B 1 and another for B 3. “Advanced Compiler Techniques”

23 SSA Form in Control-Flow Path Merges 23 b  M[x] a1  0 if b<4 a2  b c  a? + b B1 B2 B3 B4 But which version should we use in B 4 now? We define a fictional function that “ knows ” which control path was taken to reach the basic block B 4: = B3B3 from B4B4 at arrive we if a2a2 B2B2 from B4B4 at arrive we if a1a1 a2a1,  ( ) “Advanced Compiler Techniques”

24 SSA Form in Control-Flow Path Merges 24 b  M[x] a1  0 if b<4 a2  b a3   (a2,a1) c  a3 + b B1 B2 B3 B4 “Advanced Compiler Techniques” But which version should we use in B 4 now? We define a fictional function that “ knows ” which control path was taken to reach the basic block B 4: = B3B3 from B4B4 at arrive we if a2a2 B2B2 from B4B4 at arrive we if a1a1 a2a1,  ( )

25 A Loop Example X  1 X  X + 1 B1 B2 X 0  1 X 1   (X 2, X 0 ) X 2  X 1 + 1 B1 B2 Before SSA After SSA “Advanced Compiler Techniques” 25

26 A Loop Example 26 a  0 b  a+1 c  c+b a  b*2 if a < N return a1  0 a3   (a1,a2) b1   (b0,b2) c2   (c0,c1) b2  a3+1 c1  c2+b2 a2  b2*2 if a2 < N return  ( b 0, b 2) is not necessary because b 0 is never used. But the phase that generates  functions does not know it. Unnecessary functions are eliminated by dead code elimination. Note : only a, c are first used in the loop body before it is redefined. For b, it is redefined right at the beginning. “Advanced Compiler Techniques”

27 The  function 27 How can we implement a  function that “ knows ” which control path was taken? Answer 1: We don ’ t !! The  function is used only to connect use to definitions during optimization, but is never implemented. Answer 2: If we must execute the  function, we can implement it by inserting MOVE instructions in all control paths. “Advanced Compiler Techniques”

28 A naive method  Simply introduce a phi - function at each “ join ” point in CFG  But, we already pointed out that this is inefficient – too many useless phi - functions may be introduced !  What is a good algorithm to introduce only the right number of phi - functions ? 28

29 Criteria For Inserting  Functions 29 We could insert one  function for each variable at every join point ( a point in the CFG with more than one predecessor ). But that would be wasteful. What should be our criteria to insert a  function for a variable a at node z of the CFG? Intuitively, we should add a function  if there are two definitions of a that can reach the point z through distinct paths. “Advanced Compiler Techniques”

30 Path Convergence Criterion 30 Insert a  function for a variable a at a node z if all the following conditions are true : 1. There is a block x that defines a 2. There is a block y  x that defines a 3. There is a non - empty path Pxz from x to z 4. There is a non - empty path Pyz from y to z 5. Paths Pxz and Pyz don ’ t have any nodes in common other than z 6. The node z does not appear within both Pxz and Pyz prior to the end, but it might appear in one or the other. The start node contains an implicit definition of every variable. “Advanced Compiler Techniques”

31 Iterated Path-Convergence Criterion 31 The  function itself is a definition of a. Therefore the path - convergence criterion is a set of equations that must be satisfied. while there are nodes x, y, z satisfying conditions 1-6 and z does not contain a  function for a do insert a   ( a, a, …, a ) at node z This algorithm is extremely costly, because it requires the examination of every triple of nodes x, y, z and every path leading from x to y. Can we do better? “Advanced Compiler Techniques”

32 Concept of dominance Frontiers 32 X  Block s domin ated by bb 1 bb1 bbn Border between dorm and not - dorm ( Dominanc e Frontier ) An Intuitive View “Advanced Compiler Techniques”

33 Dominance Frontier  The dominance frontier DF ( x ) of a node x is the set of all node z such that x dominates a predecessor of z, without strictly dominating z.  Recall : if x dominates y and x ≠ y, then x strictly dominates y  Intuitively : The fringe just beyond the region X dominates 33 DF ( X ) = { Y | ( ∃ P ∈ PRED ( Y ): X Dom P ) and X SDom Y }

34 “Advanced Compiler Techniques” Dominance Relation  If X appears on every path from ENTRY to Y, then X dominates Y.  Dominance relation is both reflexive and transitive.  idom ( Y ): immediate dominator of Y  Dominator Tree ENTRY is the root Any node Y other than ENTRY has idom ( Y ) as its parent Notation : parent, child, ancestor, descendant 34

35 “Advanced Compiler Techniques” Dominator Tree Example ENTRY a b c d EXIT ENTRY a d EXIT b c CFG Dominator Tree 35

36 “Advanced Compiler Techniques” Dominator Tree Example 36

37 Calculate The Dominance Frontier 37 1 2 9 13 3 4 67 1011 12 8 5 How to Determine the Dominance Frontier of Node 5 ? An Intuitive Way 1. Determine the dominance region of node 5: 2. Determine the targets of edges crossing from the dominance region of node 5 {5, 6, 7, 8} These targets are the dominance frontier of node 5 DF (5) = { 4, 5, 12, 13} NOTE : node 5 is in DF (5) in this case – why ? “Advanced Compiler Techniques”

38 Algorithm: Converting to SSA  Big picture, translation to SSA form is done in 3 steps The dominance frontier mapping is constructed form the control flow graph. Using the dominance frontiers, the locations of the  - functions for each variable in the original program are determined. The variables are renamed by replacing each mention of an original variable V by an appropriate mention of a new variable V i 38

39 39 SSA Construction Algorithm 1. Insert  - functions a.) calculate dominance frontiers b.) find global names for each name, build a list of blocks that define it c.) insert  - functions  global name n  block b in which n is assigned  block d in b ’ s dominance frontier insert a  - function for n in d add d to n ’ s list of defining blocks { Creates the iterated dominance frontier This adds to the worklist ! Use a checklist to avoid putting blocks on the worklist twice; keep another checklist to avoid inserting the same  -function twice. Compute list of blocks where each name is assigned & use as a worklist Moderately complex “Advanced Compiler Techniques” 39

40 40 SSA Construction Algorithm 2. Rename variables in a pre - order walk over dominator tree ( use an array of stacks, one stack per global name ) Staring with the root block, b a.) generate unique names for each  - function and push them on the appropriate stacks b.) rewrite each operation in the block i. Rewrite uses of global names with the current version ( from the stack ) ii. Rewrite definition by inventing & pushing new name c.) fill in  - function parameters of successor blocks d.) recurse on b ’ s children in the dominator tree e.) pop names generated in b from stacks “Advanced Compiler Techniques” 40

41 41 SSA Construction Algorithm  Computing Dominance  First step in  - function insertion computes dominance  Recall that n dominates m iff n is on every path from n 0 to m Every node dominates itself n ’ s immediate dominator is its closest dominator, IDOM ( n ) DOM ( n 0 ) = { n 0 } DOM ( n ) = { n }  (  p  preds ( n ) DOM ( p )) Initially, DOM(n) = N,  n≠n 0 “Advanced Compiler Techniques” 41

42 42 Why Dominance Frontiers  Dominance frontier criterion : if node x contains def of a, then any node z in DF ( x ) needs a  function for a  intuition : at least two non - intersecting paths converge to z, and one path must contain node strictly dominated by x “Advanced Compiler Techniques” 42

43 43 Dominance Frontiers (visually) “Advanced Compiler Techniques” 43

44 Dominance Frontiers 44 nB0B1B2B3B4B5B6B7 DOM(n)00,10,1,20,1,30,1,3,40,1,3,50,1,3,60,1,7 IDOM(n)--0113331 DF(n)--1776671 For each join point x For each CFG pred of x Run to ID OM (x ) in dom tree, add x to DF(n) for each n between x and ID OM (x ) B1B1 B2B2 B3B3 B4B4 B5B5 B6B6 B7B7 B0B0 B1B1 B2B2 B3B3 B4B4 B5B5 B6B6 B7B7 B0B0 Flow Graph Dominance Tree “Advanced Compiler Techniques”

45 Dominance Frontiers &  -Function Insertion 45 Dominance Frontiers A definition at n forces a  -function at m iff n  D OM (m) but n  D OM (p) for some p  preds(m) DF(n ) is fringe just beyond region n dominates B1B1 B2B2 B3B3 B4B4 B5B5 B6B6 B7B7 B0B0  in 1 forces  -function in DF(1) = {1} (halt ) x ... x   (...) DF(4) is {6}, so  in 4 forces  -function in 6 x   (...)  in 6 forces  -function in DF(6) = {7} x   (...)  in 7 forces  -function in DF(7) = {1} For each assignment, we insert the  -functions “Advanced Compiler Techniques”

46 Ф-function and Dominance Frontier  Intuition behind dominance frontier Y ∈ DF ( X ) means :  Y has multiple predecessors  X dominate one of them, say U, U inherits everything defined in X  Reaching definition of Y are from U and other predecessors  So Y is exactly the place where Ф - function is needed 46

47 a   (a,a) b   (b,b) c   (c,c) d   (d,d) y  a+b z  c+d i  i+1 B7B7 i > 100 i ... B0B0 b ... c ... d ... B2B2 a   (a,a) b   (b,b) c   ( c,c) d   (d,d) i   (i,i) a ... c ... B1B1 a ... d ... B3B3 B4B4 c ... B5B5 d   (d,d) c   (c,c) b ... B6B6 i > 100 With all the  -functions Lots of new ops Renaming is next Assume a, b, c, & d defined before B 0 Excluding local names avoids  ’s for y & z “Advanced Compiler Techniques” 47

48 48 SSA Construction Algorithm (reminder) 2. Rename variables in a pre - order walk over dominator tree ( use an array of stacks, one stack per global name ) Staring with the root block, b a.) generate unique names for each  - function and push them on the appropriate stacks b.) rewrite each operation in the block i. Rewrite uses of global names with the current version ( from the stack ) ii. Rewrite definition by inventing & pushing new name c.) fill in  - function parameters of successor blocks d.) recurse on b ’ s children in the dominator tree e.) pop names generated in b from stacks “Advanced Compiler Techniques” 48

49 a   (a,a) b   (b,b) c   (c,c) d   (d,d) y  a+b z  c+d i  i+1 B7B7 i > 100 i ... B0B0 b ... c ... d ... B2B2 a   (a,a) b   (b,b) c   (c,c) d   (d,d) i   (i,i) a ... c ... B1B1 a ... d ... B3B3 B4B4 c ... B5B5 d   (d,d) c   (c,c) b ... B6B6 i > 100 Counters Stacks 11110 a a0a0 b0b0 c0c0 d0d0 Before processing B 0 bcdi Assume a, b, c, & d defined before B 0 i has not been defined “Advanced Compiler Techniques” 49

50 a   (a,a) b   (b,b) c   (c,c) d   (d,d) y  a+b z  c+d i  i+1 B7B7 i > 100 i 0 ... B0B0 b ... c ... d ... B2B2 a   (a 0,a) b   (b 0,b) c   (c 0,c) d   (d 0,d) i   (i 0,i) a ... c ... B1B1 a ... d ... B3B3 B4B4 c ... B5B5 d   (d,d) c   (c,c) b ... B6B6 i > 100 Counters Stacks 11111 abcdi a0a0 b0b0 c0c0 d0d0 End of B 0 i0i0 “Advanced Compiler Techniques” 50

51 a   (a,a) b   (b,b) c   (c,c) d   (d,d) y  a+b z  c+d i  i+1 B7B7 i > 100 i 0 ... B0B0 b ... c ... d ... B2B2 a 1   (a 0,a) b 1   (b 0,b) c 1   (c 0,c) d 1   (d 0,d) i 1   (i 0,i) a 2 ... c 2 ... B1B1 a ... d ... B3B3 B4B4 c ... B5B5 d   (d,d) c   (c,c) b ... B6B6 i > 100 Counters Stacks 32322 abcdi a0a0 b0b0 c0c0 d0d0 End of B 1 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 “Advanced Compiler Techniques” 51

52 a   (a 2,a) b   (b 2,b) c   (c 3,c) d   (d 2,d) y  a+b z  c+d i  i+1 B7B7 i > 100 i 0 ... B0B0 b 2 ... c 3 ... d 2 ... B2B2 a 1   (a 0,a) b 1   (b 0,b) c 1   (c 0,c) d 1   (d 0,d) i 1   (i 0,i) a 2 ... c 2 ... B1B1 a ... d ... B3B3 B4B4 c ... B5B5 d   (d,d) c   (c,c) b ... B6B6 i > 100 Counters Stacks 33432 abcdi a0a0 b0b0 c0c0 d0d0 End of B 2 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 b2b2 d2d2 c3c3 “Advanced Compiler Techniques” 52

53 a   (a 2,a) b   (b 2,b) c   (c 3,c) d   (d 2,d) y  a+b z  c+d i  i+1 B7B7 i > 100 i 0 ... B0B0 b 2 ... c 3 ... d 2 ... B2B2 a 1   (a 0,a) b 1   (b 0,b) c 1   (c 0,c) d 1   (d 0,d) i 1   (i 0,i) a 2 ... c 2 ... B1B1 a ... d ... B3B3 B4B4 c ... B5B5 d   (d,d) c   (c,c) b ... B6B6 i > 100 i ≤ 100 Counters Stacks 33432 abcdi a0a0 b0b0 c0c0 d0d0 Before starting B 3 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 “Advanced Compiler Techniques” 53

54 a   (a 2,a) b   (b 2,b) c   (c 3,c) d   (d 2,d) y  a+b z  c+d i  i+1 B7B7 i > 100 i 0 ... B0B0 b 2 ... c 3 ... d 2 ... B2B2 a 1   (a 0,a) b 1   (b 0,b) c 1   (c 0,c) d 1   (d 0,d) i 1   (i 0,i) a 2 ... c 2 ... B1B1 a 3 ... d 3 ... B3B3 d ... B4B4 c ... B5B5 d   (d,d) c   (c,c) b ... B6B6 i > 100 Counters Stacks 43442 abcdi a0a0 b0b0 c0c0 d0d0 End of B 3 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 a3a3 d3d3 “Advanced Compiler Techniques” 54

55 a   (a 2,a) b   (b 2,b) c   (c 3,c) d   (d 2,d) y  a+b z  c+d i  i+1 B7B7 i > 100 i 0 ... B0B0 b 2 ... c 3 ... d 2 ... B2B2 a 1   (a 0,a) b 1   (b 0,b) c 1   (c 0,c) d 1   (d 0,d) i 1   (i 0,i) a 2 ... c 2 ... B1B1 a 3 ... d 3 ... B3B3 d 4 ... B4B4 c ... B5B5 d   (d 4,d) c   (c 2,c) b ... B6B6 i > 100 Counters Stacks 43452 abcdi a0a0 b0b0 c0c0 d0d0 End of B 4 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 a3a3 d3d3 d4d4 “Advanced Compiler Techniques” 55

56 a   (a 2,a) b   (b 2,b) c   (c 3,c) d   (d 2,d) y  a+b z  c+d i  i+1 B7B7 i > 100 i 0 ... B0B0 b 2 ... c 3 ... d 2 ... B2B2 a 1   (a 0,a) b 1   (b 0,b) c 1   (c 0,c) d 1   (d 0,d) i 1   (i 0,i) a 2 ... c 2 ... B1B1 a 3 ... d 3 ... B3B3 d 4 ... B4B4 c 4 ... B5B5 d   (d 4,d 3 ) c   (c 2,c 4 ) b ... B6B6 i > 100 Counters Stacks 43552 abcdi a0a0 b0b0 c0c0 d0d0 End of B 5 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 a3a3 d3d3 c4c4 “Advanced Compiler Techniques” 56

57 a   (a 2,a 3 ) b   (b 2,b 3 ) c   (c 3,c 5 ) d   (d 2,d 5 ) y  a+b z  c+d i  i+1 B7B7 i > 100 i 0 ... B0B0 b 2 ... c 3 ... d 2 ... B2B2 a 1   (a 0,a) b 1   (b 0,b) c 1   (c 0,c) d 1   (d 0,d) i 1   (i 0,i) a 2 ... c 2 ... B1B1 a 3 ... d 3 ... B3B3 d 4 ... B4B4 c 4 ... B5B5 d 5   (d 4,d 3 ) c 5   (c 2,c 4 ) b 3 ... B6B6 i > 100 Counters Stacks 44662 abcdi a0a0 b0b0 c0c0 d0d0 End of B 6 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 a3a3 d3d3 c5c5 d5d5 b3b3 “Advanced Compiler Techniques” 57

58 a   (a 2,a 3 ) b   (b 2,b 3 ) c   (c 3,c 5 ) d   (d 2,d 5 ) y  a+b z  c+d i  i+1 B7B7 i > 100 i 0 ... B0B0 b 2 ... c 3 ... d 2 ... B2B2 a 1   (a 0,a) b 1   (b 0,b) c 1   (c 0,c) d 1   (d 0,d) i 1   (i 0,i) a 2 ... c 2 ... B1B1 a 3 ... d 3 ... B3B3 d 4 ... B4B4 c 4 ... B5B5 d 5   (d 4,d 3 ) c 5   (c 2,c 4 ) b 3 ... B6B6 i > 100 Counters Stacks 44662 abcdi a0a0 b0b0 c0c0 d0d0 Before B 7 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 “Advanced Compiler Techniques” 58

59 a 4   (a 2,a 3 ) b 4   (b 2,b 3 ) c 6   (c 3,c 5 ) d 6   (d 2,d 5 ) y  a 4 +b 4 z  c 6 +d 6 i 2  i 1 +1 B7B7 i > 100 i 0 ... B0B0 b 2 ... c 3 ... d 2 ... B2B2 a 1   (a 0,a 4 ) b 1   (b 0,b 4 ) c 1   (c 0,c 6 ) d 1   (d 0,d 6 ) i 1   (i 0,i 2 ) a 2 ... c 2 ... B1B1 a 3 ... d 3 ... B3B3 d 4 ... B4B4 c 4 ... B5B5 d 5   (d 4,d 3 ) c 5   (c 2,c 4 ) b 3 ... B6B6 i > 100 Counters Stacks 55773 abcdi a0a0 b0b0 c0c0 d0d0 End of B 7 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 a4a4 b4b4 c6c6 d6d6 i2i2 “Advanced Compiler Techniques” 59

60 a 4   (a 2,a 3 ) b 4   (b 2,b 3 ) c 6   (c 3,c 5 ) d 6   (d 2,d 5 ) y  a 4 +b 4 z  c 6 +d 6 i 2  i 1 +1 B7B7 i > 100 i 0 ... B0B0 b 2 ... c 3 ... d 2 ... B2B2 a 1   (a 0,a 4 ) b 1   (b 0,b 4 ) c 1   (c 0,c 6 ) d 1   (d 0,d 6 ) i 1   (i 0,i 2 ) a 2 ... c 2 ... B1B1 a 3 ... d 3 ... B3B3 d 4 ... B4B4 c 4 ... B5B5 d 5   (d 4,d 3 ) c 5   (c 2,c 4 ) b 3 ... B6B6 i > 100 After renaming Semi-pruned SSA form We’re done … Semi-pruned  only names live in 2 or more blocks are “global names”. “Advanced Compiler Techniques” 60

61 61 SSA Construction  What ’ s this “ pruned SSA ” stuff? Minimal SSA still contains extraneous  - functions Inserts some  - functions where they are dead Would like to avoid inserting them Two ideas  Semi - pruned SSA : discard names used in only one block Significant reduction in total number of  - functions Needs only local Live information ( cheap to compute )  Pruned SSA : only insert  - functions where value is live Inserts even fewer  - functions, but costs more to do Requires global Live variable analysis ( more expensive ) In practice, both are simple modifications to step 1. “Advanced Compiler Techniques” 61

62 “Advanced Compiler Techniques” Converting Out of SSA  Eventually, a program must be executed.  The Ф - function have precise semantics, but they are generally not represented in existing target machines. X 17  (x 10,x 11 )...  x 17......  x 17 X 17  x 10 X 17  x 11 62

63 “Advanced Compiler Techniques” Converting Out of SSA  Naively, a k - input Ф - function at entrance to node X can be replaced by k ordinary assignments, one at the end of each control predecessor of X.  Inefficient object code can be generated. 63

64 “Advanced Compiler Techniques” Dead Code Elimination  Where does dead code come from? Assignments without any use  Dead code elimination method Initially all statements are marked dead Some statements need to be marked live because of certain conditions Mark these statements can cause others to be marked live. After worklist is empty, truly dead code can be removed 64

65 “Advanced Compiler Techniques” SSA Example  Dead Code Elimination Intuition Because there is only one definition for each variable, if the list of uses of the variable is empty, the definition is dead. When a statement v  x  y is eliminated because v is dead, this statement must be removed from the list of uses of x and y. This might cause those definitions to become dead. 65

66 “Advanced Compiler Techniques” SSA Example  Simple Constant Propagation Intuition If there is a statement v  c, where c is a constant, then all uses of v can be replaced for c. A  function of the form v   ( c 1, c 2, …, cn ) where all ci are identical can be replaced for v  c. Using a work list algorithm in a program in SSA form, we can perform constant propagation in linear time. 66

67 Problems with Use-Def Chains 67 What happens if we statically know direction of branch? Do no need to propagate information along that path Easy to do with CFGs U - D chains Hard to tell which definitions to ignore “Advanced Compiler Techniques”

68 Use-Def with SSA 68 SSA form shortens u - d chains Chains terminate at merge points, rather than crossing them Can simply ignore information merged from un - taken branches Much easier to account for irrelevant information “Advanced Compiler Techniques”

69 69 VN Example Review With VNs a 3  x 1 + y 2  b 3  x 1 + y 2 a 4  17 4  c 3  x 1 + y 2 Rewritten a 0 3  x 0 1 + y 0 2  b 0 3  a 0 3 a 1 4  17 4  c 0 3  a 0 3 Give each value a unique name No value is ever killed These are SSA names ( static single - assignment ) Original Code a  x + y  b  x + y a  17  c  x + y “Advanced Compiler Techniques”

70 70 Example X =2 Y =3 X =3 Y =2 Z=X+YZ=X+Y {,, … } {,,, … } “Advanced Compiler Techniques”

71 71 Constant Propagation w/ SSA  For statements x i := C, for some constant C, replace all x i with C  For x i :=  ( C, C,..., C ), for some constant C, replace statement with x i := C  Iterate “Advanced Compiler Techniques”

72 72 Example: SSA a := 3 d := 2 f := a + d g := 5 a := g – d f < = g f := g + 1 g < a d := 2 T F T F a1 := 3 d1 := 2 d3 =  (d2,d1) a3 =  (a2,a1) f1 := a3 + d3 g1 := 5 a2 := g1 – d3 f1 <= g1 f2 := g1 + 1 g1 < a2 f3 :=  (f2,f1) d2 := 2 TF T F “Advanced Compiler Techniques”

73 73 a1 := 3 d1 := 2 d3 =  (d2,d1) a3 =  (a2,a1) f1 := a3 + d3 g1 := 5 a2 := g1 – d3 f1 <= g1 f2 := g1 + 1 g1 < a2 f3 :=  (f2,f1) d2 := 2 T F Example: SSA T F a1 := 3 d1 := 2 d3 =  (2,2) a3 =  (a2,3) f1 := a3 + d3 g1 := 5 a2 := 5 – d3 f1 <= 5 f2 := 5 + 1 5 < a2 f3 :=  (f2,f1) d2 := 2 T F T F “Advanced Compiler Techniques”

74 74 Example: SSA a1 := 3 d1 := 2 d3 =  a3 =  (a2,3) f1 := a3 + 2 g1 := 5 a2 := 5 – 2 f1 <= 5 f2 := 6 5 < a2 f3 :=  (6,f1) d2 := 2 T F T F a1 := 3 d1 := 2 d3 =  (2,2) a3 =  (a2,3) f1 := a3 + d3 g1 := 5 a2 := 5 – d3 f1 <= 5 f2 := 5 + 1 5 < a2 f3 :=  (f2,f1) d2 := 2 T F T F This may continue for a few steps... “Advanced Compiler Techniques”

75 Loop Invariants 75 i = 1 i <= 100 t1 = n + 2 k = i * t1 j = i j <= 100 t2 = 100*n t3 = 10 * k t4 = t2 + t3 t5 = t4 + j j = j + 1 i = i + 1 do i = 1, 100 k = i * (n+2) do j = i, 100 a[i,j] = 100 * n + 10*k + j end T F F T Inner loop Outer loop “Advanced Compiler Techniques”

76 Loop Invariants: SSA 76 i 1 = 1 i 2 =  (i 1,i 3 ) i 2 <= 100 t1 0 = n 1 + 2 k 1 = i 2 * t1 0 j 1 = i 2 j 2 =  (j 1,j 3 ) j 2 <= 100 t2 0 = 100*n 1 t3 0 = 10 * k 1 t4 0 = t2 0 + t3 0 t5 0 = t4 0 + j 2 j 3 = j 2 + 1 i 3 = i 2 + 1 do i = 1, 100 k = i * (n+2) do j = i, 100 a[i,j] = 100 * n + 10*k + j end T F F T all operands are invariant invariant Inner loop Outer loop “Advanced Compiler Techniques”

77 Code Motion Example 77 i 1 = 1 i 2 =  (i 1,i 3 ) i 2 <= 100 t1 0 = n 1 + 2 k 1 = i 2 * t1 0 j 1 = i 2 j 2 =  (j 1,j 3 ) j 2 <= 100 t2 0 = 100*n 1 t3 0 = 10 * k 1 t4 0 = t2 0 + t3 0 t5 0 = t4 0 + j 2 j 3 = j 2 + 1 i 3 = i 2 + 1 T F F T Assuming t1 not live outside the loop-nest, this stmt is invariant and all three conditions met i 1 = 1 t1 0 = n 1 + 2 i 2 =  (i 1,i 3 ) i 2 <= 100 k 1 = i 2 * t1 0 j 1 = i 2 j 2 =  (j 1,j 3 ) j 2 <= 100 t2 0 = 100*n 1 t3 0 = 10 * k 1 t4 0 = t2 0 + t3 0 t5 0 = t4 0 + j 2 j 3 = j 2 + 1 i 3 = i 2 + 1 T F F T “Advanced Compiler Techniques”

78 Code Motion Example 78 i 1 = 1 t1 0 = n 1 + 2 i 2 =  (i 1,i 3 ) i 2 <= 100 k 1 = i 2 * t1 0 j 1 = i 2 j 2 =  (j 1,j 3 ) j 2 <= 100 t2 0 = 100*n 1 t3 0 = 10 * k 1 t4 0 = t2 0 + t3 0 t5 0 = t4 0 + j 2 j 3 = j 2 + 1 i 3 = i 2 + 1 T F F T invariant and all conditions met, assuming t2, t3, t4 not live outside the loop-nest i 1 = 1 t1 0 = n 1 + 2 i 2 =  (i 1,i 3 ) i 2 <= 100 k 1 = i 2 * t1 0 j 1 = i 2 t2 0 = 100*n 1 t3 0 = 10*k 1 t4 0 = t2 0 + t3 0 j 2 =  (j 1,j 3 ) j 2 <= 100 t5 0 = t4 0 + j 2 j 3 = j 2 + 1 i 3 = i 2 + 1 T F F T “Advanced Compiler Techniques”

79 Code Motion Example 79 i 1 = 1 t1 0 = n 1 + 2 i 2 =  (i 1,i 3 ) i 2 <= 100 k 1 = i 2 * t1 j 1 = i 2 t2 0 = 100*n 1 t3 0 = 10 * k 1 t4 0 = t2 0 + t3 0 j 2 =  (j 1,j 3 ) j 2 <= 100 t5 0 = t4 0 + j 2 j 3 = j 2 + 1 i 3 = i 2 + 1 T F F T invariant and all conditions met i 1 = 1 t1 0 = n 1 + 2 t2 0 = 100*n 1 i 2 =  (i 1,i 3 ) i 2 <= 100 k 1 = i 2 * t1 j 1 = i 2 t3 0 = 10 * k 1 t4 0 = t2 0 + t3 0 j 2 =  (j 1,j 3 ) j 2 <= 100 t5 0 = t4 0 + j 2 j 3 = j 2 + 1 i 3 = i 2 + 1 T F F T “Advanced Compiler Techniques”

80 SSA Example i=1; j=1; k=0; while(k<100) { if(j<20) { j=i; k=k+1; } else { j=k; k=k+2; } return j; } i  1 j  1 k  0 j  i k  k+1 j  k k  k+2 return j if j<20 if k<100 B1 B2 B3 B5 B6 B4 B7 80

81 “Advanced Compiler Techniques” SSA Example i=1; j=1; k=0; while(k<100) { if(j<20) { j=i; k=k+1; } else { j=k; k=k+2; } return j; } i  1 j  1 k1  0 j  i k3  k+1 j  k k5  k+2 return j if j<20 if k<100 B1 B2 B3 B5 B6 B4 B7 81

82 “Advanced Compiler Techniques” SSA Example i=1; j=1; k=0; while(k<100) { if(j<20) { j=i; k=k+1; } else { j=k; k=k+2; } return j; } i  1 j  1 k1  0 j  i k3  k+1 j  k k5  k+2 return j if j<20 if k<100 k4   (k3,k5) B1 B2 B3 B5 B6 B4 B7

83 “Advanced Compiler Techniques” SSA Example i=1; j=1; k=0; while(k<100) { if(j<20) { j=i; k=k+1; } else { j=k; k=k+2; } return j; } i  1 j  1 k1  0 j  i k3  k+1 j  k k5  k+2 return j if j<20 k2   (k4,k1) if k<100 k4   (k3,k5) B1 B2 B3 B5 B6 B4 B7

84 “Advanced Compiler Techniques” SSA Example i=1; j=1; k=0; while(k<100) { if(j<20) { j=i; k=k+1; } else { j=k; k=k+2; } return j; } i  1 j  1 k1  0 j  i k3  k2+1 j  k k5  k2+2 return j if j<20 k2   (k4,k1) if k2<100 k4   (k3,k5) B1 B2 B3 B5 B6 B4 B7 84

85 “Advanced Compiler Techniques” SSA Example i=1; j=1; k=0; while(k<100) { if(j<20) { j=i; k=k+1; } else { j=k; k=k+2; } return j; } i1  1 j1  1 k1  0 j3  i1 k3  k2+1 j5  k2 k5  k2+2 return j2 if j2<20 j2   (j4,j1) k2   (k4,k1) if k2<100 j4   (j3,j5) k4   (k3,k5) B1 B2 B3 B5 B6 B4 B7 85

86 “Advanced Compiler Techniques” SSA Example: Const. Propagation i1  1 j1  1 k1  0 j3  i1 k3  k2+1 j5  k2 k5  k2+2 return j2 if j2<20 j2   (j4,j1) k2   (k4,k1) if k2<100 j4   (j3,j5) k4   (k3,k5) B1 B2 B3 B5 B6 B4 B7 i1  1 j1  1 k1  0 j3  1 k3  k2+1 j5  k2 k5  k2+2 return j2 if j2<20 j2   (j4, 1) k2   (k4,0) if k2<100 j4   (j3,j5) k4   (k3,k5) B1 B2 B3 B5 B6 B4 B7 86

87 “Advanced Compiler Techniques” SSA Example: Dead Code Elimination j3  1 k3  k2+1 j5  k2 k5  k2+2 return j2 if j2<20 j2   (j4,1) k2   (k4,0) if k2<100 j4   (j3,j5) k4   (k3,k5) B2 B3 B5 B6 B4 B7 i1  1 j1  1 k1  0 j3  1 k3  k2+1 j5  k2 k5  k2+2 return j2 if j2<20 j2   (j4, 1) k2   (k4,0) if k2<100 j4   (j3,j5) k4   (k3,k5) B1 B2 B3 B5 B6 B4 B7 87

88 “Advanced Compiler Techniques” SSA Example: Constant Propagation and Dead Code Elimination j3  1 k3  k2+1 j5  k2 k5  k2+2 return j2 if j2<20 j2   (j4,1) k2   (k4,0) if k2<100 j4   (1,j5) k4   (k3,k5) B2 B3 B5 B6 B4 B7 j3  1 k3  k2+1 j5  k2 k5  k2+2 return j2 if j2<20 j2   (j4,1) k2   (k4,0) if k2<100 j4   (j3,j5) k4   (k3,k5) B2 B3 B5 B6 B4 B7 88

89 “Advanced Compiler Techniques” SSA Example: One Step Further But block 6 is never executed ! How can we find this out, and simplify the program? SSA conditional constant propagation finds the least fixed point for the program and allows further elimination of dead code. k3  k2+1j5  k2 k5  k2+2 return j2 if j2<20 j2   (j4,1) k2   (k4,0) if k2<100 j4   (1,j5) k4   (k3,k5) B2 B3 B5 B6 B4 B7 89

90 “Advanced Compiler Techniques” SSA Example: Dead Code Elimination k3  k2+1j5  k2 k5  k2+2 return j2 if j2<20 j2   (j4,1) k2   (k4,0) if k2<100 j4   (1,j5) k4   (k3,k5) B2 B3 B6 B4 B7 B4 k3  k2+1 return j2 j2   (j4,1) k2   (k4,0) if k2<100 j4   (1) k4   (k3) B2 B5 B7 90

91 “Advanced Compiler Techniques” SSA Example: Single Argument  Function Elimination B4 k3  k2+1 return j2 j2   (j4,1) k2   (k4,0) if k2<100 j4  1 k4  k3 B2 B5 B7 B4 k3  k2+1 return j2 j2   (j4,1) k2   (k4,0) if k2<100 j4   (1) k4   (k3) B2 B5 B7 91

92 “Advanced Compiler Techniques” SSA Example: Constant and Copy Propagation k3  k2+1 return j2 j2   (j4,1) k2   (k4,0) if k2<100 j4  1 k4  k3 B2 B5 B7 k3  k2+1 return j2 j2   (1,1) k2   (k3,0) if k2<100 j4  1 k4  k3 B2 B5 B7 B4 92

93 “Advanced Compiler Techniques” SSA Example: More Dead Code k3  k2+1 return j2 j2   (1,1) k2   (k3,0) if k2<100 j4  1 k4  k3 B2 B5 B7 B4 k3  k2+1 return j2 j2   (1,1) k2   (k3,0) if k2<100 B2 B5 B4 93

94 “Advanced Compiler Techniques” SSA Example: More  Function Simplification k3  k2+1 return j2 j2   (1,1) k2   (k3,0) if k2<100 B2 B5 B4 k3  k2+1 return j2 j2  1 k2   (k3,0) if k2<100 B2 B5 B4 94

95 “Advanced Compiler Techniques” SSA Example: More Constant Propagation k3  k2+1 return j2 j2  1 k2   (k3,0) if k2<100 B2 B5 B4 k3  k2+1 return 1 j2  1 k2   (k3,0) if k2<100 B2 B5 B4 95

96 “Advanced Compiler Techniques” SSA Example: Ultimate Dead Code Elimination k3  k2+1 return 1 j2  1 k2   (k3,0) if k2<100 B2 B5 B4 return 1 B4 96

97 Advantages of SSA-based optimizations  Dependency information built - in No separate phase required to compute dependency information  Transformed output preserves SSA form Little overhead to update dependencies  Efficient algorithms due to : Sparse occurrence of nodes Complexity dependent only on problem size ( independent of program size ) Linear data flow propagation along use - def edges Can customize treatment according to candidate  Can re - apply algorithms as often as needed  No separation of local optimizations from global optimizations 97 “Advanced Compiler Techniques”

98 SSA in the Real World  Invented end of the 80 s, a lot of research in the 90 s  Used in many modern compilers LLVM GNU GCC 4 ETH Oberon 2 IBM Jikes Java VM Java Hotspot VM Mono Many more… 98 “Advanced Compiler Techniques”

99 SSAPRE Kennedy et al., Partial redundancy elimination in SSA form, ACM TOPLAS 1999 99

100 “Advanced Compiler Techniques” SSAPRE: Motivation  Traditional data flow analysis based on bit - vectors do not interface well with SSA representation 100

101 “Advanced Compiler Techniques” Traditional Partial Redundancy Elimination a  x*y b  x*y B1B1 B2B2 B3B3 t  x*y a  t t  x*y b  t B1B1 B2B2 B3B3 SS A fo rm No t SS A fo rm ! Before PRE After PRE 101

102 “Advanced Compiler Techniques” SSAPRE: Motivation (Cont.)  Traditional PRE needs a postpass transform to turn the results into SSA form again.  SSA is based on variables, not on expressions 102

103 “Advanced Compiler Techniques” SSAPRE: Motivation (Cont.)  Representative occurrence Given a partially redundant occurrence E 0, we define the representative occurrence for E 0 as the nearest to E 0 among all occurrences that dominate E 0. Unique and well - defined 103

104 “Advanced Compiler Techniques” FRG (Factored Redundancy Graph) Redundancy edge Without factoring Control flow edge EE E E Factored EE E E E =  (E, E,  ) 104

105 “Advanced Compiler Techniques” FRG (Factored Redundancy Graph) EE E E E =  (E, E,  ) t 0  x*yt 1  x*y t 2   ( t 0, t 1, t 3 )  t2 t2  t2 t2 Assume E=x*y Note: This is in SSA form 105

106 “Advanced Compiler Techniques” Observation  Every use - def edge of the temporary corresponds directly to a redundancy edge for the expression. 106

107 “Advanced Compiler Techniques” Intuition for SSAPRE  Construct FRG for each expression E  Refine redundant edges to form the use - def relation for each expression E ’ s temporary  Transform the program For a definition, replace with t  E For a use, replace with  t Sometimes need also insert an expression 107

108 “Advanced Compiler Techniques” Main Steps  Initialization : scan the whole program, collect all the occurrences, and build a worklist of expressions.  Then, for each entry in the worklist : Phi placement : find where the Phi nodes of the FRG must be placed. Renaming : assign appropriate " redundancy classes " to all occurrences Analyze : compute various flags in PHI nodes  This conceptually is composed of two data flow analysis passes, which in practice only scan the PHI nodes in the FRG, not the whole code, so they are not that heavy. Finalize : make so that the FRG is exactly like the use - def graph for the temporary introduced  Code motion : actually update the code using the FRG. 108

109 An Example 109 “Advanced Compiler Techniques”

110 Next Time  Inter - procedural Analysis Reading : Dragon chapter 12 110


Download ppt "Advanced Compiler Techniques LIU Xianhua School of EECS, Peking University Static Single Assignment."

Similar presentations


Ads by Google