Presentation is loading. Please wait.

Presentation is loading. Please wait.

Compiler Principles Winter 2012-2013 Compiler Principles Loop Optimizations and Register Allocation Mayer Goldberg and Roman Manevich Ben-Gurion University.

Similar presentations


Presentation on theme: "Compiler Principles Winter 2012-2013 Compiler Principles Loop Optimizations and Register Allocation Mayer Goldberg and Roman Manevich Ben-Gurion University."— Presentation transcript:

1 Compiler Principles Winter 2012-2013 Compiler Principles Loop Optimizations and Register Allocation Mayer Goldberg and Roman Manevich Ben-Gurion University

2 Today Review (global) dataflow analysis – Join semilattices Monotone dataflow frameworks – Termination – Distribute transfer functions  join over all paths Loop optimizations – Introduce reaching definitions analysis – Loop code motion – (Strength reduction via induction variables) Register allocation by graph coloring – From liveness to register interference graph – Heuristics for graph coloring 2

3 Liveness Analysis A variable is live at a point in a program if later in the program its value will be read before it is written to again 3

4 Join semilattice definition A join semilattice is a pair (V,  ), where V is a domain of elements  is a join operator that is – commutative: x  y = y  x – associative: (x  y)  z = x  (y  z) – idempotent: x  x = x If x  y = z, we say that z is the join or (Least Upper Bound) of x and y Every join semilattice has a bottom element denoted  such that   x = x for all x 4

5 Partial ordering induced by join Every join semilattice (V,  ) induces an ordering relationship  over its elements Define x  y iff x  y = y Need to prove – Reflexivity: x  x – Antisymmetry: If x  y and y  x, then x = y – Transitivity: If x  y and y  z, then x  z 5

6 A join semilattice for liveness Sets of live variables and the set union operation Idempotent: – x  x = x Commutative: – x  y = y  x Associative: – (x  y)  z = x  (y  z) Bottom element: – The empty set: Ø  x = x Ordering over elements = subset relation 6

7 Join semilattice example for liveness 7 {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} Bottom element

8 Dataflow framework A global analysis is a tuple ( D, V, , F, I ), where –D is a direction (forward or backward) The order to visit statements within a basic block, NOT the order in which to visit the basic blocks –V is a set of values (sometimes called domain) –  is a join operator over those values –F is a set of transfer functions f s : V  V (for every statement s) –I is an initial value 8

9 Running global analyses Assume that ( D, V, , F, I ) is a forward analysis For every statement s maintain values before - IN[s] - and after - OUT[s] Set OUT[s] =  for all statements s Set OUT[entry] = I Repeat until no values change: – For each statement s with predecessors PRED[s]={p 1, p 2, …, p n } Set IN[s] = OUT[p 1 ]  OUT[p 2 ]  …  OUT[p n ] Set OUT[s] = f s (IN[s]) The order of this iteration does not matter – Chaotic iteration 9

10 Proving termination Our algorithm for running these analyses continuously loops until no changes are detected Problem: how do we know the analyses will eventually terminate? 10

11 A non-terminating analysis The following analysis will loop infinitely on any CFG containing a loop: Direction: Forward Domain: ℕ Join operator: max Transfer function: f(n) = n + 1 Initial value: 0 11

12 A non-terminating analysis 12 start end x = y

13 Initialization 13 start end x = y 0 0

14 Fixed-point iteration 14 start end x = y 0 0

15 Choose a block 15 start end x = y 0 0

16 Iteration 1 16 start end x = y 0 0 0

17 Iteration 1 17 start end x = y 1 0 0

18 Choose a block 18 start end x = y 1 0 0

19 Iteration 2 19 start end x = y 1 0 0

20 Iteration 2 20 start end x = y 1 0 1

21 Iteration 2 21 start end x = y 2 0 1

22 Choose a block 22 start end x = y 2 0 1

23 Iteration 3 23 start end x = y 2 0 1

24 Iteration 3 24 start end x = y 2 0 2

25 Iteration 3 25 start end x = y 3 0 2

26 Why doesn’t this terminate? Values can increase without bound Note that “increase” refers to the lattice ordering, not the ordering on the natural numbers The height of a semilattice is the length of the longest increasing sequence in that semilattice The dataflow framework is not guaranteed to terminate for semilattices of infinite height Note that a semilattice can be infinitely large but have finite height – e.g. constant propagation 26 0 1 2 3 4...

27 Height of a lattice An increasing chain is a sequence of elements   a 1  a 2  …  a k – The length of such a chain is k The height of a lattice is the length of the maximal increasing chain For liveness with n program variables: – {}  {v 1 }  {v 1,v 2 }  …  {v 1,…,v n } For available expressions it is the number of expressions of the form a=b op c – For n program variables and m operator types: m  n 3 27

28 Another non-terminating analysis This analysis works on a finite-height semilattice, but will not terminate on certain CFGs: Direction: Forward Domain: Boolean values true and false Join operator: Logical OR Transfer function: Logical NOT Initial value: false 28

29 A non-terminating analysis 29 start end x = y

30 Initialization 30 start end x = y false

31 Fixed-point iteration 31 start end x = y false

32 Choose a block 32 start end x = y false

33 Iteration 1 33 start end x = y false

34 Iteration 1 34 start end x = y true false

35 Iteration 2 35 start end x = y true false true

36 Iteration 2 36 start end x = y false true

37 Iteration 3 37 start end x = y false

38 Iteration 3 38 start end x = y true false

39 Why doesn’t it terminate? Values can loop indefinitely Intuitively, the join operator keeps pulling values up If the transfer function can keep pushing values back down again, then the values might cycle forever 39 false true false true false...

40 Why doesn’t it terminate? Values can loop indefinitely Intuitively, the join operator keeps pulling values up If the transfer function can keep pushing values back down again, then the values might cycle forever How can we fix this? 40 false true false true false...

41 Monotone transfer functions A transfer function f is monotone iff if x  y, then f(x)  f(y) Intuitively, if you know less information about a program point, you can't “gain back” more information about that program point Many transfer functions are monotone, including those for liveness and constant propagation Note: Monotonicity does not mean that x  f(x) – (This is a different property called extensivity) 41

42 Liveness and monotonicity A transfer function f is monotone iff if x  y, then f(x)  f(y) Recall our transfer function for a = b + c is – f a = b + c (V) = (V – {a})  {b, c} Recall that our join operator is set union and induces an ordering relationship X  Y iff X  Y Is this monotone? 42

43 Is constant propagation monotone? A transfer function f is monotone iff if x  y, then f(x)  f(y) Recall our transfer functions – f x=k (V) = V| x  k (update V by mapping x to k) – f x=a+b (V) = V| x  Not-a-Constant (assign Not-a-Constant) Is this monotone? 43 Undefined 0-212... Not-a-constant

44 The grand result Theorem: A dataflow analysis with a finite- height semilattice and family of monotone transfer functions always terminates Proof sketch: – The join operator can only bring values up – Transfer functions can never lower values back down below where they were in the past (monotonicity) – Values cannot increase indefinitely (finite height) 44

45 An “optimality” result A transfer function f is distributive if f(a  b) = f(a)  f(b) for every domain elements a and b If all transfer functions are distributive then the fixed-point solution is the solution that would be computed by joining results from all (potentially infinite) control-flow paths – Join over all paths Optimal if we ignore program conditions 45

46 An “optimality” result A transfer function f is distributive if f(a  b) = f(a)  f(b) for every domain elements a and b If all transfer functions are distributive then the fixed-point solution is equal to the solution computed by joining results from all (potentially infinite) control-flow paths – Join over all paths Optimal if we pretend all control-flow paths can be executed by the program Which analyses use distributive functions? 46

47 Loop optimizations Most of a program’s computations are done inside loops – Focus optimizations effort on loops The optimizations we’ve seen so far are independent of the control structure Some optimizations are specialized to loops – Loop-invariant code motion – (Strength reduction via induction variables) Require another type of analysis to find out where expressions get their values from – Reaching definitions (Also useful for improving register allocation) 47

48 Loop invariant computation 48 y = t * 4 x < y + z end x = x + 1 start y = … t = … z = …

49 Loop invariant computation 49 y = t * 4 x < y + z end x = x + 1 start y = … t = … z = … t*4 and y+z have same value on each iteration

50 Code hoisting 50 x < w end x = x + 1 start y = … t = … z = … y = t * 4 w = y + z

51 What reasoning did we use? 51 y = t * 4 x < y + z end x = x + 1 start y = … t = … z = … y is defined inside loop but it is loop invariant since t*4 is loop-invariant Both t and z are defined only outside of loop constants are trivially loop-invariant

52 What about now? 52 y = t * 4 x < y + z end x = x + 1 t = t + 1 start y = … t = … z = … Now t is not loop-invariant and so are t*4 and y

53 Loop-invariant code motion d: t = a 1 op a 2 – d is a program location a 1 op a 2 loop-invariant (for a loop L) if computes the same value in each iteration – Hard to know in general Conservative approximation – Each a i is a constant, or – All definitions of a i that reach d are outside L, or – Only one definition of of a i reaches d, and is loop- invariant itself Transformation: hoist the loop-invariant code outside of the loop 53

54 Reaching definitions analysis A definition d: t = … reaches a program location if there is a path from the definition to the program location, along which the defined variable is never redefined 54

55 Reaching definitions analysis A definition d: t = … reaches a program location if there is a path from the definition to the program location, along which the defined variable is never redefined Direction: Forward Domain: sets of program locations that are definitions ` Join operator: union Transfer function: f d: a=b op c (RD) = (RD - defs(a))  {d} f d: not-a-def (RD) = RD – Where defs(a) is the set of locations defining a (statements of the form a=...) Initial value: {} 55

56 Reaching definitions analysis 56 d4: y = t * 4 d4:x < y + z d6: x = x + 1 d1: y = … d2: t = … d3: z = … start end {}

57 Reaching definitions analysis 57 d4: y = t * 4 d4:x < y + z d5: x = x + 1 start d1: y = … d2: t = … d3: z = … end {}

58 Initialization 58 d4: y = t * 4 d4:x < y + z d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} end {}

59 Iteration 1 59 d4: y = t * 4 d4:x < y + z d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} end {}

60 Iteration 1 60 d4: y = t * 4 d4:x < y + z d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1} {d1, d2} {d1, d2, d3} end {}

61 Iteration 2 61 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1} {d1, d2} {d1, d2, d3} {}

62 Iteration 2 62 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {} {d1} {d1, d2} {d1, d2, d3} {}

63 Iteration 2 63 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {}

64 Iteration 2 64 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {}

65 Iteration 3 65 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {d2, d3, d4} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {}

66 Iteration 3 66 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {d2, d3, d4} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {d2, d3, d4, d5}

67 Iteration 4 67 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {d2, d3, d4} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {d2, d3, d4, d5}

68 Iteration 4 68 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3, d4, d5} {d2, d3, d4} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {d2, d3, d4, d5}

69 Iteration 4 69 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3, d4, d5} {d2, d3, d4} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4, d5}

70 Iteration 5 70 end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4, d5} {d1} {d1, d2} {d1, d2, d3} d5: x = x + 1 {d2, d3, d4} {d2, d3, d4, d5} d4: y = t * 4 x < y + z {d1, d2, d3, d4, d5} {d2, d3, d4, d5}

71 Iteration 6 71 end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4, d5} {d1} {d1, d2} {d1, d2, d3} d5: x = x + 1 {d2, d3, d4, d5} d4: y = t * 4 x < y + z {d1, d2, d3, d4, d5} {d2, d3, d4, d5}

72 Which expressions are loop invariant 72 t is defined only in d2 – outside of loop z is defined only in d3 – outside of loop y is defined only in d4 – inside of loop but depends on t and 4, both loop-invariant start d1: y = … d2: t = … d3: z = … {} {d1} {d1, d2} {d1, d2, d3} end {d2, d3, d4, d5} d5: x = x + 1 {d2, d3, d4, d5} d4: y = t * 4 x < y + z {d1, d2, d3, d4, d5} {d2, d3, d4, d5} x is defined only in d5 – inside of loop so is not a loop-invariant

73 Inferring loop-invariant expressions For a statement s of the form t = a 1 op a 2 A variable a i is immediately loop-invariant if all reaching definitions IN[s]={d 1,…,d k } for a i are outside of the loop LOOP-INV = immediately loop-invariant variables and constants LOOP-INV = LOOP-INV  {x | d: x = a 1 op a 2, d is in the loop, and both a 1 and a 2 are in LOOP-INV} – Iterate until fixed-point An expression is loop-invariant if all operands are loop-invariants 73

74 Computing LOOP-INV 74 end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5}

75 Computing LOOP-INV 75 end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5} (immediately) LOOP-INV = {t}

76 Computing LOOP-INV 76 end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5} (immediately) LOOP-INV = {t, z}

77 Computing LOOP-INV 77 end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5} (immediately) LOOP-INV = {t, z}

78 Computing LOOP-INV 78 end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5} (immediately) LOOP-INV = {t, z}

79 Computing LOOP-INV 79 end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} LOOP-INV = {t, z, 4} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5}

80 Computing LOOP-INV 80 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3, d4, d5} {d2, d3, d4, d5} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4, d5} LOOP-INV = {t, z, 4, y}

81 Induction variables 81 while (i < x) { j = a + 4 * i a[j] = j i = i + 1 } i is incremented by a loop-invariant expression on each iteration – this is called an induction variable j is a linear function of the induction variable with multiplier 4

82 Strength-reduction 82 j = a + 4 * i while (i < x) { j = j + 4 a[j] = j i = i + 1 } Prepare initial value Increment by multiplier

83 Summary of optimizations 83 Enabled OptimizationsAnalysis Common-subexpression elimination Copy Propagation Available Expressions Constant foldingConstant Propagation Dead code eliminationLive Variables Loop-invariant code motionReaching Definitions

84 Global Register Allocation 84

85 85

86 86

87 Registers Most machines have a set of registers, dedicated memory locations that – can be accessed quickly, – can have computations performed on them, and – exist in small quantity Using registers intelligently is a critical step in any compiler – A good register allocator can generate code orders of magnitude better than a bad register allocator 87

88 Register allocation In TAC, there are an unlimited number of variables On a physical machine there are a small number of registers: – x86 has four general-purpose registers and a number of specialized registers – MIPS has twenty-four general-purpose registers and eight special-purpose registers Register allocation is the process of assigning variables to registers and managing data transfer in and out of registers 88

89 Challenges in register allocation Registers are scarce – Often substantially more IR variables than registers – Need to find a way to reuse registers whenever possible Registers are complicated – x86: Each register made of several smaller registers; can't use a register and its constituent registers at the same time – x86: Certain instructions must store their results in specific registers; can't store values there if you want to use those instructions – MIPS: Some registers reserved for the assembler or operating system – Most architectures: Some registers must be preserved across function calls 89

90 Simple approach Problem: program execution very inefficient– moving data back and forth between memory and registers 90 x = y + z mov 16(%ebp), %eax mov 20(%ebp), %ebx add %ebx, %eax mov %eax, 24(%ebx) Straightforward solution: Allocate each variable in activation record At each instruction, bring values needed into registers, perform operation, then store result to memory

91 Find a register allocation 91 b = a + 2 c = b * b b = c + 1 return b * a eax ebx register variable ?a ?b ?c

92 Is this a valid allocation? eax ebx register 92 b = a + 2 c = b * b b = c + 1 return b * a registervariable eaxa ebxb eaxc ebx = eax + 2 eax = ebx * ebx ebx = eax + 1 return ebx * eax Overwrites previous value of ‘a’ also stored in eax

93 Is this a valid allocation? eax ebx register 93 b = a + 2 c = b * b b = c + 1 return b * a registervariable ebxa eaxb c eax = ebx + 2 eax = eax * eax eax = eax + 1 return eax * ebx Value of ‘c’ stored in eax is not needed anymore so reuse it for ‘b’

94 Main idea For every node n in CFG, we have out[n] – Set of temporaries live out of n Two variables interfere if they appear in the same out[n] of any node n – Cannot be allocated to the same register Conversely, if two variables do not interfere with each other, they can be assigned the same register – We say they have disjoint live ranges How to assign registers to variables? 94

95 Interference graph Nodes of the graph = variables Edges connect variables that interfere with one another Nodes will be assigned a color corresponding to the register assigned to the variable Two colors can’t be next to one another in the graph 95

96 Interference graph construction b = a + 2 c = b * b b = c + 1 return b * a 96

97 Interference graph construction b = a + 2 c = b * b b = c + 1 {b, a} return b * a 97

98 Interference graph construction b = a + 2 c = b * b {a, c} b = c + 1 {b, a} return b * a 98

99 Interference graph construction b = a + 2 {b, a} c = b * b {a, c} b = c + 1 {b, a} return b * a 99

100 Interference graph construction {a} b = a + 2 {b, a} c = b * b {a, c} b = c + 1 {b, a} return b * a 100

101 Interference graph a cb eax ebx color register 101 {a} b = a + 2 {b, a} c = b * b {a, c} b = c + 1 {b, a} return b * a

102 Colored graph a cb eax ebx color register 102 {a} b = a + 2 {b, a} c = b * b {a, c} b = c + 1 {b, a} return b * a

103 Graph coloring This problem is equivalent to graph-coloring, which is NP-hard if there are at least three registers No good polynomial-time algorithms (or even good approximations!) are known for this problem We have to be content with a heuristic that is good enough for RIGs that arise in practice 103

104 Coloring by simplification [Kempe 1879] How to find a k-coloring of a graph Intuition: – Suppose we are trying to k-color a graph and find a node with fewer than k edges – If we delete this node from the graph and color what remains, we can find a color for this node if we add it back in – Reason: fewer than k neighbors  some color must be left over 104

105 Coloring by simplification [Kempe 1879] How to find a k-coloring of a graph Phase 1: Simplification – Repeatedly simplify graph – When a variable (i.e., graph node) is removed, push it on a stack Phase 2: Coloring – Unwind stack and reconstruct the graph as follows: – Pop variable from the stack – Add it back to the graph – Color the node for that variable with a color that it doesn’t interfere with 105 simplify color

106 Coloring k=2 b ed a c stack: eax ebx color register 106

107 Coloring k=2 b ed a stack: c c eax ebx color register 107

108 Coloring k=2 b ed a stack: e c c eax ebx color register 108

109 Coloring k=2 b ed a stack: a e c c eax ebx color register 109

110 Coloring k=2 b ed a stack: b a e c c eax ebx color register 110

111 Coloring k=2 b ed a stack: d b a e c c eax ebx color register 111

112 Coloring k=2 b ed eax ebx color register a stack: b a e c c 112

113 Coloring k=2 b e a stack: a e c c eax ebx color register d 113

114 Coloring k=2 e a stack: e c c eax ebx color register b d 114

115 Coloring k=2 e stack: c c eax ebx color register a b d 115

116 Coloring k=2 stack: c eax ebx color register e a b d 116

117 Failure of heuristic If the graph cannot be colored, it will eventually be simplified to graph in which every node has at least K neighbors Sometimes, the graph is still K-colorable! Finding a K-coloring in all situations is an NP- complete problem – We will have to approximate to make register allocators fast enough 117

118 Coloring k=2 stack: c eax ebx color register e a b d 118

119 Coloring k=2 c eax ebx color register e a b d stack: c b e a d Some graphs can’t be colored in K colors: 119

120 Coloring k=2 c eax ebx color register e a b d Some graphs can’t be colored in K colors: stack: b e a d 120

121 Coloring k=2 c eax ebx color register e a b d Some graphs can’t be colored in K colors: stack: e a d 121

122 Coloring k=2 c eax ebx color register e a b d Some graphs can’t be colored in K colors: stack: e a d no colors left for e! 122

123 Chaitin’s algorithm Choose and remove an arbitrary node, marking it “troublesome” – Use heuristics to choose which one – When adding node back in, it may be possible to find a valid color – Otherwise, we have to spill that node 123

124 Spilling Phase 3: spilling – once all nodes have K or more neighbors, pick a node for spilling There are many heuristics that can be used to pick a node Try to pick node not used much, not in inner loop Storage in activation record – Remove it from graph We can now repeat phases 1-2 without this node Better approach – rewrite code to spill variable, recompute liveness information and try to color again 124

125 Coloring k=2 c eax ebx color register e a b d Some graphs can’t be colored in K colors: stack: e a d no colors left for e! 125

126 Coloring k=2 c eax ebx color register e a b d Some graphs can’t be colored in K colors: stack: b e a d 126

127 Coloring k=2 c eax ebx color register e a b d Some graphs can’t be colored in K colors: stack: e a d 127

128 Coloring k=2 c eax ebx color register e a b d Some graphs can’t be colored in K colors: stack: a d 128

129 Coloring k=2 c eax ebx color register e a b d Some graphs can’t be colored in K colors: stack: d 129

130 Coloring k=2 c eax ebx color register e a b d Some graphs can’t be colored in K colors: stack: 130

131 Handling precolored nodes Some variables are pre-assigned to registers – Eg: mul on x86/pentium uses eax; defines eax, edx – Eg: call on x86/pentium Defines (trashes) caller-save registers eax, ecx, edx To properly allocate registers, treat these register uses as special temporary variables and enter into interference graph as precolored nodes 131

132 Handling precolored nodes Simplify. Never remove a pre-colored node – it already has a color, i.e., it is a given register Coloring. Once simplified graph is all colored nodes, add other nodes back in and color them using precolored nodes as starting point 132

133 Optimizing move instructions Code generation produces a lot of extra mov instructions mov t5, t9 If we can assign t5 and t9 to same register, we can get rid of the mov – effectively, copy elimination at the register allocation level Idea: if t5 and t9 are not connected in inference graph, coalesce them into a single variable; the move will be redundant Problem: coalescing nodes can make a graph un-colorable – Conservative coalescing heuristic 133

134 Summary of material 1/2 134 TechniquesCompiler task Regular expressions Finite automata (DFA/NFA) Determinization via subset construction Maximal munch and precedences Automatic scanner generation tools (Jflex) Scanning Context-free grammars Leftmost/Rightmost-derivations, parse trees Ambiguity / ambiguity elimination tactics LL parsing: building prediction tables (FIRST/FOLLOWS), conflicts, left-recursion elimination, recursive descent, automata-based parsing Shift-reduce parsing: LR items, transition relation construction, conflicts, SLR, LALR, resolving ambiguity via precedence, automatic parser generation tools (CUP) Parsing

135 Summary of material 2/2 135 TechniquesCompiler task Three-Address Code and recursive lowering, Sethi-Ullman translation minimizing number of temporaries Lowering to IR Basic blocks, control flow graphs Local analysis: transfer functions Local analysis vs. Global analysis Dataflow analysis: join semilattices, partial orderings, monotone transfer functions Available expressions, liveness, constant propagation, reaching definitions Common-subexpression elimination, copy propagation, constant folding, loop-invariant code motion Optimizations Naïve allocation Register interference graph – isomorphism to graph coloring Graph coloring by simplification Chaitin’s algorithm (spilling) Register allocation

136 Good luck with final project and exams! I hope some of this was interesting Advertisement for next semester course: Program Analysis and Verification Program Analysis and Verification


Download ppt "Compiler Principles Winter 2012-2013 Compiler Principles Loop Optimizations and Register Allocation Mayer Goldberg and Roman Manevich Ben-Gurion University."

Similar presentations


Ads by Google