U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Static Single Assignment John Cavazos University of Delaware
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT 2 What is SSA? Many data-flow problems have been formulated To limit number of analyses, use single analysis to perform multiple transformations SSA Compiler optimization algorithms enhanced or enabled by SSA: Constant propagation Dead code elimination Global value numbering Partial redundancy elimination Register allocation
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT 3 SSA Construction Algorithm ( High-level sketch ) 1. Insert -functions 2. Rename values
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT 4 SSA Construction Algorithm ( Less high-level sketch ) Insert -functions at every join for every name Solve reaching definitions Rename each use to def that reaches it ( will be unique ) What’s wrong with this approach Too many -functions! To do better, we need a more complex approach Builds maximal SSA
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT 5 1.Insert -functions a.) calculate dominance frontiers b.) find global names for each name, build list of blocks that define it c.) insert -functions SSA Construction Algorithm ( Less high-level sketch )
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT 6 Insert -functions global name n worklist ← Block(n) // blocks in which n is assigned block b ∈ worklist block d in b’s dominance frontier insert a -function for n in d add d to worklist
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT 7 Computing Dominance Frontiers Only join points are in DF(n) for some n Simple algorithm for computing dominance frontiers For each join point x ( i.e., |preds(x)| > 1 ) For each CFG predecessor of x Run up to ID OM (x ) in dominator tree, add x to DF(n) for each n between x and ID OM (x )
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT 8 Dominance Frontiers nB0B1B2B3B4B5B6B7 DOM(n)00,10,1,20,1,30,1,3,40,1,3,50,1,3,60,1,7 IDOM(n) DF(n) For each join point x For each CFG pred of x Run to ID OM (x ) in dom tree, add x to DF(n) for each n between x and ID OM (x ) B1B1 B2B2 B3B3 B4B4 B5B5 B6B6 B7B7 B0B0 B1B1 B2B2 B3B3 B4B4 B5B5 B6B6 B7B7 B0B0 Flow Graph Dominance Tree
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Dominance Frontiers & -Function Insertion Dominance Frontiers A definition at n forces a -function at m iff n D OM (m) but n D OM (p) for some p preds(m) DF(n ) is fringe just beyond region n dominates B1B1 B2B2 B3B3 B4B4 B5B5 B6B6 B7B7 B0B0 in 1 forces -function in DF(1) = Ø (halt ) x ... x (...) DF(4) is {6}, so in 4 forces -function in 6 x (...) in 6 forces -function in DF(6) = {7} x (...) in 7 forces -function in DF(7) = {1} For each assignment, we insert the -functions
a (a,a) b (b,b) c (c,c) d (d,d) y a+b z c+d i i+1 B7B7 i > 100 i ... B0B0 b ... c ... d ... B2B2 a (a,a) b (b,b) c ( c,c) d (d,d) i (i,i) a ... c ... B1B1 a ... d ... B3B3 B4B4 c ... B5B5 d (d,d) c (c,c) b ... B6B6 i > 100 With all the -functions Lots of new ops Renaming is next Assume a, b, c, & d defined before B 0 Excluding local names avoids ’s for y & z
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT 11 2.Rename variables in a pre-order walk over dominator tree (uses counter and a stack per global name) Staring with the root block, b a.) generate unique names for each -function and push them on the appropriate stacks SSA Construction Algorithm ( Less high-level sketch )
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Rename variables (cont’d) b.) rewrite each operation in the block i. Rewrite uses of global names with the current version (from the stack) ii. Rewrite definition by creating & pushing new name c.) fill in -function parameters of successor blocks d.) recurse on b’s children in the dominator tree e.) pop names generated in b from stacks SSA Construction Algorithm ( Less high-level sketch )
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT 13 SSA Construction Algorithm for each global name i counter[i] 0 stack[i] call Rename(n 0 ) NewName(n) i counter[n] counter[n] counter[n] + 1 push n i onto stack[n] return n i Rename(b) for each -function in b, x (…) rename x as NewName(x) for each operation “x y op z” in b rewrite y as top(stack[y]) rewrite z as top(stack[z]) rewrite x as NewName(x) for each successor of b in the CFG rewrite appropriate parameters for each successor s of b in dom. tree Rename(s) for each operation “x y op z” in b pop(stack[x]) Adding all the details...
a (a,a) b (b,b) c (c,c) d (d,d) y a+b z c+d i i+1 B7B7 i > 100 i ... B0B0 b ... c ... d ... B2B2 a (a,a) b (b,b) c (c,c) d (d,d) i (i,i) a ... c ... B1B1 a ... d ... B3B3 B4B4 c ... B5B5 d (d,d) c (c,c) b ... B6B6 i > 100 Counters Stacks a a0a0 b0b0 c0c0 d0d0 Before processing B 0 bcdi Assume a, b, c, & d defined before B 0 i has not been defined 20
a (a,a) b (b,b) c (c,c) d (d,d) y a+b z c+d i i+1 B7B7 i > 100 i 0 ... B0B0 b ... c ... d ... B2B2 a (a 0,a) b (b 0,b) c (c 0,c) d (d 0,d) i (i 0,i) a ... c ... B1B1 a ... d ... B3B3 B4B4 c ... B5B5 d (d,d) c (c,c) b ... B6B6 i > 100 Counters Stacks abcdi a0a0 b0b0 c0c0 d0d0 End of B 0 i0i0 21
a (a,a) b (b,b) c (c,c) d (d,d) y a+b z c+d i i+1 B7B7 i > 100 i 0 ... B0B0 b ... c ... d ... B2B2 a 1 (a 0,a) b 1 (b 0,b) c 1 (c 0,c) d 1 (d 0,d) i 1 (i 0,i) a 2 ... c 2 ... B1B1 a ... d ... B3B3 B4B4 c ... B5B5 d (d,d) c (c,c) b ... B6B6 i > 100 Counters Stacks abcdi a0a0 b0b0 c0c0 d0d0 End of B 1 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 22
a (a 2,a) b (b 2,b) c (c 3,c) d (d 2,d) y a+b z c+d i i+1 B7B7 i > 100 i 0 ... B0B0 b 2 ... c 3 ... d 2 ... B2B2 a 1 (a 0,a) b 1 (b 0,b) c 1 (c 0,c) d 1 (d 0,d) i 1 (i 0,i) a 2 ... c 2 ... B1B1 a ... d ... B3B3 B4B4 c ... B5B5 d (d,d) c (c,c) b ... B6B6 i > 100 Counters Stacks abcdi a0a0 b0b0 c0c0 d0d0 End of B 2 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 b2b2 d2d2 c3c3 23
a (a 2,a) b (b 2,b) c (c 3,c) d (d 2,d) y a+b z c+d i i+1 B7B7 i > 100 i 0 ... B0B0 b 2 ... c 3 ... d 2 ... B2B2 a 1 (a 0,a) b 1 (b 0,b) c 1 (c 0,c) d 1 (d 0,d) i 1 (i 0,i) a 2 ... c 2 ... B1B1 a ... d ... B3B3 B4B4 c ... B5B5 d (d,d) c (c,c) b ... B6B6 i > 100 i ≤ 100 Counters Stacks abcdi a0a0 b0b0 c0c0 d0d0 Before starting B 3 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 24
a (a 2,a) b (b 2,b) c (c 3,c) d (d 2,d) y a+b z c+d i i+1 B7B7 i > 100 i 0 ... B0B0 b 2 ... c 3 ... d 2 ... B2B2 a 1 (a 0,a) b 1 (b 0,b) c 1 (c 0,c) d 1 (d 0,d) i 1 (i 0,i) a 2 ... c 2 ... B1B1 a 3 ... d 3 ... B3B3 d ... B4B4 c ... B5B5 d (d,d) c (c,c) b ... B6B6 i > 100 Counters Stacks abcdi a0a0 b0b0 c0c0 d0d0 End of B 3 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 a3a3 d3d3 25
a (a 2,a) b (b 2,b) c (c 3,c) d (d 2,d) y a+b z c+d i i+1 B7B7 i > 100 i 0 ... B0B0 b 2 ... c 3 ... d 2 ... B2B2 a 1 (a 0,a) b 1 (b 0,b) c 1 (c 0,c) d 1 (d 0,d) i 1 (i 0,i) a 2 ... c 2 ... B1B1 a 3 ... d 3 ... B3B3 d 4 ... B4B4 c ... B5B5 d (d 4,d) c (c 2,c) b ... B6B6 i > 100 Counters Stacks abcdi a0a0 b0b0 c0c0 d0d0 End of B 4 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 a3a3 d3d3 d4d4 26
a (a 2,a) b (b 2,b) c (c 3,c) d (d 2,d) y a+b z c+d i i+1 B7B7 i > 100 i 0 ... B0B0 b 2 ... c 3 ... d 2 ... B2B2 a 1 (a 0,a) b 1 (b 0,b) c 1 (c 0,c) d 1 (d 0,d) i 1 (i 0,i) a 2 ... c 2 ... B1B1 a 3 ... d 3 ... B3B3 d 4 ... B4B4 c 4 ... B5B5 d (d 4,d 3 ) c (c 2,c 4 ) b ... B6B6 i > 100 Counters Stacks abcdi a0a0 b0b0 c0c0 d0d0 End of B 5 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 a3a3 d3d3 c4c4 27
a (a 2,a 3 ) b (b 2,b 3 ) c (c 3,c 5 ) d (d 2,d 5 ) y a+b z c+d i i+1 B7B7 i > 100 i 0 ... B0B0 b 2 ... c 3 ... d 2 ... B2B2 a 1 (a 0,a) b 1 (b 0,b) c 1 (c 0,c) d 1 (d 0,d) i 1 (i 0,i) a 2 ... c 2 ... B1B1 a 3 ... d 3 ... B3B3 d 4 ... B4B4 c 4 ... B5B5 d 5 (d 4,d 3 ) c 5 (c 2,c 4 ) b 3 ... B6B6 i > 100 Counters Stacks abcdi a0a0 b0b0 c0c0 d0d0 End of B 6 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 a3a3 d3d3 c5c5 d5d5 b3b3 28
a (a 2,a 3 ) b (b 2,b 3 ) c (c 3,c 5 ) d (d 2,d 5 ) y a+b z c+d i i+1 B7B7 i > 100 i 0 ... B0B0 b 2 ... c 3 ... d 2 ... B2B2 a 1 (a 0,a) b 1 (b 0,b) c 1 (c 0,c) d 1 (d 0,d) i 1 (i 0,i) a 2 ... c 2 ... B1B1 a 3 ... d 3 ... B3B3 d 4 ... B4B4 c 4 ... B5B5 d 5 (d 4,d 3 ) c 5 (c 2,c 4 ) b 3 ... B6B6 i > 100 Counters Stacks abcdi a0a0 b0b0 c0c0 d0d0 Before B 7 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 29
a 4 (a 2,a 3 ) b 4 (b 2,b 3 ) c 6 (c 3,c 5 ) d 6 (d 2,d 5 ) y a 4 +b 4 z c 6 +d 6 i 2 i 1 +1 B7B7 i > 100 i 0 ... B0B0 b 2 ... c 3 ... d 2 ... B2B2 a 1 (a 0,a 4 ) b 1 (b 0,b 4 ) c 1 (c 0,c 6 ) d 1 (d 0,d 6 ) i 1 (i 0,i 2 ) a 2 ... c 2 ... B1B1 a 3 ... d 3 ... B3B3 d 4 ... B4B4 c 4 ... B5B5 d 5 (d 4,d 3 ) c 5 (c 2,c 4 ) b 3 ... B6B6 i > 100 Counters Stacks abcdi a0a0 b0b0 c0c0 d0d0 End of B 7 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 a4a4 b4b4 c6c6 d6d6 i2i2 30
a 4 (a 2,a 3 ) b 4 (b 2,b 3 ) c 6 (c 3,c 5 ) d 6 (d 2,d 5 ) y a 4 +b 4 z c 6 +d 6 i 2 i 1 +1 B7B7 i > 100 i 0 ... B0B0 b 2 ... c 3 ... d 2 ... B2B2 a 1 (a 0,a 4 ) b 1 (b 0,b 4 ) c 1 (c 0,c 6 ) d 1 (d 0,d 6 ) i 1 (i 0,i 2 ) a 2 ... c 2 ... B1B1 a 3 ... d 3 ... B3B3 d 4 ... B4B4 c 4 ... B5B5 d 5 (d 4,d 3 ) c 5 (c 2,c 4 ) b 3 ... B6B6 i > 100 After renaming Semi-pruned SSA form We’re done … Semi-pruned only names live in 2 or more blocks are “global names”. 31
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT 26 SSA Construction Algorithm ( Pruned SSA ) What’s this “pruned SSA” stuff? Minimal SSA still contains extraneous -functions Inserts some -functions where they are dead Would like to avoid inserting them
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT 27 SSA Construction Algorithm ( Two Ideas ) Semi-pruned SSA: discard names used in only one block Significant reduction in total number of -functions Needs only local Live information ( cheap to compute ) Pruned SSA: only insert -functions where their value is live Inserts even fewer -functions, but costs more to do Requires global Live variable analysis ( more expensive ) In practice, both are simple modifications to step 1.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT 28 SSA Deconstruction At some point, we need executable code Few machines implement operations Need to fix up the flow of values Basic idea Insert copies -function pred’s Simple algorithm Works in most cases Adds lots of copies Many of them coalesce away X 17 (x 10,x 11 )... x x 17 X 17 x 10 X 17 x 11