Building SSA Form (A mildly abridged account) For the full story, see the lecture notes for COMP 512 (lecture 8) and.

Building SSA Form (A mildly abridged account) For the full story, see the lecture notes for COMP 512 http://www.cs.rice.edu/~keith/512 (lecture 8) and read Chapter 9 of Engineering a Compiler. Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these materials for their personal use.

Review SSA-form Each name is defined exactly once Each use refers to exactly one name What’s hard Straight-line code is trivial Splits in the CFG are trivial Joins in the CFG are hard Building SSA Form Insert Ø-functions at birth points Rename all values for uniqueness * x  17 - 4 x  a + b x  y - z x  13 z  x * q s  w - x ?

Birth Points ( another notion due to Tarjan ) Consider the flow of values in this example x  17 - 4 x  a + b x  y - z x  13 z  x * q s  w - x The value x appears everywhere It takes on several values. Here, x can be 13, y-z, or 17-4 Here, it can also be a+b If each value has its own name … Need a way to merge these distinct values Values are “born” at merge points *

Birth Points ( another notion due to Tarjan ) Consider the flow of values in this example x  17 - 4 x  a + b x  y - z x  13 z  x * q s  w - x New value for x here 17 - 4 or y - z New value for x here 13 or (17 - 4 or y - z) New value for x here a+b or ((13 or (17-4 or y-z))

Birth Points ( another notion due to Tarjan ) Consider the flow of values in this example x  17 - 4 x  a + b x  y - z x  13 z  x * q s  w - x These are all birth points for values All birth points are join points Not all join points are birth points Birth points are value-specific …

Review SSA-form Each name is defined exactly once Each use refers to exactly one name What’s hard Straight-line code is trivial Splits in the CFG are trivial Joins in the CFG are hard Building SSA Form Insert Ø-functions at birth points Rename all values for uniqueness A Ø-function is a special kind of copy that selects one of its parameters. The choice of parameter is governed by the CFG edge along which control reached the current block. Real machines do not implement a Ø-function directly in hardware. y 1 ...y 2 ... y 3  Ø(y 1,y 2 ) *

SSA Construction Algorithm ( High-level sketch ) 1.Insert Ø-functions 2.Rename values … that’s all... … of course, there is some bookkeeping to be done... *

SSA Construction Algorithm ( Lower-level sketch ) 1.Insert Ø-functions at every join for every name 2. Solve reaching definitions 3.Rename each use to the def that reaches ( will be unique )

Reaching Definitions The equations R EACHES (n 0 ) = Ø R EACHES (n ) =  p  preds(n) D EF O UT (p )  (R EACHES (p )  S URVIVED (p)) R EACHES (n ) is the set of definitions that reach block n D EF O UT (n) is the set of definitions in n that reach the end of n S URVIVED (n) is the set of defs not obscured by a new def in n Computing R EACHES (n) Use any data-flow method ( i.e., the iterative method ) This particular problem has a very-fast solution ( Zadeck ) F.K. Zadeck, “Incremental data-flow analysis in a structured program editor,” Proceedings of the S IGPLAN 84 Conf. on Compiler Construction, June, 1984, pages 132-143.ge * Domain is | DEFINITIONS |, same as number of operations

SSA Construction Algorithm ( Lower-level sketch ) 1.Insert Ø-functions at every join for every name 2. Solve reaching definitions 3.Rename each use to the def that reaches ( will be unique ) What’s wrong with this approach? Too many Ø-functions ( precision ) Too many Ø-functions ( space ) Too many Ø-functions ( time ) Need to relate edges to Ø-function parameters ( bookkeeping ) To do better, we need a more complex approach Builds “maximal” SSA

SSA Construction Algorithm ( Lower-level sketch ) 1.Insert Ø-functions at every join for every name 2. Solve reaching definitions 3.Rename each use to the def that reaches ( will be unique ) What’s wrong with this approach? A Ø-function may be useless  Value is never used  Ø-function is extraneous A Ø-function may be pointless  As in “ x 17  Ø(x 12, x 12 ) ” Can we get there by eliminating these? Probably not Can have Ø ’s that depend on themselves … Builds “maximal” SSA Someone always suggests this.

SSA Construction Algorithm ( Detailed sketch ) 1.Insert Ø-functions a.) calculate dominance frontiers b.) find global names for each name, build a list of blocks that define it c.) insert Ø-functions  global name n  block b in which n is assigned  block d in b’s dominance frontier insert a Ø-function for n in d add d to n’s list of defining blocks { Creates the iterated dominance frontier This adds to the worklist ! Use a checklist to avoid putting blocks on the worklist twice; keep another checklist to avoid inserting the same Ø-function twice. Compute list of blocks where each name is assigned & use as a worklist Moderately complex *

SSA Construction Algorithm ( Detailed sketch ) 2.Rename variables in a pre-order walk over dominator tree (use an array of stacks, one stack per global name) Staring with the root block, b a.) generate unique names for each Ø-function and push them on the appropriate stacks b.) rewrite each operation in the block i. Rewrite uses of global names with the current version (from the stack) ii. Rewrite definition by inventing & pushing new name c.) fill in Ø-function parameters of successor blocks d.) recurse on b’s children in the dominator tree e.) pop names generated in b from stacks 1 counter per name for subscripts Need the end-of-block name for this path Reset the state *

SSA Construction Algorithm ( Details ) Computing Dominance First step in Ø-function insertion computes dominance Recall that n dominates m iff n is on every path from n 0 to m  Every node dominates itself  n’s immediate dominator is its closest dominator, ID OM (n) † D OM (n 0 ) = { n 0 } D OM (n) = { n }  (  p  preds(n) D OM (p)) Computing DOM These equations form a rapid data-flow framework Iterative algorithm will solve them in d(G) + 3 passes  Each pass does N unions & E intersections,  E is worst case O(N 2 ),  O(N 2 ) work † ID OM (n ) ≠ n, unless n is n 0, by convention. Initially, D OM (n) = N,  n≠n 0

Example B1B1 B2B2 B3B3 B4B4 B5B5 B6B6 B7B7 B0B0 Flow Graph Progress of iterative solution for D OM Results of iterative solution for D OM Figure 9.3, EaC*

Example Dominance Tree Progress of iterative solution for D OM Results of iterative solution for D OM B1B1 B2B2 B3B3 B4B4 B5B5 B6B6 B7B7 B0B0 There are asymptotically faster algorithms. With the right data structures, the iterative algorithm can be made faster than them. Data structure in Chapter 9 Cooper, Harvey, Kennedy; to appear

Example Dominance Frontiers Dominance Frontiers & Ø-Function Insertion A definition at n forces a Ø-function at m iff n  D OM (m) but n  D OM (p) for some p  preds(m) DF(n ) is fringe just beyond region n dominates B1B1 B2B2 B3B3 B4B4 B5B5 B6B6 B7B7 B0B0 *  in 1 forces Ø-function in DF(1) = Ø (halt ) x ... x  Ø(...) DF(4) is {6}, so  in 4 forces Ø-function in 6 x  Ø(...)  in 6 forces Ø-function in DF(6) = {7} x  Ø(...)  in 7 forces Ø-function in DF(7) = {1} For each assignment, we insert the Ø-functions

Example Dominance Frontiers Computing Dominance Frontiers Only join points are in DF(n) for some n Leads to a simple, intuitive algorithm for computing dominance frontiers For each join point x ( i.e., |preds(x)| > 1 ) For each CFG predecessor of x Run up to ID OM (x ) in the dominator tree, adding x to DF(n) for each n between x and ID OM (x ) B1B1 B2B2 B3B3 B4B4 B5B5 B6B6 B7B7 B0B0 * For some applications, we need post-dominance, the post-dominator tree, and reverse dominance frontiers, RDF(n ) > Just dominance on the reverse CFG > Reverse the edges & add unique exit node We will use these in dead code elimination

SSA Construction Algorithm ( Reminder ) 1.Insert Ø-functions at every join for every name a.) calculate dominance frontiers b.) find global names for each name, build a list of blocks that define it c.) insert Ø-functions  global name n  block b in which n is assigned  block d in b’s dominance frontier insert a Ø-function for n in d add d to n’s list of defining blocks * Needs a little more detail

SSA Construction Algorithm Finding global names Different between two forms of SSA Minimal uses all names Semi-pruned uses names that are live on entry to some block  Shrinks name space & number of Ø-functions  Pays for itself in compile-time speed For each “global name”, need a list of blocks where it is defined  Drives Ø-function insertion  b defines x implies a Ø-function for x in every c  DF(b) Pruned SSA adds a test to see if x is live at insertion point Otherwise, cannot need a Ø-function

a  Ø(a,a) b  Ø(b,b) c  Ø(c,c) d  Ø(d,d) y  a+b z  c+d i  i+1 B7B7 i > 100 i  B0B0 b  c  d  B2B2 a  Ø(a,a) b  Ø(b,b) c  Ø(c,c) d  Ø(d,d) i  Ø(i,i) a  c  B1B1 a  d  B3B3 B4B4 c  B5B5 d  Ø(d,d) c  Ø(c,c) b  B6B6 i > 100 With all the Ø-functions Lots of new ops Renaming is next Assume a, b, c, & d defined before B 0 Example Excluding local names avoids Ø’s for y & z

SSA Construction Algorithm ( Details ) 2.Rename variables in a pre-order walk over dominator tree (use an array of stacks, one stack per global name) Staring with the root block, b a.) generate unique names for each Ø-function and push them on the appropriate stacks b.) rewrite each operation in the block i. Rewrite uses of global names with the current version (from the stack) ii. Rewrite definition by inventing & pushing new name c.) fill in Ø-function parameters of successor blocks d.) recurse on b’s children in the dominator tree e.) pop names generated in b from stacks 1 counter per name for subscripts Need the end-of-block name for this path Reset the state

SSA Construction Algorithm ( Details ) for each global name i counter[i]  0 stack[i]  Ø call Rename(n 0 ) NewName(n) i  counter[n] counter[n]  counter[n] + 1 push n i onto stack[n] return n i Rename(b) for each Ø-function in b, x  Ø(…) rename x as NewName(x) for each operation “x  y op z” in b rewrite y as top(stack[y]) rewrite z as top(stack[z]) rewrite x as NewName(x) for each successor of b in the CFG rewrite appropriate Ø parameters for each successor s of b in dom. tree Rename(s) for each operation “x  y op z” in b pop(stack[x]) Adding all the details...

a  Ø(a,a) b  Ø(b,b) c  Ø(c,c) d  Ø(d,d) y  a+b z  c+d i  i+1 B7B7 i > 100 i  B0B0 b  c  d  B2B2 a  Ø(a,a) b  Ø(b,b) c  Ø(c,c) d  Ø(d,d) i  Ø(i,i) a  c  B1B1 a  d  B3B3 B4B4 c  B5B5 d  Ø(d,d) c  Ø(c,c) b  B6B6 i > 100 Counters Stacks 11110 a a0a0 b0b0 c0c0 d0d0 Before processing B 0 bcdi Assume a, b, c, & d defined before B 0 i has not been defined Example *

a  Ø(a,a) b  Ø(b,b) c  Ø(c,c) d  Ø(d,d) y  a+b z  c+d i  i+1 B7B7 i > 100 i0  i0  B0B0 b  c  d  B2B2 a  Ø(a 0,a) b  Ø(b 0,b) c  Ø(c 0,c) d  Ø(d 0,d) i  Ø(i 0,i) a  c  B1B1 a  d  B3B3 B4B4 c  B5B5 d  Ø(d,d) c  Ø(c,c) b  B6B6 i > 100 Counters Stacks 11111 abcdi a0a0 b0b0 c0c0 d0d0 End of B 0 i0i0 Example *

a  Ø(a,a) b  Ø(b,b) c  Ø(c,c) d  Ø(d,d) y  a+b z  c+d i  i+1 B7B7 i > 100 i0  i0  B0B0 b  c  d  B2B2 a 1  Ø(a 0,a) b 1  Ø(b 0,b) c 1  Ø(c 0,c) d 1  Ø(d 0,d) i 1  Ø(i 0,i) a 2  c 2  B1B1 a  d  B3B3 B4B4 c  B5B5 d  Ø(d,d) c  Ø(c,c) b  B6B6 i > 100 Counters Stacks 32322 abcdi a0a0 b0b0 c0c0 d0d0 End of B 1 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 Example *

a  Ø(a 2,a) b  Ø(b 2,b) c  Ø(c 3,c) d  Ø(d 2,d) y  a+b z  c+d i  i+1 B7B7 i > 100 i0  i0  B0B0 b2  c3  d2  b2  c3  d2  B2B2 a 1  Ø(a 0,a) b 1  Ø(b 0,b) c 1  Ø(c 0,c) d 1  Ø(d 0,d) i 1  Ø(i 0,i) a 2  c 2  B1B1 a  d  B3B3 B4B4 c  B5B5 d  Ø(d,d) c  Ø(c,c) b  B6B6 i > 100 Counters Stacks 33432 abcdi a0a0 b0b0 c0c0 d0d0 End of B 2 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 b2b2 d2d2 c3c3 Example *

a  Ø(a 2,a) b  Ø(b 2,b) c  Ø(c 3,c) d  Ø(d 2,d) y  a+b z  c+d i  i+1 B7B7 i > 100 i0  i0  B0B0 b2  c3  d2  b2  c3  d2  B2B2 a 1  Ø(a 0,a) b 1  Ø(b 0,b) c 1  Ø(c 0,c) d 1  Ø(d 0,d) i 1  Ø(i 0,i) a 2  c 2  B1B1 a  d  B3B3 B4B4 c  B5B5 d  Ø(d,d) c  Ø(c,c) b  B6B6 i > 100 i ≤ 100 Counters Stacks 33432 abcdi a0a0 b0b0 c0c0 d0d0 Before starting B 3 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 Example *

a  Ø(a 2,a) b  Ø(b 2,b) c  Ø(c 3,c) d  Ø(d 2,d) y  a+b z  c+d i  i+1 B7B7 i > 100 i0  i0  B0B0 b2  c3  d2  b2  c3  d2  B2B2 a 1  Ø(a 0,a) b 1  Ø(b 0,b) c 1  Ø(c 0,c) d 1  Ø(d 0,d) i 1  Ø(i 0,i) a 2  c 2  B1B1 a3  d3  a3  d3  B3B3 d  B4B4 c  B5B5 d  Ø(d,d) c  Ø(c,c) b  B6B6 i > 100 Counters Stacks 43442 abcdi a0a0 b0b0 c0c0 d0d0 End of B 3 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 a3a3 d3d3 Example *

a  Ø(a 2,a) b  Ø(b 2,b) c  Ø(c 3,c) d  Ø(d 2,d) y  a+b z  c+d i  i+1 B7B7 i > 100 i0  i0  B0B0 b2  c3  d2  b2  c3  d2  B2B2 a 1  Ø(a 0,a) b 1  Ø(b 0,b) c 1  Ø(c 0,c) d 1  Ø(d 0,d) i 1  Ø(i 0,i) a 2  c 2  B1B1 a3  d3  a3  d3  B3B3 d4  d4  B4B4 c  B5B5 d  Ø(d 4,d) c  Ø(c 2,c) b  B6B6 i > 100 Counters Stacks 43452 abcdi a0a0 b0b0 c0c0 d0d0 End of B 4 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 a3a3 d3d3 d4d4 Example *

a  Ø(a 2,a) b  Ø(b 2,b) c  Ø(c 3,c) d  Ø(d 2,d) y  a+b z  c+d i  i+1 B7B7 i > 100 i0  i0  B0B0 b2  c3  d2  b2  c3  d2  B2B2 a 1  Ø(a 0,a) b 1  Ø(b 0,b) c 1  Ø(c 0,c) d 1  Ø(d 0,d) i 1  Ø(i 0,i) a 2  c 2  B1B1 a3  d3  a3  d3  B3B3 d4  d4  B4B4 c4  c4  B5B5 d  Ø(d 4,d 3 ) c  Ø(c 2,c 4 ) b  B6B6 i > 100 Counters Stacks 43552 abcdi a0a0 b0b0 c0c0 d0d0 End of B 5 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 a3a3 d3d3 c4c4 Example *

a  Ø(a 2,a 3 ) b  Ø(b 2,b 3 ) c  Ø(c 3,c 5 ) d  Ø(d 2,d 5 ) y  a+b z  c+d i  i+1 B7B7 i > 100 i0  i0  B0B0 b2  c3  d2  b2  c3  d2  B2B2 a 1  Ø(a 0,a) b 1  Ø(b 0,b) c 1  Ø(c 0,c) d 1  Ø(d 0,d) i 1  Ø(i 0,i) a 2  c 2  B1B1 a3  d3  a3  d3  B3B3 d4  d4  B4B4 c4  c4  B5B5 d 5  Ø(d 4,d 3 ) c 5  Ø(c 2,c 4 ) b 3  B6B6 i > 100 Counters Stacks 44662 abcdi a0a0 b0b0 c0c0 d0d0 End of B 6 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 a3a3 d3d3 c5c5 d5d5 b3b3 Example *

a  Ø(a 2,a 3 ) b  Ø(b 2,b 3 ) c  Ø(c 3,c 5 ) d  Ø(d 2,d 5 ) y  a+b z  c+d i  i+1 B7B7 i > 100 i0  i0  B0B0 b2  c3  d2  b2  c3  d2  B2B2 a 1  Ø(a 0,a) b 1  Ø(b 0,b) c 1  Ø(c 0,c) d 1  Ø(d 0,d) i 1  Ø(i 0,i) a 2  c 2  B1B1 a3  d3  a3  d3  B3B3 d4  d4  B4B4 c4  c4  B5B5 d 5  Ø(d 4,d 3 ) c 5  Ø(c 2,c 4 ) b 3  B6B6 i > 100 Counters Stacks 44662 abcdi a0a0 b0b0 c0c0 d0d0 Before B 7 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 Example *

a 4  Ø(a 2,a 3 ) b 4  Ø(b 2,b 3 ) c 6  Ø(c 3,c 5 ) d 6  Ø(d 2,d 5 ) y  a 4 +b 4 z  c 6 +d 6 i 2  i 1 +1 B7B7 i > 100 i0  i0  B0B0 b2  c3  d2  b2  c3  d2  B2B2 a 1  Ø(a 0,a 4 ) b 1  Ø(b 0,b 4 ) c 1  Ø(c 0,c 6 ) d 1  Ø(d 0,d 6 ) i 1  Ø(i 0,i 2 ) a 2  c 2  B1B1 a3  d3  a3  d3  B3B3 d4  d4  B4B4 c4  c4  B5B5 d 5  Ø(d 4,d 3 ) c 5  Ø(c 2,c 4 ) b 3  B6B6 i > 100 Counters Stacks 55773 abcdi a0a0 b0b0 c0c0 d0d0 End of B 7 i0i0 a1a1 b1b1 c1c1 d1d1 i1i1 a2a2 c2c2 a4a4 b4b4 c6c6 d6d6 i2i2 Example *

a 4  Ø(a 2,a 3 ) b 4  Ø(b 2,b 3 ) c 6  Ø(c 3,c 5 ) d 6  Ø(d 2,d 5 ) y  a 4 +b 4 z  c 6 +d 6 i 2  i 1 +1 B7B7 i > 100 i0  i0  B0B0 b2  c3  d2  b2  c3  d2  B2B2 a 1  Ø(a 0,a 4 ) b 1  Ø(b 0,b 4 ) c 1  Ø(c 0,c 6 ) d 1  Ø(d 0,d 6 ) i 1  Ø(i 0,i 2 ) a 2  c 2  B1B1 a3  d3  a3  d3  B3B3 d4  d4  B4B4 c4  c4  B5B5 d 5  Ø(d 4,d 3 ) c 5  Ø(c 2,c 4 ) b 3  B6B6 i > 100 Counters Stacks Example After renaming Semi-pruned SSA form We’re done … Semi-pruned  only names live in 2 or more blocks are “global names”.

SSA Construction Algorithm ( Pruned SSA ) What’s this “pruned SSA” stuff? Minimal SSA still contains extraneous Ø-functions Inserts some Ø-functions where they are dead Would like to avoid inserting them Two ideas Semi-pruned SSA: discard names used in only one block  Significant reduction in total number of Ø-functions  Needs only local Live information ( cheap to compute ) Pruned SSA: only insert Ø-functions where their value is live  Inserts even fewer Ø-functions, but costs more to do  Requires global Live variable analysis ( more expensive ) In practice, both are simple modifications to step 1.

SSA Construction Algorithm We can improve the stack management Push at most one name per stack per block ( save push & pop ) Thread names together by block To pop names for block b, use b’s thread This is another good use for a scoped hash table Significant reductions in pops and pushes Makes a minor difference in SSA construction time Scoped table is a clean, clear way to handle the problem

SSA Deconstruction At some point, we need executable code Few machines implement Ø operations Need to fix up the flow of values Basic idea Insert copies Ø-function pred’s Simple algorithm  Works in most cases Adds lots of copies  Most of them coalesce away X 17  Ø(x 10,x 11 )...  x 17......  x 17 X 17  x 10 X 17  x 11 *

Extra Slides Start Here

Building SSA Form (A mildly abridged account) For the full story, see the lecture notes for COMP 512 (lecture 8) and.

Similar presentations

Presentation on theme: "Building SSA Form (A mildly abridged account) For the full story, see the lecture notes for COMP 512 (lecture 8) and."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Building SSA Form (A mildly abridged account) For the full story, see the lecture notes for COMP 512 (lecture 8) and.

Similar presentations

Presentation on theme: "Building SSA Form (A mildly abridged account) For the full story, see the lecture notes for COMP 512 (lecture 8) and."— Presentation transcript:

Similar presentations

About project

Feedback