Presentation is loading. Please wait.

Presentation is loading. Please wait.

Terminology, Principles, and Concerns, IV With examples from LIVE and global block positioning Copyright 2011, Keith D. Cooper & Linda Torczon, all rights.

Similar presentations


Presentation on theme: "Terminology, Principles, and Concerns, IV With examples from LIVE and global block positioning Copyright 2011, Keith D. Cooper & Linda Torczon, all rights."— Presentation transcript:

1 Terminology, Principles, and Concerns, IV With examples from LIVE and global block positioning Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved. Comp 512 Spring 2011

2 Last Lecture Computing dominance information  Quick introduction to global data-flow analysis  That is compile-time reasoning about the runtime flow of values  Round-robin iterative algorithm to find MOP solution to DOM Using immediate dominators to improve on SVN  For a node n, start LVN with the hash table from IDOM(n)  Includes results from each predecessor in dominator tree  Use scoped hash table and SSA names to simplify algorithm  Some predecessor information at each node in CFG COMP 512, Rice University2

3 This Lecture Examples of Global Analysis and Transformation Computing live variables  Classic backwards global data-flow problem  Used in SSA construction, in register allocation Using live information to eliminate useless stores  Simple demonstration of the use of LIVE Single-procedure block placement algorithm (Pettis & Hansen)  Arrange the blocks to maximize fall-through branches  Improves code locality as a natural consequence COMP 512, Rice University3

4 Computing Live Information A value v is live at p if  a path from p to some use of v along which v is not re-defined Data-flow problems are expressed as simultaneous equations Annotate each block with sets LIVEOUT and LIVEIN LIVEOUT(b) =  s  succ(b) LIVEIN(s) LIVEIN(b) = UEVAR(b)  (LIVEOUT(b)  VARKILL(b)) LIVEOUT(n f ) =  where UEVAR(b) is the set of names used in block b before being defined in b VARKILL(b) is the set of names defined in b § 9.2.1 in EaC Domain of LIVEOUT is variables COMP 512, Rice University4 Note that LIVE is a backwards data-flow problem

5 Computing Live Information The compiler can solve these equations with a simple algorithm The world’s quickest introduction to data-flow analysis ! WorkList  { all blocks } while ( WorkList ≠ Ø) remove a block b from WorkList Compute LIVEOUT(b) Compute LIVEIN(b) if LIVEIN(b) changed then add pred (b) to WorkList The Worklist Iterative Algorithm Why does this work?  LIVEOUT, LIVEIN  2 Names  UEVAR & VARKILL are constants for b  Equations are monotone  Finite # of additions to sets  will reach a fixed point ! Speed of convergence depends on the order in which blocks are “removed” & their sets recomputed COMP 512, Rice University5 The worklist should be implemented as a set so that it does not contain duplicate entries.

6 COMP 512, Rice University6 Using Live Information: Eliminating Unneeded Stores Transformation: Eliminating unneeded stores Value in a register, have seen last definition, never again used The store is dead ( except for debugging ) Compiler can eliminate the store The Plan: Solve for LIVEIN and LIVEOUT Walk through each block, bottom to top  Compute local LIVE incrementally  If target of STORE operation is not in LIVE, delete the STORE If all STOREs to a local variable are eliminated, can delete the space for it from the activation record |L IVE O UT | = |variables|

7 Using LIVE Information: Eliminating Unneeded Stores Safety If x ∉ LIVE(s) at some STORE s, its value is not used along any path from s to the exit node of the CFG  Its value is not read and is, therefore, dead Relies on the correctness of LIVE Profitability Assumes that not executing a STORE costs less than executing it Opportunity Linear search, block-by-block, for STORE operations  Could build a list of them while computing initial UEVAR set COMP 512, Rice University7

8 Block Placement The order of blocks in memory matters Bad placement can increase working set size ( TLB & page misses ) Fall-through and branch-taken paths differ in cost & locality The plan Discover which paths execute frequently Rearrange blocks to keep those paths in contiguous memory Finding hot paths Need execution profile information COMP 512, Rice University8

9 9 Block Placement Targets branches with unequal execution frequencies Make likely case the “fall through” case Move unlikely case out-of-line & out-of-sight Potential benefits Longer branch-free code sequences More executed operations per cache line Denser instruction stream  fewer cache misses Moving unlikely code  denser page use & fewer page faults

10 COMP 512, Rice University10 Block Placement Moving infrequently executed code B1B1 B4B4 B2B2 B3B3 B1B1 B4B4 B2B2 B3B3 1000 1 1 Unlikely path gets fall through (cheap) case Likely path gets an extra branch Would like this to become B1B1 B4B4 B3B3 B2B2 Long distance In another page,... This branch goes away Denser instruction stream *

11 COMP 512, Rice University11 Block Placement Overview 1. Build chains of frequently executed paths  Work from profile data  Edge profiles are better than node profiles  Combine blocks with a simple greedy algorithm 2. Lay out the code so that chains follow short forward branches Gathering profile data Instrument the executable Statistical sampling Infer edge counts from performance count data While precision is desirable, a good approximation will probably work well.

12 COMP 512, Rice University12 Block Placement The Idea Form chains that should be placed to form straight-line code First step: Build the chains 1.Make each block a degenerate chain & set its priority to # blocks 2.P  1 3.  edge e = in the CFG, in order by decreasing frequency if x is the tail of a chain, a, and y is the head of a chain, b then merge a and b else set priority(y) to min(priority(y),P ++) Point is to place targets after their sources, to make forward branches PA-RISC predicted most forward branches as taken, backward as not taken Check code against Chapter 8

13 COMP 512, Rice University13 Block Placement Second step: Lay out the code WorkList  chain containing the entry node, n 0 While (WorkList ≠ Ø) Pick the chain c with lowest priority(c) from WorkList Place it next in the code  edge leaving c add z to WorkList Intuitions Entry node first Tries to make edge from chain i to chain j a forward branch  Predicted as taken on target machine  Edge remains only if it is lower probability choice Replace with corrected algorithm from Chapter 8

14 COMP 512, Rice University14 Going Further – Procedure Splitting Any code that has profile count of zero (0) is “fluff” Move fluff into the distance  It rarely executes  Get more useful operations into I cache  Increase effective density of I cache Slower execution for rarely executed code Implementation Create a linkage-less procedure with an invented name Give it a priority that the linker will sort to the code’s end Replace original branch with a 0-profile branch to a 0-profile call  Cause linkage code to move to end of procedure to maintain density Branch to fluff becomes short branch to long branch. Block with long branch gets sorted to end of current procedure. *

15 COMP 512, Rice University15 Block Placement Safety Changing position of code, not values it computes Barring bugs in implementation, should be safe Profitability More fall-through branches Where possible, more compiler-predicted branches Better code locality Opportunity Profile data shows high-frequency edges Looks at all blocks and edges in transformation – O(N+E ) Many transformations have an O(N+E ) component

16 Transformations We Have Seen COMP 512, Rice University16 ScopeNameAnalysisEffect Local LVNIncremental Redundancy, constants, & identities BalancingLIVE info.Enhance ILP Regional Superlocal VNCFG, EBBs Redundancy, constants, & identities Dominator VN CFG, DOM info. Global Dead store elim.LIVE info.Eliminate dead store Block placementCFG, Profiles Code locality & branch straightening Interprocedural Inline subs’n Proc. placement On Monday

17 End of Lecture Extra Slides Begin Here COMP 512, Rice University17

18 COMP 512, Rice University18 Iterative Data-flow Analysis Any finite semilattice is bounded Some infinite semilattices are bounded -.002 … -.001 … 0 ….001 ….002 … Real constants Termination If every f n  F is monotone, i.e., x ≤ y  f(x) ≤ f(y), and If the lattice is bounded, i.e., every descending chain is finite  Chain is sequence x 1, x 2, …, x n where x i  L, 1 ≤ i ≤ n  x i > x i+1, 1 ≤ i < n  chain is descending Then The set at each node can only change a finite number of times The iterative algorithm must halt on an instance of the problem  Finite lattice, bounded descending chains, & monotone functions  termination

19 COMP 512, Rice University19 Iterative Data-flow Analysis Correctness ( What does it compute? ) If every f n  F is monotone, i.e., x ≤ y  f(x) ≤ f(y), and If the semilattice is bounded, i.e., every descending chain is finite  Chain is sequence x 1, x 2, …, x n where x i  L, 1 ≤ i ≤ n  x i > x i+1, 1 ≤ i < n  chain is descending Given a bounded semilattice S and a monotone function space F  k such that f k (  ) = f j (  )  j > k f k (  ) is called the least fixed-point of f over S If L has a T, then  k such that f k ( T ) = f j ( T )  j > k and f k ( T ) is called the maximal fixed-point of f over S optimism f k (x) is the application of f to x k times

20 COMP 512, Rice University20 Iterative Data-flow Analysis Correctness If every f n  F is monotone, i.e., f(x  y) ≤ f(x)  f(y), and If the lattice is bounded, i.e., every descending chain is finite  Chain is sequence x 1, x 2, …, x n where x i  L, 1 ≤ i ≤ n  x i > x i+1, 1 ≤ i < n  chain is descending Then The round-robin algorithm computes a least fixed-point ( LFP ) The uniqueness of the solution depends on other properties of F Unique solution  it finds the one we want Multiple solutions  we need to know which one it finds

21 COMP 512, Rice University21 Iterative Data-flow Analysis Correctness Does the iterative algorithm compute the desired answer? Admissible Function Spaces 1.  f  F,  x,y  L, f (x  y) = f (x)  f (y) 2.  f i  F such that  x  L, f i (x) = x 3.f,g  F  h  F such that h(x ) = f (g(x)) 4.  x  L,  a finite subset H  F such that x =  f  H f (  ) If F meets these four conditions, then an instance of the problem will have a unique fixed point solution (instance  graph + initial values)  LFP = MFP = MOP  order of evaluation does not matter Not distributive  fixed point solution may not be unique *

22 COMP 512, Rice University22 Iterative Data-flow Analysis If a data-flow framework meets those admissibility conditions then it has a unique fixed-point solution The iterative algorithm finds the (best) answer The solution does not depend on order of computation Algorithm can choose an order that converges quickly Intuition Choose an order so that changes propagate as far as possible on each “sweep”  Process a node’s predecessors before the node Cycles pose problems, of course  Ignore back edges when computing the order? *

23 COMP 512, Rice University23 Ordering the Nodes to Maximize Propagation 23 4 1 Postorder 32 1 4 Reverse Postorder Reverse postorder visits predecessors before visiting a node Use reverse preorder for backward problems  Reverse postorder on reverse CFG is reverse preorder N+1 - postorder number

24 COMP 512, Rice University24 Iterative Data-flow Analysis Speed For a problem with an admissible function space & a bounded semilattice, If the functions all meet the rapid condition, i.e.,  f,g  F,  x  L, f (g(  )) ≥ g(  )  f (x)  x then, a round-robin, reverse-postorder iterative algorithm will halt in d(G)+3 passes over a graph G d(G) is the loop-connectedness of the graph w.r.t a DFST  Maximal number of back edges in an acyclic path  Several studies suggest that, in practice, d(G) is small ( <3 )  For most CFGs, d(G) is independent of the specific DFST Sets stabilize in two passes around a loop Each pass does O(E ) meets & O(N ) other operations *

25 COMP 512, Rice University25 Iterative Data-flow analysis What does this mean? Reverse postorder  Easily computed order that increases propagation per pass Round-robin iterative algorithm  Visit all the nodes in a consistent order ( RPO )  Do it again until the sets stop changing Rapid condition  Most classic global data-flow problems meet this condition These conditions are easily met  Admissible framework, rapid function space  Round-robin, reverse-postorder, iterative algorithm  The analysis runs in ( effectively ) linear time

26 COMP 512, Rice University26 Some problems are not admissible Global constant propagation First condition in admissibility  f  F,  x,y  L, f (x  y) = f (x)  f (y) Constant propagation is not admissible  Kam & Ullman time bound does not hold  There are tight time bounds, however, based on lattice height  Require a variable-by-variable formulation … a  b + c Function “f” models block’s effects f( S1 ) = {a=7,b=3,c=4} f( S2 ) = {a=7,b=1,c=6} f(S1  S2) = Ø S1 : {b=3,c=4} S2 : {b=1,c=6}

27 COMP 512, Rice University27 Some admissible problems are not rapid Interprocedural May Modify sets Iterations proportional to number of parameters  Not a function of the call graph  Can make example arbitrarily bad Proportional to length of chain of bindings… shift(a,b,c,d,e,f) { local t; … call shift(t,a,b,c,d,e); f = 1; … } Assume call-by-reference Compute the set of variables (in shift) that can be modified by a call to shift How long does it take? shift abcdef Nothing to do with d(G)


Download ppt "Terminology, Principles, and Concerns, IV With examples from LIVE and global block positioning Copyright 2011, Keith D. Cooper & Linda Torczon, all rights."

Similar presentations


Ads by Google