# Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.

## Presentation on theme: "Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses."— Presentation transcript:

Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses and applications Software Testing Dynamic Program Analysis

Outline Local analysis vs. global analysis Introduction to data-flow analysis The four classical data-flow problems –Reaching definitions –Live variables –Available expressions –Very busy expressions Reading: Compilers: Principles, Techniques and Tools, by Aho, Lam, Sethi and Ullman, Chapter 9.2

Local Analysis vs. Global Analysis Local analysis: analysis on a basic block –Enables optimizations such as local common subexpression elimination, dead code elimination, constant propagation, copy propagation, etc. Global analysis: beyond the basic block –Enables optimizations such as global common subexpression elimination, dead code elimination, constant propagation, loop optimizations, etc.

Local Analysis: Local Common Subexpression Elimination 1.a = y+2 y+2 is available after the execution of statement 1 2.z = x+w y+2,x+w 3.x = y+2 y+2 ( y+2 is available in a, but x+w is no longer available) 4.z = b+c y+2,b+c 5.b = y+2 y+2 ( y+2 is available in a, but b+c is no longer available)

Local Analysis: Dead Code Elimination 1.a = y+2 (a,1) 2.z = x+w (a,1),(z,2) 3.x = a (a,1),(z,2),(x,3) 4.z = b+c (a,1),(x,3),(z,4) z is redefined at 4, and was never used on the way from 2 to 4; thus 2.z=x+w is “dead code” 5.b = a (a,1),(x,3),(z,4), (b,5)

Local Analysis vs. Global Analysis Local analysis is easy – we need to take into account a single path, from basic block entry to basic block exit Global analysis is harder – we need to take into account multiple paths, across basic blocks

Introduction to Data-flow Analysis Collects information about the flow of data along execution paths –Loops (control goes back) –Control splits and control merges Data-flow information Data-flow analysis

Data-flow Analysis 1 2 3 4 5 6 7 8 9 10 Control-flow graph: G = (N, E, ρ) Data-flow equations (or transfer functions): out(i) = gen(i) (in(i) – kill(i)) For simplicity, we assume that each node (i.e., basic block) has a single statement, and define equations over single statements. Entry node ρ:

Four Classical Data-flow Problems Reaching definitions (Reach) Live uses of variables (Live) Available expressions (Avail) Very busy expressions (VeryB) Def-use chains built from Reach, and the dual Use-def chains, built from Live, play role in many optimizations Avail enables global common subexpression elimination VeryB is used for conservative code motion

Reaching Definitions Definition A statement that may change the value of a variable (e.g., x = i+5 ) A definition of a variable x at node k reaches node n if there is a path from k to n, clear of a definition of x. k n x = … … = x x = …

Live Uses of Variables Use Appearance of a variable as an operand of a 3-address statement (e.g., x in y=x+4 ) A use of a variable x at node n is live on exit from k, if there is a path from k to n clear of definition of x. k n x = … … = x x = …

Def-use Relations Use-def chain links an use to a definition that reaches that use Def-use chain links a definition to a use that it reaches k n x = … … = x x = …

Optimizations Enabled Dead code elimination (Def-use) Code motion (Use-def) Constant propagation (Use-def) Strength reduction (Use-def) Test elision (Use-def) Copy propagation (Def-use)

Dead Code Elimination 1. sum = 0 2. i = 1 3. if i > n goto 15 4. t1 = addr(a)–4 … 5. t2 = i * 4 6. i = i + 1 T F After code motion, strength reduction, test elision and constant propagation, the def- use links from i=1 disappear. Becomes dead code.

Constant Propagation 1. i = 1 2. i = 1 3. i = 2 4. p = i*2 5. i = 1 6. q = 5*i+3 = 8

Classical Data-flow Problems How to formulate the analysis using data-flow equations defined on the control flow graph? Forward and backward data-flow problems May and must data-flow problems out(i) = gen(i) (in(i) – kill(i)) in(i) = gen(i) (out(i) – kill(i)) Forward: Backward:

Problem 1: Reaching Definitions For each CFG node n, compute the set of definitions that reach n. i in(i) = { out(j) | j is predecessor of i } j: a=b+c out(i) = gen(i) (in(i) – kill(i)) kill: all definitions of a gen: this definition of a, (a,j)

Example 1. x:=5 2. y:=1 3. if x<2 then 4. y:=x*y 5. x:=x-1 6. goto 3 7. … in RD (1) = Ø in RD (2) = out RD (1) in RD (3) = out RD (2) out RD (6) in RD (4) = out RD (3) in RD (5) = out RD (4) in RD (6) = out RD (5) in RD (7) = out RD (3) out RD (1) = (in RD (1)-D x ) {(x,1)} out RD (2) = (in RD (2)-D y ) {(y,2)} out RD (3) = in RD (3) out RD (4) = (in RD (4)-D y ) {(y,4)} out RD (5) = (in RD (5)-D x ) {(x,5)} out RD (6) = in RD (6)

Example 1. x:=5 2. y:=1 3. if x<2 then 4. y:=x*y 5. x:=x-1 6. goto 3 7. … in RD (1) = Ø in RD (2) = {(x,1)} in RD (3) = {(x,1),(x,5),(y,2),(y,4)} in RD (4) = {(x,1),(x,5),(y,2),(y,4)} in RD (6) = {(x,5),(y,4)} in RD (7) = {(x,1),(x,5),(y,2),(y,4)} out RD (1) = {(x,1)} out RD (2) = {(x,1), (y,2)} out RD (3) = {(x,1),(x,5),(y,2),(y,4)} out RD (4) = {(x,1),(x,5),(y,4)} in RD (5) = {(x,1),(x,5),(y,4)} out RD (5) = {(x,5),(y,4)} in RD (6) = {(x,5),(y,4)}

Reaching Definitions m1 m2 m3 j in(m1) in(m2) in(m3) in(j) Forward, may dataflow problem

Are these equations equivalent? where: pres(m) is the set of definitions preserved through node m dgen(m) is the set of defs generated at node m pred(j) is the set of immediate predecessors of node j

Problem 2: Live Uses of Variables For each node n, compute the set of variables live on exit from n. i: out LV (i) = { in LV (j) | j is a successor of i } in LV (i)= gen(i) (out LV (i) – kill(i)) 1.x=2; 2. y=4; 3. x=1; (if (y>x) then 5. z=y; else 6. z=y*y); 7. x=z; What variables are live on exit from statement 1? Statement 3? x = y+z Q: What is gen(i)? Q: What is kill(i)?

Example 1. x:=2 2. y:=4 3. x:=1 4. if (y>x) 5. z:=y6. z:=y*y 7. x := z

Live Uses of Variables m1 m2 m3 j out(m1) out(m2) out(m3) out(j) Backward, may dataflow problem

Is this set of equations the same? where: pres(m) is the set of uses preserved through node m (these will correspond to variables whose defs are preserved) ugen(m) is the set of uses generated at node m succ(j) is the set of immediate successors of node j

Problem 3: Available Expressions An expression X op Y is available at program point n if every path from entry to n evaluates X op Y, and after every evaluation prior to reaching n, there are NO subsequent assignments to X or Y X op Y X = … Y = … n ρ

Global Common Subexpressions z=a*b r=2*z q=a*b u=a*b z=u/2 w=a*b

Global Common Subexpressions t1=a*b z=t1 r=2*z t1=a*b q=t1 u=t1 z=u/2 w=a*b Can we eliminate w=a*b?

Available Expressions m1 m2 m3 j Forward, must dataflow problem in AE (j) = ? out AE (j) = ? gen(j) = ? kill(j) = ? x=y+z

Example 1.x = a + b 2.y = a * b 3.if y <= a + b then goto 7 4.a = a + 1 5.x = a + b 6.goto 3 7.…

Problem 4: Very Busy Expressions An expression X op Y is very busy at program point n, if along EVERY path from n, we come to a computation of X op Y BEFORE any redefinition of X or Y. X = … Y = … t1=X op Y n

Very Busy Expressions m1 m2 m3 j VeryB(m1) VeryB(m2) VeryB(m3) VeryB(j)

Very Busy Expressions where: epres(m) is the set of expressions preserved through node m vgen(m) is the set of (upwards exposed) expressions generated at node m succ(j) is the set of immediate successors of node j

Terms Data-flow Analysis Reaching Definitions Live Variables Available Expressions Very Busy Expressions (More later) May-problem, Must problem, Forward problem, Backward problem

Download ppt "Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses."

Similar presentations