# Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.

## Presentation on theme: "Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University."— Presentation transcript:

Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University

2 Static Analysis Interesting Questions  Is every statement reachable?  Does every non- void method return a value?  Will local variables definitely be assigned before they are read?  Will the current value of a variable ever be read? ...  How much heap space will the program need?  Does the program always terminate?  Will the output always be correct?

3 Static Analysis Rice’s Theorem Theorem 11.9 (Martin p. 420) If R is a property of languages that is satisfied by some but not all recursively enumerable languages then the decision problem P R : Given a TM T, does L(T) have property R? is unsolvable.

4 Static Analysis Rice’s Theorem Explained Theorem 11.9 (Martin, p. 420) "Every non-trivial question about the behavior of a program is undecidable."

5 Static Analysis  Static analysis provides approximate answers to non-trivial questions about programs  The approximation is conservative, meaning that the answers only err to one side  Compilers spend most of their time performing static analysis so they may: understand the semantics of programs provide safety guarantees generate efficient code

6 Static Analysis Conservative Approximation  A typical scenario for a boolean property: if the analysis says yes, the property definitely holds if it says no, the property may or may not hold only the yes answer will help the compiler a trivial analysis will say no always the engineering challenge is to say yes often enough  For other kinds of properties, the notion of approximation may be more subtle

7 Static Analysis A Range of Static Analyses  Static analysis may take place: at the source code level at some intermediate level at the machine code level  Static analysis may look at: statement blocks only an entire method (intraprocedural) the whole program (interprocedural)  The precision and cost both rise as we include more information

8 Static Analysis The Phases of GCC (1/2) Parsing Tree optimization RTL generation Sibling call optimization Jump optimization Register scan Jump threading Common subexpression elimination Loop optimizations Jump bypassing Data flow analysis Instruction combination If-conversion Register movement Instruction scheduling Register allocation Basic block reordering Delayed branch scheduling Branch shortening Assembly output Debugging output

9 Static Analysis The Phases of GCC (2/2) Parsing Tree optimization RTL generation Sibling call optimization Jump optimization Register scan Jump threading Common subexpression elimination Loop optimizations Jump bypassing Data flow analysis Instruction combination If-conversion Register movement Instruction scheduling Register allocation Basic block reordering Delayed branch scheduling Branch shortening Assembly output Debugging output Static analysis uses 60% of the compilation time

10 Static Analysis Reachability Analysis  Java requires two reachability guarantees: all statements must be reachable (avoid dead code) all non- void methods must return a value  These are non-trivial properties and thus they are undecidable  But a static analysis may provide conservative approximations  To ensure that different compilers accept the same programs, the Java language specification mandates a specific static analysis

11 Static Analysis Constraint-Based Analysis  For every node S that represents a statement in the AST, we define two boolean properties: C[[S]] denotes that S may complete normally R[[S]] denotes that S is possibly reachable  A statement may only complete if it is reachable  For each syntactic kind of statement, we generate constraints that relate C[[...]] and R[[...]]

12 Static Analysis Information Flow  The values of R[[...]] are inherited  The values of C[[...]] are synthesized AST R C

13 Static Analysis Reachability Constraints (1/3) if( E ) S: R[[S]] = R[[ if( E ) S]] C[[ if ( E ) S]] = R[[ if( E ) S]] if( E ) S 1 else S 2 : R[[S i ]] = R[[ if( E ) S 1 else S 2 ]] C[[ if( E ) S 1 else S 2 ]] = C[[S 1 ]]  C[[S 2 ]] while(true) S: R[[S]] = R[[ while(true) S]] C[[ while(true) S]] = false while(false) S: R[[S]] = false C[[ while(false) S]] = R[[ while(false) S]]

14 Static Analysis Reachability Constraints (2/3) while( E ) S: R[[S]] = R[[ while( E ) S]] C[[ while( E ) S]] = R[[ while( E ) S]] return : C[[ return ]] = false return E: C[[ return E]] = false throw E: C[[ throw E]] = false { σ x ; S } : R[[S]] = R[[ { σ x ; S } ]] C[[ { σ x ; S } ]] = C[[S]]

15 Static Analysis Reachability Constraints (3/3) S 1 S 2 : R[[S 1 ]] = R[[S 1 S 2 ]] R[[S 2 ]] = C[[S 1 ]] C[[S 1 S 2 ]] = C[[S 2 ]] for any simple statement S: C[[S]] = R[[S]] for any method or constructor body { S } : R[[S]] = true

16 Static Analysis Exploiting the Information  For any statement S where R[[S]] = false: unreachable statement  For any non- void method with body { S } where C[[S]] = true: missing return statement  These guarantees are sound but conservative

17 Static Analysis Approximations  C[[S]] may be true too often: some unfair missing return errors may occur if (b) return 17; if (!b) return 42;  R[[S]] may be true too often: some dead code is not detected if (b==!b) {... }

18 Static Analysis Definite Assignment Analysis  Java requires that a local variable is assigned before its value is used  This is a non-trivial property and thus it is undecidable  But a static analysis may provide a conservative approximation  To ensure that different compilers accept the same programs, the Java language specification mandates a specific static analysis

19 Static Analysis Constraint-Based Analysis  For every node S that represents a statement in the AST, we define some set-valued properties: B[[S]] denotes the variables that are definitely assigned before S is executed A[[S]] denotes the variables that are definitely assigned after S is executed  For every node E that represents an expression in the AST, we similarly define B[[E]] and A[[E]]

20 Static Analysis Increased Precision  To handle cases such as: { int k; if (a>0 && (k=System.in.read())>0) System.out.print(k); } we also use two refinements of A[[...]]: A t [[E]] which assumes that E evaluates to true A f [[E]] which assumes that E evaluates to false

21 Static Analysis Information Flow  The values of B[[...]] are inherited  The values of A[[...]], A t [[....]] and A f [[...]] are synthesized AST B A, A t, A f

22 Static Analysis Definite Assignment Constraints (1/7) if( E ) S: B[[E]] = B[[ if( E ) S]] B[[S]] = A t [[E]] A[[ if( E ) S]] = A[[S]]  A f [[E]] if( E ) S 1 else S 2 : B[[E]] = B[[ if( E ) S 1 else S 2 ]] B[[S 1 ]] = A t [[E]] B[[S 2 ]] = A f [[E]] A[[ if( E ) S 1 else S 2 ]] = A[[S 1 ]]  A[[S 2 ]]

23 Static Analysis Definite Assignment Constraints (2/7) while( E ) S: B[[E]] = B[[ while( E ) S]] B[[S]] = A t [[E]] A[[ while( E ) S]] = A f [[E]] return : A[[ return ]] =  return E: B[[E]] = B[[ return E]] A[[ return E]] =  throw E: B[[E]] = B[[ throw E]] A[[ throw E]] =  the set of all variables in scope

24 Static Analysis Definite Assignment Constraints (3/7) E ; : B[[E]] = B[[E ; ]] A[[E ; ]] = A[[E]] { σ x = E ; S } : B[[E]] = B[[ { σ x = E ; S } ]] B[[S]] = A[[E]]  {x} A[[ { σ x = E ; S } ]] = A[[S]] { σ x ; S } : B[[S]] = B[[ { σ x ; S } ]] A[[ { σ x ; S } ]] = A[[S]]

25 Static Analysis Definite Assignment Constraints (4/7) S 1 S 2 : B[[S 1 ]] = B[[S 1 S 2 ]] B[[S 2 ]] = A[[S 1 ]] A[[S 1 S 2 ]] = A[[S 2 ]] x = E: B[[E]] = B[[x = E]] A[[x = E]] = A[[E]]  {x} x[E 1 ] = E 2 : B[[E 1 ]] = B[[x[E 1 ] = E 2 ]] B[[E 2 ]] = A[[E 1 ]] A[[x[E 1 ] = E 2 ]] = A[[E 2 ]]

26 Static Analysis Definite Assignment Constraints (5/7) true : A t [[ true ]] = B[[ true ]] A f [[ true ]] =  A[[ true ]] = B[[ true ]] false : A t [[ false ]] =  A f [[ false ]] = B[[ false ]] A[[ false ]] = B[[ false ]] ! E: B[[E]] = B[[ ! E]] A f [[ ! E]] = A t [[E]] A[[ ! E]] = A[[E]] A t [[ ! E]] = A f [[E]]

27 Static Analysis Definite Assignment Constraints (6/7) E 1 && E 2 : B[[E 1 ]] = B[[E 1 && E 2 ]] B[[E 2 ]] = A t [[E 1 ]] A t [[E 1 && E 2 ]] = A t [[E 2 ]] A f [[E 1 && E 2 ]] = A f [[E 1 ]]  A f [[E 2 ]] A[[E 1 && E 2 ]] = A t [[E 1 && E 2 ]]  A f [[E 1 && E 2 ]] E 1 || E 2 : B[[E 1 ]] = B[[E 1 || E 2 ]] B[[E 2 ]] = A f [[E 1 ]] A t [[E 1 || E 2 ]] = A t [[E 1 ]]  A t [[E 2 ]] A f [[E 1 || E 2 ]] = A f [[E 2 ]] A[[E 1 || E 2 ]] = A t [[E 1 || E 2 ]]  A f [[E 1 || E 2 ]]

28 Static Analysis Definite Assignment Constraints (7/7) EXP(E 1,...,E k ): (any other expression with subexpressions) B[[E 1 ]] = B[[EXP(E 1,...,E k )]] B[[E i+1 ]] = A[[E i ]] A[[EXP(E 1,...,E k )]] = A[[E k ]] When not specified otherwise: A t [[E]] = A f [[E]] = A[[E]]

29 Static Analysis Exploiting the Information  For every expression E of the form: x x ++ x -- x[E'] where x  B[[E]]: variable might not have been initialized

30 Static Analysis Approximation  A[[...]] and B[[...]] may be too small: some unfair uninitialized variable errors may occur { int x; if (b) x = 17; if (!b) x = 42 System.out.print(x); }

31 Static Analysis A Simpler Guarantee  In Joos 1, definite assignment is guaranteed by: requiring initializers for all local declarations forbidding a local variable to appear in its own initializer  This is an even coarser approximation: { int x = (x=1)+42; System.out.print(x) }

32 Static Analysis Flow-Sensitive Analysis  The analyses for: reachability definite assignment may simply be computed by traversing the AST  Other analyses are defined on the control flow graph of a program and require more complex techniques

33 Static Analysis Motivation: Register Optimization  For native code, we may want to optimize the use of registers: mov 1,R3 mov 1,R1 mov R3,R1  This optimization is only sound if the value in R3 is not used in the future

34 Static Analysis Motivation: Register Spills  When pushing a new frame, we write back all variables from registers to memory  It would be better to only write back those registers whose values may be used in the future c b a y x R1 R2 R3

35 Static Analysis Liveness  In both examples, we need to know if the value of some register R i might be read in the future  If so, it is called live (and otherwise dead)  Exact liveness is of course undecidable  A static analysis may conservatively approximate liveness at each program point  A trivial analysis thinks everything is live  A superior analysis identifies more dead registers

36 Static Analysis Liveness Analysis  For every program point S i we define the following set-valued properties: B[[S i ]] denotes the set of registers that are possibly live before S i A[[S i ]] denotes the set of registers that are possibly live after S i  For every program point we generate a constraint that relates A[[...]] and B[[...]] for neighboring program points  We no longer just use the AST...

37 Static Analysis Terminology  succ(S i ) denotes the set of program points to which execution may continue (by falling through or jumping)  uses(S i ) denotes the set of registers that S i reads  defs(S i ) denotes the set of registers that S i writes

38 Static Analysis A Tiny Example uses( S i )defs( S i ) succ( S i ) S1: mov 3,R1 {}{R1} {S2} S2:mov 4,R2 {}{R2} {S3} S3:add R1,R2,R3 {R1,R2}{R3} {S4} S4:mov R3,R0 {R3}{R0} {S5} S5:return {R0}{} {}

39 Static Analysis Dataflow Constraints  For every program point S i we have: B[[S i ]] = uses(S i )  (A[[S i ]] \ defs(S i )) A[[S i ]] = B[[x]] x  succ(S i )  A cyclic control flow graph will generate a cyclic collection of constraints 

40 Static Analysis Dataflow Example S7: add R1,R2,R3 S8 S9 S17 B[[ S8 ]]={ R4 } B[[ S9 ]]={ R1 } B[[ S17 ]]={ R3 } B[[ S7 ]]={ R1, R2 }  ({ R4, R1, R3 }\{ R3 })={ R1, R2, R4 }

41 Static Analysis A Small Example { int i, even, odd, sum; i = 1; even = 0; odd = 0; sum = 0; while (i < 10) { if (i%2 == 0) even = even+i; else odd = odd+i; sum = sum+i; i++; }

42 Static Analysis Generated Native Code mov 1,R1 // R1 is i mov 0,R2 // R2 is even mov 0,R3 // R3 is odd mov 0,R4 // R4 is sum loop: andcc R1,1,R5 // R5 = R1 & 1 cmp R5,0 bne else // if R5!=0 goto else add R2,R1,R2 // R2 = R2+R1 b endif else: add R3,R1,R3 // R3 = R3+R1 endif: add R4,R1,R4 // R4 = R4+R1 add R1,1,R1 // R1 = R1+1 cmp R1,9 ble loop // if i<=9 goto loop

43 Static Analysis The Control Flow Graph S5: andcc R1,1,R5 S1: mov 1,R1 S6: cmp R5,0 S8: add R2,R1,R2 S2: mov 0,R2 S7: bne S9 S10: add R4,R1,R4 S9: add R3,R1,R3 S3: mov 0,R3 S11: add R1,1,R1 S4: mov 0,R4 S12: cmp R1,9 S13: ble S5

44 Static Analysis Cyclic Constraints B[[ S1 ]] = f 1 (B[[ S2 ]]) B[[ S2 ]] = f 2 (B[[ S3 ]]) B[[ S3 ]] = f 3 (B[[ S4 ]]) B[[ S4 ]] = f 4 (B[[ S5 ]]) B[[ S5 ]] = f 5 (B[[ S6 ]]) B[[ S6 ]] = f 6 (B[[ S7 ]]) B[[ S7 ]] = f 7 (B[[ S8 ]],B[[ S9 ]]) B[[ S8 ]] = f 8 (B[[ S10 ]]) B[[ S9 ]] = f 9 (B[[ S10 ]]) B[[ S10 ]] = f 10 (B[[ S11 ]]) B[[ S11 ]] = f 11 (B[[ S12 ]]) B[[ S12 ]] = f 12 (B[[ S13 ]]) B[[ S13 ]] = f 13 (B[[ S5 ]]) where the f i (...) functions express the local constraints

45 Static Analysis Fixed-Point Solutions  Reminder: x is a fixed point of a function f iff f(x)=x  Define  = ( , , , , , , , , , , ,  )  Define R = { R0, R1, R2, R3, R4, R5 },  (R) = Powerset of R  Define F:  (R) 13 →  (R) 13 as: F(X 1,X 2,X 3,X 4,X 5,X 6,X 7,X 8,X 9,X 10,X 11,X 12,X 13 ) = (f 1 (X 2 ),f 2 (X 3 ),f 3 (X 4 ),f 4 (X 5 ),f 5 (X 6 ),f 6 (X 7 ),f 7 (X 8,X 9 ), f 8 (X 9 ),f 9 (X 10 ),f 10 (X 11 ),f 11 (X 12 ),f 12 (X 13 ),f 13 (X 5 ))  A solution is now a fixed point X  (R) 13 such that F(X)=X  The least fixed point is computed as a Kleene iteration: F n (  ) for some n  0

46 Static Analysis Computing the Least Fixed Point (1/3)  F(  )F 2 (  ) F 3 (  ) S1{}{}{}{} S2{}{}{}{} S3{}{}{}{R1} S4{}{}{R1}{R1} S5{}{R1}{R1}{R1} S6{}{R5}{R5}{R1,R2,R3,R5} S7{}{}{R1,R2,R3}{R1,R2,R3,R4} S8{}{R1,R2}{R1,R2,R4}{R1,R2,R4} S9{}{R1,R3}{R1,R3,R4}{R1,R3,R4} S10{}{R1,R4}{R1,R4}{R1,R4} S11{}{R1}{R1}{R1} S12{}{R1}{R1}{R1} S13{}{R1}{R1}{R1}

47 Static Analysis Computing the Least Fixed Point (2/3) F 4 (  ) F 5 (  ) F 6 (  ) S1{}{}{} S2{R1}{R1}{R1} S3{R1}{R1}{R1,R2} S4{R1}{R1,R2,R3}{R1,R2,R3} S5{R1,R2,R3}{R1,R2,R3,R4}{R1,R2,R3,R4} S6{R1,R2,R3,R4,R5}{R1,R2,R3,R4,R5}{R1,R2,R3,R4,R5} S7{R1,R2,R3,R4}{R1,R2,R3,R4}{R1,R2,R3,R4} S8{R1,R2,R4}{R1,R2,R4}{R1,R2,R4} S9{R1,R3,R4}{R1,R3,R4}{R1,R3,R4} S10{R1,R4}{R1,R4}{R1,R4} S11{R1}{R1}{R1,R2,R3} S12{R1}{R1,R2,R3}{R1,R2,R3,R4} S13{R1,R2,R3}{R1,R2,R3,R4}{R1,R2,R3,R4}

48 Static Analysis Computing the Least Fixed Point (3/3) F 7 (  ) F 8 (  ) S1{}{} S2{R1}{R1} S3{R1,R2}{R1,R2} S4{R1,R2,R3}{R1,R2,R3} S5{R1,R2,R3,R4}{R1,R2,R3,R4} S6{R1,R2,R3,R4,R5}{R1,R2,R3,R4,R5} S7{R1,R2,R3,R4}{R1,R2,R3,R4} S8{R1,R2,R4}{R1,R2,R3,R4} S9{R1,R3,R4}{R1,R2,R3,R4} S10{R1,R2,R3,R4}{R1,R2,R3,R4} S11{R1,R2,R3,R4}{R1,R2,R3,R4} S12{R1,R2,R3,R4}{R1,R2,R3,R4} S13{R1,R2,R3,R4}{R1,R2,R3,R4} F 8 (  )= F 9 (  )

49 Static Analysis Background: Why does this work?   (R), the powerset of R, forms a lattice:  It is a partial order, i.e., it is reflexive: S  S transitive: if S 1  S 2 and S 2  S 3 then S 1  S 3 anti-symmetric: if S 1  S 2 and S 2  S 1 then S 1 =S 2  It has a least element (  ) and a greatest element (R)  Any two elements have a join S 1  S 2 and a meet S 1  S 2  Fixed point theorem: In a finite lattice L a monotone function F: L  L has a unique least fixed point: F n (  ) computable by Kleene iteration n  0 

50 Static Analysis Application: Register Allocation  Variables that are never live at the same time may share the same register  Create a graph of variables where edges indicate simultaneous liveness:  Register allocation is done by finding a minimal graph coloring and assigning a register to each color: {{ a, d, f }, { b, e }, { c }} a b c d e f

Download ppt "Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University."

Similar presentations