Program Analysis and Verification Spring 2014 Program Analysis and Verification Lecture 8: Static Analysis II Roman Manevich Ben-Gurion University
Syllabus Semantics Natural Semantics Structural semantics Axiomatic Verification Static Analysis Automating Hoare Logic Control Flow Graphs Equation Systems Collecting Semantics Abstract Interpretation fundamentals Lattices Galois Connections Fixed-Points Widening/ Narrowing Domain constructors Interprocedural Analysis Analysis Techniques Numerical Domains CEGARAlias analysis Shape Analysis Crafting your own Soot From proofs to abstractions Systematically developing transformers 2
Previously Static Analysis by example – Simple Available Expressions analysis – Abstract transformer for assignments – Three-address code – Processing serial composition – Processing conditions – Processing loops 3
Defining an SAV abstract transformer Goal: define a function F SAV [x:=a] : s.t. if F SAV [x:=a](D) = D’ thensp(x := a, Conj(D)) Conj(D’) Idea: define rules for individual facts and generalize to sets of facts by the conjunction rule 4 Is either a variable v or an addition expression v+w { x= } x:=a { } [kill-lhs] { y=x+w } x:=a { } [kill-rhs-1] { y=w+x } x:=a { } [kill-rhs-2] { } x:= { x= } [gen] { y=z+w } x:=a { y=z+w } [preserve]
Defining a semantic reduction Idea: make as many implicit facts explicit by – Using symmetry and transitivity of equality – Commutativity of addition – Meaning of equality – can substitute equal variables For an SAV-predicate P=Conj(D) define Explicate(D) = minimal set D * such that: 1.D D * 2.x=y D * implies y=x D * 3.x=y D * y=z D * implies x=z D * 4.x=y+z D * implies x=z+y D * 5.x=y D * and x=z+w D * implies y=z+w D * 6.x=y D * and z=x+w D * implies z=y+w D * 7.x=z+w D * and y=z+w D * implies x=y D * Notice that Explicate(D) D Explicate is a special case of a semantic reduction 5
Annotating assignments Define: F * [x:=aexpr] = Explicate F SAV [x:= aexpr] Annotate(P, x:=aexpr) = {P} x:=aexpr F * [x:= aexpr](P) 6
Annotating composition Annotate(P, S 1 ; S 2 ) = let Annotate(P, S 1 ) be {P} A 1 {Q 1 } let Annotate(Q 1, S 2 ) be {Q 1 } A 2 {Q 2 } return {P} A 1 ; {Q 1 } A 2 {Q 2 } 7
Simplifying conditions Extend While with – Non-determinism (or) and – An assume statement assume b, s sos s if B b s = tt Now, the following two statements are equivalent – if b then S 1 else S 2 – ( assume b; S 1 ) or ( assume b; S 2 ) 8
assume transformer Define (bexpr) = if bexpr is factoid {bexpr} else {} Define F[ assume bexpr](D) = D (bexpr) Can sharpen F * [ assume bexpr] = Explicate F SAV [ assume bexpr] 9
Annotating conditions let P t = F * [ assume bexpr] P let P f = F * [ assume bexpr] P let Annotate(P t, S 1 ) be {P t } A 1 {Q 1 } let Annotate(P f, S 2 ) be {P f } A 2 {Q 2 } return {P} if bexpr then {P t } A 1 {Q 1 } else {P f } A 2 {Q 2 } {Q 1 Q 2 } 10
k-loop unrolling 11 The following must hold: P N Q 1 N Q 2 N … Q k N … { P } if (x z) x := x + 1 y := x + a d := x + a Q 1 = { y=x+a, y=a+x } if (x z) x := x + 1 y := x + a d := x + a Q 2 = { y=x+a, y=a+x } … { P } Inv = { N } while (x z) do x := x + 1 y := x + a d := x + a { y=x+a, y=a+x, w=d, d=w } if (x z) x := x + 1 y := x + a d := x + a Q 1 = { y=x+a, y=a+x } We can compute the following sequence: N 0 = P N 1 = N 1 Q 1 N 2 = N 1 Q 2 … N k = N k-1 Q k Observation 1: No need to explicitly unroll loop – we can reuse postcondition from unrolling k-1 for k
Annotating loops Annotate(P, while bexpr do S ) = Initialize N := N c := P repeat let Annotate(P, if b then S else skip ) be {N c } if bexpr then S else skip {N} N c := N c N until N = N c return {P} INV= N while bexpr do F[ assume bexpr](N) Annotate(F[ assume bexpr](N), S) F[ assume bexpr](N) 12
Annotating programs Annotate(P, S) = case S is x:=aexpr return {P} x:=aexpr {F * [x:=aexpr] P} case S is S 1 ; S 2 let Annotate(P, S 1 ) be {P} A 1 {Q 1 } let Annotate(Q 1, S 2 ) be {Q 1 } A 2 {Q 2 } return {P} A 1 ; {Q 1 } A 2 {Q 2 } case S is if bexpr then S 1 else S 2 let P t = F[ assume bexpr] P let P f = F[ assume bexpr] P let Annotate(P t, S 1 ) be {P t } A 1 {Q 1 } let Annotate(P f, S 2 ) be {P f } A 2 {Q 2 } return {P} if bexpr then {P t } A 1 {Q 1 } else {P f } A 2 {Q 2 } {Q 1 Q 2 } case S is while bexpr do S N := N c := P // Initialize repeat let P t = F[ assume bexpr] N c let Annotate(P t, S) be {N c } A body {N} N c := N c N until N = Nc return {P} INV= {N} while bexpr do {P t } A body {F[ assume bexpr](N)} 13
Today Another static analysis example – constant propagation Basic concepts in static analysis – Control flow graphs – Equation systems – Collecting semantics – (Trace semantics) 14
Constant propagation 15
Second static analysis example Optimization: constant folding – Example: x:=7; y:=x*9 transformed to: x:=7; y:=7*9 and then to: x:=7; y:=63 Analysis: constant propagation (CP) – Infers facts of the form x = c 16 { x = c } y := aexpr y := eval(aexpr[c/x]) constant folding simplifies constant expressions
Plan Define domain – set of allowed assertions Handle assignments Handle composition Handle conditions Handle loops 17
Constant propagation domain 18
CP semantic domain 19 ?
CP semantic domain Define CP-factoids: = { x = c | x Var, c Z } – How many factoids are there? Define predicates as = 2 – How many predicates are there? – Do all predicates make sense? (x=5) (x=7) Treat conjunctive formulas as sets of factoids {x=5, y=7} ~ (x=5) (y=7) 20
Handling assignments 21
CP abstract transformer Goal: define a function F CP [x:=aexpr] : such that if F CP [x:=aexpr] P = P’ then sp(x:=aexpr, P) P’ 22 ?
CP abstract transformer Goal: define a function F CP [x:=aexpr] : such that if F CP [x:=aexpr] P = P’ then sp(x:=aexpr, P) P’ 23 { x=c } x:=aexpr { } [kill] { y=c 1, z=c 2 } x:=y op z { x=c} and c=c 1 op c 2 [gen-2] { } x:=c { x=c } [gen-1] { y=c } x:=aexpr { y=c } [preserve]
Gen-kill formulation of transformers Suited for analysis propagating sets of factoids – Available expressions, – Constant propagation, etc. For each statement, define a set of killed factoids and a set of generated factoids F[S] P = (P \ kill(S)) gen(S) F CP [x:=aexpr] P = (P \ {x=c}) aexpr is not a constant F CP [x:=k] P = (P \ {x=c}) {x=k} Used in dataflow analysis – a special case of abstract interpretation 24
Handling composition 25
Does this still work? Annotate(P, S 1 ; S 2 ) = let Annotate(P, S 1 ) be {P} A 1 {Q 1 } let Annotate(Q 1, S 2 ) be {Q 1 } A 2 {Q 2 } return {P} A 1 ; {Q 1 } A 2 {Q 2 } 26
Handling conditions 27
Handling conditional expressions We want to soundly approximate D bexpr and D bexpr in Define (bexpr) = if bexpr is CP-factoid {bexpr} else {} Define F[ assume bexpr](D) = D (bexpr) 28
Does this still work? let P t = F[ assume bexpr] P let P f = F[ assume bexpr] P let Annotate(P t, S 1 ) be {P t } A 1 {Q 1 } let Annotate(P f, S 2 ) be {P f } A 2 {Q 2 } return {P} if bexpr then {P t } A 1 {Q 1 } else {P f } A 2 {Q 2 } {Q 1 Q 2 } 29 How do we define join for CP?
Join example {x=5, y=7} {x=3, y=7, z=9} = 30
Handling loops 31
Does this still work? What about correctness? What about termination? 32 Annotate(P, while bexpr do S) = N := N c := P // Initialize repeat let P t = F[ assume bexpr] N c let Annotate(P t, S) be {N c } A body {N} N c := N c N until N = Nc return {P} INV= {N} while bexpr do {P t } A body {F[ assume bexpr](N)}
Does this still work? What about correctness? – If loop terminates then is N a loop invariant? What about termination? 33 Annotate(P, while bexpr do S) = N := N c := P // Initialize repeat let P t = F[ assume bexpr] N c let Annotate(P t, S) be {N c } A body {N} N c := N c N until N = Nc return {P} INV= {N} while bexpr do {P t } A body {F[ assume bexpr](N)}
A termination principle g : X X is a function How can we determine whether the sequence x 0, x 1 = g(x 0 ), …, x k+1 =g(x k ),… stabilizes? Technique: 1.Find ranking function rank : X N (that is show that rank(x) 0 for all x) 2.Show that if x g(x) then rank(g(x)) < rank(x) 34
Rank function for available expressions rank(P) = ? 35
Rank function for available expressions rank(P) = |P| number of factoids Prove that either N c = N c N or rank(N c N) < ? rank(N c ) 36 Annotate(P, while bexpr do S) = N := N c := P // Initialize repeat let P t = F[ assume bexpr] N c let Annotate(P t, S) be {N c } A body {N} N c := N c N until N = Nc return {P} INV= {N} while bexpr do {P t } A body {F[ assume bexpr](N)}
Rank function for constant propagation rank(P) = ? Prove that either N c = N c N or rank(N c ) > ? rank(N c N) 37 Annotate(P, while bexpr do S) = N := N c := P // Initialize repeat let P t = F[ assume bexpr] N c let Annotate(P t, S) be {N c } A body {N} N c := N c N until N = Nc return {P} INV= {N} while bexpr do {P t } A body {F[ assume bexpr](N)}
Rank function for constant propagation rank(P) = |P| number of factoids Prove that either N c = N c N’ or rank(N c ) > ? rank(N c N’) 38 Annotate(P, while bexpr do S) = N’ := N c := P // Initialize repeat let P t = F[ assume bexpr] N c let Annotate(P t, S) be {N c } A body {N’} N c := N c N’ until N’ = Nc return {P} INV= {N’} while bexpr do {P t } A body {F[ assume bexpr](N)}
Generalizing 39 By NMZ (Photoshop) [CC0], via Wikimedia Commons 1 Available Expressions Constant Propagation Abstract Interpretation
Towards a recipe for static analysis Two static analyses – Available Expressions (extended with equalities) – Constant Propagation Semantic domain – a family of formulas – Join operator approximates pairs of formulas Abstract transformers for basic statements – Assignments – assume statements Initial precondition 40
Control flow graphs 41
A technical issue Unrolling loops is quite inconvenient and inefficient (but we can avoid it as we just saw) How do we handle more complex control-flow constructs, e.g., goto, break, exceptions…? – The problem: non-inductive control flow constructs Solution: model control-flow by labels and goto statements Would like a dedicated data structure to explicitly encode control flow in support of the analysis Solution: control-flow graphs (CFGs) 42
Modeling control flow with labels 43 while (x z) do x := x + 1 y := x + a d := x + a a := b label0: if x z goto label1 x := x + 1 y := x + a d := x + a goto label0 label1: a := b
Control-flow graph example 44 1 label0: if x z goto label1 x := x + 1 y := x + a d := x + a goto label0 label1: a := b label0: if x z x := x + 1 y := x + a d := x + a goto label0 label1: a := b line number
Control-flow graph example 45 1 label0: if x z goto label1 x := x + 1 y := x + a d := x + a goto label0 label1: a := b label0: if x z x := x + 1 y := x + a d := x + a goto label0 label1: a := b entry exit 7
Control-flow graph Node are statements or labels Special nodes for entry/exit A edge from node v to node w means that after executing the statement of v control passes to w – Conditions represented by splits and join node – Loops create cycles Can be generated from abstract syntax tree in linear time – Automatically taken care of by the front-end Usage: store analysis results (assertions) in CFG nodes 46
Control-flow graph example 47 1 label0: if x z goto label1 x := x + 1 y := x + a d := x + a goto label0 label1: a := b label0: if x z x := x + 1 y := x + a d := x + a goto label0 label1: a := b entry exit
Eliminating labels We can use edges to point to the nodes following labels and remove all label nodes (other than entry/exit) 48
Control-flow graph example 49 1 label0: if x z goto label1 x := x + 1 y := x + a d := x + a goto label0 label1: a := b label0: if x z x := x + 1 y := x + a d := x + a goto label0 label1: a := b entry exit
Control-flow graph example 50 1 label0: if x z goto label1 x := x + 1 y := x + a d := x + a goto label0 label1: a := b if x z x := x + 1 y := x + a d := x + a a := b entry exit
Basic blocks A basic block is a chain of nodes with a single entry point and a single exit point Entry/exit nodes are separate blocks 51 if x z x := x + 1 y := x + a d := x + a a := b entry exit
Blocked CFG Stores basic blocks in a single node Extended blocks – maximal connected loop- free subgraphs 52 if x z x := x + 1 y := x + a d := x + a a := b entry exit 4 5
53 Collecting semantics
Why need another semantics? Operational semantics explains how to compute output from a given input – Useful for implementing an interpreter/compiler – Less useful for reasoning about safety properties – Not suitable for computational purposes – does not explicitly show how assertions in different program points influence each other Need a more explicit semantics – Over a control flow graph 54
Control-flow graph example if x > 0 x := x - 1 goto label0: label1: entry exit label0: 1 55 label0: if x <= 0 goto label1 x := x – 1 goto label0 label1:
Trimmed CFG if x > 0 x := x entry exit 56 label0: if x <= 0 goto label1 x := x – 1 goto label0 label1:
Collecting semantics example: input if x > 0 x := x entry exit [x1][x1] [x1][x1] [x0][x0] [x0][x0] 57 [x1][x1][x2][x2][x3][x3] … label0: if x <= 0 goto label1 x := x – 1 goto label0 label1:
Collecting semantics example: input if x > 0 x := x entry exit [x1][x1] [x1][x1] [x0][x0][x2][x2] [x2][x2] 58 [x1][x1][x2][x2][x3][x3] … label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: [x0][x0]
Collecting semantics example: input if x > 0 x := x entry exit [x1][x1] [x1][x1] [x0][x0][x2][x2] [x2][x2] [x3][x3] [x3][x3] 59 [x1][x1][x2][x2][x3][x3] … label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: [x0][x0]
ad infinitum – fixed point if x > 0 x := x entry exit [x1][x1] [x1][x1] [x1][x1] [x0][x0] [x2][x2] [x2][x2] [x2][x2] [x3][x3] [x3][x3] [x3][x3] … … … 60 label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: [ x -1][ x -2] … [x0][x0]
Predicates at fixed point if x > 0 x := x entry exit 61 label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: { true } {?}{?} {?}{?}{?}{?}
Predicates at fixed point if x > 0 x := x entry exit 62 label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: { true } { x>0 }{x0}{x0}{x0}{x0}
Collecting semantics Accumulates for each control-flow node the (possibly infinite) sets of states that can reach there by executing the program from some given set of input states Not computable in general A reference point for static analysis (An abstraction of the trace semantics) We will work our way up to defining it formally 63
Collecting semantics in equational form 64
Math reference: function lifting Let f : X Y be a function The lifted function f’ : 2 X 2 Y is defined as f’(XS) = { f(x) | x XS } We will sometimes use the same symbol for both functions when it is clear from the context which one is used 65
Equational definition example A vector of variables R[0, 1, 2, 3, 4] R[0] = { x Z} // established input R[1] = R[0] R[4] R[2] = R[1] {s | s(x) > 0} R[3] = R[1] {s | s(x) 0} R[4] = x:=x-1 R[2] A (recursive) system of equations 66 if x > 0 x := x-1 entry exit R[0] R[1] R[2] R[4] R[3] Semantic function for assume x>0 Semantic function for x:=x-1 lifted to sets of states
General definition A vector of variables R[0, …, k] one per input/output of a node – R[0] is for entry For node n with multiple predecessors add equation R[n] = {R[k] | k is a predecessor of n} For an atomic operation node R[m] S R[n] add equation R[n] = S R[m] Transform if b then S 1 else S 2 to ( assume b; S 1 ) or ( assume b; S 2 ) 67 if x > 0 x := x-1 entry exit R[0] R[1] R[2] R[4] R[3]
Next lecture: abstract interpretation fundamentals