Spring 2016 Program Analysis and Verification

Spring 2016 Program Analysis and Verification
Lecture 7: Static Analysis I Roman Manevich Ben-Gurion University

Tentative syllabus Program Verification Program Analysis Basics
Operational semantics Hoare Logic Applying Hoare Logic Weakest Precondition Calculus Proving Termination Data structures Automated Verification Program Analysis Basics From Hoare Logic to Static Analysis Control Flow Graphs Equation Systems Collecting Semantics Using Soot Abstract Interpretation fundamentals Lattices Fixed-Points Chaotic Iteration Galois Connections Domain constructors Widening/ Narrowing Analysis Techniques Numerical Domains Alias analysis Interprocedural Analysis Shape Analysis CEGAR

Previously Axiomatic verification Weakest precondition calculus
Strongest postcondition calculus Handling data structures Total correctness

Agenda Static analysis for compiler optimization
Common Subexpression Elimination Available Expression domain Develop a static analysis: Simple Available Expressions Constant Propagation Basic concepts in static analysis Control flow graphs Equation systems Collecting semantics

Array-max example: Post1
nums : array N : int // N stands for num’s length { N0 } x := 0 { N0  x=0 } res := nums[0] { x=0 } Inv = { xN } while x < N { x=k  k<N } if nums[x] > res then res := nums[x] { x=k  k<N } x := x { x=k+1  k<N } { xN  xN } { x=N }

Can we find this proof automatically?
nums : array N : int { N0 } x := 0 { N0  x=0 } res := nums[0] { x=0 } Inv = { xN } while x < N { x=k  k<N } if nums[x] > res then { x=k  k<N } res := nums[x] { x=k  k<N } { x=k  k<N } x := x { x=k+1  k<N } { xN  xN } { x=N } Observation: predicates in proof have the general form  constraint where constraint has the form X - Y  c or X  c

Look under the street lamp
…We may move lamp a bit By Infopablo00 (Own work) [CC-BY-SA-3.0 ( via Wikimedia Commons

Zone Abstract Domain Developed by Antoine Mine in his Ph.D. thesis
Uses constraints of the form X - Y  c and X  c

Analysis with Zone abstract domain
Static Analysis with Zone Abstraction Manual Proof nums : array N : int { N0 } x := 0 { N0  x=0 } res := nums[0] { N0  x=0 } Inv = { N0  0xN } while x < N { N0  0x<N } if nums[x] > res then { N0  0x<N } res := nums[x] { N0  0x<N } { N0  0x<N } x := x { N0  0<x<N } {N0  0x  x=N } nums : array N : int { N0 } x := 0 { N0  x=0 } res := nums[0] { x=0 } Inv = { xN } while x < N { x=k  kN } if nums[x] > res then { x=k  k<N } res := nums[x] { x=k  k<N } { x=k  k<N } x := x { x=k+1  k<N } { xN  xN } { x=N }

Array-max example: Post3
nums : array { N0  0m<N } // N stands for num’s length x := 0 { x=0 } res := nums[0] { x=0  res=nums(0) } Inv = { 0m<x  nums(m)res } while x < N { x=k  res=oRes  0m<k  nums(m)oRes } if nums[x] > res then { nums(x)>oRes  res=oRes  x=k  0m<k  nums(m)oRes } res := nums[x] { res=nums(x)  nums(x)>oRes  x=k  0m<k  nums(m)oRes } { x=k  0mk  nums(m)<res } { (x=k  0m<k  nums(m)<res)  (res≥nums(x)  x=k  res=oRes  0m<k  nums(m)oRes)} { x=k  0m<k  nums(m)res } x := x { x=k+1  0mx-1  nums(m)res } { 0m<x  nums(m)res } { x=N  0m<x  nums(m)res} [univp]{ m. 0m<N  nums(m)res }

Can we find this proof automatically?
Various static analysis techniques can A framework for numeric analysis of array operations [Gopan et al. in POPL 2015] Discovering properties about arrays in simple programs [Halbwachs & Péron in PLDI 2008]

Static analysis for compiler optimizations

Motivating problem: optimization
A compiler optimization is defined by a program transformation: T : Stmt  Stmt The transformation is semantics-preserving: s. Ssos  C  s = Ssos  T(C)  s The transformation is applied to the program only if an enabling condition is met We use static analysis for inferring enabling conditions

Common Subexpression Elimination
If we have two variable assignments x := a op b … y := a op b and the values of x, a, and b have not changed between the assignments, rewrite the code as x = a op b … y := x Eliminates useless recalculation Paves the way for more optimizations (e.g., dead code elimination) op  {+, -, *, ==, <=}

What do we need to prove? CSE { true } C1
x := a op b C2 { x = a op b } y := a op b C3 { true } C1 x := a op b C2 { x = a op b } y := x C3 CSE Assertion localizes decision

A simplified problem CSE { true } C1 x := a + b C2 { x = a + b }
y := a + b C3 { true } C1 x := a + b C2 { x = a + b } y := x C3 CSE

Available Expressions analysis
A static analysis that infers for every program point a set of facts of the form AV = { x = y | x, y  Var }  { x = op y | x, y  Var, op  {-, !} }  { x = y op z | y, z  Var, op  {+, -, *, <=} } For every program with n=|Var| variables number of possible facts is finite: |AV|=O(n3) Yields a trivial algorithm … Is it efficient?

Simple Available Expressions
Define atomic facts (for SAV) as  = { x = y | x, y  Var }  { x = y + z | x, y, z  Var } For n=|Var| number of atomic facts is O(n3) Define sav-predicates as  = 2

Notation for conjunctive sets of facts
For a set of atomic facts D  , we define Conj(D) = D E.g., if D={a=b, c=b+d, b=c} then Conj(D) = (a=b)  (c=b+d)  (b=c) Notice that for two sets of facts D1 and D2 Conj(D1  D2) = Conj(D1)  Conj(D1) What does Conj({}) stand for…?

Towards an automatic proof
Goal: automatically compute an annotated program proving as many facts as possible of the form x = y and x = y + z Decision 1: develop a forward-going proof Decision 2: draw predicates from a finite set D “looking under the light of the lamp” A compromise that simplifies problem by focusing attention – possibly miss some facts that hold Challenge 1: handle straight-line code Challenge 2: handle conditions Challenge 3: handle loops

Challenge 1: handling straight-line code
By Zachary Dylan Tax (Zachary Dylan Tax) [GFDL ( or CC-BY-3.0 ( via Wikimedia Commons

Straight line code example
{ } x := a + b { x=a+b } z := a + c { x=a+b, z=a+c } b := a * c { z=a+c } Find a proof that satisfies both conditions

Straight line code example
sp { } x := a + b { x=a+b } z := a + c { x=a+b, z=a+c } b := a * c { z=a+c } cons Frame Can we turn this into an algorithm? What should we ensure for each triple?

Goal Given a program of the form x1 := a1; … xn := an
Find predicates P0, …, Pn such that {P0} x1 := a1 {P1} … {Pn-1} xn := an {Pn} is a proof That is: sp(xi := ai, Pi-1)  Pi Each Pi has the form Conj(Di) where Di is a set of atomic

Algorithm for straight-line code
Goal: find predicates P0, …, Pn such that {P0} x1 := a1 {P1} … {Pn-1} xn := an {Pn} is a proof That is: sp(xi := ai, Pi-1)  Pi Each Pi has the form Conj(Di) where Di is a set of atomic facts Idea: define a function FSAV[x:=a] :    s.t. if FSAV[x:=a](D) = D’ then sp(x := a, Conj(D))  Conj(D’) We call F the abstract transformer of x:=a Unless D0 is given, initialize D0={} (why?) For each i: compute Di+1 = Conj(FSAV[xi := ai] Di) Finally Pi = Conj(Di)

Defining an SAV abstract transformer
Goal: define a function FSAV[x:=a] :    s.t. if FSAV[x:=a](D) = D’ then sp(x := a, Conj(D))  Conj(D’) Idea: define rules for individual facts and generalize to sets of facts by the conjunction rule

Defining an SAV abstract transformer
Goal: define a function FSAV[x:=a] :    s.t. if FSAV[x:=a](D) = D’ then sp(x := a, Conj(D))  Conj(D’) Idea: define rules for individual facts and generalize to sets of facts by the conjunction rule { x= } x:=a { } [kill-lhs]  Is either a variable v or an addition expression v+w { y=x+w } x:=a { } [kill-rhs-1] { y=w+x } x:=a { } [kill-rhs-2] { } x:=  { x= } [gen] { y=z+w } x:=a { y=z+w } [preserve]

SAV abstract transformer example
{ } x := a + b { x=a+b } z := a + c { x=a+b, z=a+c } b := a * c { z=a+c }  Is either a variable v or an addition expression v+w { x= } x:= aexpr { } [kill-lhs] { y=x+w } x:= aexpr { } [kill-rhs-1] { y=w+x } x:= aexpr { } [kill-rhs-2] { } x:=  { x= } [gen] { y=z+w } x:= aexpr { y=z+w } [preserve]

Problem 1: large expressions
{ } x := a + b + c { } y := a + b + c { } Missed CSE opportunity Large expressions on the right hand sides of assignments are problematic Can miss optimization opportunities Require complex transformers Solution: …?

Problem 1: large expressions
{ } x := a + b + c { } y := a + b + c { } Missed CSE opportunity Large expressions on the right hand sides of assignments are problematic Can miss optimization opportunities Require complex transformers Solution: transform code to normal form where right-hand sides have bounded size Standard compiler transformation – lowering into three address code

Three-address code { } x := a + b + c { } y := a + b + c { } { } i1 := a + b { i1=a+b } x := i1 + c { i1=a+b, x=i1+c } i2 := a + b { i1=a+b, x=i1+c, i2=a+b } y := i2 + c { i1=a+b, x=i1+c, i2=a+b, y=i2+c } Main idea: simplify expressions by storing intermediate results in new temporary variables Number of variables in simplified statements  3

Three-address code { } x := a + b + c { } y := a + b + c { } { } i1 := a + b { i1=a+b } x := i1 + c { i1=a+b, x=i1+c } i2 := a + b { i1=a+b, x=i1+c, i2=a+b } y := i2 + c { i1=a+b, x=i1+c, i2=a+b, y=i2+c } Need to infer i1=i2 Main idea: simplify expressions by storing intermediate results in new temporary variables Number of variables in simplified statements  3

Problem 2: transformer precision
{ } i1 := a + b { i1=a+b } x := i1 + c { i1=a+b, x=i1+c } i2 := a + b { i1=a+b, x=i1+c, i2=a+b } y := i2 + c { i1=a+b, x=i1+c, i2=a+b, y=i2+c } Need to infer i1=i2 Our transformer only infers syntactically available expressions – ones that appear in the code explicitly We want a transformer that considers the meaning of the predicates Takes equalities into account

Defining a semantic reduction
Idea: make as many implicit facts explicit by Using symmetry and transitivity of equality Commutativity of addition Meaning of equality – can substitute equal variables For an SAV-predicate P=Conj(D) define reduce(D) = minimal set D* such that: D  D* x=y  D* implies y=x  D* x=y  D* y=z  D* implies x=z  D* x=y+z  D* implies x=z+y  D* x=y  D* and x=z+w  D* implies y=z+w  D* x=y  D* and z=x+w  D* implies z=y+w  D* x=z+w  D* and y=z+w  D* implies x=y  D* Notice that reduce(D)  D reduce is a special case of a semantic reduction

Sharpening the transformer
Define: F*[x:=aexpr] = reduce  FSAV[x:= aexpr] { } i1 := a + b { i1=a+b, i1=b+a } x := i1 + c { i1=a+b, i1=b+a, x=i1+c, x=c+i1 } i2 := a + b { i1=a+b, i1=b+a, x=i1+c, x=c+i1, i2=a+b, i2=b+a, i1=i2, i2=i1, x=i2+c, x=c+i2, } y := i2 + c { ... } Since sets of facts and their conjunction are isomorphic we will use them interchangeably

An algorithm for annotating SLP
Annotate(P, x:=aexpr) = {P} x:=aexpr F*[x:= aexpr](P) Annotate(P, S1; S2) = let Annotate(P, S1) be {P} A1 {Q1} let Annotate(Q1, S2) be {Q1} A2 {Q2} return {P} A1; {Q1} A2 {Q2}

Challenge 2: handling conditions

Goal {bexpr  P } S1 { Q }, { bexpr  P } S2 { Q } { P } if bexpr then S1 else S2 { Q } [ifp] Annotate a program if bexpr then S1 else S2 with predicates from  Assumption 1: P is given (otherwise use true) Assumption 2: bexpr is a simple binary expression e.g., x=y, xy, x<y (why?) { P } if bexpr then { bexpr  P } S1 { Q1 } else { bexpr  P } S2 { Q2 } { Q }

Joining predicates [ifp]
{bexpr  P } S1 { Q }, { bexpr  P } S2 { Q } { P } if bexpr then S1 else S2 { Q } [ifp] Possibly an SAV-fact Start with P or {bexpr  P} and annotate S1 (yielding Q1) Start with P or {bexpr  P} and annotate S2 (yielding Q2) How do we infer a Q such that Q1Q and Q2Q? Q1=Conj(D1), Q2=Conj(D2) Define: Q = Q1  Q = Conj(D1  D2) { P } if bexpr then { bexpr  P } S1 { Q1 } else { bexpr  P } S2 { Q2 } { Q } Possibly an SAV-fact

Joining predicates [ifp]
{bexpr  P } S1 { Q }, { bexpr  P } S2 { Q } { P } if bexpr then S1 else S2 { Q } [ifp] Start with P or {bexpr  P} and annotate S1 (yielding Q1) Start with P or {bexpr  P} and annotate S2 (yielding Q2) How do we infer a Q such that Q1Q and Q2Q? Q1=Conj(D1), Q2=Conj(D2) Define: Q = Q1  Q = Conj(D1  D2) { P } if bexpr then { bexpr  P } S1 { Q1 } else { bexpr  P } S2 { Q2 } { Q } The join operator for SAV

Joining predicates Q1=Conj(D1), Q2=Conj(D2)
We want to soundly approximate Q1  Q2 in  Define: Q = Q1  Q = Conj(D1  D2) Notice that Q1Q and Q2Q meaning Q1  Q2 Q

Simplifying handling of conditions
Extend While with Non-determinism (or) and An assume statement assume b, s sos s if B b s = tt Use the fact that the following two statements are equivalent if b then S1 else S2 (assume b; S1) or (assume b; S2)

Handling conditional expressions
We want to soundly approximate D  bexpr and D  bexpr in  Define (bexpr) = if bexpr is factoid {bexpr} else {} Define F[assume bexpr](D) = D  (bexpr) Can sharpen F*[assume bexpr] = reduce  FSAV[assume bexpr]

Notice bexpr  (bexpr) Examples (y=z) = {y=z} (y<z) = {}

An algorithm for annotating conditions
let Pt = F*[assume bexpr] P let Pf = F*[assume bexpr] P let Annotate(Pt, S1) be {Pt} A1 {Q1} let Annotate(Pf, S2) be {Pf} A2 {Q2} return {P} if bexpr then {Pt} A1 {Q1} else {Pf} A2 {Q2} {Q1  Q2}

Example { } if (x = y) { x=y, y=x } a := b + c { x=y, y=x, a=b+c, a=c+b } d := b – c { x=y, y=x, a=b+c, a=c+b } else { } a := b + c { a=b+c, a=c+b } d := b + c { a=b+c, a=c+b, d=b+c, d=c+b, a=d, d=a } { a=b+c, a=c+b }

Recap We now have an algorithm for soundly annotating loop-free code
Generates forward-going proofs Algorithm operates on abstract syntax tree of code Handles straight-line code by applying F* Handles conditions by recursively annotating true and false branches and then intersecting their postconditions

Example { } if (x = y) { x=y, y=x } a := b + c { x=y, y=x, a=b+c, a=c+b } d := b – c { x=y, y=x, a=b+c, a=c+b } else { } a := b + c { a=b+c, a=c+b } d := b + c { a=b+c, a=c+b, d=b+c, d=c+b, a=d, d=a } { a=b+c, a=c+b }

Challenge 2: handling loops
By Stefan Scheer (Own work (Own Photo)) [GFDL ( CC-BY-SA-3.0 ( or CC-BY-SA ( via Wikimedia Commons

{bexpr  P } S { P } { P } while b do S {bexpr  P }
Goal {bexpr  P } S { P } { P } while b do S {bexpr  P } [whilep] Annotate a program while bexpr do S with predicates from  s.t. P  N Main challenge: find N Assumption 1: P is given (otherwise use true) Assumption 2: bexpr is a simple binary expression { P } Inv = { N } while bexpr do { bexpr  N } S { Q } {bexpr  N }

Example: annotate this program
{ y=x+a, y=a+x, w=d, d=w } Inv = { y=x+a, y=a+x } while (x  z) do { z=x+a, z=a+x, w=d, d=w } x := x { w=d, d=w } y := x + a { y=x+a, y=a+x, w=d, d=w } d := x + a { y=x+a, y=a+x, d=x+a, d=a+x, y=d, d=y } { y=x+a, y=a+x, x=z, z=x }

Example: annotate this program
{ y=x+a, y=a+x, w=d, d=w } Inv = { y=x+a, y=a+x } while (x  z) do { y=x+a, y=a+x } x := x { } y := x + a { y=x+a, y=a+x } d := x + a { y=x+a, y=a+x, d=x+a, d=a+x, y=d, d=y } { y=x+a, y=a+x, x=z, z=x }

{bexpr  P } S { P } { P } while b do S {bexpr  P }
Goal {bexpr  P } S { P } { P } while b do S {bexpr  P } [whilep] Idea: try to guess a loop invariant from a small number of loop unrollings We know how to annotate S (by induction) { P } Inv = { N } while bexpr do { bexpr  N } S { Q } {bexpr  N }

k-loop unrolling { y=x+a, y=a+x, w=d, d=w } if (x  z) x := x y := x + a d := x + a Q1 = { y=x+a } { P } Inv = { N } while (x  z) do x := x y := x + a d := x + a { P } if (x  z) x := x y := x + a d := x + a Q1 = { y=x+a } if (x  z) x := x y := x + a d := x + a Q2 = { y=x+a } …

k-loop unrolling { y=x+a, y=a+x, w=d, d=w } if (x  z) x := x y := x + a d := x + a Q1 = { y=x+a, y=a+x } { P } Inv = { N } while (x  z) do x := x y := x + a d := x + a { P } if (x  z) x := x y := x + a d := x + a Q1 = { y=x+a, y=a+x } if (x  z) x := x y := x + a d := x + a Q2 = { y=x+a, y=a+x } …

k-loop unrolling { y=x+a, y=a+x, w=d, d=w } if (x  z) x := x y := x + a d := x + a Q1 = { y=x+a, y=a+x } { P } Inv = { N } while (x  z) do x := x y := x + a d := x + a { P } if (x  z) x := x y := x + a d := x + a Q1 = { y=x+a, y=a+x } if (x  z) x := x y := x + a d := x + a Q2 = { y=x+a, y=a+x } The following must hold: P  N Q1  N Q2  N … Qk  N …

k-loop unrolling { y=x+a, y=a+x, w=d, d=w } if (x  z) x := x y := x + a d := x + a Q1 = { y=x+a, y=a+x } { P } Inv = { N } while (x  z) do x := x y := x + a d := x + a { P } if (x  z) x := x y := x + a d := x + a Q1 = { y=x+a, y=a+x } if (x  z) x := x y := x + a d := x + a Q2 = { y=x+a, y=a+x } The following must hold: P  N Q1  N Q2  N … Qk  N … Observation 1: No need to explicitly unroll loop – we can reuse postcondition from unrolling k-1 for k We can compute the following sequence: N0 = P N1 = N0  Q1 N2 = N1  Q2 … Nk = Nk-1  Qk …

k-loop unrolling { y=x+a, y=a+x, w=d, d=w } if (x  z) x := x y := x + a d := x + a Q1 = { y=x+a, y=a+x } { P } Inv = { N } while (x  z) do x := x y := x + a d := x + a { P } if (x  z) x := x y := x + a d := x + a Q1 = { y=x+a, y=a+x } if (x  z) x := x y := x + a d := x + a Q2 = { y=x+a, y=a+x } The following must hold: P  N Q1  N Q2  N … Qk  N … Observation 2: Nk monotonically decreases set of facts. Question: does it stabilizes for some k? We can compute the following sequence: N0 = P N1 = N1  Q1 N2 = N1  Q2 … Nk = Nk-1  Qk …

Algorithm for annotating a loop
Annotate(P, while bexpr do S) = Initialize N := Nc := P repeat let Annotate(P, if b then S else skip) be {Nc} if bexpr then S else skip {N} Nc := Nc  N until N = Nc return {P} INV= N while bexpr do F[assume bexpr](N) Annotate(F[assume bexpr](N), S) F[assume bexpr](N)

Putting it together

Algorithm for annotating a program
Annotate(P, S) = case S is x:=aexpr return {P} x:=aexpr {F*[x:=aexpr] P} case S is S1; S2 let Annotate(P, S1) be {P} A1 {Q1} let Annotate(Q1, S2) be {Q1} A2 {Q2} return {P} A1; {Q1} A2 {Q2} case S is if bexpr then S1 else S2 let Pt = F[assume bexpr] P let Pf = F[assume bexpr] P let Annotate(Pt, S1) be {Pt} A1 {Q1} let Annotate(Pf, S2) be {Pf} A2 {Q2} return {P} if bexpr then {Pt} A1 {Q1} else {Pf} A2 {Q2} {Q1  Q2} case S is while bexpr do S N := Nc := P // Initialize repeat let Pt = F[assume bexpr] Nc let Annotate(Pt, S) be {Nc} Abody {N} Nc := Nc  N until N = Nc return {P} INV= {N} while bexpr do {Pt} Abody {F[assume bexpr](N)}

Exercise: apply algorithm
{ } y := a+b { } x := y { } while (xz) do { } w := a+b { } x := a+b { } a := z { }

Step 1/18 {} y := a+b { y=a+b }*
Not all factoids are shown – apply reduce to get all factoids {} y := a+b { y=a+b }* x := y while (xz) do w := a+b x := a+b a := z

Step 2/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }*
while (xz) do w := a+b x := a+b a := z

Step 3/18 {} y := a+b { y=a+b }*
x := y { y=a+b, x=y, x=a+b }* Inv’ = { y=a+b, x=y, x=a+b }* while (xz) do w := a+b x := a+b a := z

Step 4/18 {} y := a+b { y=a+b }*
x := y { y=a+b, x=y, x=a+b }* Inv’ = { y=a+b, x=y, x=a+b }* while (xz) do { y=a+b, x=y, x=a+b }* w := a+b x := a+b a := z

Step 5/18 {} y := a+b { y=a+b }*
x := y { y=a+b, x=y, x=a+b }* Inv’ = { y=a+b, x=y, x=a+b }* while (xz) do { y=a+b, x=y, x=a+b }* w := a+b { y=a+b, x=y, x=a+b, w=a+b, w=x, w=y }* x := a+b a := z

Step 6/18 {} y := a+b { y=a+b }*
x := y { y=a+b, x=y, x=a+b }* Inv’ = { y=a+b, x=y, x=a+b }* while (xz) do { y=a+b, x=y, x=a+b }* w := a+b { y=a+b, x=y, x=a+b, w=a+b, w=x, w=y }* x := a+b { y=a+b, w=a+b, w=y, x=a+b, w=x, x=y }* a := z

Step 7/18 {} y := a+b { y=a+b }*
x := y { y=a+b, x=y, x=a+b }* Inv’ = { y=a+b, x=y, x=a+b }* while (xz) do { y=a+b, x=y, x=a+b }* w := a+b { y=a+b, x=y, x=a+b, w=a+b, w=x, w=y }* x := a+b { y=a+b, w=a+b, w=y, x=a+b, w=x, x=y }* a := z { w=y, w=x, x=y, a=z }*

Step 8/18 {} y := a+b { y=a+b }*
x := y { y=a+b, x=y, x=a+b }* Inv’’ = { x=y }* while (xz) do { y=a+b, x=y, x=a+b }* w := a+b { y=a+b, x=y, x=a+b, w=a+b, w=x, w=y }* x := a+b { y=a+b, w=a+b, w=y, x=a+b, w=x, x=y }* a := z { w=y, w=x, x=y, a=z }*

Step 9/18 {} y := a+b { y=a+b }*
x := y { y=a+b, x=y, x=a+b }* Inv’’ = { x=y }* while (xz) do { x=y }* w := a+b { y=a+b, x=y, x=a+b, w=a+b, w=x, w=y }* x := a+b { y=a+b, w=a+b, w=y, x=a+b, w=x, x=y }* a := z { w=y, w=x, x=y, a=z }*

Step 10/18 {} y := a+b { y=a+b }*
x := y { y=a+b, x=y, x=a+b }* Inv’’ = { x=y }* while (xz) do { x=y }* w := a+b { x=y, w=a+b }* x := a+b { y=a+b, w=a+b, w=y, x=a+b, w=x, x=y }* a := z { w=y, w=x, x=y, a=z }*

Step 11/18 {} y := a+b { y=a+b }*
x := y { y=a+b, x=y, x=a+b }* Inv’’ = { x=y }* while (xz) do { x=y }* w := a+b { x=y, w=a+b }* x := a+b { x=a+b, w=a+b, w=x }* a := z { w=y, w=x, x=y, a=z }*

Step 12/18 {} y := a+b { y=a+b }*
x := y { y=a+b, x=y, x=a+b }* Inv’’ = { x=y }* while (xz) do { x=y }* w := a+b { x=y, w=a+b }* x := a+b { x=a+b, w=a+b, w=x }* a := z { w=x, a=z }*

Step 13/18 {} y := a+b { y=a+b }*
x := y { y=a+b, x=y, x=a+b }* Inv’’’ = { } while (xz) do { x=y }* w := a+b { x=y, w=a+b }* x := a+b { x=a+b, w=a+b, w=x }* a := z { w=x, a=z }*

Step 14/18 {} y := a+b { y=a+b }*
x := y { y=a+b, x=y, x=a+b }* Inv’’’ = { } while (xz) do { } w := a+b { x=y, w=a+b }* x := a+b { x=a+b, w=a+b, w=x }* a := z { w=x, a=z }*

Step 15/18 {} y := a+b { y=a+b }*
x := y { y=a+b, x=y, x=a+b }* Inv’’’ = { } while (xz) do { } w := a+b { w=a+b }* x := a+b { x=a+b, w=a+b, w=x }* a := z { w=x, a=z }*

Step 16/18 {} y := a+b { y=a+b }*

Step 17/18 {} y := a+b { y=a+b }*

Step 18/18 {} y := a+b { y=a+b }*
x := y { y=a+b, x=y, x=a+b }* Inv = { } while (xz) do { } w := a+b { w=a+b }* x := a+b { x=a+b, w=a+b, w=x }* a := z { w=x, a=z }* { x=z }*

Constant propagation

Second static analysis example
Optimization: constant folding Example: x:=7; y:=x*9 transformed to: x:=7; y:=7*9 and then to: x:=7; y:=63 Analysis: constant propagation (CP) Infers facts of the form x=c simplifies constant expressions constant folding { x=c } y := aexpr y := eval(aexpr[c/x])

Plan Define domain – set of allowed assertions Handle assignments
Handle composition Handle conditions Handle loops

Constant propagation domain

CP semantic domain ?

CP semantic domain Define CP-factoids:  = { x = c | x  Var, c  Z }
How many factoids are there? Define predicates as  = 2 How many predicates are there? Do all predicates make sense? (x=5)  (x=7) Treat conjunctive formulas as sets of factoids {x=5, y=7} ~ (x=5)  (y=7)

Handling assignments

CP abstract transformer
Goal: define a function FCP[x:=aexpr] :    such that if FCP[x:=aexpr] P = P’ then sp(x:=aexpr, P)  P’ ?

CP abstract transformer
Goal: define a function FCP[x:=aexpr] :    such that if FCP[x:=aexpr] P = P’ then sp(x:=aexpr, P)  P’ { x=c } x:=aexpr { } [kill] { } x:=c { x=c } [gen-1] { y=c1, z=c2 } x:=y op z { x=c} and c=c1 op c2 [gen-2] { y=c } x:=aexpr { y=c } [preserve]

Gen-kill formulation of transformers
Suited for analysis propagating sets of factoids Available expressions, Constant propagation, etc. For each statement, define a set of killed factoids and a set of generated factoids F[S] P = (P \ kill(S))  gen(S) FCP[x:=aexpr] P = (P \ {x=c}) aexpr is not a constant FCP[x:=k] P = (P \ {x=c})  {x=k} Used in dataflow analysis – a special case of abstract interpretation

Handling composition

Does this still work? Annotate(P, S1; S2) = let Annotate(P, S1) be {P} A1 {Q1} let Annotate(Q1, S2) be {Q1} A2 {Q2} return {P} A1; {Q1} A2 {Q2}

Handling conditions

We want to soundly approximate D  bexpr and D  bexpr in  Define (bexpr) = if bexpr is CP-factoid {bexpr} else {} Define F[assume bexpr](D) = D  (bexpr)

Does this still work? let Pt = F[assume bexpr] P let Pf = F[assume bexpr] P let Annotate(Pt, S1) be {Pt} A1 {Q1} let Annotate(Pf, S2) be {Pf} A2 {Q2} return {P} if bexpr then {Pt} A1 {Q1} else {Pf} A2 {Q2} {Q1  Q2} How do we define join for CP?

Join example {x=5, y=7}  {x=3, y=7, z=9} =

Handling loops

Does this still work? What about correctness? What about termination?
Annotate(P, while bexpr do S) = N := Nc := P // Initialize repeat let Pt = F[assume bexpr] Nc let Annotate(Pt, S) be {Nc} Abody {N} Nc := Nc  N until N = Nc return {P} INV= {N} while bexpr do {Pt} Abody {F[assume bexpr](N)} What about correctness? What about termination?

Does this still work? What about correctness? What about termination?
Annotate(P, while bexpr do S) = N := Nc := P // Initialize repeat let Pt = F[assume bexpr] Nc let Annotate(Pt, S) be {Nc} Abody {N} Nc := Nc  N until N = Nc return {P} INV= {N} while bexpr do {Pt} Abody {F[assume bexpr](N)} What about correctness? If loop terminates then is N a loop invariant? What about termination?

A termination principle
g : X  X is a function How can we determine whether the sequence x0, x1 = g(x0), …, xk+1=g(xk),… stabilizes? Technique: Find ranking function rank : X  N (that is show that rank(x)  0 for all x) Show that if xg(x) then rank(g(x)) < rank(x)

Rank function for available expressions
rank(P) = ?

Rank function for available expressions
Annotate(P, while bexpr do S) = N := Nc := P // Initialize repeat let Pt = F[assume bexpr] Nc let Annotate(Pt, S) be {Nc} Abody {N} Nc := Nc  N until N = Nc return {P} INV= {N} while bexpr do {Pt} Abody {F[assume bexpr](N)} rank(P) = |P| number of factoids Prove that either Nc = Nc  N or rank(Nc  N) <? rank(Nc)

Rank function for constant propagation
Annotate(P, while bexpr do S) = N := Nc := P // Initialize repeat let Pt = F[assume bexpr] Nc let Annotate(Pt, S) be {Nc} Abody {N} Nc := Nc  N until N = Nc return {P} INV= {N} while bexpr do {Pt} Abody {F[assume bexpr](N)} rank(P) = ? Prove that either Nc = Nc  N or rank(Nc) >? rank(Nc  N)

Rank function for constant propagation
Annotate(P, while bexpr do S) = N’ := Nc := P // Initialize repeat let Pt = F[assume bexpr] Nc let Annotate(Pt, S) be {Nc} Abody {N’} Nc := Nc  N’ until N’ = Nc return {P} INV= {N’} while bexpr do {Pt} Abody {F[assume bexpr](N)} rank(P) = |P| number of factoids Prove that either Nc = Nc  N’ or rank(Nc) >? rank(Nc  N’)

Available Expressions Abstract Interpretation
Generalizing 1 Available Expressions Constant Propagation By NMZ (Photoshop) [CC0], via Wikimedia Commons Abstract Interpretation

Towards a recipe for static analysis
Two static analyses Available Expressions (extended with equalities) Constant Propagation Semantic domain – a family of formulas Join operator approximates pairs of formulas Abstract transformers for basic statements Assignments assume statements Initial precondition

Control flow graphs

A technical issue Unrolling loops is quite inconvenient and inefficient (but we can avoid it as we just saw) How do we handle more complex control-flow constructs, e.g., goto , break, exceptions…? The problem: non-inductive control flow constructs Solution: model control-flow by labels and goto statements Would like a dedicated data structure to explicitly encode control flow in support of the analysis Solution: control-flow graphs (CFGs)

Modeling control flow with labels
while (x  z) do x := x y := x + a d := x + a a := b label0: if x  z goto label1 x := x y := x + a d := x + a goto label0 label1: a := b

Control-flow graph example
line number label0: if x  z goto label1 x := x y := x + a d := x + a goto label0 label1: a := b 1 2 3 4 1 label0: 5 6 2 if x  z 7 8 label1: x := x + 1 7 3 a := b y := x + a 8 4 d := x + a 5 goto label0 6

label0: if x  z goto label1 x := x y := x + a d := x + a goto label0 label1: a := b 1 entry 2 3 4 1 label0: 5 6 2 if x  z 7 8 label1: x := x + 1 7 3 a := b y := x + a 8 4 exit d := x + a 5 goto label0 6

Control-flow graph Node are statements or labels
Special nodes for entry/exit A edge from node v to node w means that after executing the statement of v control passes to w Conditions represented by splits and join node Loops create cycles Can be generated from abstract syntax tree in linear time Automatically taken care of by the front-end Usage: store analysis results (assertions) in CFG nodes

Eliminating labels We can use edges to point to the nodes following labels and remove all label nodes (other than entry/exit)

label0: if x  z goto label1 x := x y := x + a d := x + a goto label0 label1: a := b 1 entry 2 3 4 5 6 2 if x  z 7 8 x := x + 1 3 a := b y := x + a 8 4 exit d := x + a 5

Basic blocks A basic block is a chain of nodes with a single entry point and a single exit point Entry/exit nodes are separate blocks entry 2 if x  z x := x + 1 3 a := b y := x + a 8 4 exit d := x + a 5

Blocked CFG Stores basic blocks in a single node
Extended blocks – maximal connected loop-free subgraphs entry 2 if x  z x := x + 1 y := x + a d := x + a 3 4 a := b 5 8 exit

Collecting semantics

Why need another semantics?
Operational semantics explains how to compute output from a given input Useful for implementing an interpreter/compiler Less useful for reasoning about safety properties Not suitable for analysis purposes – does not explicitly show how assertions in different program points influence each other Need a more explicit semantics Over a control flow graph

label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: 1 entry 2 3 label0: 4 1 5 2 if x > 0 x := x - 1 3 label1: goto label0: 5 4 exit

Trimmed CFG label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: 1 2 3 entry 4 5 2 if x > 0 exit x := x - 1 3

Collecting semantics example: input 1
label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: 1 2 … 3 [x3] [x2] [x1] entry 4 5 [x0] [x1] 2 if x > 0 [x0] exit [x1] x := x - 1 3

label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: 1 2 … 3 [x3] [x2] [x1] entry 4 5 [x2] [x0] [x1] 2 if x > 0 [x0] exit [x1] x := x - 1 3 [x2]

label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: 1 2 … 3 [x3] [x2] [x1] entry 4 5 [x3] [x2] [x0] [x1] 2 if x > 0 [x0] exit [x1] x := x - 1 3 [x3] [x2]

ad infinitum – fixed point
label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: 1 2 … 3 [x3] [x2] [x1] entry 4 5 … [x3] [x2] [x0] [x1] 2 if x > 0 [x0] exit [x1] x := x - 1 … 3 [x-2] [x-1] [x3] [x2] …

Predicates at fixed point
label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: 1 2 3 {true} entry 4 5 {?} 2 if x > 0 {?} exit {?} x := x - 1 3

Predicates at fixed point
label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: 1 2 3 {true} entry 4 5 {true} 2 if x > 0 {x0} exit {x>0} x := x - 1 3 {x0}

Collecting semantics Accumulates for each control-flow node the (possibly infinite) sets of states that can reach there by executing the program from some given set of input states Not computable in general A reference point for static analysis (An abstraction of the trace semantics) We will define it formally

Collecting semantics in equational form

Math reference: function lifting
Let f : X  Y be a function The lifted function f’ : 2X  2Y is defined as f’(XS) = { f(x) | x XS } We will sometimes use the same symbol for both functions when it is clear from the context which one is used

Equational definition example
A vector of variables R[0, 1, 2, 3, 4] R[0] = {xZ} // established input R[1] = R[0]  R[4] R[2] = assume x>0 R[1] R[3] = assume (x>0) R[1] R[4] = x:=x-1 R[2] A (recursive) system of equations Semantic function for x:=x-1 lifted to sets of states entry R[0] R[1] if x > 0 R[3] R[2] R[4] exit x := x-1

General definition A vector of variables R[0, …, k] one per input/output of a node R[0] is for entry For node n with multiple predecessors add equation R[n] = {R[k] | k is a predecessor of n} For an atomic operation node R[m] S R[n] add equation R[n] = S R[m] Transform if b then S1 else S2 to (assume b; S1) or (assume b; S2) entry R[0] R[1] if x > 0 R[3] R[2] R[4] exit x := x-1

see you next time

Spring 2016 Program Analysis and Verification

Similar presentations

Presentation on theme: "Spring 2016 Program Analysis and Verification"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Spring 2016 Program Analysis and Verification

Similar presentations

Presentation on theme: "Spring 2016 Program Analysis and Verification"— Presentation transcript:

Similar presentations

About project

Feedback