# Program Verification using Probabilistic Techniques Sumit Gulwani Microsoft Research Invited Talk: VSTTE Workshop August 2006 Joint work with George Necula.

## Presentation on theme: "Program Verification using Probabilistic Techniques Sumit Gulwani Microsoft Research Invited Talk: VSTTE Workshop August 2006 Joint work with George Necula."— Presentation transcript:

Program Verification using Probabilistic Techniques Sumit Gulwani Microsoft Research Invited Talk: VSTTE Workshop August 2006 Joint work with George Necula and Nebojsa Jojic

1 Probabilistic Techniques Used successfully in several areas of computer science. Yields more efficient, precise, even simpler algorithms. Technique 1: Random Interpretation –Discovers program invariants –Monte Carlo Algorithm: May generate invalid invariants with a small probability. Running time is bounded. –Random Testing + Abstract Interpretation Technique 2: Simulated Annealing –Discovers proof of validity/invalidity of a Hoare triple. –Las Vegas Algorithm: Generates a correct proof. Running time is probabilistic. –Forward Analysis + Backward Analysis

2 Random Interpretation = Random Testing + Abstract Interpretation Random Testing: Test program on random inputs Simple, efficient but unsound (cant prove absence of bugs) Abstract Interpretation: Class of deterministic program analyses Interpret (analyze) an abstraction (approximation) of program Sound but usually complicated, expensive Random Interpretation: Class of randomized program analyses Almost as simple, efficient as random testing Almost as sound as abstract interpretation

3 a := 0; b := i;a := i-2; b := 2; c := b – a; d := i – 2b; assert(c+d = 0); assert(c = a+i) c := 2a + b; d := b – 2i; True False True * * Example 1

4 a := 0; b := i;a := i-2; b := 2; c := b – a; d := i – 2b; assert(c+d = 0); assert(c = a+i) c := 2a + b; d := b – 2i; True False True * * Example 1: Random Testing Need to test blue path to falsify second assertion. Chances of choosing blue path from set of all 4 paths are small. Hence, random testing is unsound.

5 a+b=i a+b=i, c=-d a=i-2, b=2 a+b=i c=2a+b, d=b-2i a+b=i c=b-a, d=i-2b a=0, b=i a := 0; b := i;a := i-2; b := 2; c := b – a; d := i – 2b; assert(c+d = 0); assert(c = a+i) c := 2a + b; d := b – 2i; True False True * * Example 1: Abstract Interpretation Computes invariant at each program point. Operations are usually complicated and expensive.

6 a := 0; b := i;a := i-2; b := 2; c := b – a; d := i – 2b; assert(c+d = 0); assert(c = a+i) c := 2a + b; d := b – 2i; True False True * * Example 1: Random Interpretation Choose random values for input variables. Execute both branches of a conditional. Combine values of variables at join points. Test the assertion.

7 Random Interpretation: Outline Random Interpretation Linear arithmetic (POPL 2003) –Uninterpreted functions (POPL 2004) –Inter-procedural analysis (POPL 2005)

8 Linear relationships in programs with linear assignments Linear relationships (e.g., x=2y+5) are useful for –Program correctness (e.g. buffer overflows) –Compiler optimizations (e.g., constant and copy propagation, CSE, Induction variable elimination etc.) programs with linear assignments does not mean inapplicability to real programs –abstract other program stmts as non-deterministic assignments (standard practice in program analysis)

9 Basic idea in random interpretation Generic algorithm: Choose random values for input variables. Execute both branches of a conditional. Combine the values of variables at join points. Test the assertion.

10 Idea #1: The Affine Join operation w = 7 a = 2 b = 3 a = 4 b = 1 a = 7 (2,4) = -10 b = 7 (3,1) = 15 Affine join of v 1 and v 2 w.r.t. weight w w (v 1,v 2 ) ´ w v 1 + (1-w) v 2 Affine join preserves common linear relationships (a+b=5) It does not introduce false relationships w.h.p.

11 Idea #1: The Affine Join operation Affine join of v 1 and v 2 w.r.t. weight w w (v 1,v 2 ) ´ w v 1 + (1-w) v 2 Affine join preserves common linear relationships (a+b=5) It does not introduce false relationships w.h.p. Unfortunately, non-linear relationships are not preserved (e.g. a £ (1+b) = 8) w = 5 a = 5 (2,4) = -6 b = 5 (3,1) = 11 w = 7 a = 2 b = 3 a = 4 b = 1 a = 7 (2,4) = -10 b = 7 (3,1) = 15

12 Geometric Interpretation of Affine Join a b a + b = 5 b = 2 (a = 2, b = 3) (a = 4, b = 1) : State before the join : State after the join satisfies all the affine relationships that are satisfied by both (e.g. a + b = 5) Given any relationship that is not satisfied by any of (e.g. b=2), also does not satisfy it with high probability

i=3, a=0, b=3 i=3 a := 0; b := i; a := i-2; b := 2; c := b – a; d := i – 2b; assert (c+d = 0); assert (c = a+i) i=3, a=-4, b=7 c=23, d=-23 c := 2a + b; d := b – 2i; i=3, a=1, b=2 i=3, a=-4, b=7 c=-1, d=1 i=3, a=-4, b=7 c=11, d=-11 False w 1 = 5 w 2 = 2 True * * Example 1 Choose a random weight for each join independently. All choices of random weights verify first assertion Almost all choices contradict second assertion

14 Correctness of Random Interpreter R Completeness: If e 1 =e 2, then R ) e 1 =e 2 –assuming non-det conditionals Soundness: If e 1 e 2, then R e 1 = e 2 –error prob. · j : number of joins d: size of set from which random values are chosen k: number of points in the sample –If j = 10, k = 4, d ¼ 2 32, then error ·

15 Proof Methodology Proving correctness was the most complicated part in this work. We used the following methodology. Design an appropriate deterministic algorithm (need not be efficient) Prove (by induction) that the randomized algorithm simulates each step of the deterministic algorithm with high probability.

16 Random Interpretation: Outline Random Interpretation –Linear Arithmetic (POPL 2003) Uninterpreted functions (POPL 2004) –Inter-procedural analysis (POPL 2005)

17 Problem: Global value numbering a := 5; x := a*b; y := 5*b; z := b*a; a := 5; x := F(a,b); y := F(5,b); z := F(b,a); Abstraction x=y and x=z Reasoning about multiplication is undecidable only x=y Reasoning is decidable but tricky in presence of joins Axiom: If x 1 =y 1 and x 2 =y 2, then F(x 1,x 2 )=F(y 1,y 2 ) Goal: Detect expression equivalence when program operators are abstracted using uninterpreted functions Application: Compiler optimizations, Translation validation

18 Random Interpretation: Outline Random Interpretation –Linear arithmetic (POPL 2003) –Uninterpreted functions (POPL 2004) Inter-procedural analysis (POPL 2005)

19 Example 1 a := 0; b := i;a := i-2; b := 2; c := b – a; d := i – 2b; assert (c + d = 0); assert (c = a + i) c := 2a + b; d := b – 2i; True False The second assertion is true in the context i=2. Interprocedural Analysis requires computing procedure summaries. True * *

i=2 a=0, b=i a := 0; b := i;a := i-2; b := 2; c := b – a; d := i – 2b; assert (c+d = 0); assert (c = a+i) a=8-4i, b=5i-8 c=21i-40, d=40-21i c := 2a + b; d := b – 2i; a=i-2, b=2 a=8-4i, b=5i-8 c=8-3i, d=3i-8 a=8-4i, b=5i-8 c=9i-16, d=16-9i False w 1 = 5 w 2 = 2 Idea: Keep input variables symbolic Do not choose random values for input variables (to later instantiate by any context). Resulting program state at the end is a random procedure summary. a=0, b=2 c=2, d=-2 True * *

21 Experimental measure of error The % of incorrect relationships decreases with increase in S = size of set from which random values are chosen. N = # of random summaries used. 295.5 364.33.20 40.200 5000 6000 S N The experimental results are better than what is predicted by theory. 2 10 2 16 2 31

22 Simulated Annealing Problem: Given a program with a pre/post conditions, discover proof of validity/invalidity. Proof is in the form of an invariant at each program point that can be locally verified. Key Idea: –Initialize invariants at all program points to anything. –Pick a random program point whose invariant is not locally consistent and update it to make it less inconsistent.

23 Simulated Annealing: Outline Simulated Annealing Inconsistency Measure & Penalty Function –Algorithm –Experiments

24 Inconsistency Measure for an Abstract Domain Let A be an abstract domain with ) as the partial order and as the concretization function. An inconsistency measure IM: A £ A ! [0,1] satisfies: –IM( 1, 2 ) = 0 iff 1 ) 2 –IM is monotonically decreasing in its first argument –IM is monotonically increasing in its second argument IM is a monotonic (increasing) measure of ( 1 ) - ( 2 ) [set of states that violate 1 ) 2 ]. The more strictly monotonic IM is, the more smooth it is.

25 Example of a Smooth Inconsistency Measure Let A be the abstract domain of Boolean formulas (with the usual implication as the partial order). Let 1 ´ a 1 Ç … Ç a n in DNF and 2 ´ b 1 Æ … Æ b m in CNF IM( 1, 2 ) = IM(a i,b j ) where IM(a i,b j ) = 0, if a i ) b j = 1, otherwise

26 Penalty Function Penalty(I, ) is a measure of how much inconsistent is I with respect to the invariants at neighbors of. Penalty(I, ) = IM(Post( ), I) + IM(I,Pre( )) Post( ) is the strongest postcondition of the invariants at the predecessors of at. Pre( ) is the weakest precondition of the invariants at the successors of at.

27 Example of Penalty Function Penalty(I, 2 ) = IM(Post( 2 ), I) + IM(I, Pre( 2 )) I Q P R c 1 Post( 2 ) = StrongestPost(P,s) Pre( 2 ) = (c ) Q) Æ ( : c ) R) s Since Post( ) and Pre( ) may not belong to A, we define: IM(Post( ), I) = Min {IM(I 1,I) | I 1 2 A, I 1 overapproximates Post( )} IM(I, Pre( )) = Min {IM(I,I 2 ) | I 2 2 A, I 2 underapproximates Pre( )}

28 Simulated Annealing: Outline Simulated Annealing –Inconsistency Measure & Penalty Function Algorithm –Experiments

29 Algorithm Search for proof of validity and invalidity in parallel. Same algorithm with different boundary conditions. Proof of Validity –I entry = Pre –I exit = Post Proof of Invalidity –I entry Æ Pre is satisfiable –I exit = : Post –This assumes that program terminates on all inputs.

30 Algorithm (Continued) Initialize invariant I j at program point j to anything. While penalty at some program point is not 0: –Choose j randomly s.t. Penalty(I j, j ) 0. –Update I j s.t. Penalty(I j, j ) is minimized. More precisely, I j is chosen randomly with probability inversely proportional to Penalty(I j, j ).

31 Interesting Aspects of the Algorithm Combination of Forward & Backward Analysis No distinction between forward & backward information Random Choices –Program point to update –Invariant choice

32 Simulated Annealing: Outline Simulated Annealing –Inconsistency Measure & Penalty Function –Algorithm Experiments

33 Example 2 y := 50; y = 100 False x := x +1; y := y +1; x < 50 x <100 True False 1 2 3 4 5 6 7 8 x = 0 Prog. Point Invariant 1 x = 0 Æ y = 50 2 x · 50 ) y = 50 Æ 50 · x ) x = y Æ x · 100 3 x · 50 ) y = 50 Æ 50 · x ) x = y Æ x <100 4 x <50 Æ y = 50 5 x · 50 Æ y = 50 6 50 · x <100 Æ x = y 7 50< x · 100 Æ x = y 8 x · 50 ) y = 50 Æ 50 · x ) x = y Æ x · 100 Proof of Validity

34 Stats: Proof vs Incremental Proof of Validity Black: Proof of Validity Grey: Incremental Proof of Validity Incremental proof requires fewer updates

35 Stats: Different Sizes of Boolean Formulas Grey: 5*3, Black: 4*3, White: 3*2 n*m denotes n conjuncts & m disjuncts Larger size requires fewer updates

36 Example 3 x := 0; m := 0; n · 0 Ç 0 · m < n False m := x ; x := x +1; * x < n True 1 2 3 4 6 5 7 8 true Prog. Point Invariant 1 x=0 Æ m=0x=0 Æ m=0 2 n · 0 Ç (0 · x Æ 0 · m < n ) 3 n · 0 Ç (0 · x < n Æ 0 · m < n ) 4 5 6 7 8 n · 0 Ç (0 · x · n Æ 0 · m < n ) Proof of Validity

37 Stats: Proof of Validity Example 2 is easier than Example 1. Easier example requires fewer updates.

38 Example 2: Precondition Modified Prog. Point Invariant 0 x ¸ 100 1 x ¸ 100 Æ y = 50 2 3 false 4 5 6 7 8 Proof of Invalidity y := 50; y = 100 False x := x +1; y := y +1; x < 50 x <100 True False 1 2 3 4 5 6 7 8 true

39 Stats: Proof of Invalidity

Conclusion Lessons Learned Randomization buys efficiency and simplicity. Randomization suggests ideas for deterministic algorithms. Combining randomized and symbolic techniques is powerful. Summary Random Interpretation: Linear Arithmetic: Affine Joins Uninterpreted Functions: Random Linear Interpretations Interprocedural Analysis: Symbolic Input Variables Simulated Annealing: Smooth Inconsistency Measure for an abstract domain

Similar presentations