Program Verification using Probabilistic Techniques Sumit Gulwani Microsoft Research Invited Talk: VSTTE Workshop August 2006 Joint work with George Necula.

Slides:



Advertisements
Similar presentations
Assertion Checking over Combined Abstraction of Linear Arithmetic and Uninterpreted Functions Sumit Gulwani Microsoft Research, Redmond Ashish Tiwari SRI.
Advertisements

A Randomized Satisfiability Procedure for Arithmetic and Uninterpreted Function Symbols Sumit Gulwani George Necula EECS Department University of California,
A Polynomial-Time Algorithm for Global Value Numbering SAS 2004 Sumit Gulwani George C. Necula.
Path-Sensitive Analysis for Linear Arithmetic and Uninterpreted Functions SAS 2004 Sumit Gulwani George Necula EECS Department University of California,
Global Value Numbering using Random Interpretation Sumit Gulwani George C. Necula CS Department University of California, Berkeley.
3.6 Support Vector Machines
October 31, 2005Copyright © by Erik D. Demaine and Charles E. LeisersonL13.1 Introduction to Algorithms LECTURE 11 Amortized Analysis Dynamic tables.
Precise Interprocedural Analysis using Random Interpretation Sumit Gulwani George Necula UC-Berkeley.
Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.
Logical Abstract Interpretation Sumit Gulwani Microsoft Research, Redmond.
Applications Computational LogicLecture 11 Michael Genesereth Spring 2004.
1 MPE and Partial Inversion in Lifted Probabilistic Variable Elimination Rodrigo de Salvo Braz University of Illinois at Urbana-Champaign with Eyal Amir.
Constraint Satisfaction Problems
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.
UNITED NATIONS Shipment Details Report – January 2006.
October 17, 2005 Copyright© Erik D. Demaine and Charles E. Leiserson L2.1 Introduction to Algorithms 6.046J/18.401J LECTURE9 Randomly built binary.
© 2001 by Charles E. Leiserson Introduction to AlgorithmsDay 17 L9.1 Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 9 Prof. Charles E. Leiserson.
©2001 by Charles E. Leiserson Introduction to AlgorithmsDay 9 L6.1 Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 6 Prof. Erik Demaine.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
Arithmetic and Geometric Means
Exit a Customer Chapter 8. Exit a Customer 8-2 Objectives Perform exit summary process consisting of the following steps: Review service records Close.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 10 second questions
Overview of Lecture Partitioning Evaluating the Null Hypothesis ANOVA
1 Program verification: flowchart programs (Book: chapter 7)
Chapter 7 Sampling and Sampling Distributions
1 Outline relationship among topics secrets LP with upper bounds by Simplex method basic feasible solution (BFS) by Simplex method for bounded variables.
Solve Multi-step Equations
1 Refactoring with Contracts Shmuel Tyszberowicz School of Computer Science The Academic College of Tel Aviv Yaffo Maayan Goldstein School of Computer.
Detection Chia-Hsin Cheng. Wireless Access Tech. Lab. CCU Wireless Access Tech. Lab. 2 Outlines Detection Theory Simple Binary Hypothesis Tests Bayes.
Online Algorithm Huaping Wang Apr.21
5-1 Chapter 5 Theory & Problems of Probability & Statistics Murray R. Spiegel Sampling Theory.
VOORBLAD.
1 Decision Procedures An algorithmic point of view Equality Logic and Uninterpreted Functions.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
© 2012 National Heart Foundation of Australia. Slide 2.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
DB analyzer utility An overview 1. DB Analyzer An application used to track discrepancies and other reports in Sanchay Post Constantly updated by SDC.
25 seconds left…...
Januar MDMDFSSMDMDFSSS
Symbolic Analysis. Symbolic analysis tracks the values of variables in programs symbolically as expressions of input variables and other variables, which.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Local Search Jim Little UBC CS 322 – CSP October 3, 2014 Textbook §4.8
PSSA Preparation.
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
Insertion Sort Introduction to Algorithms Insertion Sort CSE 680 Prof. Roger Crawfis.
Impossibility of Consensus in Asynchronous Systems (FLP) Ali Ghodsi – UC Berkeley / KTH alig(at)cs.berkeley.edu.
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
Distributed Computing 5. Snapshot Shmuel Zaks ©
The Pumping Lemma for CFL’s
Chapter 5 The Mathematics of Diversification
SAT Solver CS 680 Formal Methods Jeremy Johnson. 2 Disjunctive Normal Form  A Boolean expression is a Boolean function  Any Boolean function can be.
Hoare’s Correctness Triplets Dijkstra’s Predicate Transformers
Program Analysis as Constraint Solving Sumit Gulwani (MSR Redmond) Ramarathnam Venkatesan (MSR Redmond) Saurabh Srivastava (Univ. of Maryland) TexPoint.
Discovering Affine Equalities Using Random Interpretation Sumit Gulwani George Necula EECS Department University of California, Berkeley.
Program Verification as Probabilistic Inference Sumit Gulwani Nebojsa Jojic Microsoft Research, Redmond.
Precise Inter-procedural Analysis Sumit Gulwani George C. Necula using Random Interpretation presented by Kian Win Ong UC Berkeley.
CS 536 Spring Global Optimizations Lecture 23.
Program Analysis Using Randomization Sumit Gulwani, George Necula (U.C. Berkeley)
Global Value Numbering Using Random Interpretation OSQ Retreat, May 2003 Sumit Gulwani George Necula EECS Department University of California, Berkeley.
Prof. Bodik CS 164 Lecture 16, Fall Global Optimization Lecture 16.
Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler.
Presentation transcript:

Program Verification using Probabilistic Techniques Sumit Gulwani Microsoft Research Invited Talk: VSTTE Workshop August 2006 Joint work with George Necula and Nebojsa Jojic

1 Probabilistic Techniques Used successfully in several areas of computer science. Yields more efficient, precise, even simpler algorithms. Technique 1: Random Interpretation –Discovers program invariants –Monte Carlo Algorithm: May generate invalid invariants with a small probability. Running time is bounded. –Random Testing + Abstract Interpretation Technique 2: Simulated Annealing –Discovers proof of validity/invalidity of a Hoare triple. –Las Vegas Algorithm: Generates a correct proof. Running time is probabilistic. –Forward Analysis + Backward Analysis

2 Random Interpretation = Random Testing + Abstract Interpretation Random Testing: Test program on random inputs Simple, efficient but unsound (cant prove absence of bugs) Abstract Interpretation: Class of deterministic program analyses Interpret (analyze) an abstraction (approximation) of program Sound but usually complicated, expensive Random Interpretation: Class of randomized program analyses Almost as simple, efficient as random testing Almost as sound as abstract interpretation

3 a := 0; b := i;a := i-2; b := 2; c := b – a; d := i – 2b; assert(c+d = 0); assert(c = a+i) c := 2a + b; d := b – 2i; True False True * * Example 1

4 a := 0; b := i;a := i-2; b := 2; c := b – a; d := i – 2b; assert(c+d = 0); assert(c = a+i) c := 2a + b; d := b – 2i; True False True * * Example 1: Random Testing Need to test blue path to falsify second assertion. Chances of choosing blue path from set of all 4 paths are small. Hence, random testing is unsound.

5 a+b=i a+b=i, c=-d a=i-2, b=2 a+b=i c=2a+b, d=b-2i a+b=i c=b-a, d=i-2b a=0, b=i a := 0; b := i;a := i-2; b := 2; c := b – a; d := i – 2b; assert(c+d = 0); assert(c = a+i) c := 2a + b; d := b – 2i; True False True * * Example 1: Abstract Interpretation Computes invariant at each program point. Operations are usually complicated and expensive.

6 a := 0; b := i;a := i-2; b := 2; c := b – a; d := i – 2b; assert(c+d = 0); assert(c = a+i) c := 2a + b; d := b – 2i; True False True * * Example 1: Random Interpretation Choose random values for input variables. Execute both branches of a conditional. Combine values of variables at join points. Test the assertion.

7 Random Interpretation: Outline Random Interpretation Linear arithmetic (POPL 2003) –Uninterpreted functions (POPL 2004) –Inter-procedural analysis (POPL 2005)

8 Linear relationships in programs with linear assignments Linear relationships (e.g., x=2y+5) are useful for –Program correctness (e.g. buffer overflows) –Compiler optimizations (e.g., constant and copy propagation, CSE, Induction variable elimination etc.) programs with linear assignments does not mean inapplicability to real programs –abstract other program stmts as non-deterministic assignments (standard practice in program analysis)

9 Basic idea in random interpretation Generic algorithm: Choose random values for input variables. Execute both branches of a conditional. Combine the values of variables at join points. Test the assertion.

10 Idea #1: The Affine Join operation w = 7 a = 2 b = 3 a = 4 b = 1 a = 7 (2,4) = -10 b = 7 (3,1) = 15 Affine join of v 1 and v 2 w.r.t. weight w w (v 1,v 2 ) ´ w v 1 + (1-w) v 2 Affine join preserves common linear relationships (a+b=5) It does not introduce false relationships w.h.p.

11 Idea #1: The Affine Join operation Affine join of v 1 and v 2 w.r.t. weight w w (v 1,v 2 ) ´ w v 1 + (1-w) v 2 Affine join preserves common linear relationships (a+b=5) It does not introduce false relationships w.h.p. Unfortunately, non-linear relationships are not preserved (e.g. a £ (1+b) = 8) w = 5 a = 5 (2,4) = -6 b = 5 (3,1) = 11 w = 7 a = 2 b = 3 a = 4 b = 1 a = 7 (2,4) = -10 b = 7 (3,1) = 15

12 Geometric Interpretation of Affine Join a b a + b = 5 b = 2 (a = 2, b = 3) (a = 4, b = 1) : State before the join : State after the join satisfies all the affine relationships that are satisfied by both (e.g. a + b = 5) Given any relationship that is not satisfied by any of (e.g. b=2), also does not satisfy it with high probability

i=3, a=0, b=3 i=3 a := 0; b := i; a := i-2; b := 2; c := b – a; d := i – 2b; assert (c+d = 0); assert (c = a+i) i=3, a=-4, b=7 c=23, d=-23 c := 2a + b; d := b – 2i; i=3, a=1, b=2 i=3, a=-4, b=7 c=-1, d=1 i=3, a=-4, b=7 c=11, d=-11 False w 1 = 5 w 2 = 2 True * * Example 1 Choose a random weight for each join independently. All choices of random weights verify first assertion Almost all choices contradict second assertion

14 Correctness of Random Interpreter R Completeness: If e 1 =e 2, then R ) e 1 =e 2 –assuming non-det conditionals Soundness: If e 1 e 2, then R e 1 = e 2 –error prob. · j : number of joins d: size of set from which random values are chosen k: number of points in the sample –If j = 10, k = 4, d ¼ 2 32, then error ·

15 Proof Methodology Proving correctness was the most complicated part in this work. We used the following methodology. Design an appropriate deterministic algorithm (need not be efficient) Prove (by induction) that the randomized algorithm simulates each step of the deterministic algorithm with high probability.

16 Random Interpretation: Outline Random Interpretation –Linear Arithmetic (POPL 2003) Uninterpreted functions (POPL 2004) –Inter-procedural analysis (POPL 2005)

17 Problem: Global value numbering a := 5; x := a*b; y := 5*b; z := b*a; a := 5; x := F(a,b); y := F(5,b); z := F(b,a); Abstraction x=y and x=z Reasoning about multiplication is undecidable only x=y Reasoning is decidable but tricky in presence of joins Axiom: If x 1 =y 1 and x 2 =y 2, then F(x 1,x 2 )=F(y 1,y 2 ) Goal: Detect expression equivalence when program operators are abstracted using uninterpreted functions Application: Compiler optimizations, Translation validation

18 Random Interpretation: Outline Random Interpretation –Linear arithmetic (POPL 2003) –Uninterpreted functions (POPL 2004) Inter-procedural analysis (POPL 2005)

19 Example 1 a := 0; b := i;a := i-2; b := 2; c := b – a; d := i – 2b; assert (c + d = 0); assert (c = a + i) c := 2a + b; d := b – 2i; True False The second assertion is true in the context i=2. Interprocedural Analysis requires computing procedure summaries. True * *

i=2 a=0, b=i a := 0; b := i;a := i-2; b := 2; c := b – a; d := i – 2b; assert (c+d = 0); assert (c = a+i) a=8-4i, b=5i-8 c=21i-40, d=40-21i c := 2a + b; d := b – 2i; a=i-2, b=2 a=8-4i, b=5i-8 c=8-3i, d=3i-8 a=8-4i, b=5i-8 c=9i-16, d=16-9i False w 1 = 5 w 2 = 2 Idea: Keep input variables symbolic Do not choose random values for input variables (to later instantiate by any context). Resulting program state at the end is a random procedure summary. a=0, b=2 c=2, d=-2 True * *

21 Experimental measure of error The % of incorrect relationships decreases with increase in S = size of set from which random values are chosen. N = # of random summaries used S N The experimental results are better than what is predicted by theory

22 Simulated Annealing Problem: Given a program with a pre/post conditions, discover proof of validity/invalidity. Proof is in the form of an invariant at each program point that can be locally verified. Key Idea: –Initialize invariants at all program points to anything. –Pick a random program point whose invariant is not locally consistent and update it to make it less inconsistent.

23 Simulated Annealing: Outline Simulated Annealing Inconsistency Measure & Penalty Function –Algorithm –Experiments

24 Inconsistency Measure for an Abstract Domain Let A be an abstract domain with ) as the partial order and as the concretization function. An inconsistency measure IM: A £ A ! [0,1] satisfies: –IM( 1, 2 ) = 0 iff 1 ) 2 –IM is monotonically decreasing in its first argument –IM is monotonically increasing in its second argument IM is a monotonic (increasing) measure of ( 1 ) - ( 2 ) [set of states that violate 1 ) 2 ]. The more strictly monotonic IM is, the more smooth it is.

25 Example of a Smooth Inconsistency Measure Let A be the abstract domain of Boolean formulas (with the usual implication as the partial order). Let 1 ´ a 1 Ç … Ç a n in DNF and 2 ´ b 1 Æ … Æ b m in CNF IM( 1, 2 ) = IM(a i,b j ) where IM(a i,b j ) = 0, if a i ) b j = 1, otherwise

26 Penalty Function Penalty(I, ) is a measure of how much inconsistent is I with respect to the invariants at neighbors of. Penalty(I, ) = IM(Post( ), I) + IM(I,Pre( )) Post( ) is the strongest postcondition of the invariants at the predecessors of at. Pre( ) is the weakest precondition of the invariants at the successors of at.

27 Example of Penalty Function Penalty(I, 2 ) = IM(Post( 2 ), I) + IM(I, Pre( 2 )) I Q P R c 1 Post( 2 ) = StrongestPost(P,s) Pre( 2 ) = (c ) Q) Æ ( : c ) R) s Since Post( ) and Pre( ) may not belong to A, we define: IM(Post( ), I) = Min {IM(I 1,I) | I 1 2 A, I 1 overapproximates Post( )} IM(I, Pre( )) = Min {IM(I,I 2 ) | I 2 2 A, I 2 underapproximates Pre( )}

28 Simulated Annealing: Outline Simulated Annealing –Inconsistency Measure & Penalty Function Algorithm –Experiments

29 Algorithm Search for proof of validity and invalidity in parallel. Same algorithm with different boundary conditions. Proof of Validity –I entry = Pre –I exit = Post Proof of Invalidity –I entry Æ Pre is satisfiable –I exit = : Post –This assumes that program terminates on all inputs.

30 Algorithm (Continued) Initialize invariant I j at program point j to anything. While penalty at some program point is not 0: –Choose j randomly s.t. Penalty(I j, j ) 0. –Update I j s.t. Penalty(I j, j ) is minimized. More precisely, I j is chosen randomly with probability inversely proportional to Penalty(I j, j ).

31 Interesting Aspects of the Algorithm Combination of Forward & Backward Analysis No distinction between forward & backward information Random Choices –Program point to update –Invariant choice

32 Simulated Annealing: Outline Simulated Annealing –Inconsistency Measure & Penalty Function –Algorithm Experiments

33 Example 2 y := 50; y = 100 False x := x +1; y := y +1; x < 50 x <100 True False x = 0 Prog. Point Invariant 1 x = 0 Æ y = 50 2 x · 50 ) y = 50 Æ 50 · x ) x = y Æ x · x · 50 ) y = 50 Æ 50 · x ) x = y Æ x <100 4 x <50 Æ y = 50 5 x · 50 Æ y = · x <100 Æ x = y 7 50< x · 100 Æ x = y 8 x · 50 ) y = 50 Æ 50 · x ) x = y Æ x · 100 Proof of Validity

34 Stats: Proof vs Incremental Proof of Validity Black: Proof of Validity Grey: Incremental Proof of Validity Incremental proof requires fewer updates

35 Stats: Different Sizes of Boolean Formulas Grey: 5*3, Black: 4*3, White: 3*2 n*m denotes n conjuncts & m disjuncts Larger size requires fewer updates

36 Example 3 x := 0; m := 0; n · 0 Ç 0 · m < n False m := x ; x := x +1; * x < n True true Prog. Point Invariant 1 x=0 Æ m=0x=0 Æ m=0 2 n · 0 Ç (0 · x Æ 0 · m < n ) 3 n · 0 Ç (0 · x < n Æ 0 · m < n ) n · 0 Ç (0 · x · n Æ 0 · m < n ) Proof of Validity

37 Stats: Proof of Validity Example 2 is easier than Example 1. Easier example requires fewer updates.

38 Example 2: Precondition Modified Prog. Point Invariant 0 x ¸ x ¸ 100 Æ y = false Proof of Invalidity y := 50; y = 100 False x := x +1; y := y +1; x < 50 x <100 True False true

39 Stats: Proof of Invalidity

Conclusion Lessons Learned Randomization buys efficiency and simplicity. Randomization suggests ideas for deterministic algorithms. Combining randomized and symbolic techniques is powerful. Summary Random Interpretation: Linear Arithmetic: Affine Joins Uninterpreted Functions: Random Linear Interpretations Interprocedural Analysis: Symbolic Input Variables Simulated Annealing: Smooth Inconsistency Measure for an abstract domain