50.530: Software Engineering

Name: 50.530: Software Engineering
Uploaded: 2017-10-07T17:07:52+00:00
Duration: PTM47S52
Channel: Shayna Held
Description: 50.530: Software Engineering

50.530: Software Engineering
Sun Jun SUTD

Week 10: Symbolic Execution

Example int x, y; if (x>0) { assert(x>=0); array[x] = 5; }
Will assertion failure occur?

Example 1. if (x>y) { 2. x = x + y; 3. y = x – y; 4. x = x – y;
6. assert(false); } 8. } Will assertion failure occur?

Example 1 1. if (x>y) { 2. x = x + y; 3. y = x – y; 4. x = x – y;
6. assert(false); } 8. } x > y 2 x = x+y 3 y = x-y x <= y 4 x = x-y 5 x-y>0 x-y<=0 6 8 7

Example: Path Condition
Assertion failure occurs if and only if: x1 > y1 && x2=x1 && y2 = y1 && x3=x2+y2 && y3 = y2 && x4=x3 && y4=x3-y3 && x5=x4-y4 && y5=y4 && x5-y5>0 && !(false) is satisfiable. 1 x > y 2 x = x+y 3 y = x-y x <= y 4 x = x-y 5 x-y>0 x-y<=0 6 8 7

Symbolic Execution Rather than executing a program with concrete input value, execute it with symbolic variables representing the inputs. Proposed in 1976*. Popularized only in recent years due to advancement in constraint solving techniques. *L. A. Clarke, “A System to Generate Test Data and Symbolically Execute Programs”, IEEE Transactions on Software Engineering

x1 > y1 && x2=x1 && y2 = y1 && x3=x2+y2 && y3 = y2 && x4=x3 && y4=x3-y3 && x5=x4-y4 && y5=y4 && x5-y5>0 && !(false) How do we know systematically whether a constraint like this is satisfiable or not?

Boolean Satisfiability Problem
Boolean Satisfiability (often abbreviated SAT) is the problem of determining if there exists an interpretation that satisfies a given Boolean formula. Consider the formula (a ∨ b) ∧ (¬a ∨ ¬c) The assignment b = True and c = False satisfies the formula! Arguably one of the most important problems in computer science.

Exercise 1 Consider the following constraints:
John can only meet either on Monday, Wednesday or Thursday; Catherine cannot meet on Wednesday; Anne cannot meet on Friday; Peter cannot meet neither on Tuesday nor on Thursday Question: When can the meeting take place? Answer the question using SAT solving.

SAT: Example Use 3 Boolean variables to represent the 6 colors. Use 3 variables to present each little square. Define functions T(X, Y) which change values of the Boolean variables X to Y to represent the turns. Question: the game can be solved by answering the satisfiability of the following formula. Init(X0) && T(X0, X1) && T(X1, X2) &&& … && T(X17, X18) && Goal(X18)

History SAT is shown to be NP-complete in 1971 (Stephen Cook)
The DPLL algorithm is developed in 1960. Breakthrough occurred in 90s. Advanced SAT solver handles problem instances with millions of Boolean variables. Annual competition:

Exponential Complexity Growth: The Challenge of Complex Domains
Note: rough estimates, for propositional reasoning 1M 5M War Gaming 10301,020 0.5M 1M VLSI Verification Case complexity 10150,500 100K 450K Military Logistics 106020 20K 100K Chess (20 steps deep) 103010 No. of atoms on the earth 10K 50K Deep space mission control Seconds until heat death of sun 1047 100 200 1030 Car repair diagnosis Protein folding Calculation (petaflop-year) Variables 100 10K 20K 100K 1M Rules (Constraints) [Credit: Kumar, DARPA; Cited in Computer World magazine]

SAT Solver Progress Solvers have continually improved over time
Source: Marques-Silva 2002

SAT Extension: QBF SAT: are there b1, b2, b3 such that a formula with no quantifiers is satisfiable or not? QBF: Is a formula constituted by Boolean variables and both "for all" (∀) and "there exists" (∃) satisfiable or not. ∀x ∀y ∃z (x ∨ y ∨z) ∧ (¬x ∨ ¬y ∨ ¬z)

QBF Example Query: Does there exist a strategy such that for all opponent’s move, I would win?

SAT Extension: SMT Satisfiability Modulo Theories (SMT) enrich QBF formulas with linear constraints, arrays, all-different constraints, uninterpreted functions, etc. Very efficient SMT solvers are now available that can handle many such kinds of constraints. Annual competition:

SMT Example (Difference Logic) Is there a solution {x,y} satisfying
x-y < 20 and x -y > 4 (Linear arithmetic) Is there a solution {x,y,z} satisfying 3x+2y >= 5z and 5z = 2x

Black Box View SMT Solver Not satisfiable Logic Formula
Or an assignment of the variables Logic Formula 1. if (x>y) { x = x + y; y = x – y; x = x – y; if (x-y>0) { 6. assert(false); } 8. } Click here to see a proof that the assertion failure is not occurring.

Symbolic Execution: Algo1
Find all paths P which lead to an assertion; For each path in P { Construct a path condition Con for P; Check whether Con is satisfiable using an SMT solver; if (satisfiable) { Construct a test case based on the SMT output; Report error; } Report assertion verified;

Exercise 2 Boolean a = input(); Boolean b = input();
Boolean c = input(); int x = 0, y = 0, z = 0; if (a) { x = -2; } if (b) { if (!a && c) { y = 1; } z = 2; assert(x+y+z!=3) Analyze the above program using Algo1 to check assertion violation.

Limitation: Path Explosion
How many paths are there? 2^3 Exponential in branching structure. if (input()==true) { x = x+1; } x = x+2; x = x+4; assert(x <= 7);

Limitation: Path Explosion
How do we handle loops? check all paths which reach the assertion in one iteration. … in two iterations. … in three iterations. … int x = input(); while (x > 0) { x++; assert(…); } The loop invariant problem is still there.

Limitation: Incompleteness
SMT solver is no magic Existing SMT solvers supports theories on linear integer arithmetic, bit vectors, string, etc. Existing SMT solvers are not particularly scalable. int x = input(); int y = input(); int z = input(); if (5x^63 + 7x^12 = 78y^2 + z) { assert(false); }

An interpolation method for clp traversal
Jaffar et al. CP 2009 An interpolation method for clp traversal

Symbolic Execution Path Explosion How do we solve the problem of path explosion?

Example 1. if (input()==true) { x = x+1; } 2. if (input()==true) {
4. assert(x <= 7); Is it possible to have an assertion failure? How many path conditions do we have to solve?

Unfolding Tree 1 x = x+1 * 2 2 x = x+2 * x = x+2 * 3 3 3 3 x = x+4

Step 1: Symbolic Execution
x = x+1 * 2 2 x = x+2 * x = x+2 * 3 3 3 3 x = x+4 x = x+4 x = x+4 * * * x = x+4 * 4 4 4 4 4 4 4 4 Path Condition: x1 = 0 && x2 = x1 && x3 = x2 && x4 = x3 && x4 > 7

Step 1: Interpolant Interpolate: generalization of A which is still disjoint with B. A B bad states: x4 > 7 states reached by the path: x1 = 0 && x2 = x1 && x3 = x2 && x4 = x3

Craig Interpolation Given a pair of predicates (A, B), if A && B is not satisfiable, an interpolant for (A, B) is a formula P with the following properties: A implies P P && B is un-satisfiable, and P refers only to the common variables of A and B.

Example 7 A: x=0 B: x > 7 Sample Interpolants: x = 0 x <= 3
7 A: x=0 B: x > 7 Sample Interpolants: x = 0 x <= 3 x < 7 x <= 7

Exercise 3: Interpolant
A is: (x <= 3 && y <= 1) || (x <= 2 && y <= 2) || (x <= 1 && y <= 3) B is: (x >= 3 && y >= 2) || (x >= 2 && y >= 3) Is there any interpolant other than A or !B? Find one if you believe there is. Otherwise, argue why there isn’t any. Finding interpolants in general is a hard problem.

Interpolation Computation
There have been many algorithm proposed to compute interpolants efficiently for logics. Given a pair of A and B, there might be many different interpolants. Weakest precondition is the strongest interpolant, which is expensive to compute. Existing tools usually propose interpolants in the form of a conjunctive formula.

1 x = x+1 * 2 2 x = x+2 * x = x+2 * 3 3 3 3 x = x+4 x = x+4 x = x+4 * * * x = x+4 * 4 4 4 4 4 4 4 4 Let A be x1 = 0 && x2 = x1 && x3 = x2 && x4 = x3. Let B be x4 > 7 (strongest) interpolant: x4 <= 7. We learned: At location 4, x <= 7 implies safety;

1 x = x+1 * 2 2 x = x+2 * x = x+2 * 3 3 3 3 x = x+4 x = x+4 x = x+4 * * * x = x+4 * 4 4 4 4 4 4 4 4 Let A be x1 = 0 && x2 = x1 && x3 = x2. Let B be x4 = x3 && x4 > 7 (strongest) interpolant: x3 <= 7. We learned: At location 4, x <= 7 implies safety; At location 3, x <= 7 implies safety if we take the else-branch.

At location 4, x <= 7 implies safety;
At location 3, x <= 7 implies safety if we take the else-branch. At location 2, x <= 7 implies safety if we take two else-branch. At location 2, x <= 7 implies safety if we take three else-branch. 1 x = x+1 * 2 2 x = x+2 * x = x+2 * 3 3 3 3 x = x+4 x = x+4 x = x+4 * * * x = x+4 * 4 4 4 4 4 4 4 4 x1=0 && x2=x1 && x3=x2 && x4=x3+4 implies x4<=7, and therefore it is safe.

At location 3, x <= 7 implies safety if we take the else-branch. At location 2, x <= 7 implies safety if we take two else-branch. At location 2, x <= 7 implies safety if we take three else-branch. 1 x = x+1 * 2 2 x = x+2 * x = x+2 * 3 3 3 3 x = x+4 x = x+4 x = x+4 * * * x = x+4 * 4 4 4 4 4 4 4 4 Since x1=0 && x2=x1 && x3=x2 && x4=x3+4 && x4>7 is unsatisfiable, we learn using interpolants again.

At location 3, x <= 7 implies safety if we take the else-branch. At location 2, x <= 7 implies safety if we take two else-branch. At location 2, x <= 7 implies safety if we take three else-branch. 1 x = x+1 * 2 2 x = x+2 * x = x+2 * 3 3 3 3 x = x+4 x = x+4 x = x+4 * * * x = x+4 * 4 4 4 4 4 4 4 4 Let A be x1=0 && x2=x1 && x3=x2 and B be x4=x3+4 && x4<=7. We found an interpolant x3 <=4 at location 3.

At location 3, x <= 7 implies safety if we take the else-branch. At location 3, x <= 3 implies safety if we take the then-branch. At location 2, x <= 7 implies safety if we take two else-branch. At location 2, x <= 7 implies safety if we take three else-branch. 1 x = x+1 * 2 2 x = x+2 * x = x+2 * 3 3 3 3 x = x+4 x = x+4 x = x+4 * * * x = x+4 * 4 4 4 4 4 4 4 4 Let A be x1=0 && x2=x1 && x3=x2 and B be x4=x3+4 && x4>7. We found an interpolant x3 <= 3 at location 3.

At location 2, x <= 3 implies safety if we take the else-branch first. At location 2, x <= 3 implies safety if we take two else-branch. 1 x = x+1 * 2 2 x = x+2 * x = x+2 * 3 3 3 3 x = x+4 x = x+4 x = x+4 * * * x = x+4 * 4 4 4 4 4 4 4 4

At location 2, x <= 3 implies safety if we take the else-branch first. At location 2, x <= 3 implies safety if we take two else-branch. 1 x = x+1 * 2 2 x = x+2 * x = x+2 * 3 3 3 3 x = x+4 x = x+4 x = x+4 * * * x = x+4 * 4 4 4 4 4 4 4 4 x1=0 && x2=x1 && x3=x2+2 implies x3<=3, and therefore it is safe.

x = x+1 * 2 2 x = x+2 * x = x+2 * 3 3 3 3 x = x+4 x = x+4 x = x+4 * * * x = x+4 * 4 4 4 4 4 4 4 4 x1=0 && x2=x1 && x3=x2+2 implies x3<=3, and therefore it is safe.

x = x+1 * 2 2 x = x+2 * x = x+2 * 3 3 3 3 x = x+4 x = x+4 x = x+4 * * * x = x+4 * 4 4 4 4 4 4 4 4 x1=0 && x2=x1+1 implies x2<=1, and therefore it is safe.

Reduction 1 x = x+1 * 2 2 x = x+2 * x = x+2 * 3 3 3 3 x = x+4 x = x+4

Algorithm Input: a finite tree T with root v representing a program, assuming that each leaf represents an assertion: assert(Q). Output: a test case leading to assertion violation or “no assertion violation” while (there is un-visited nodes) { visit each node N in DFS order; if (there is an unconditioned learned result: “if P is satisfied at N, then safe”) { let PathCond be the path condition of the current path; if (PathCond implies P) { update the learned results based interpolants from PathCond && !P; skip the node; } else if (N is a leaf) { if (PathCond && !Q is satisfiable) {report with a test case for assertion violation;} else { update the learned results based interpolants from PathCond && !Q; }

Exercise 4: Show How it Works
int y = input(); 1. if (input()==true) { x = x+1; } 2. if (y>=1) { x = x+2; 3. if (y<1) { x = x+4; 4. assert(x <= 5);

Loops 1. if (input()==true) { x = x+1; } 2. if (input()==true) { x = x+2; 3. if (input()==true) { x = x+4; 4. assert(x <= 7); How about we verify the program using simply Hoare logic? A program which contains one or more loops would lead an unbounded tree. Symbolic execution can be used to help discovering loop invariant.

Example function foo(int x, int n) { int y = x; int i = 0;
while (i < n) { x = x+1; i = i +1; } if (x < y) { error(); Is error possible? How do we systematically verify that?

Example 1 y=x function foo(int x, int n) { 1. int y = x; 2. int i = 0;
3. while (i < n) { 4. x = x+1; 5. i = i +1; } 6. if (x < y) { 7. error(); 2 i=0 3 i<n i=i+1 4 i>=n x=x+1 5 6 x<y not safe 7

Example Step 1: Path condition: y=x && i = 0 && i >= n && x < y Unsatisfiable Interpolant at 6: x >= y 1 y=x 2 i=0 3 i<n i=i+1 4 i>=n x=x+1 5 6 x>=y implies safety x<y not safe 7

Example Step 1: Path condition: y=x && i = 0 && i >= n && x < y Unsatisfiable Interpolant at 3->6: (x >= y) 1 y=x 2 i=0 3 x>=y i<n i=i+1 4 i>=n x=x+1 5 6 x>=y implies safety x<y not safe 7 In theory, it should be: !(x<y && i >=n), why?

Example Step 2: Path condition: y=x && i = 0 && i < n && x1=x+1 && i=i+1 && i >= n&&x<y Unsatisfiable Interpolant at 3->6: (x >= y) 1 y=x 2 i=0 3 x>=y implies safety i<n i=i+1 4 i>=n x=x+1 5 6 x>=y implies safety x<y not safe 7

Guessing Loop Invariants
Through symbolic execution with interpolants, we obtain conditions which must be satisfied in order to verify safety. These interpolants perhaps are related to the loop invariants. The Idea: take (part of) the condition as candidates for loop invariant and check.

Candidate: x>=y To check whether x>=y is a sufficiently strong loop invariant, we need to establish: {true}y=x;i=0{x>=y} {i<n&&x>=y}x=x+1; i=i+1;{x>=y} and x>=y implies x>=y at 6 which implies safety. 1 y=x 2 i=0 3 i<n i=i+1 4 i>=n x=x+1 5 6 x>=y implies safety x<y Do the above Hoare triples hold? not safe 7

Candidate: x>=y {true}y=x;i=0{x>=y} {i<n&&x>=y}x=x+1; i=i+1;{x>=y} and x>=y implies x>=y at 6 which implies safety The above Hoare triples can be discharged using symbolic execution by checking the satisfiability of the following: y=x&&i=0&&x<y i<n&&x>=y&&x1=x+1&&i1=i+1&&x<y 1 y=x 2 i=0 3 i<n i=i+1 4 i>=n x=x+1 5 6 x>=y implies safety x<y not safe 7

Empirical Study

Empirical Study Reported in “Lazy Abstraction with Interpolants” (CAV 2006) SGP = simple goto programs These are all windows device drivers

Conclusion Symbolic execution allows us to check many test cases (which share the same path) at once. Symbolic execution needs the support of advanced constraint solving like SMT solving – which is not yet very scalable. Symbolic execution with interpolants eases the path explosion problem by “learning” from failures (in reaching the error state).

Exercise 5 Verify that the following program is free from exception using symbolic execution with interpolants. public void rec() { if (input() == true) { rec(); } else { x = x+1; return; int x; int[] array = new array[]{1,2,3,4, …}; rec(); array[x] = 2;

Question Does the traversing order matter in term of reduction?

Dart: Directed Automated Random Testing
Godefroid et al. PLDI 2005 Dart: Directed Automated Random Testing

Motivation Random testing can cover many paths but is hardly ever complete Symbolic execution can completely check all paths if there aren’t many. if (x == 19973) { assert(false); } What is the probability of finding the assertion failure? How about we randomly test first and use symbolic execution to increase coverage?

Example 1. int h(int x, int y) { 2. if (x != y) {
if (2*x == x + 10) { abort(); /*error*/ } else { return 2x+y; } } else { return 2x; } 13. } 1 x == y x != y 3 11 else 2*x == x+10 4 7 random testing symbolic executing

DART: Approach Objective: Input: a function written in C.
Output: a set of test cases which provides 100% code coverage. Method: Generate a test driver that performs random testing to simulate the most general environment the program can operate in. Dynamically analyze how the program behaves under random testing and generate new test inputs systematically using symbolic execution.

Input A function with parameters
The function is assumed to be always terminating. It contains the following statements: abort() if (e) {goto l} else {goto l’} (where e is an expression and l and l’ are statements) assignment: m := e (where m is a variable name and e is an expression) Expression e can be A constant c, e1 * e2, e1 <= e2, !e1, *e1 Expressions are side-effect-free.

Test Driver Identity all external inputs needed by the program
Function parameters and user inputs

Test Driver Identity all external inputs needed by the program
external functions Is this justified?

Dart: the Algorithm complete is true iff the applied SMT solver is complete in solving the constraints. complete = true; do { path = <>; inits = []; directed = true; while (directed) { run_instructed(); } } while (complete); path is a sequence of statements and variable valuations; inits assigns values to some variable (generated by the SMT solvers);

Dart: the Algorithm run_instructed() { for each variable x {
x = random() if it is not in inits; otherwise x = inits(x); } Let s be the initial statement; while (s is not abort or halt) { execute s; add s and current variable valuations into path; s = next statement; if (s == abort) { report bug; exit(); } else { return solvePathCondition();

Dart: the Algorithm solvePathCondition () {
from the last of path, find a statement if (B) {} else {} such that only its then- branch or else-branch has been executed; if (no such branching condition exists) { directed = false; return; } else { Remove from path all statements after that branch statement; Let C be B is the else-branch is taken or else !B; if (SMT-solve(path, C)) { set inits be the variable valuations returned by the SMT solver; else { solvePathCondition(); }

Dart: the Algorithm SMT-solve(path, C) {
Let SM be a symbolic memory such that SM(x) = x for all variable x; evaluate each statement in path one by one on SM by calling evaluate(e, CM, SM) where CM is the concrete variable valuation before the execution of the statement; return true iff SM && C is satisfiable by an SMT solver; } evaluate(e, CM, SM) { if (e is variable name m) { return SM(m) if m is of a type supported by the SMT solver; or else CM(m); if (e is e1 * e2) { let e1’ = evaluate(e1, CM, SM); e2’ = evaluate(e2, CM, SM); if (neither of e1’ or e2’ is a constant) { complete = false; return the evaluation result of e with CM; else { return e1’*e2’; } …

Dart: Theorem (A) If Dart reports a bug, then there is some input that leads to an abort; (B) If Dart terminates without reporting a bug, there is no input that leads to an abort and all paths in the program have been exercised; (C) Otherwise, Dart runs forever.

Example 1. int h(int x, int y) { 2. if (x != y) {
if (2*x == x + 10) { abort(); /*error*/ } else { return 2x+y; } } else { return 2x; } 13. } 1 x == y x != y 3 11 else 2*x == x+10 4 7 random testing symbolic executing

Question foo (int x, int y) { if (x*x*x > 0) {
if (x > 0 && y == 10) { abort(); } else { if (x > 0 && y == 20) { Will Dart find the bug? Assume that the SMT solver can’t deal with non-linear expressions.

Case Study oSIP library: 30K lines of C codes, 600+ externally visible functions Apply DART to test every function There are no assertions DART is used to look for segmentation fault and non-termination. DART found ways to crash 65% of the functions Most of which caused by null-pointers in function parameters. Pex: a tool based on the same idea of DART will be part of Visual Studio 2015.

50.530: Software Engineering

Similar presentations

Presentation on theme: "50.530: Software Engineering"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

50.530: Software Engineering

Similar presentations

Presentation on theme: "50.530: Software Engineering"— Presentation transcript:

Similar presentations

About project

Feedback