Download presentation

Presentation is loading. Please wait.

1
**Amal Khalil & Juergen Dingel**

Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015

2
**Outline Overview of Classical Symbolic Execution**

How it works Application of symbolic execution Challenges of symbolic execution Modern Symbolic Execution Techniques: Combining concrete and symbolic executions Concolic Testing Execution Generated Testing (EGT) KLEE Demo

3
**Motivation Testing is a practical way for verifying programs.**

3 Motivation Testing is a practical way for verifying programs. Manual testing is difficult and it requires knowledge of code and constant maintenance. Random testing is easy to perform but it is ineffective. It does not guarantee a full coverage of all program paths. Symbolic execution can systematically explore a large number of program paths. It is commonly used to derive the testing process and hence achieving higher path coverage.

4
4 Symbolic Execution A program analysis technique that allows the execution of programs in a parametric way using symbolic inputs to derive precise characterizations of their properties and their execution paths. Firstly introduced in the 70’s by Lori A. Clarke [1976] & James C. King [1976] for program testing. Since 2003, a lot of research efforts has been devoted to improve the effectiveness, the efficiency and the applicability of the traditional technique [Yang et al. 2014]. Examples of Symbolic Execution Tools: jCUTE, JPF (Java) KLEE (LLVM IR for C/C++) Pex (.NET Framework )

5
**How does Symbolic Execution work?**

5 How does Symbolic Execution work? The main idea is to substitute program inputs with symbolic values and then execute the program parametrically such that: The values of all program variables are computed as symbolic expressions over the symbolic input values; The execution can proceed along any feasible path.

6
**How does Symbolic Execution work?**

6 How does Symbolic Execution work? The result from the symbolic execution of a program is a tree-based structure called symbolic execution tree (SET). The nodes of a SET represent the symbolic program states and the edges represent the transitions between these states. Each program symbolic state consists of the set of program variables and their symbolic valuations, a program location, and a path constraint (PC) which is the the conjunction of all the logical constraints collected over the program variables to reach that program location. Decision procedures and SMT solvers are used to check the satisfiablity of each path constraint (PC). The set of path constraints computed by symbolic execution is used to enable various analysis, verification, and testing tasks. The paths of a SET characterize all the distinct execution paths of a program.

7
**Constraints, Decision Procedures, and SMT Solvers**

7 Constraints, Decision Procedures, and SMT Solvers Constraints X > Y Λ Y+X ≤ 10 (X, Y are called free variables) A solution of the constraint is a set of assignments, one for each free variable that makes the constraint satisfiable. {X = 3, Y=2} is a solution but {X = 6, Y=5} is not. Types of constraints Linear constraint (e.g., X > Y Λ Y+X ≤ 10) Non-linear constraint (e.g., X * Y < 100, X % 3 Λ Y > 10, and (X >> 3) < Y) Use of function symbols (e.g., f(X)> 10 Λ (forall a. f(a) = a + 10)) A decision procedure is a tool that can decide if a constraint is satisfiable. In general, checking constraint satisfiability is undecidable. A constraint solver is a tool that finds satisfying assignments for a constraint, if it is satisfiable. “A constraint solver is a program that computes solutions to logic formulas in a given logic.” Note: This page is taken from Saswat Anand’s slides on Symbolic Execution,

8
**>> Infeasible path**

8 Example #1 Loc: 1 x: X, y: Y PC: true int foo (int x, int y){ 1: if (x > y) 2: x = x - y; 3: else 4: x = y - x; 5: if (x > 0) 6: x++; 7: else 8: x--; 9: return x; } Loc: 2 x: X, y: Y PC: X>Y 1: if(x>y) - then Loc: 4 x: X, y: Y PC: X<=Y 1: if(x>y) - else Loc: 5 x: X-Y, y: Y PC: X>Y 2: x = x - y; Loc: 5 x: Y-X, y: Y PC: X<=Y 4: x = y - x; Loc: 6 x: X-Y, y: Y PC: X>Y^X-Y>0 5: if(x>0) - then Loc: 8 x: X-Y, y: Y PC: X>Y^X-Y<=0 5: if(x>0) - else Loc: 6 x: Y-X, y: Y PC: X<=Y^Y-X>0 5: if(x>0) - then Loc: 8 x: Y-X, y: Y PC: X<=Y^Y-X<=0 5: if(x>0) - else Loc: 9 x: X-Y+1, y: Y PC: X>Y^X-Y>0 6: x++; Unsatisfiable PC >> Infeasible path Loc: 9 x: Y-X+1, y: Y PC: X<=Y^Y-X>0 6: x++; Loc: 9 x: Y-X-1, y: Y PC: X<=Y^Y-X<=0 8: x--; Path: 1, 2, 5, 6, 9 Test inputs: x: 7, y: 5 Path: 1, 4, 5, 6, 9 Test inputs: x: 3, y: 9 Path: 1, 4, 5, 8, 9 Test inputs: x: 1, y: 1

9
**Example #1 9 Loc: 1 x: X, y: Y PC: true int foo (int x, int y){**

1: if (x > y) 2: x = x - y; 3: else 4: x = y - x; 5: if (x >= 0) 6: x++; 7: else 8: x--; 9: return x; } Loc: 2 x: X, y: Y PC: X>Y 1: if(x>y) - then Loc: 4 x: X, y: Y PC: X<=Y 1: if(x>y) - else Loc: 5 x: X-Y, y: Y PC: X>Y 2: x = x - y; Loc: 5 x: Y-X, y: Y PC: X<=Y 4: x = y - x; “Dead Code” Loc: 6 x: X-Y, y: Y PC: X>Y^X-Y>=0 5: if(x>0) - then Loc: 8 x: X-Y, y: Y PC: X>Y^X-Y<0 5: if(x>0) - else Loc: 6 x: Y-X, y: Y PC: X<=Y^Y-X>=0 5: if(x>0) - then Loc: 8 x: Y-X, y: Y PC: X<=Y^Y-X<0 5: if(x>0) - else Unsatisfiable PC >> Infeasible path Loc: 9 x: X-Y+1, y: Y PC: X>Y^X-Y>=0 6: x++; Unsatisfiable PC >> Infeasible path Loc: 9 x: Y-X+1, y: Y PC: X<=Y^Y-X>=0 6: x++;

10
**Applications of Symbolic Execution**

10 Applications of Symbolic Execution Test case generation Infeasible paths detection Invariants checking Bug findings Programs equivalence checking Regression analysis Others

11
**Example #2 - [Cadar & Sen 2013]**

11 Example #2 - [Cadar & Sen 2013] SS1 - Loc: 1 N: N1 PC: true void testme_inf(int N) { 1: int sum = 0; 2: while (N > 0) { 3: sum = sum + N; 4: N = sym_input(); 5: } } SS2 - Loc: 2 N: N1, sum: 0 PC: true 1: sum = 0; SS3 - Loc: 3 N: N1, sum: 0 PC: N1>0 2: while (N>0) - true Loc: 5 N: N1, sum: 0 PC: N1<=0 2: while (N>0) - false SS4 - Loc: 4 N: N1, sum: N1 PC: N1>0 3: sum = sum + N; SS5 - Loc: 2 N: N2, sum: N1 PC: N1>0 4: N = sym_input(); SS6 - Loc: 3 N: N2, sum: N1 PC: N1>0^N2>0 2: while (N>0) - true Loc: 5 N: N2, sum: N1 PC: N1>0^N2<=0 2: while (N>0) - false

12
**Example #2 - [Cadar & Sen 2013]**

12 Example #2 - [Cadar & Sen 2013] SS3 - Loc: 3 N: N1, sum: 0 PC: N1>0 2: while (N>0) - true SS1 - Loc: 1 N: N1 PC: true SS2 - Loc: 2 1: sum = 0; SS4 - Loc: 4 N: N1, sum: N1 SS5 - Loc: 2 N: N2, sum: N1 4: N = sym_input(); SS6 - Loc: 3 PC: N1>0^N2>0 3: sum = sum + N; void testme_inf(int N) { 1: int sum = 0; 2: while (N > 0) { 3: sum = sum + N; 4: N = sym_input(); 5: } } Loc: 5 N: N1, sum: 0 PC: N1<=0 2: while (N>0) - false Loc: 5 N: N2, sum: N1 PC: N1>0^N2<=0 2: while (N>0) - false

13
**Example #2 - [Cadar & Sen 2013]**

SS4 - Loc: 4 N: N1, sum: N1 PC: N1>0 void testme_inf(int N) { 1: int sum = 0; 2: while (N > 0) { 3: sum = sum + N; 4: N = sym_input(); 5: } } 4: N = sym_input(); SS5 - Loc: 2 N: N2, sum: N1 PC: N1>0 2: while (N>0) - true Loc: 5 N: N2, sum: N1 PC: N1>0^N2<=0 2: while (N>0) - false SS6 - Loc: 3 N: N2, sum: N1 PC: N1>0^N2>0 SS7 - Loc: 4 N: N2, sum: N1+N1 PC: N1>0^N2>0 3: sum = sum + N; SS8 - Loc: 2 N: N3, sum: N1+N2 PC: N1>0^N2>0 4: N = sym_input(); SS9 - Loc: 3 N: N3, sum: N1+N2 PC: N1>0^N2>0^N3>0 2: while (N>0) - true Loc: 5 N: N3, sum: N1+N2 PC: N1>0^N2>0^N3<=0 2: while (N>0) - false

14
**Example #2 - [Cadar & Sen 2013]**

SS6 - Loc: 3 N: N2, sum: N1 PC: N1>0^N2>0 2: while (N>0) - true SS4 - Loc: 4 N: N1, sum: N1 PC: N1>0 SS5 - Loc: 2 SS7 - Loc: 4 N: N2, sum: N1+N1 SS8 - Loc: 2 N: N3, sum: N1+N2 4: N = sym_input(); SS9 - Loc: 3 PC: N1>0^N2>0^N3>0 3: sum = sum + N; void testme_inf(int N) { 1: int sum = 0; 2: while (N > 0) { 3: sum = sum + N; 4: N = sym_input(); 5: } } Loc: 5 N: N2, sum: N1 PC: N1>0^N2<=0 2: while (N>0) - false Loc: 5 N: N3, sum: N1+N2 PC: N1>0^N2>0^N3<=0 2: while (N>0) - false

15
**Example #2 - [Cadar & Sen 2013]**

SS7 - Loc: 4 N: N2, sum: N1+N1 PC: N1>0^N2>0 void testme_inf(int N) { 1: int sum = 0; 2: while (N > 0) { 3: sum = sum + N; 4: N = sym_input(); 5: } } 4: N = sym_input(); SS8 - Loc: 2 N: N3, sum: N1+N2 PC: N1>0^N2>0 2: while (N>0) - true Loc: 5 N: N3, sum: N1+N2 PC: N1>0^N2>0^N3<=0 2: while (N>0) - false SS9 - Loc: 3 N: N3, sum: N1+N2 PC: N1>0^N2>0^N3>0 SS10 - Loc: 4 N: N3, sum: N1+N2+N3 PC: N1>0^N2>0^N3>0 3: sum = sum + N; SS11- Loc: 2 N: N4, sum: N1+N2+N3 PC: N1>0^N2>0^N3>0 4: N = sym_input(); SS12 - Loc: 3 N: N4, sum: N1+N2+N3 PC: N1>0^N2>0^N3>0^N4>0 2: while (N>0) - true … Loc: 5 N: N4, sum: N1+N2+N3 PC: N1>0^N2>0^N3>0^N4<=0 2: while (N>0) - false

16
**Challenges of Symbolic Execution**

16 Challenges of Symbolic Execution Path explosion problem The number of feasible paths in a program grows exponentially with the size of the program and can be even infinite for programs with unbounded loops & recursion. Proposed solutions: Set upper bound for the number of iterations; Summarize loop effects; Use some abstraction criteria (e.g., subsumption) for pruning redundant paths and reducing the state space; Use heuristics for path finding to achieve some user-defined coverage criteria; Divide a program into independent parts and run the symbolic execution for each part in parallel. Path explosion problem for large programs Proposed solutions use heuristics for path finding to achieve some user-defined coverage criteria, or use abstraction for pruning redundant paths, or divide a program into independent parts and run the symbolic execution for each part in parallel. Infinite execution trees for programs with loops & recursion that have symbolic termination conditions Proposed solutions set upper bound for the number of iterations, summarize loop effects, or use some abstraction criteria to reduce the state space.

17
**Example #2 - [Cadar & Sen 2013]**

17 Example #2 - [Cadar & Sen 2013] SS1 - Loc: 1 N: N1 PC: true void testme_inf(int N) { 1: int sum = 0; 2: while (N > 0) { 3: sum = sum + N; 4: N = sym_input(); 5: } } SS2 - Loc: 2 N: N1, sum: 0 PC: true 1: sum = 0; SS3 - Loc: 3 N: N1, sum: 0 PC: N1>0 2: while (N>0) - true Loc: 5 N: N1, sum: 0 PC: N1<=0 2: while (N>0) - false SS4 - Loc: 4 N: N1, sum: N1 PC: N1>0 3: sum = sum + N; Solution #1: Set max-depth = 2 SS5 - Loc: 2 N: N2, sum: N1 PC: N1>0 4: N = sym_input(); SS6 - Loc: 3 N: N2, sum: N1 PC: N1>0^N2>0 2: while (N>0) - true Loc: 5 N: N2, sum: N1 PC: N1>0^N2<=0 2: while (N>0) - false

18
**⊆ Example #2 - [Cadar & Sen 2013] Solution #2: Subsumption**

18 Example #2 - [Cadar & Sen 2013] SS4 - Loc: 4 N: N1, sum: N1 PC: N1>0 Solution #2: Subsumption 4: N = sym_input(); SS5 - Loc: 2 N: N2, sum: N1 PC: N1>0 (N, sum) = {([-∞, + ∞], 1), ([-∞, + ∞], 2), ([-∞, + ∞], 3), …} Concretization of SS5 Subsumed by 2: while (N>0) - true SS9 - Loc: 5 N: N2, sum: N1 PC: N1>0^N2<=0 2: while (N>0) - false SS6 - Loc: 3 N: N2, sum: N1 PC: N1>0^N2>0 ⊆ 3: sum = sum + N; SS7 - Loc: 4 N: N2, sum: N1+N1 PC: N1>0^N2>0 (N, sum) = {([-∞, + ∞], 2), ([-∞, + ∞], 3), ([-∞, + ∞], 4), …} Concretization of SS8 “Do not explore a state that is subsumed by a previous one. For example, in a symbolic state a variable x could have any value that satisfies a constraint x > 0. If another execution path leads to an identical symbolic state except that the constraint for x is x > 5, the first symbolic state subsumes the second one (i.e., the first state represents all concrete states of the second symbolic state). To determine if a symbolic state has been visited before, a subsumption check by a constraint solver is needed. This can be computationally expensive especially if the constraints are complex.” 4: N = sym_input(); SS8 - Loc: 2 N: N3, sum: N1+N2 PC: N1>0^N2>0 SS9 - Loc: 3 N: N3, sum: N1+N2 PC: N1>0^N2>0^N3>0 2: while (N>0) - true Loc: 5 N: N3, sum: N1+N2 PC: N1>0^N2>0^N3<=0 2: while (N>0) - false

19
**⊆ Example #2 - [Cadar & Sen 2013] Solution #2: Subsumption**

SS6 - Loc: 3 N: N2, sum: N1 PC: N1>0^N2>0 2: while (N>0) - true SS9 - Loc: 5 PC: N1>0^N2<=0 2: while (N>0) - false SS4 - Loc: 4 N: N1, sum: N1 PC: N1>0 SS5 - Loc: 2 SS7 - Loc: 4 N: N2, sum: N1+N1 SS8 - Loc: 2 N: N3, sum: N1+N2 4: N = sym_input(); Subsumed by SS3 - Loc: 3 N: N1, sum: 0 SS1 - Loc: 1 N: N1 PC: true SS2 - Loc: 2 1: sum = 0; 3: sum = sum + N; ⊆ (N, sum) = {([-∞, + ∞], 1), ([-∞, + ∞], 2), ([-∞, + ∞], 3), …} Concretization of SS5 (N, sum) = {([-∞, + ∞], 2), ([-∞, + ∞], 4), Concretization of SS8 19 Example #2 - [Cadar & Sen 2013] Solution #2: Subsumption SS10 - Loc: 5 N: N1, sum: 0 PC: N1<=0 2: while (N>0) - false

20
**Challenges of Symbolic Execution**

20 Challenges of Symbolic Execution Inability to solve very complex and non-linear constraints Proposed solutions: Use concretization (e.g., Concolic Symbolic Execution); Perform constraints simplification. Inability to handle external library calls Provide models to simulate/abstract the behavior of such external modules.

21
**SE cannot handle symbolic value of z!**

21 Example #3 - [Cadar & Sen 2013] Complex constraints External system/library calls void testme(int x, int y) { 1: int z = (y*y)%50; 2: if (z == x) { 3: if (x > y+10) { 4: abort(); //ERROR 5: } 6: } } void testme(int x, int y) { 1: int z = F(y); 2: if (z == x) { 3: if (x > y+10) { 4: abort(); //ERROR 5: } 6: } } Loc: 1 x: X, y: Y PC: true Loc: 1 x: X, y: Y PC: true 1: int z = (y*y)%50; 1: int z = F(y); Loc: 2 x: X, y: Y, z: (Y*Y)%50 PC: true Loc: 2 x: X, y: Y, z: F(Y) PC: true SE cannot handle symbolic value of z! >> Stuck!

22
**Concolic Symbolic Execution**

22 Concolic Symbolic Execution Novelty: Simultaneous Concrete & Symbolic Executions DART: Directed Automated Random Testing [Godefroid et al. 2005] Execution-Generated Testing (EGT) [Cadar et al. 2005] “Replace symbolic expression by concrete value when symbolic expression becomes unmanageable (e.g. non-linear).”

23
**Overview of DART Example #3 - [Cadar & Sen 2013]**

23 Overview of DART Example #3 - [Cadar & Sen 2013] Random testing alone is ineffective. Probability of reaching abort() is extremely low! Solution? Combine random testing & symbolic execution (twofold benefit). Improve test coverage of random testing Alleviate some of the imprecision in SE void testme(int x, int y) { 1: int z = 2 * y; 2: if (z == x) { 3: if (x > y + 10) 4: abort(); //ERROR 5: } } /* simple driver exercising testme() */ int main(){ int inp1 = random(); int inp2 = random(); testme(inp1, inp2); return 0; } “NOTE: whenever symbolic execution is stuck, static analysis becomes imprecise!”

24
**Example #3 - [Cadar & Sen 2013]**

24 Example #3 - [Cadar & Sen 2013] void testme(int x, int y) { 1: int z = 2 * y; 2: if (z == x) { 3: if (x > y + 10) 4: abort(); //ERROR 5: } } Loc: 1 x: X, y: Y PC: true Loc: 2 x: X, y: Y, z: 2*Y 1: int z = 2*y; Loc: 5 PC: 2*Y!=X 2: if(z==x) - false Loc: 3 x: X, y: Y, z: 2*Y PC: 2*Y==X 2: if(z==x) - true Loc: 5 PC: 2*Y==X^X<=Y+10 3: if(x>y+10) - false Test inputs: x = 22, y = 7 Path: 1, 2, 5 Test inputs: x = 2, y = 1 Path: 1, 2, 3, 5 Loc: 4 x: X, y: Y, z: 2*Y PC: 2*Y==X^X>Y+10 3: if(x>y+10) - true Solve: 2*Y==X Solution: x=2, y=1 Test inputs: x = 30, y = 15 Path: 1, 2, 3, 4 Abort>>ERROR Solve: 2*Y==X^X>Y+10 Solution: x=30, y=15

25
**Example #3 - [Cadar & Sen 2013]**

25 Example #3 - [Cadar & Sen 2013] void testme(int x, int y) { 1: int z = (y*y)%50; //int z = F(y); 2: if (z == x) { 3: if (x > y + 10) 4: abort(); //ERROR 5: } } Loc: 1 x: X, y: Y PC: true Loc: 2 x: X, y: 7, z: 49 1: int z = (y*y)%50; Loc: 5 PC: 49!=X 2: if(z==x) - false Loc: 3 x: X, y: 7, z: 49 PC: 49==X 2: if(z==x) - true Loc: 5 PC: 49==X^X>17 3: if(x>y+10) - true Test inputs: x = 22, y = 7 Path: 1, 2, 5 Test inputs: x = 49, y = 7 Path: 1, 2, 3, 7 Solve: 49==X Solution: x=49, y=7 Assume we can reason about linear constraints only. Cannot handle symbolic value of z! Do not stuck. Use concrete value z = 49 and proceed Take else branch with constraint 49 != X Solve 49 = X to take then branch Execute next run with x = 49 and y = 7 DART finds the error! Abort>>ERROR

26
**KLEE LLVM Execution Engine [Cadar et al 2008] https://klee.github.io/**

26 KLEE Demo KLEE LLVM Execution Engine [Cadar et al 2008] https://klee.github.io/ Compiling C code into bytecode: llvm-gcc -I../../include/ --emit-llvm -c -g demo.c Running the bytecode using KLEE: klee -write-pcs -allow-external-sym-calls -emit-all-errors demo.o The test cases generated by KLEE are written in binary files with extension “.ktest”. We can read these test cases using the “ktest-tool” utility: ktest-tool --write-ints klee-last/test ktest KLEE has a convenient library called “lkleeRuntest” for replaying a test case which replaces the call to “klee_make_symbolic” with a call to a function that assigns to the input the value stored in the “.ktest” file. To use this library, we need to link our “.c” file with the “lkleeRuntest” library and set the “KTEST_FILE” environment variable to point to the file name of the required test case: gcc -I../../include/ -L $LD_LIBRARY_PATH demo.c –lkleeRuntest KTEST_FILE=klee-last/test ktest ./a.out

27
27 References [1] King, James C, "Symbolic execution and program testing", Communications of the ACM 19, 7 (1976), pp [2] Clarke, Lori A. "A system to generate test data and symbolically execute programs." Software Engineering, IEEE Transactions on 3 (1976): [3] Khurshid, Sarfraz, Corina S. Păsăreanu, and Willem Visser. "Generalized symbolic execution for model checking and testing." Tools and Algorithms for the Construction and Analysis of Systems. Springer Berlin Heidelberg, [4] Godefroid, Patrice, Nils Klarlund, and Koushik Sen. "DART: directed automated random testing." ACM Sigplan Notices. Vol. 40. No. 6. ACM, 2005. [5] Cadar, Cristian, and Dawson Engler. "Execution generated test cases: How to make systems code crash itself." Model Checking Software. Springer Berlin Heidelberg, [6] Cadar, Cristian, Daniel Dunbar, and Dawson R. Engler. "KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs." OSDI. Vol [7] Cadar, Cristian, and Koushik Sen. "Symbolic execution for software testing: three decades later." Communications of the ACM 56.2 (2013): [8] Yang, Guowei, et al. "Directed incremental symbolic execution." ACM Transactions on Software Engineering and Methodology (TOSEM) 24.1 (2014): 3.

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google