Software Model Checking

Software Model Checking
Xiangyu Zhang

Symbolic Software Model Checking
Symbolic analysis explicitly explores individual paths, encodes and resolves path conditions Model checking directly encodes both the program and the property to check to constraints Program Analysis Engine CNF SMT Solver Claim SAT counterexample exists UNSAT no counterexample found

A (very) simple example (1)
Presentation Title 2018/9/5 A (very) simple example (1) Program Constraints int x; int y=8,z=0,w=0; if (x) z = y – 1; else w = y + 1; assert (z == 7 || w == 9) y = 8, z = x ? y – 1 : 0, w = x ? 0 :y + 1, z != 7, w != 9 UNSAT no counterexample assertion always holds! © 2006 Carnegie Mellon University

A (very) simple example (2)
Program Constraints int x; int y=8,z=0,w=0; if (x) z = y – 1; else w = y + 1; assert (z == 5 || w == 9) y = 8, z = x ? y – 1 : 0, w = x ? 0 :y + 1, z != 5, w != 9 SAT counterexample found! y = 8, x = 1, w = 0, z = 7

counterexample exists
Procedure Unroll loops Translate to SSA form SSA to SMT constraints Program Analysis Engine CNF SMT Solver Claim Bound (n) SAT counterexample exists UNSAT no counterexample of bound n

Loop Unwinding All loops are unwound
Presentation Title 2018/9/5 All loops are unwound can use different unwinding bounds for different loops to check whether unwinding is sufficient special “unwinding assertion” claims are added If a program satisfies all of its claims and all unwinding assertions then it is correct! Same for backward goto jumps and recursive functions Consider the example: while (i) i=i-1; © 2006 Carnegie Mellon University

Loop Unwinding while() loops are unwound iteratively void f(...) { ...
Presentation Title 2018/9/5 void f(...) { ... while(cond) { Body; } Remainder; while() loops are unwound iteratively Break / continue replaced by goto © 2006 Carnegie Mellon University

Loop Unwinding while() loops are unwound iteratively void f(...) { ...
Presentation Title 2018/9/5 void f(...) { ... if(cond) { Body; while(cond) { } Remainder; while() loops are unwound iteratively Break / continue replaced by goto © 2006 Carnegie Mellon University

Unwinding assertion while() loops are unwound iteratively
Presentation Title 2018/9/5 void f(...) { ... if(cond) { Body; while(cond) { } Remainder; while() loops are unwound iteratively Break / continue replaced by goto Assertion inserted after last iteration: violated if program runs longer than bound permits © 2006 Carnegie Mellon University

Unwinding assertion Unwinding assertion
Presentation Title 2018/9/5 void f(...) { ... if(cond) { Body; assert(!cond); } Remainder; while() loops are unwound iteratively Break / continue replaced by goto Assertion inserted after last iteration: violated if program runs longer than bound permits Unwinding assertion © 2006 Carnegie Mellon University

Example: Sufficient Loop Unwinding
Presentation Title 2018/9/5 void f(...) { j = 1 while (j <= 2) j = j + 1; Remainder; } void f(...) { j = 1 if(j <= 2) { j = j + 1; assert(!(j <= 2)); } Remainder; unwind = 3 © 2006 Carnegie Mellon University

Example: Insufficient Loop Unwinding
Presentation Title 2018/9/5 void f(...) { j = 1 while (j <= 10) j = j + 1; Remainder; } void f(...) { j = 1 if(j <= 10) { j = j + 1; assert(!(j <= 10)); } Remainder; unwind = 3 © 2006 Carnegie Mellon University

Transforming Loop-Free Programs Into Equations (1)
Presentation Title 2018/9/5 Transforming Loop-Free Programs Into Equations (1) Easy to transform when every variable is only assigned once! Program Constraints x = a; y = x + 1; z = y – 1; x = a && y = x + 1 && z = y – 1 && The meaning is that “the assignment statement unifies the lhs term and the right hand side term” The “=“ symbol on the left is called the assignment operation, but the one on the right is called the equivalence operation. On the right, x=a, y=x+1, z=y-1 have boolean values, meaning they are either true or false. X, a, x+1, z, y-1 are called terms, or bit vectors, the logic is therefore called bit vector logic, more details about bit vector logic will be disclosed later. How can we connect propositional logic to the constraints we see here? Well, one naïve approach is to encode (x=a) as p, (y=x+1) as q, and (z=y-1) as r. then the conditions become, p, q, r. However, we are not able to expression the connection between p and q, q and r. Later on, we will talk about how to translate such bit vector logic formula to propositional formulas. © 2006 Carnegie Mellon University

Transforming Loop-Free Programs Into Equations (2)
Presentation Title 2018/9/5 Transforming Loop-Free Programs Into Equations (2) When a variable is assigned multiple times, use a new variable for the RHS of each assignment Program SSA Program \rho (e) means the expression after renaming The motivation is that a variable may have different values during program execution, however, a variable can be assigned with one value in a logic formula, decided by the valuation function. Consider the following example x=10; x=x+x; assert (x!=20) The assertion is not valid. However, if we translate the program into the following equation x=10 && x=x+x && x=20 The root cause is that in bit-vector logic, a variable has one unique value assignment. The right way of encoding is that: x1=10 && x2=x1+x1 && x2=20 © 2006 Carnegie Mellon University

What about conditionals?
Presentation Title 2018/9/5 What about conditionals? Program SSA Program if (v) x = y; else x = z; w = x; if (v0) x0 = y0; else x1 = z0; w1 = x??; What should ‘x’ be? © 2006 Carnegie Mellon University

What about conditionals?
Presentation Title 2018/9/5 What about conditionals? Program SSA Program if (v) x = y; else x = z; w = x; if (v0) x0 = y0; else x1 = z0; x2 = v0 ? x0 : x1; w1 = x2 For each join point, add new variables with selectors © 2006 Carnegie Mellon University

Encoding Declare symbolic variables for each (SSA) scalar variables
Assignments to equivalence Phi functions to ITE expressions Array accesses to select/store operations Scalar pointer dereferences to identify operations Heap dereferences to select/store operations if (v) p = &x; else p = &y; *p = 10; q=p; z=*q p =(int*) malloc(100); i = 10; q = p+i *q = 10

CBMC: C Bounded Model Checker
Presentation Title 2018/9/5 CBMC: C Bounded Model Checker Developed at CMU by Daniel Kroening et al. Available at: Supported platfoms: Windows (requires VisualStudio’s CL), Linux Provides a command line and Eclipse-based interfaces Known to scale to programs with over 30K LOC Was used to find previously unknown bugs in MS Windows device drivers © 2006 Carnegie Mellon University

Explicit State Model Checking
The program is indeed executing jpf <your class> <parameters> Very similar to “java <your class> <parameters> Execute in a way that all possible scenarios are explored Thread interleaving Undeterministic values (random values) Concrete input is provided A state is indeed a concrete state, consisting of Concrete values in heap/stack memory Jpf, spin, slam

An Example

One execution corresponds to one path.
An Example (cont.) One execution corresponds to one path.

JPF explores multiple possible executions GIVEN THE SAME CONCRETE INPUT

Two Essential Capabilities
Backtracking Means that JPF can restore previous execution states, to see if there are unexplored choices left. While this is theoretically can be achieved by re-executing the program from the beginning, backtracking is a much more efficient mechanism if state storage is optimized. State matching JPF checks every new state if it already has seen an equal one, in which case there is no use to continue along the current execution path, and JPF can backtrack to the nearest non-explored non-deterministic choice Heap and thread-stack snapshots.

State Abstraction Eliminate details irrelevant to the property
Obtain simple finite models sufficient to verify the property Disadvantage Loss of Precision: False positives/negatives

Data Abstraction h S S’ Abstraction Function h : from S to S’

Data Abstraction Example
Abstraction proceeds component-wise, where variables are components Even Odd …, -2, 0, 2, 4, … x:int …, -3, -1, 1, 3, … Pos Neg Zero …, -3, -2, -1 y:int 1, 2, 3, …

How do we Abstract Behaviors?
Abstract domain A Abstract concrete values to those in A Then compute transitions in the abstract domain

Data Type Abstraction Code Abstract Data domain int x = 0; if (x == 0)
x = x + 1; (n<0) : NEG (n==0): ZERO (n>0) : POS Signs NEG POS ZERO Signs x = ZERO; if (Signs.eq(x,ZERO)) x = Signs.add(x,POS); we transform the code so that to operate on the abs domain and it looks like this; here the concrete type int was replaced by abs type signs, concrete constants 0 and 1 were replaced with abs ct 0 and pos; and primitive ops on ints were replaced with calls to some methods that implement the abs.ops.that manipulate abstract values. Ex : equality operator was replaced with a call to method signs.eq and + was replace by signs.add . So, how do we apply this abstraction technique to the DEOS example ? We have to decide which variables to abstract, what abstrations to use and then we have to effectively transform the system to encode the abstractions. The translation s homomorphic. * is an operator on the original domain is an operator on the abstract domain.

Existential/Universal Abstractions
Make a transition from an abstract state if at least one corresponding concrete state has the transition. Abstract model M’ simulates concrete model M Universal Make a transition from an abstract state if all the corresponding concrete states have the transition.

Existential Abstraction (Over-approximation)
h Use the x=x+1 example (abstracted to the Signs domain) to explain the idea. Red is a faulty state x=x+1; y=10/x This makes x=0 a faulty state. I I S’

Universal Abstraction (Under-Approximation)
h abstract state 1 does not have a self-loop, last state does not have a successor abs state 1 does have a transition to the next state. no self loop on the red state I S’

Guarantees from Abstraction
Assume M’ is an abstraction of M Strong Preservation: P holds in M’ iff P holds in M Weak Preservation: P holds in M’ implies P holds in M

Guarantees from Exist. Abstraction
Let φ be a hold-for-all-paths property M’ existentially abstracts M M’ Preservation Theorem M’ ⊨ φ  M ⊨ φ M Converse does not hold M’ ⊭ φ  M ⊭ φ M’ ⊭ φ : counterexample may be spurious

Spurious counterexample in Over-approximation
Deadend states I Bad States I Failure State f

Refinement Problem: Deadend and Bad States are in the same abstract state. Solution: Refine abstraction function. The sets of Deadend and Bad states should be separated into different abstract states.

Refinement h’ Refinement : h’

Automated Abstraction/Refinement
Good abstractions are hard to obtain Automate both Abstraction and Refinement processes Counterexample-Guided AR (CEGAR) Build an abstract model M’ Model check property P, M’ ⊨ P? If M’ ⊨ P, then M ⊨ P by Preservation Theorem Otherwise, check if Counterexample (CE) is spurious Refine abstract state space using CE analysis results Repeat

Counterexample-Guided Abstraction-Refinement (CEGAR)
Build New Abstract Model Model Check M M’ Pass No Bug Fail Check Counterexample Obtain Refinement Cue Spurious CE Real CE Bug

Predicate Abstraction
Extract a finite state model from an infinite state system Used to prove assertions or safety properties Successfully applied for verification of C programs SLAM (used in windows device driver verification) MAGIC, BLAST, F-Soft If the property holds on the abstract model it also holds on the original C program.

Example for Predicate Abstraction
void main() { bool p1, p2; p1=TRUE; p2=TRUE; while(p2) { p1=p1?FALSE:nondet(); p2=!p2; } int main() { int i; i=0; while(even(i)) i++; } + = p1  i=0 p2  even(i) In predicate abstraction a given program is abstracted with respect to a given set of predicates. Consider the C program which has one integer variable i. The boolean program has the same control structure as that of the original C program. Each statement in the C program is replaced by its affect on the predicates in the boolean program. NOW THE QUESTION IS HOW DO WE GET THE PREDICATES and WHERE SHOULD WE USE WHICH PREDICATE???? WHAT PREDICATES ARE USEFUL FOR PROVING THE PROPERTY. We will try to answer these questions in the next few slides. C program Predicates Boolean program [Graf, Saidi ’97] [Ball, Rajamani ’01]

Computing Predicate Abstraction
How to get predicates for checking a given property? How do we compute the abstraction? Predicate Abstraction is an over-approximation How to refine coarse abstractions

Example Example ( ) { 1: do{ lock(); old = new; q = q->next;
2: if (q != NULL){ 3: q->data = new; unlock(); new ++; } 4: } while(new != old); 5: unlock (); return; lock unlock

What a program really is…
State Transition pc lock old new q  3   5  0x133a 3: unlock(); new++; 4:} … pc lock old new q  4   5  6  0x133a Example ( ) { 1: do{ lock(); old = new; q = q->next; 2: if (q != NULL){ 3: q->data = new; unlock(); new ++; } 4: } while(new != old); 5: unlock (); return;}

The Safety Verification Problem
Error Safe Initial Is there a path from an initial to an error state ? Problem: Infinite state graph Solution : Set of states ' logical formula

Idea 1: Predicate Abstraction
Predicates on program state: lock old = new States satisfying same predicates are equivalent Merged into one abstract state #abstract states is finite

Abstract States and Transitions
pc lock old new q  3   5  0x133a 3: unlock(); new++; 4:} … pc lock old new q  4   5  6  0x133a lock old=new ! lock ! old=new

Abstraction Existential Approximation State lock ! lock old=new
pc lock old new q  3   5  0x133a 3: unlock(); new++; 4:} … pc lock old new q  4   5  6  0x133a lock old=new ! lock ! old=new Existential Approximation

Abstraction State lock ! lock old=new ! old=new 3: unlock(); new++;
pc lock old new q  3   5  0x133a 3: unlock(); new++; 4:} … pc lock old new q  4   5  6  0x133a lock old=new ! lock ! old=new

Analyze Abstraction Problem Analyze finite graph
Over Approximate: Safe => System Safe Problem Spurious counterexamples

Idea 2: Counterex.-Guided Refinement
Solution Use spurious counterexamples to refine abstraction !

Idea 2: Counterex.-Guided Refinement
Solution Use spurious counterexamples to refine abstraction 1. Add predicates to distinguish states across cut 2. Build refined abstraction

Iterative Abstraction-Refinement
Solution Use spurious counterexamples to refine abstraction 1. Add predicates to distinguish states across cut 2. Build refined abstraction -eliminates counterexample 3. Repeat search Till real counterexample or system proved safe [Kurshan et al 93] [Clarke et al 00] [Ball-Rajamani 01]

Build-and-Search 1 1 Predicates: LOCK Example ( ) { 1: do{ lock();
old = new; q = q->next; 2: if (q != NULL){ 3: q->data = new; unlock(); new ++; } 4:}while(new != old); 5: unlock (); 1 ! LOCK 1 Predicates: LOCK

Build-and-Search 1 2 1 2 Predicates: LOCK Example ( ) { 1: do{ lock();
old = new; q = q->next; 2: if (q != NULL){ 3: q->data = new; unlock(); new ++; } 4:}while(new != old); 5: unlock (); lock() old = new q=q->next 1 ! LOCK 2 LOCK 1 2 Predicates: LOCK

Build-and-Search 1 2 3 1 2 3 Predicates: LOCK Example ( ) { 1: do{
old = new; q = q->next; 2: if (q != NULL){ 3: q->data = new; unlock(); new ++; } 4:}while(new != old); 5: unlock (); 1 ! LOCK 2 LOCK [q!=NULL] 3 LOCK 1 2 3 Predicates: LOCK

Build-and-Search 1 2 3 4 4 1 2 3 Predicates: LOCK Example ( ) { 1: do{
old = new; q = q->next; 2: if (q != NULL){ 3: q->data = new; unlock(); new ++; } 4:}while(new != old); 5: unlock (); 1 ! LOCK 2 LOCK 3 q->data = new unlock() new++ LOCK 4 ! LOCK 4 1 2 3 Predicates: LOCK

Build-and-Search 1 2 3 4 5 5 4 1 2 3 Predicates: LOCK Example ( ) {
1: do{ lock(); old = new; q = q->next; 2: if (q != NULL){ 3: q->data = new; unlock(); new ++; } 4:}while(new != old); 5: unlock (); 1 ! LOCK 2 LOCK 3 LOCK 4 ! LOCK [new==old] 5 ! LOCK 5 4 1 2 3 Predicates: LOCK

Build-and-Search 1 2 3 4 5 5 4 1 2 3 Predicates: LOCK Example ( ) {
1: do{ lock(); old = new; q = q->next; 2: if (q != NULL){ 3: q->data = new; unlock(); new ++; } 4:}while(new != old); 5: unlock (); 1 ! LOCK 2 LOCK 3 LOCK 4 ! LOCK 5 ! LOCK 5 unlock() 4 ! LOCK 1 2 3 Predicates: LOCK

Analyze Counterexample
1: do{ lock(); old = new; q = q->next; 2: if (q != NULL){ 3: q->data = new; unlock(); new ++; } 4:}while(new != old); 5: unlock (); 1 ! LOCK lock() old = new q=q->next 2 LOCK [q!=NULL] 3 LOCK q->data = new unlock() new++ 4 ! LOCK [new==old] 5 ! LOCK 5 unlock() 4 ! LOCK 1 2 3 Predicates: LOCK

Analyze Counterexample
1: do{ lock(); old = new; q = q->next; 2: if (q != NULL){ 3: q->data = new; unlock(); new ++; } 4:}while(new != old); 5: unlock (); 1 : LOCK old = new 2 LOCK 3 LOCK new++ 4 : LOCK [new==old] 5 : LOCK 5 Inconsistent 4 : LOCK new == old 1 2 3 Predicates: LOCK

Repeat Build-and-Search
Example ( ) { 1: do{ lock(); old = new; q = q->next; 2: if (q != NULL){ 3: q->data = new; unlock(); new ++; } 4:}while(new != old); 5: unlock (); 1 : LOCK Write out the perdicate abstraction P2= !p2 ? 1: * 1 Predicates: LOCK, new==old

Example ( ) { 1: do{ lock(); old = new; q = q->next; 2: if (q != NULL){ 3: q->data = new; unlock(); new ++; } 4:}while(new != old); 5: unlock (); 1 ! LOCK lock() old = new q=q->next 2 LOCK , new==old 1 2 Predicates: LOCK, new==old

Example ( ) { 1: do{ lock(); old = new; q = q->next; 2: if (q != NULL){ 3: q->data = new; unlock(); new ++; } 4:}while(new != old); 5: unlock (); 1 ! LOCK 2 LOCK , new==old LOCK , new==old 3 q->data = new unlock() new++ 4 ! LOCK , ! new = old 4 1 2 3 Predicates: LOCK, new==old

Example ( ) { 1: do{ lock(); old = new; q = q->next; 2: if (q != NULL){ 3: q->data = new; unlock(); new ++; } 4:}while(new != old); 5: unlock (); 1 ! LOCK 2 LOCK , new==old LOCK , new==old 3 4 ! LOCK , ! new = old [new==old] 4 1 2 3 Predicates: LOCK, new==old

Example ( ) { 1: do{ lock(); old = new; q = q->next; 2: if (q != NULL){ 3: q->data = new; unlock(); new ++; } 4:}while(new != old); 5: unlock (); 1 ! LOCK 2 LOCK , new==old LOCK , new==old 3 4 ! LOCK , ! new = old [new!=old] 1 ! LOCK, ! new == old 4 4 1 2 3 Predicates: LOCK, new==old

Example ( ) { 1: do{ lock(); old = new; q = q->next; 2: if (q != NULL){ 3: q->data = new; unlock(); new ++; } 4:}while(new != old); 5: unlock (); 1 ! LOCK 2 LOCK , new==old SAFE LOCK , new==old 3 4 4 ! LOCK , ! new = old LOCK , new=old 1 5 5 ! LOCK, ! new == old 4 4 4 1 ! LOCK , new==old 2 3 Predicates: LOCK, new==old

Tools for Predicate Abstraction of C
SLAM at Microsoft Used for verifying correct sequencing of function calls in windows device drivers MAGIC at CMU Allows verification of concurrent C programs Found bugs in MicroC OS BLAST at Berkeley Lazy abstraction, interpolation SATABS at CMU Computes predicate abstraction using SAT Can handle pointer arithmetic, bit-vectors F-Soft at NEC Labs Localization, register sharing

Probabilistic Program Analysis
Xiangyu Zhang

Python Probabilistic Type Inference with Natural Language Support
Hi, everyone. My name is Zhaogui Xu from Nanjing University of China. Today I am going to talk my paper "Python Probabilistic Type Inference with Natural Language Support". This work was done when I was a visiting student in Purdue University.

Popularity of Python IEEE Spectrum 2015 (TOP 4 in all languages)
Nowadays, Python is becoming more and more popular. The statistics from IEEE Spectrum 2015 shows the popularity of Python ranks at top 4 among all the programming languages.

Existing Type Inference for Python
Most of existing Python type inferences work by leveraging data flow between untyped variables and variables of known types. [M. S. Master Thesis'04], [M. G. DLS'10], [A. R. OOPSLA'06], [J. A. PyConf'05] x = "S" y = x A String "S" Variable x Variable y Existing type inferences for other dynamic languages have the similar idea. [M. F. OOPSLA'09], [S.H. J. SAS'09], [C. A. ECOOP'05] As we all know, Python is a dynamic language. Variables have no type declarations before using. Therefore, type inference for Python is an important and challenging problem. [Animation] Most existing Python type inferences use a kind of forward data flow analysis that leverages data flow between untyped variables and variables of known types. For example, if we see a string "S" is assigned to variable x, then we know x is of string type. Then if it comes an assignment y equals to x, we further know the type of variable y is also string. In fact, existing type inferences for other dynamic languages have similar ideas. There are also some dynamic analysis to infer types. However these methods require very good test coverage to get a nice perfomance. However good test coverage is usually challenging and time consuming. Some are dynamic analysis, requiring good test coverage. [J.D.A POPL'11]

A Key Challenge for Python Type Inference
Data flow in Python is often incomplete. The callee of f(...) is unknown. Source Code 1: def gzip(f, *args, **kwargs): 2: resp = f(*args, **kwargs) 3: url = resp.url 4: mthd = resp.method 5: data = compress(resp) ... 6: result = resp 7: return result Failed compress(...) is an external function call. Failed The variable resp is failed to infer because the callee of f(...) is unknown. However, existing type inference has limited effectiveness for Python programs. [Animation] The key challenge is that the data flow in Python programs is often incomplete. [Animation] Let's see a simple example. Here is the definition of a library function "gzip". The first argument is "f". It should be a function because it will be called at line 2. Line 3 and line 4 get access to attribute "url" and "method" of "resp". Line 5 is another function call. Here, the function "compress" is an external function. That means we have no source code of this function. Line 6 will assign resp to result which will be returned further. Existing inferences will fail to infer lots of variables here. First of all, gzip is a high level function which is only used by its downstream projects. [Animation] During the development, we do not know which function will called at line 2. Therefore, we also have no idea with the type of variable "resp" at line 2. [Animation] Further more, since we dont know the type of resp, we also dont know the type of variable "result" at line 6. [Animation] Second, line 5 is an external function call. That means we do not know the behavior of this function call. We cannot build a type relation between the parameter "resp" and the return target "data" without the source code. Therefore, existing inferences will fail to infer the type of variable "data". For such situation, existing methods heavily rely on the manual mocking of function "compress". The variable data is failed because the compress is an external function call.

Our Basic Idea Leverage the type hints in a program to infer variable types. Source Code 1: def gzip(f, *args, **kwargs): 2: resp = f(*args, **kwargs) 3: url = resp.url 4: mthd = resp.method 5: data = compress(resp) ... 6: result = resp 7: return result ※ The object referenced by resp must have attributes {"url", "method"}. ※ The naming convention tells us resp is very likely to be Response typed. The variable result may not have any hints from the naming convention. What is accessed in compress(...)? To address this challenge, our basic idea is to use type hints in a program to infer variable types. [Animation] Let's consider the example again. We may find there are many type hints in this program. For example, line 3 and line 4 tell us the object referenced by resp must contain attibutes {"url", "method"}. From the vision of human beings, the naming convention tells us that the variable resp is very likely to be Response typed in the type domain. These hints are very helpful for us to infer types even we do not have a complete data flow. However, the challenge is that these hints are usually incomplete and also uncertain. For example, at line 5, we do not know which attributes will be accessed during the external function call. Another aspect, not all variables may follow naming convention. For instance, here, the variable result seems does not have any type hints from the naming convention. However, these type hints are incomplete and uncertain. The observed attribute accesses are always incomplete. The developer may sometime NOT follow naming conventions

Our Basic Idea We represent these uncertain type hints into probabilistic constraints and then merge them to conduct a probabilistic inference to infer types. We represent these uncertain type hints into probabilistic constraints and then merge them to conduct a probabilistic inference to infer types.

Probabilistic Constraints
Source Code 1: def gzip(f, *args, **kwargs): 2: resp = f(*args, **kwargs) 3: url = resp.url 4: mthd = resp.method 5: data = compress(resp) ... 6: result = resp 7: return result We analyze each type in the domain one by one. Now, assume we want to infer if any variable's type may be Response. Naming Constraints: The probability from naming convention. C1: N(resp, Response) = 1 (p=0.8) We analyze each type in the domain one by one. Now, assume we want to infer if any variable's type may be Response. [Animation] According to line 2, we can use the variable name resp to generate naming constraints for variable resp. First, constaint C1 means, according to naming convention, resp is an instance of Response with probability of 0.8. Second, constraint C2 means that if resp is of Response based on naming convention, we have 70% confidence that the ultimate type of resp is Response. Note that here ETA is just a threshold representing the belief how much you trust the result from naming convention. C2: N(resp, Response) → P(resp, Response) (η=0.7) A belief how much you trust the result from naming convention.

Naming Convention Learning
A Type Name T in the Domain Extract NL A Set of Labeled Features Train SVM Classifier of T M(T) Training Statically Typed Variables Features A Type Name T in the Domain Extract NL Predict by Predicting Features for x Probability p of x being of type T Features A New Variable Name x M(T) N(x, T) = 1 (p) We get the naming convention through machine learning. [Animation] Basically, we train an individual model for each type in the domain. For the training part, we have two inputs, one is the type T to be trained, and the other is a set of already typed variables. Note that we can leverage existing tools to infer a part of variables as our training data. We extract a set of features from each typed variable. If the inferred types of a variable contains type T, we then set the label to be 1, otherwise we set as 0. We finally got a SVM classfiication model M(T) for each type T. [Animation] For the prediction part, we also have two inputs, one is the type to assert, and the other is the new variable. We extract NL features and put them into the M(T) and it will give us a probability of x being of type T. [Animation] We care a set of NL features for each record. The first is string similarity with type T. This is because we found lots of variable names are similar to their type names. [Animation] For example, the variable name resp is very similar to its type Response. The second are the part-of-speech features. Actually, POS features also give us some hints. [Animation] For example, variable has_connected started with a verb may probably represent a boolean variable. Next is the singular/plural form. [Animation] We also found singular/plural form was useful. For instance, variable connections gives us a hint that it is very likely to represent a collection. String Similarity with the type T Singular/Plural Form Feature ... POS Features NL Features e.g., resp VS Response e.g., has_connnected e.g., connections

Source Code (Class Definitions) 1: class Response(): 2: def __init__(self, ...): 3: self.url = ... 4: self.method = ... 1: class Request(): Source Code 1: def gzip(f, *args, **kwargs): 2: resp = f(*args, **kwargs) 3: url = resp.url 4: mthd = resp.method 5: data = compress(resp) ... 6: result = resp 7: return result How many observed attributes are contained by the instance of type Response? Attribute Constraints: How many types are sharing the observed attributes? [Animation] Let's go on the constraint generation. According to Line 3 and 4, we can generate attribute constraints for variable resp. [Animation] Constraint C3 means we have a probability 0.95 asserting that the attribute "url" and "method" are in the attribute set of the instances of Response. Here we use probability because the attribute set of a Python object may be dynamically changed at runtime. The probability here is computed according to the definition of Response. [Animation] Constraint C4 represents if we observe that the two attributes are in the attribute set of the instance of Response, we have 80% confidence to say the type of resp is Response. The reason why we use probability here is because there may be mulitple variables having the attributes we observed. For instance, in the project, the type Request also contain the attribute "url" and "method", so we cannot be sure resp must be of Response type. The more types are sharing the observed attributes, the lower probability we will have. C3: {"url", "method"} ⊂ A(Response) = 1 (p0=0.95) C4: {"url", "method"} ⊂ A(Response) → P(resp, Response) (p'=0.8) ...

Source Code 1: def gzip(f, *args, **kwargs): 2: resp = f(*args, **kwargs) 3: url = resp.url 4: mthd = resp.method 5: data = compress(resp) ... 6: result = resp 7: return result Data Flow Constraints: C6: P(resp, Response) → P(result, Response) (1.0) C7: P(result, Response) → P(resp, Response) (1.0) [Antimation] Let's come to line 6. We may generate two kinds of constraints. One kind is the data flow constraints between variable resp and result. The other is the naming constraints for variable result. For the data flow constraints, as you can see, the type of result and resp is equivalent with probability 1.0. Naming constraints are similar as we discussed previously. Here the probability of the type of result being Response from the naming convention is low, only having 0.4. Naming Constraints: C8: N(result, Response) = 1 (p=0.4) C9: N(result, Response) → P(result, Response) (η=0.7)

Probabilistic Inference
Basic Notations :( We represent each probabilistic constraint as a probabilistic function: We conjoin all the probablistic functions Then compute the joint probability through normalization Now, with these constraints, we begin merge them and conduct the probalilistic inference. First of all, we have to give some basic notations about the probabilistic inference. [Antimation] First, we respresent each probabilistic constraint as a probabilistic function, and if the constraint is true, the function value is p, otherwise it is 1 - p. Second, we conjoin all the probabilistic functions. Then compute the joint probability function through normalization. Finally, our target is to compute the marginal probability of each boolean variable. Intuitively, the marginal probability of x_i equals to the sum of product of the joint probability of all variables except x_i. Our target is to compute the marginal probability p(xi) is denoted as

Probabilistic Inference
Probabilistic Function Factor Graph Source Code 1: def gzip(f, *args, **kwargs): 2: resp = f(*args, **kwargs) 3: url = resp.url 4: mthd = resp.method 5: data = compress(resp) ... 6: result = resp 7: return result Probabilistic Constraints Factor P(result, Response) → P(resp, Response) (1.0) C7 P(resp, Response) → P(result, Response) (1.0) C6 P(resp, Response) → {"url", "method"} ⊂ A(Response) (p=0.95) C5 {"url", "method"} ⊂ A(Response) → P(resp, Response) (p'=0.8) C4 N(resp, Response) → P(resp, Response) (η=0.7) C2 N(result, Response) → P(result, Response) (η=0.7) C9 {"url", "method"} ⊂ A(Response) = 1 (p0=0.95) C3 N(resp, Response) = 1 (p=0.8) C1 N(result, Response) = 1 (p=0.4) C8 Predicate Boolean Variable P(result, Response) x1 P(resp, Response) x2 {"url", "method"} ⊂ A(Response) x3 N(resp, Response) x4 N(result, Response) x5 In fact the computation of marginal probability is very expensive. In our implementation, we use a graphical model called factor graph to represent the probabilistic consraints. The factor graph consists of two kinds of nodes, one kind is the variable node. Here each variable node represents a predicate involved in this case. The other kind is the factor node. Each factor represents a probabilistic constraint.

{"url", "method"} ⊂ A(Response)
Probabilistic Inference Factor Graph C3 C4 C5 C2 C1 x3 x2 x4 {"url", "method"} ⊂ A(Response) P(resp, Response) N(resp, Response) Sum-Product Algorithm C4 Incoming message outcoming message Incoming message outcoming message x2 x2 x4 Here, we only show a part of the whole factor graph. As you can see each factor node is connected with variable nodes. We use sum-product algorithm to compute the marginal probability on the factor graph. This algorithm is an iterative message passing algorithm. It only propagates probabilities between adjacent nodes through message passing. For each node, it will integrate all the messages it receives and propagate the computed probability to its receivers. The algorithm will terminate once all the probabilites converge. Eventually, we compute the probability that the type of result being Response is 0.91. C2 C2 Incoming message C5 P(result, Response) = 0.91 Message Passing from Factor to Variable Message Passing from Variable to Factor

Probabilistic Forensics
Memory forensics

Software Model Checking

Similar presentations

Presentation on theme: "Software Model Checking"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Software Model Checking

Similar presentations

Presentation on theme: "Software Model Checking"— Presentation transcript:

Similar presentations

About project

Feedback