Software Model Checking with SLAM

Slides:

Advertisements

Similar presentations

Advanced programming tools at Microsoft

Advertisements

The Static Driver Verifier Research Platform

The SLAM Project: Debugging System Software via Static Analysis

Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?

Semantics Static semantics Dynamic semantics attribute grammars

Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.

A Program Transformation For Faster Goal-Directed Search Akash Lal, Shaz Qadeer Microsoft Research.

Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.

Bebop: A Symbolic Model Checker for Boolean Programs Thomas Ball Sriram K. Rajamani

1 Lecture 08(c) – Predicate Abstraction and SLAM Eran Yahav.

1 Thorough Static Analysis of Device Drivers Byron Cook – Microsoft Research Joint work with: Tom Ball, Vladimir Levin, Jakob Lichtenberg,

A survey of techniques for precise program slicing Komondoor V. Raghavan Indian Institute of Science, Bangalore.

Chair of Software Engineering Software Verification Stephan van Staden Lecture 10: Model Checking.

Automatic Predicate Abstraction of C-Programs T. Ball, R. Majumdar T. Millstein, S. Rajamani.

Thomas Ball, Rupak Majumdar, Todd Millstein, Sriram K. Rajamani Presented by Yifan Li November 22nd In PLDI 01: Programming Language.

1 Automatic Predicate Abstraction of C Programs Parts of the slides are from

ISBN Chapter 3 Describing Syntax and Semantics.

Verification of parameterised systems

Symmetry-Aware Predicate Abstraction for Shared-Variable Concurrent Programs Alastair Donaldson, Alexander Kaiser, Daniel Kroening, and Thomas Wahl Computer.

The Software Model Checker BLAST by Dirk Beyer, Thomas A. Henzinger, Ranjit Jhala and Rupak Majumdar Presented by Yunho Kim Provable Software Lab, KAIST.

Software Verification via Refinement Checking Sagar Chaki, Edmund Clarke, Alex Groce, CMU Somesh Jha, Wisconsin.

Termination Proofs for Systems Code Andrey Rybalchenko, EPFL/MPI joint work with Byron Cook, MSR and Andreas Podelski, MPI PLDI’2006, Ottawa.

Lazy Abstraction Thomas A. Henzinger Ranjit Jhala Rupak Majumdar Grégoire Sutre UC Berkeley.

Synergy: A New Algorithm for Property Checking

Secrets of Software Model Checking Thomas Ball Sriram K. Rajamani Software Productivity Tools Microsoft Research

1 Predicate Abstraction of ANSI-C Programs using SAT Edmund Clarke Daniel Kroening Natalia Sharygina Karen Yorav (modified by Zaher Andraus for presentation.

Speeding Up Dataflow Analysis Using Flow- Insensitive Pointer Analysis Stephen Adams, Tom Ball, Manuvir Das Sorin Lerner, Mark Seigle Westley Weimer Microsoft.

CS 267: Automated Verification Lectures 14: Predicate Abstraction, Counter- Example Guided Abstraction Refinement, Abstract Interpretation Instructor:

Automatically Validating Temporal Safety Properties of Interfaces Thomas Ball and Sriram K. Rajamani Software Productivity Tools, Microsoft Research Presented.

Predicate Abstraction for Software and Hardware Verification Himanshu Jain Model checking seminar April 22, 2005.

ESC Java. Static Analysis Spectrum Power Cost Type checking Data-flow analysis Model checking Program verification AutomatedManual ESC.

Review: forward E { P } { P && E } TF { P && ! E } { P 1 } { P 2 } { P 1 || P 2 } x = E { P } { \exists … }

Review: forward E { P } { P && E } TF { P && ! E } { P 1 } { P 2 } { P 1 || P 2 } x = E { P } { \exists … }

Describing Syntax and Semantics

Lazy Abstraction Tom Henzinger Ranjit Jhala Rupak Majumdar Grégoire Sutre.

Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.

Automatic Predicate Abstraction of C Programs Thomas BallMicrosoft Rupak MajumdarUC Berkeley Todd MillsteinU Washington Sriram K. RajamaniMicrosoft

Model Checking Lecture 5. Outline 1 Specifications: logic vs. automata, linear vs. branching, safety vs. liveness 2 Graph algorithms for model checking.

Formal Verification of SpecC Programs using Predicate Abstraction Himanshu Jain Daniel Kroening Edmund Clarke Carnegie Mellon University.

272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 4: SMT-based Bounded Model Checking of Concurrent Software.

Software Model Checking with SLAM Thomas Ball Testing, Verification and Measurement Sriram K. Rajamani Software Productivity Tools Microsoft Research

Lecture #11 Software Model Checking: automating the search for abstractions Thomas Ball Testing, Verification and Measurement Microsoft Research.

272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 3: Modular Verification with Magic, Predicate Abstraction.

Rule Checking SLAM Checking Temporal Properties of Software with Boolean Programs Thomas Ball, Sriram K. Rajamani Microsoft Research Presented by Okan.

Aditya V. Nori, Sriram K. Rajamani Microsoft Research India.

SLAM :Software Model Checking From Theory To Practice Sriram K. Rajamani Software Productivity Tools Microsoft Research.

Parameterized Verification of Thread-safe Libraries Thomas Ball Sagar Chaki Sriram K. Rajamani.

Newton: A tool for generating abstract explanations of infeasibility1 The Problem P (C Program) BP (Boolean Program of P) CFG(P) CFG(BP)

Thomas Ball Sriram K. Rajamani

Use of Models in Analysis and Design Sriram K. Rajamani Rigorous Software Engineering Microsoft Research, India.

Automatically Validating Temporal Safety Properties of Interfaces Thomas Ball, Sriram K. MSR Presented by Xin Li.

Convergence of Model Checking & Program Analysis Philippe Giabbanelli CMPT 894 – Spring 2008.

Chapter 3 Part II Describing Syntax and Semantics.

An Undergraduate Course on Software Bug Detection Tools and Techniques Eric Larson Seattle University March 3, 2006.

SLAM internals Sriram K. Rajamani Rigorous Software Engineering Microsoft Research, India.

Verification & Validation By: Amir Masoud Gharehbaghi

Predicate Abstraction. Abstract state space exploration Method: (1) start in the abstract initial state (2) use to compute reachable states (invariants)

The Yogi Project Software property checking via static analysis and testing Aditya V. Nori, Sriram K. Rajamani, Sai Deep Tetali, Aditya V. Thakur Microsoft.

Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.

1 Automatically Validating Temporal Safety Properties of Interfaces - Overview of SLAM Parts of the slides are from

Counter Example Guided Refinement CEGAR Mooly Sagiv.

Boolean Programs: A Model and Process For Software Analysis By Thomas Ball and Sriram K. Rajamani Microsoft technical paper MSR-TR Presented by.

Presentation Title 2/4/2018 Software Verification using Predicate Abstraction and Iterative Refinement: Part Bug Catching: Automated Program Verification.

Formal methods: Lecture

Compilers Principles, Techniques, & Tools Taught by Jing Zhang

Symbolic Implementation of the Best Transformer

Over-Approximating Boolean Programs with Unbounded Thread Creation

Predicate Abstraction

Course: CS60030 Formal Systems

Presentation transcript:

Software Model Checking with SLAM Thomas Ball Testing, Verification and Measurement Sriram K. Rajamani Software Productivity Tools Microsoft Research http://research.microsoft.com/slam/

People behind SLAM Summer interns Visitors Windows Partners Sagar Chaki, Todd Millstein, Rupak Majumdar (2000) Satyaki Das, Wes Weimer, Robby (2001) Jakob Lichtenberg, Mayur Naik (2002) Visitors Giorgio Delzanno, Andreas Podelski, Stefan Schwoon Windows Partners Byron Cook, Vladimir Levin, Abdullah Ustuner

Outline Part I: Overview (30 min) Part II: Basic SLAM (1 hour) overview of SLAM process demonstration (Static Driver Verifier) lessons learned Part II: Basic SLAM (1 hour) foundations basic algorithms (no pointers) Part III: Advanced Topics (30 min) pointers + procedures imprecision in aliasing and mod analysis

Part I: Overview of SLAM

What is SLAM? SLAM is a software model checking project at Microsoft Research Goal: Check C programs (system software) against safety properties using model checking Application domain: device drivers Starting to be used internally inside Windows Working on making this into a product Keep this short!!!

Source Code Rules Static Driver Verifier Development Testing Precise API Usage Rules (SLIC) Software Model Checking Read for understanding New API rules Drive testing tools Rules Static Driver Verifier Defects 100% path coverage Development Testing elapsed time: 2 Source Code

SLAM – Software Model Checking SLAM innovations boolean programs: a new model for software model creation (c2bp) model checking (bebop) model refinement (newton) SLAM toolkit built on MSR program analysis infrastructure Keep this short!!!

SLIC Finite state language for stating rules monitors behavior of C code temporal safety properties (security automata) familiar C syntax Suitable for expressing control-dominated properties e.g. proper sequence of events can encode data values inside state

State Machine for Locking enum {Locked,Unlocked} s = Unlocked; } KeAcquireSpinLock.entry { if (s==Locked) abort; else s = Locked; KeReleaseSpinLock.entry { if (s==Unlocked) abort; else s = Unlocked; Locking Rule in SLIC State Machine for Locking Rel Acq Unlocked Locked Rel Acq Error

The SLAM Process boolean c2bp program prog. P prog. P’ slic bebop SLIC rule predicates path newton

Example do { KeAcquireSpinLock(); nPacketsOld = nPackets; if(request){ Does this code obey the locking rule? do { KeAcquireSpinLock(); nPacketsOld = nPackets; if(request){ request = request->Next; KeReleaseSpinLock(); nPackets++; } } while (nPackets != nPacketsOld);

Example do { KeAcquireSpinLock(); if(*){ KeReleaseSpinLock(); } Model checking boolean program (bebop) do { KeAcquireSpinLock(); if(*){ KeReleaseSpinLock(); } } while (*); U L L L U L U L U U E

Example do { KeAcquireSpinLock(); nPacketsOld = nPackets; if(request){ Is error path feasible in C program? (newton) do { KeAcquireSpinLock(); nPacketsOld = nPackets; if(request){ request = request->Next; KeReleaseSpinLock(); nPackets++; } } while (nPackets != nPacketsOld); U L L L U L U L U U E

b : (nPacketsOld == nPackets) Example Add new predicate to boolean program (c2bp) b : (nPacketsOld == nPackets) do { KeAcquireSpinLock(); nPacketsOld = nPackets; b = true; if(request){ request = request->Next; KeReleaseSpinLock(); nPackets++; b = b ? false : *; } } while (nPackets != nPacketsOld); !b U L L L U L U L U U E

b : (nPacketsOld == nPackets) Example Model checking refined boolean program (bebop) b : (nPacketsOld == nPackets) do { KeAcquireSpinLock(); b = true; if(*){ KeReleaseSpinLock(); b = b ? false : *; } } while ( !b ); U L b L b L b U b !b L U b L U b U E

b : (nPacketsOld == nPackets) Example Model checking refined boolean program (bebop) b : (nPacketsOld == nPackets) do { KeAcquireSpinLock(); b = true; if(*){ KeReleaseSpinLock(); b = b ? false : *; } } while ( !b ); U L b L b L b U b !b L U b L b U

Observations about SLAM Automatic discovery of invariants driven by property and a finite set of (false) execution paths predicates are not invariants, but observations abstraction + model checking computes inductive invariants (boolean combinations of observations) A hybrid dynamic/static analysis newton executes path through C code symbolically c2bp+bebop explore all paths through abstraction A new form of program slicing program code and data not relevant to property are dropped non-determinism allows slices to have more behaviors

Static Driver Verifier results Part I: Demo Static Driver Verifier results

Part I: Lessons Learned

SLAM Boolean program model has proved itself Successful for domain of device drivers control-dominated safety properties few boolean variables needed to do proof or find real counterexamples Counterexample-driven refinement terminates in practice incompleteness of theorem prover not an issue

What is hard? Abstracting from a language with pointers (C) to one without pointers (boolean programs) All side effects need to be modeled by copying (as in dataflow) Open environment problem

What stayed fixed? Boolean program model Basic tool flow Repercussions: newton has to copy between scopes c2bp has to model side-effects by value-result finite depth precision on the heap is all boolean programs can do

What changed? Interface between newton and c2bp We now use predicates for doing more things refine alias precision via aliasing predicates newton helps resolve pointer aliasing imprecision in c2bp

Scaling SLAM Largest driver we have processed has ~60K lines of code Largest abstractions we have analyzed have several hundred boolean variables Routinely get results after 20-30 iterations Out of 672 runs we do daily, 607 terminate within 20 minutes

Scale and SLAM components Out of 67 runs that time out, tools that take longest time: bebop: 50, c2bp: 10, newton: 5, constrain: 2 C2bp: fast predicate abstraction (fastF) and incremental predicate abstraction (constrain) re-use across iterations Newton: biggest problems are due to scope-copying Bebop: biggest issue is no re-use across iterations solution in the works

SLAM Status 2000-2001 March 2002 May 2002 July 2002 September 3, 2002 foundations, algorithms, prototyping papers in CAV, PLDI, POPL, SPIN, TACAS March 2002 Bill Gates review May 2002 Windows committed to hire two people with model checking background to support Static Driver Verifier (SLAM+driver rules) July 2002 running SLAM on 100+ drivers, 20+ properties September 3, 2002 made initial release of SDV to Windows (friends and family) April 1, 2003 made wide release of SDV to Windows (any internal driver developer)

Part II: Basic SLAM

C- Types  ::= void | bool | int | ref  Expressions e ::= c | x | e1 op e2 | &x | *x LExpression l ::= x | *x Declaration d ::=  x1,x2 ,…,xn Statements s ::= skip | goto L1,L2 …Ln | L: s | assume(e) | l = e | l = f (e1 ,e2 ,…,en ) | return x | s1 ; s2 ;… ; sn Procedures p ::=  f (x1: 1,x2 : 2,…,xn : n ) Program g ::= d1 d2 … dn p1 p2 … pn

C-- Types  ::= void | bool | int Expressions e ::= c | x | e1 op e2 LExpression l ::= x Declaration d ::=  x1,x2 ,…,xn Statements s ::= skip | goto L1,L2 …Ln | L: s | assume(e) | l = e | f (e1 ,e2 ,…,en ) | return | s1 ; s2 ;… ; sn Procedures p ::= f (x1: 1,x2 : 2,…,xn : n ) Program g ::= d1 d2 … dn p1 p2 … pn

BP Types  ::= void | bool Expressions e ::= c | x | e1 op e2 LExpression l ::= x Declaration d ::=  x1,x2 ,…,xn Statements s ::= skip | goto L1,L2 …Ln | L: s | assume(e) | l = e | f (e1 ,e2 ,…,en ) | return | s1 ; s2 ;… ; sn Procedures p ::= f (x1: 1,x2 : 2,…,xn : n ) Program g ::= d1 d2 … dn p1 p2 … pn

Syntactic sugar goto L1, L2; if (e) { L1: assume(e); S1; S1; } else {

Example, in C int g; void cmp (int a , int b) { main(int x, int y){ if (a == b) g = 0; else g = 1; } int g; main(int x, int y){ cmp(x, y); if (!g) { if (x != y) assert(0); }

Example, in C-- int g; main(int x, int y){ cmp(x, y); assume(!g); assume(x != y) assert(0); } void cmp(int a , int b) { goto L1, L2; L1: assume(a==b); g = 0; return; L2: assume(a!=b); g = 1; }

The SLAM Process boolean c2bp program prog. P prog. P’ slic bebop SLIC rule predicates path newton

c2bp: Predicate Abstraction for C Programs Given P : a C program F = {e1,...,en} each ei a pure boolean expression each ei represents set of states for which ei is true Produce a boolean program B(P,F) same control-flow structure as P boolean vars {b1,...,bn} to match {e1,...,en} properties true of B(P,F) are true of P Note: May abstraction. Do we want to state more formally what our abstraction is and what we are guaranteeing?

Assumptions Given P : a C program F = {e1,...,en} each ei a pure boolean expression each ei represents set of states for which ei is true Assume: each ei uses either: only globals (global predicate) local variables from some procedure (local predicate for that procedure) Mixed predicates: predicates using both local variables and global variables complicate “return” processing covered in advanced topics

C2bp Algorithm Performs modular abstraction abstracts each procedure in isolation Within each procedure, abstracts each statement in isolation no control-flow analysis no need for loop invariants simple expressions – conditional test, rhs of assn, proc args

Preds: {x==y} {g==0} {a==b} void cmp (int a , int b) { int g; goto L1, L2 L1: assume(a==b); g = 0; return; L2: assume(a!=b); g = 1; } int g; main(int x, int y){ cmp(x, y); assume(!g); assume(x != y) assert(0); } Preds: {x==y} {g==0} {a==b}

Preds: {x==y} {g==0} {a==b} void cmp (int a , int b) { int g; goto L1, L2 L1: assume(a==b); g = 0; return; L2: assume(a!=b); g = 1; } int g; main(int x, int y){ cmp(x, y); assume(!g); assume(x != y) assert(0); } void cmp ( {a==b} ) { } decl {g==0} ; main( {x==y} ) { } Preds: {x==y} {g==0} {a==b}

Preds: {x==y} {g==0} {a==b} void equal (int a , int b) { int g; goto L1, L2 L1: assume(a==b); g = 0; return; L2: assume(a!=b); g = 1; } int g; main(int x, int y){ cmp(x, y); assume(!g); assume(x != y) assert(0); } void cmp ( {a==b} ) { goto L1, L2; L1: assume( {a==b} ); {g==0} = T; return; L2: assume( !{a==b} ); {g==0} = F; } decl {g==0} ; main( {x==y} ) { cmp( {x==y} ); assume( {g==0} ); assume( !{x==y} ); assert(0); } Preds: {x==y} {g==0} {a==b}

C-- Types  ::= void | bool | int Expressions e ::= c | x | e1 op e2 LExpression l ::= x Declaration d ::=  x1,x2 ,…,xn Statements s ::= skip | goto L1,L2 …Ln | L: s | assume(e) | l = e | f (e1 ,e2 ,…,en ) | return | s1 ; s2 ;… ; sn Procedures p ::= f (x1: 1,x2 : 2,…,xn : n ) Program g ::= d1 d2 … dn p1 p2 … pn

Abstracting Assigns via WP Statement y=y+1 and F={ y<4, y<5 } {y<4}, {y<5} = ((!{y<5} || !{y<4}) ? F : *), {y<4} ; WP(x=e,Q) = Q[x -> e] WP(y=y+1, y<5) = (y<5) [y -> y+1] = (y+1<5) = (y<4)

WP Problem WP(s, ei) not always expressible via { e1,...,en } Example F = { x==0, x==1, x<5 } WP( x=x+1 , x<5 ) = x<4 Best possible: x==0 || x==1 say why wp is not expressible in terms of E – eis are arbitrary

Abstracting Expressions via F F = { e1,...,en } ImpliesF(e) best boolean function over F that implies e ImpliedByF(e) best boolean function over F implied by e ImpliedByF(e) = !ImpliesF(!e) mention that we have optimizations

ImpliesF(e) and ImpliedByF(e)

Computing ImpliesF(e) minterm m = d1 && ... && dn where di = ei or di = !ei ImpliesF(e) disjunction of all minterms that imply e Naïve approach generate all 2n possible minterms for each minterm m, use decision procedure to check validity of each implication me Many optimizations possible mention that we have optimizations

Abstracting Assignments if ImpliesF(WP(s, ei)) is true before s then ei is true after s if ImpliesF(WP(s, !ei)) is true before s then ei is false after s {ei} = ImpliesF(WP(s, ei)) ? true : ImpliesF(WP(s, !ei)) ? false : *; For each bi

Assignment Example ImpliesF( x==y+1 ) = false Statement in P: Predicates in E: y = y+1; {x==y} Weakest Precondition: WP(y=y+1, x==y) = x==y+1 ImpliesF( x==y+1 ) = false ImpliesF( x!=y+1 ) = x==y explain why the abstraction is interesting; what it says in terms of bebop. Abstraction of assignment in B: {x==y} = {x==y} ? false : *;

Absracting Assumes assume(e) is abstracted to: Example: WP( assume(e) , Q) = eQ assume(e) is abstracted to: assume( ImpliedByF(e) ) Example: F = {x==2, x<5} assume(x < 2) is abstracted to: assume( {x<5} && !{x==2} )

Abstracting Procedures Each predicate in F is annotated as being either global or local to a particular procedure Procedures abstracted in two passes: a signature is produced for each procedure in isolation procedure calls are abstracted given the callees’ signatures

Abstracting a procedure call a sequence of assignments from actuals to formals see assignment abstraction Procedure return NOP for C-- with assumption that all predicates mention either only globals or only locals with pointers and with mixed predicates: Most complicated part of c2bp Covered in the advanced topics section Need to say a story about procedure return

{x==y} {g==0} {a==b} void cmp (int a , int b) { int g; Goto L1, L2 L1: assume(a==b); g = 0; return; L2: assume(a!=b); g = 1; } int g; main(int x, int y){ cmp(x, y); assume(!g); assume(x != y) assert(0); } void cmp ( {a==b} ) { Goto L1, L2 L1: assume( {a==b} ); {g==0} = T; return; L2: assume( !{a==b} ); {g==0} = F; } decl {g==0} ; main( {x==y} ) { cmp( {x==y} ); assume( {g==0} ); assume( !{x==y} ); assert(0); } {x==y} {g==0} {a==b} Do we want to complicate the example to illustrate S(e) and W(e)??

Precision For program P and E = {e1,...,en}, there exist two “ideal” abstractions: Boolean(P,E) : most precise abstraction Cartesian(P,E) : less precise abtraction, where each boolean variable is updated independently [See Ball-Podelski-Rajamani, TACAS 00] Theory: with an “ideal” theorem prover, c2bp can compute Cartesian(P,E) Practice: c2bp computes a less precise abstraction than Cartesian(P,E) we use Das/Dill’s technique to incrementally improve precision with an “ideal” theorem prover, the combination of c2bp + Das/Dill can compute Boolean(P,E)

The SLAM Process boolean c2bp program prog. P prog. P’ slic bebop SLIC rule predicates path newton

Bebop Model checker for boolean programs Based on CFL reachability [Sharir-Pnueli 81] [Reps-Sagiv-Horwitz 95] Iterative addition of edges to graph “path edges”: <entry,d1>  <v,d2> “summary edges”: <call,d1>  <ret,d2>

Symbolic CFL reachability Partition path edges by their “target” PE(v) = { <d1,d2> | <entry,d1>  <v,d2> } What is <d1,d2> for boolean programs? A bit-vector! What is PE(v)? A set of bit-vectors Use a BDD (attached to v) to represent PE(v)

BDDs Canonical representation of void cmp ( e2 ) { [5]Goto L1, L2 [6]L1: assume( e2 ); [7] gz = T; goto L3; [8]L2: assume( !e2 ); [9]gz = F; goto L3 [10] L3: return; } Canonical representation of boolean functions set of (fixed-length) bitvectors binary relations over finite domains Efficient algorithms for common dataflow operations transfer function join/meet subsumption test e2=e2’ & gz’=e2’ BDD at line [10] of cmp: Read: “cmp leaves e2 unchanged and sets gz to have the same final value as e2”

gz=gz’& e=e’ e=e’& gz’=e’ e=e’ & gz’=1 & e’=1 e2=e gz=gz’& e2=e2’ decl gz ; main( e ) { [1] equal( e ); [2] assume( gz ); [3] assume( !e ); [4] assert(F); } gz=gz’& e=e’ e=e’& gz’=e’ e=e’ & gz’=1 & e’=1 e2=e void cmp ( e2 ) { [5]Goto L1, L2 [6]L1: assume( e2 ); [7] gz = T; goto L3; [8]L2: assume( !e2 ); [9]gz = F; goto L3 [10] L3: return; } gz=gz’& e2=e2’ gz=gz’& e2=e2’& e2’=T e2=e2’& e2’=T & gz’=T gz=gz’& e2=e2’& e2’=F e2=e2’& e2’=F & gz’=F e2=e2’& gz’=e2’

Bebop: summary Explicit representation of CFG Implicit representation of path edges and summary edges Generation of hierarchical error traces Complexity: O(E * 2O(N)) E is the size of the CFG N is the max. number of variables in scope

The SLAM Process boolean c2bp program prog. P prog. P’ slic bebop SLIC rule predicates path newton

Newton Given an error path p in boolean program B is p a feasible path of the corresponding C program? Yes: found an error No: find predicates that explain the infeasibility uses the same interfaces to the theorem provers as c2bp.

Newton Execute path symbolically Check conditions for inconsistency using theorem prover (satisfiability) After detecting inconsistency: minimize inconsistent conditions traverse dependencies obtain predicates

Symbolic simulation for C-- Domains variables: names in the program values: constants + symbols State of the simulator has 3 components: store: map from variables to values conditions: predicates over symbols history: past valuations of the store

Symbolic simulation Algorithm Input: path p For each statement s in p do match s with Assign(x,e): let val = Eval(e) in if (Store[x]) is defined then History[x] := History[x]  Store[x] Store[x] := val Assume(e): Cond := Cond and val let result = CheckConsistency(Cond) in if (result == “inconsistent”) then GenerateInconsistentPredicates() End Say “Path p is feasible”

Symbolic Simulation: Caveats Procedure calls add a stack to the simulator push and pop stack frames on calls and returns implement mappings to keep values “in scope” at calls and returns Dependencies for each condition or store, keep track of which values where used to generate this value traverse dependency during predicate generation

void cmp (int a , int b) { Goto L1, L2 L1: assume(a==b); g = 0; return; L2: assume(a!=b); g = 1; } int g; main(int x, int y){ cmp(x, y); assume(!g); assume(x != y) assert(0); }

void cmp (int a , int b) { Goto L1, L2 int g; L1: assume(a==b); return; L2: assume(a!=b); g = 1; } int g; main(int x, int y){ cmp(x, y); assume(!g); assume(x != y) assert(0); } Global: main: x: X y: Y Conditions:

void cmp (int a , int b) { Goto L1, L2 int g; L1: assume(a==b); return; L2: assume(a!=b); g = 1; } int g; main(int x, int y){ cmp(x, y); assume(!g); assume(x != y) assert(0); } Global: main: x: X y: Y cmp: a: A b: B Conditions: Map: X  A Y  B

void cmp (int a , int b) { Goto L1, L2 int g; L1: assume(a==b); return; L2: assume(a!=b); g = 1; } int g; main(int x, int y){ cmp(x, y); assume(!g); assume(x != y) assert(0); } Global: (6) g: 0 main: x: X y: Y cmp: a: A b: B Give exact queries to theorem prover (as pop-ups?) Conditions: (A == B) [3, 4] Map: X  A Y  B

void cmp (int a , int b) { Goto L1, L2 int g; L1: assume(a==b); return; L2: assume(a!=b); g = 1; } int g; main(int x, int y){ cmp(x, y); assume(!g); assume(x != y) assert(0); } Global: (6) g: 0 main: x: X y: Y cmp: a: A b: B Conditions: (A == B) [3, 4] (X == Y) [5] Map: X  A Y  B

void cmp (int a , int b) { Goto L1, L2 int g; L1: assume(a==b); return; L2: assume(a!=b); g = 1; } int g; main(int x, int y){ cmp(x, y); assume(!g); assume(x != y) assert(0); } Global: (6) g: 0 main: x: X y: Y cmp: a: A b: B Conditions: (A == B) [3, 4] (X == Y) [5] (X != Y) [1, 2]

void cmp (int a , int b) { Goto L1, L2 int g; L1: assume(a==b); return; L2: assume(a!=b); g = 1; } int g; main(int x, int y){ cmp(x, y); assume(!g); assume(x != y) assert(0); } Global: (6) g: 0 main: x: X y: Y cmp: a: A b: B Conditions: (A == B) [3, 4] (X == Y) [5] (X != Y) [1, 2] Contradictory!

void cmp (int a , int b) { Goto L1, L2 int g; L1: assume(a==b); return; L2: assume(a!=b); g = 1; } int g; main(int x, int y){ cmp(x, y); assume(!g); assume(x != y) assert(0); } Global: (6) g: 0 main: x: X y: Y cmp: a: A b: B Conditions: (A == B) [3, 4] (X == Y) [5] (X != Y) [1, 2] Contradictory!

Predicates after simplification: { x == y, a == b } void cmp (int a , int b) { Goto L1, L2 L1: assume(a==b); g = 0; return; L2: assume(a!=b); g = 1; } int g; main(int x, int y){ cmp(x, y); assume(!g); assume(x != y) assert(0); } Predicates after simplification: { x == y, a == b }

Part III: Advanced Topics

C- Types  ::= void | bool | int | ref  Expressions e ::= c | x | e1 op e2 | &x | *x LExpression l ::= x | *x Declaration d ::=  x1,x2 ,…,xn Statements s ::= skip | goto L1,L2 …Ln | L: s | assume(e) | l = e | l = f (e1 ,e2 ,…,en ) | return x | s1 ; s2 ;… ; sn Procedures p ::=  f (x1: 1,x2 : 2,…,xn : n ) Program g ::= d1 d2 … dn p1 p2 … pn Why arent we considering assume???

Two Problems Extending SLAM tools for pointers Dealing with imprecision of alias analysis

Pointers and SLAM With pointers, C supports call by reference Strictly speaking, C supports only call by value With pointers and the address-of operator, one can simulate call-by-reference Boolean programs support only call-by-value-result SLAM mimics call-by-reference with call-by-value-result Extra complications: address operator (&) in C multiple levels of pointer dereference in C

What changes with pointers? C2bp abstracting assignments abstracting procedure returns Newton simulation needs to handle pointer accesses need to copy local heap across scopes to match Bebop’s semantics need to refine imprecise alias analysis using predicates Bebop remains unchanged!

Assignments + Pointers Statement in P: Predicates in E: *p = 3 {x==5} Weakest Precondition: WP( *p=3 , x==5 ) = x==5 What if *p and x alias? Correct Weakest Precondition: (p==&x and 3==5) or (p!=&x and x==5) We use Das’s pointer analysis [PLDI 2000] to prune disjuncts representing infeasible alias scenarios.

Abstracting Procedure Return Need to account for lhs of procedure call mixed predicates side-effects of procedure Boolean programs support only call-by-value-result C2bp models all side-effects using return processing

Abstracting Procedure Returns Let a be an actual at call-site P(…) pre(a) = the value of a before transition to P Let f be a formal of a procedure P pre(f) = the value of f upon entry to P

int R (int f) { int r = f+1; Q() { f = 0; int x = 1; x = R(x) predicate call/return relation call/return assign int R (int f) { int r = f+1; f = 0; return r; } Q() { int x = 1; x = R(x) } f = x {f==pre(f)} {r==pre(f)+1} {x==1} {x==2} pre(f) == pre(x) x = r WP(f=x, f==pre(f) ) = x==pre(f) x==pre(f) is true at the call to R WP(x=r, x==2) = r==2 pre(f)==pre(x) and pre(x)==1 and r==pre(f)+1 implies r==2 {x==1} s Q() { {x==1},{x==2} = T,F; bool R ( {f==pre(f)} ) { {r==pre(f)+1} = {f==pre(f)}; {f==pre(f)} = *; return {r==pre(f)+1}; } {x==2} = s & {x==1}; } s = R(T);

int R (int f) { int r = f+1; Q() { f = 0; int x = 1; x = R(x) predicate call/return relation call/return assign int R (int f) { int r = f+1; f = 0; return r; } Q() { int x = 1; x = R(x) } f = x {f==pre(f)} {r==pre(f)+1} {x==1} {x==2} pre(f) == pre(x) x = r WP(f=x, f==pre(f) ) = x==pre(f) x==pre(f) is true at the call to R WP(x=r, x==2) = r==2 pre(f)==pre(x) and pre(x)==1 and r==pre(f)+1 implies r==2 {x==1} s Q() { {x==1},{x==2} = T,F; bool R ( {f==pre(f)} ) { {r==pre(f)+1} = {f==pre(f)}; {f==pre(f)} = *; return {r==pre(f)+1}; } {x==1}, {x==2} = *, s & {x==1}; } s = R(T);

Extending Pre-states Suppose formal parameter is a pointer pre( *f ) eg. P(int *f) pre( *f ) value of *f upon entry to P can’t change during P * pre( f ) value of dereference of pre( f ) can change during P

pre(x)==1 and pre(*a)==pre(x) and *pre(a)==pre(*a)+1 and pre(a)==&x predicate call/return relation call/return assign {a==pre(a)} {*pre(a)==pre(*a)} Q() { int x = 1; R(&x); } int R (int *a) { *a = *a+1; } {x==1} {x==2} a = &x pre(a) = &x {*pre(a)=pre(*a)+1} pre(x)==1 and pre(*a)==pre(x) and *pre(a)==pre(*a)+1 and pre(a)==&x implies x==2 {x==1} s Q() { {x==1},{x==2} = T,F; bool R ( {a==pre(a)}, {*pre(a)==pre(*a)} ) { {*pre(a)==pre(*a)+1} = {a==pre(a)} & {*pre(a)==pre(*a)} ; return {*pre(a)==pre(*a)+1}; } {x==2} = s & {x==1}; } s = R(T,T);

Newton: what changes with pointers? Simulation needs to handle pointer accesses Need to copy local heap across scopes to match Bebop’s semantics

void foo (int *a) { assume(*a > 5); assert(0); } main(int *x){ assume(*x < 5); foo(x); } void foo (int *a) { assume(*a > 5); assert(0); }

Predicates after simplification: *x < 5 , *a < 5, *a > 5 main(int *x){ assume(*x < 5); foo(x); } void foo (int *a) { assume(*a > 5); assert(0); } Predicates after simplification: *x < 5 , *a < 5, *a > 5 main: x: X *X: Y [1] foo: a: A *A: B [3] Conditions: (Y< 5) [1,2] (B < 5) [3,4,5] (B > 5) [3,4] Contradictory!

SLAM Imprecision due to Alias Imprecision (Flow Insensitivity) {x==0} = T; skip; {x==0} = *; assume( !{x==0} ); x = 0; y = 0; p = &y *p = *p + 1; assume(x!=0); p = &x; Pts-to(p) = { &x, &y} {x==0}

Newton: Path is Infeasible x = 0; y = 0; p = &y if (p == &x) x = x + 1; else y = y + 1; assume(x!=0); p = &x; x = 0; y = 0; p = &y *p = *p + 1; assume(x!=0); p = &x; {x==0} = T; skip; {x==0} = *; assume(!{x==0} );

Consider values in abstract trace x = 0; y = 0; p = &y *p = *p + 1; assume(x!=0); p = &x; {x==0} = T; skip; {x==0} = *; assume(!{x==0} ); {x==0} is true INCONSISTENT {x==0} is false

Consider values in abstract trace x = 0; y = 0; p = &y *p = *p + 1; assume(x!=0); p = &x; {x==0} = T; skip; {x==0} = *; assume(!{x==0} ); WP(*p = *p +1, !(x==0) ) = ((p == &x) and !(x==-1)) or ((p != &x) and !(x==0)) (p!=&x) gets added as a predicate

What changes with pointers? C2bp abstracting assignments abstracting procedure returns Newton simulation needs to handle pointer accesses need to copy local heap across scopes to match Bebop’s semantics need to refine imprecise alias analysis using predicates Bebop remains unchanged!

What worked well? Specific domain problem Safety properties Shoulders & synergies Separation of concerns Summer interns & visitors Sagar Chaki, Todd Millstein, Rupak Majumdar (2000) Satyaki Das, Wes Weimer, Robby (2001) Jakob Lichtenberg, Mayur Naik (2002) Giorgio Delzanno, Andreas Podelski, Stefan Schwoon Windows Partners Byron Cook, Vladimir Levin, Abdullah Ustuner

Future Work Concurrency Rules and environment-models SLAM analyzes drivers one thread at a time Work in progress to analyze interleavings between threads Rules and environment-models Large scale development or rules and environment-models is a challenge How can we simplify and manage development of rules? Modeling C semantics faithfully Theory: Prove that SLAM will make progress on any property and any program Identify classes of programs and properties on which SLAM will terminate

Further Reading See papers, slides from: http://research.microsoft.com/slam

Glossary Model checking Checking properties by systematic exploration of the state-space of a model. Properties are usually specified as state machines, or using temporal logics Safety properties Properties whose violation can be witnessed by a finite run of the system. The most common safety properties are invariants Reachability Specialization of model checking to invariant checking. Properties are specified as invariants. Most common use of model checking. Safety properties can be reduced to reachability. Boolean programs “C”-like programs with only boolean variables. Invariant checking and reachability is decidable for boolean programs. Predicate A Boolean expression over the state-space of the program eg. (x < 5) Predicate abstraction A technique to construct a boolean model from a system using a given set of predicates. Each predicate is represented by a boolean variable in the model. Weakest precondition The weakest precondition of a set of states S with respect to a statement T is the largest set of states from which executing T, when terminating, always results in a state in S.