November 2005Scott Stoller, Stony Brook University1 Software Model Checking: Where It Is, and Where It’s Headed Scott D. Stoller.

Slides:



Advertisements
Similar presentations
Advanced programming tools at Microsoft
Advertisements

Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Abstraction of Source Code (from Bandera lectures and talks)
Masahiro Fujita Yoshihisa Kojima University of Tokyo May 2, 2008
CS 267: Automated Verification Lecture 8: Automata Theoretic Model Checking Instructor: Tevfik Bultan.
1 Chao Wang, Yu Yang*, Aarti Gupta, and Ganesh Gopalakrishnan* NEC Laboratories America, Princeton, NJ * University of Utah, Salt Lake City, UT Dynamic.
A Program Transformation For Faster Goal-Directed Search Akash Lal, Shaz Qadeer Microsoft Research.
Introducing Formal Methods, Module 1, Version 1.1, Oct., Formal Specification and Analytical Verification L 5.
1 Model checking. 2 And now... the system How do we model a reactive system with an automaton ? It is convenient to model systems with Transition systems.
Automatic Verification Book: Chapter 6. What is verification? Traditionally, verification means proof of correctness automatic: model checking deductive:
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Koushik Sen CS 265.
1 Symbolic Execution for Model Checking and Testing Corina Păsăreanu (Kestrel) Joint work with Sarfraz Khurshid (MIT) and Willem Visser (RIACS)
1/20 Generalized Symbolic Execution for Model Checking and Testing Charngki PSWLAB Generalized Symbolic Execution for Model Checking and Testing.
Bebop: A Symbolic Model Checker for Boolean Programs Thomas Ball Sriram K. Rajamani
Carnegie Mellon University Java PathFinder and Model Checking of Programs Guillaume Brat, Dimitra Giannakopoulou, Klaus Havelund, Mike Lowry, Phil Oh,
Rigorous Software Development CSCI-GA Instructor: Thomas Wies Spring 2012 Lecture 13.
CS 267: Automated Verification Lecture 10: Nested Depth First Search, Counter- Example Generation Revisited, Bit-State Hashing, On-The-Fly Model Checking.
Iterative Context Bounding for Systematic Testing of Multithreaded Programs Madan Musuvathi Shaz Qadeer Microsoft Research.
Thomas Ball, Rupak Majumdar, Todd Millstein, Sriram K. Rajamani Presented by Yifan Li November 22nd In PLDI 01: Programming Language.
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
Software Verification via Refinement Checking Sagar Chaki, Edmund Clarke, Alex Groce, CMU Somesh Jha, Wisconsin.
Using Statically Computed Invariants Inside the Predicate Abstraction and Refinement Loop Himanshu Jain Franjo Ivančić Aarti Gupta Ilya Shlyakhter Chao.
1 Carnegie Mellon UniversitySPINFlavio Lerda SPIN An explicit state model checker.
1 Predicate Abstraction of ANSI-C Programs using SAT Edmund Clarke Daniel Kroening Natalia Sharygina Karen Yorav (modified by Zaher Andraus for presentation.
1 Today Another approach to “coverage” Cover “everything” – within a well-defined, feasible limit Bounded Exhaustive Testing.
Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San.
Predicate Abstraction for Software and Hardware Verification Himanshu Jain Model checking seminar April 22, 2005.
Efficient Software Model Checking of Data Structure Properties Paul T. Darga Chandrasekhar Boyapati The University of Michigan.
Describing Syntax and Semantics
Efficient Modular Glass Box Software Model Checking Michael Roberson Chandrasekhar Boyapati The University of Michigan.
CS 267: Automated Verification Lecture 13: Bounded Model Checking Instructor: Tevfik Bultan.
Formal Verification of SpecC Programs using Predicate Abstraction Himanshu Jain Daniel Kroening Edmund Clarke Carnegie Mellon University.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 4: SMT-based Bounded Model Checking of Concurrent Software.
Formal Techniques for Verification Using SystemC By Nasir Mahmood.
DART: Directed Automated Random Testing Koushik Sen University of Illinois Urbana-Champaign Joint work with Patrice Godefroid and Nils Klarlund.
CUTE: A Concolic Unit Testing Engine for C Technical Report Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
Software Engineering Prof. Dr. Bertrand Meyer March 2007 – June 2007 Chair of Software Engineering Static program checking and verification Slides: Based.
Using Model-Checking to Debug Device Firmware Sanjeev Kumar Microprocessor Research Labs, Intel Kai Li Princeton University.
Inferring Specifications to Detect Errors in Code Mana Taghdiri Presented by: Robert Seater MIT Computer Science & AI Lab.
CS 363 Comparative Programming Languages Semantics.
Finding Feasible Counter-examples when Model Checking Abstracted Java Programs Corina S. Pasareanu, Matthew B. Dwyer (Kansas State University) and Willem.
Introduction to Problem Solving. Steps in Programming A Very Simplified Picture –Problem Definition & Analysis – High Level Strategy for a solution –Arriving.
Model Checking Java Programs using Structural Heuristics
Convergence of Model Checking & Program Analysis Philippe Giabbanelli CMPT 894 – Spring 2008.
Symbolic Execution with Abstract Subsumption Checking Saswat Anand College of Computing, Georgia Institute of Technology Corina Păsăreanu QSS, NASA Ames.
Semantics In Text: Chapter 3.
1 Model Checking of Robotic Control Systems Presenting: Sebastian Scherer Authors: Sebastian Scherer, Flavio Lerda, and Edmund M. Clarke.
CIS 842: Specification and Verification of Reactive Systems Lecture INTRO-Examples: Simple BIR-Lite Examples Copyright 2004, Matt Dwyer, John Hatcliff,
CSV 889: Concurrent Software Verification Subodh Sharma Indian Institute of Technology Delhi Scalable Symbolic Execution: KLEE.
Verification & Validation By: Amir Masoud Gharehbaghi
CS265: Dynamic Partial Order Reduction Koushik Sen UC Berkeley.
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.
CUTE: A Concolic Unit Testing Engine for C Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
Concrete Model Checking with Abstract Matching and Refinement Corina Păsăreanu QSS, NASA Ames Research Center Radek Pelánek Masaryk University, Brno, Czech.
CS357 Lecture 13: Symbolic model checking without BDDs Alex Aiken David Dill 1.
( = “unknown yet”) Our novel symbolic execution framework: - extends model checking to programs that have complex inputs with unbounded (very large) data.
CIS 842: Specification and Verification of Reactive Systems Lecture INTRO-Depth-Bounded: Depth-Bounded Depth-first Search Copyright 2004, Matt Dwyer, John.
Agenda  Quick Review  Finish Introduction  Java Threads.
Finding bugs with a constraint solver daniel jackson. mandana vaziri mit laboratory for computer science issta 2000.
Symbolic Model Checking of Software Nishant Sinha with Edmund Clarke, Flavio Lerda, Michael Theobald Carnegie Mellon University.
Presentation Title 2/4/2018 Software Verification using Predicate Abstraction and Iterative Refinement: Part Bug Catching: Automated Program Verification.
Model Checking Java Programs (Java PathFinder)
Over-Approximating Boolean Programs with Unbounded Thread Creation
An explicit state model checker
CUTE: A Concolic Unit Testing Engine for C
Predicate Abstraction
Presentation transcript:

November 2005Scott Stoller, Stony Brook University1 Software Model Checking: Where It Is, and Where It’s Headed Scott D. Stoller

November 2005Scott Stoller, Stony Brook University2 Outline Introduction to Model Checking (MC) Software MC Success Stories Research Directions Partial-order reduction Heuristic search MC + symbolic execution Concrete (little abstraction) MC of Java, C, machine code Heap abstractions Environment modeling

November 2005Scott Stoller, Stony Brook University3 Introduction to Model Checking Model Checking (MC): systematic exploration of the possible behaviors of a system to determine whether the system satisfies a specified property. If property is not satisfied, the model checker provides a counter-example: a path in the state space that violates the property. Abstraction: approximations that reduce the cost of MC. Abstraction may cause false alarms (spurious counterexamples). Unsound abstractions may cause missed errors. MC is path-sensitive. Traditional static analysis isn’t.

November 2005Scott Stoller, Stony Brook University4 Explicit-State MC and Symbolic MC Explicit-State MC: states are manipulated individually. Symbolic MC: sets of states are represented by logical formulas. The set of successors of a set Φ of states is computed by manipulating Φ and the formula representing the system’s transition relation. Symbolic MC using OBDDs (an efficient representation of boolean formulas) is dominant in hardware verification. OBDDs are not as widely used in software verification. Hard to combine with partial-order reduction. Harder to model dynamic memory allocation. Use symbolic execution and constraints instead.

November 2005Scott Stoller, Stony Brook University5 Outline Introduction to Model Checking (MC) Software MC Success Stories Research Directions Partial-order reduction Heuristic search MC + symbolic execution Concrete (little abstraction) MC of Java, C, machine code Heap abstractions Environment modeling

November 2005Scott Stoller, Stony Brook University6 Success Story for MC with Abstraction: Static Driver Verifier (formerly SLAM) [Ball, Rajamani, et al., ] Applied to all drivers developed by Microsoft. Released in Windows Device-Driver Development Kit, Predicate abstraction [Graf and Saidi 1997]: The data state of a C program CP is abstracted by the values of a set of predicates; e.g., start with predicates used in conditionals or in the property. This produces a Boolean program BP, with the usual control structures (loops, procedure calls, etc.), non- deterministic choice (to model "don't know"), and Boolean variables. Use program analysis and theorem prover to compute BP.

November 2005Scott Stoller, Stony Brook University7 SDV: Model Checking of Boolean Programs Compute reachable data states at each point in BP. Explicit+symbolic state representation: explicit for program counter, symbolic (OBDDs) for sets of reachable data states. Exploit scoping of variables. Procedure summaries: The result of analyzing a procedure called in abstract state s0 (captures global vars and args) is a set of resulting states {t0,t1,...} (each captures global vars and return value). Add (s0, {t0,t1,...}) to the procedure summary for later re-use. This optimization avoids redundant computation. It also ensures termination on recursive programs.

November 2005Scott Stoller, Stony Brook University8 SDV: Counter-Example Guided Abstraction Refinement (CEGAR) BP overapproximates the possible behaviors of the C program, because the predicates capture limited info, and the theorem prover may diverge. If MCer says “true”, BP and CP satisfy the property. If MCer produces a counter-example, test its feasibility as follows. Use symbolic execution of CP to construct a predicate Φ that is satisfiable iff the counter-example is feasible. If Φ is satisfiable, CP does not satisfy the property. Otherwise, add predicates in theorem prover's proof that Φ is not satisfiable to the predicate abstraction, and repeat.

November 2005Scott Stoller, Stony Brook University9 SDV: Killer Application Device drivers are a killer app for software MC. Faulty device drivers can easily crash your computer. Most device drivers are a manageable size (< 60 KLOC). It took man-years to write: Formal specification of correct usage of Windows Device Driver API Environment model (test harness) representing the Windows kernel But that effort is amortized over tens of thouands of device drivers.

November 2005Scott Stoller, Stony Brook University10 Success Story for MC Without Abstraction: C Model Checker (CMC) [Musuvathi, Engler, Yang, Twohey, Park, Chou, Dill, 2002-now] Designed for reactive systems: supply an input, wait until system quiesces, record the resulting state. State = everything reachable from specified global vars (no call stack!) Perform heap canonicalization while traversing heap Explicit-state MCer with traditional optimizations: Sub-structure sharing for states on the DFS stack Hash compaction: store 4-8 byte hashes (also called fingerprints) of visited states, instead of the states.

November 2005Scott Stoller, Stony Brook University11 CMC: Heuristic, Properties Checked An unsound abstraction (heuristic): ignore "less interesting" parts of the state when hashing, to avoid exploring similar states. Example: When checking filesys, hash only the state of the filesys, not other parts of the heap or thread stacks. Check for general C programming errors (dangling pointers, array out-of-bounds, memory leaks, etc.) and application-specific properties.

November 2005Scott Stoller, Stony Brook University12 CMC: Applications: Protocols 3 implementations of AODV (ad-hoc on-demand distance vector) routing protocol [OSDI 2002], about 7 KLOC each Found 34 bugs Linux TCP [NSDI 2004]: 50KLOC, plus rest of kernel. Found 4 bugs. State vector: 200 KB. 55% code coverage 92% protocol coverage (code coverage in their reference implementation of TCP: a state machine implemented in 0.5 KLOC).

November 2005Scott Stoller, Stony Brook University13 CMC: Applications: Filesystems Linux Filesystems: ext3, JFS, and ReiserFS [OSDI 2004]. Found critical errors in all three, most patched within a day. Found 32 bugs total. Each entry on stack is 1-3 MB. Non-determinism is used for: Arguments of system calls Branches dependent on environment variables and time (this avoids need for constraints) Success of memory allocation.

November 2005Scott Stoller, Stony Brook University14 Outline Introduction to Model Checking (MC) Software MC Success Stories Research Directions Partial-order reduction Heuristic search MC + symbolic execution Concrete (little abstraction) MC of Java, C, machine code Heap abstractions Environment modeling

November 2005Scott Stoller, Stony Brook University15 Partial-Order (Commutativity) Reductions An interesting state may be reachable by many paths that differ only in the order of operations that commute with each other. Goal of POR: Explore only one of those paths. This avoids exploring and storing intermediate states on the other paths. Note: SDV and CMC mostly ignore concurrency. Traditional PORs are ineffective for shared-variable concurrent programs, because all reads and writes to shared variables are classified as non-commuting.

November 2005Scott Stoller, Stony Brook University16 Lock-Based Partial-Order Reductions In many programs, most shared variables are protected by locks: a thread must hold that lock when accessing the variable. Accesses to a lock-protected variable commute with concurrent transitions of other threads, because those transitions cannot be accesses to that variable. PORs specialized to exploit locking (mutual exclusion): [Stoller 2000], [Flanagan and Qadeer 2003], and [Dwyer, Hatcliff, et al., 2004]. Used in JPF, Bogor, Zing, etc. Very effective in programs that use locks. [Qadeer, Rajamani, and Rehof 2004]: lock-based POR and procedure summarization.

November 2005Scott Stoller, Stony Brook University17 Heuristic Search: Property, Program Structure Search algorithms: A*, beam search, best-first (greedy) search, genetic algorithms Property-based heuristics [Leue, Edelkamp, et al., 2001]: distance in control-flow graph (CFG) to an assertion distance in property automaton to closest error state. Program-based heuristics [Groce and Visser, 2004]: Branch count: prefer uncovered branches, otherwise non-branch instructions, otherwise infrequently taken branches. Choose-free: avoid transitions that perform non- deterministic choice introduced by abstraction.

November 2005Scott Stoller, Stony Brook University18 Heuristic Search: State Change Favor transitions that cause larger state changes (relative to initial state) or cause variables to take on less frequented values Used in DIDUCE [Hangal and Lam 2002] and CMC [Musuvathi+, 2002].

November 2005Scott Stoller, Stony Brook University19 Heuristic Search: Concurrency Most blocked (for deadlocks): favor transitions that cause a thread to block Lock-order: favor execution of threads involved in lock- order conflicts (acquiring locks in different orders) found during runtime monitoring [Havelund 2000]. Races [Havelund 2000]: favor execution of threads involved in races found during runtime monitoring. Synchronization coverage: try to reach each synchronization statement once in a state where it blocks and once in a state where it does not block [Bron, Farchi, Magid, Nir, and Ur 2005].

November 2005Scott Stoller, Stony Brook University20 Heuristic Search: Concurrency Context-bounded MC: favor executions with fewer context switches. Specifically, explore all schedules with 1 context switch, then with 2 context switches, etc. Especially useful with state-less search [Qadeer and Rehof 2005]

November 2005Scott Stoller, Stony Brook University21 MC + Symbolic Execution Symbolic execution (SymEx): a static analysis in which inputs are represented by symbols, and computed values are represented by expressions and constraints. Example: f(x,y) { int z=x; while (z>0) { z = z-y;...}... } Sequence of states: z=X. z=X-Y. z=X-2Y. … And constraints, e.g., after 1 iteration, X-Y>0 in loop body, X-Y≤0 at the point after the loop. Backtrack if the accumulated constraints, called the path condition, are not satisfiable. Unsound abstraction: bound the length of explored executions, to ensure termination.

November 2005Scott Stoller, Stony Brook University22 Testcase Generation Using MC + SymEx in JPF [Visser+ 2004] Goal: Find all non-isomorphic valid inputs up to a given size, by MC + SymEx of Java code for each method’s precondition. Case Study: Red-black trees. Use MC + SymEx to create symbolic states representing these inputs. Non-reference values are handled as on previous slide. When a symbolic reference is used, materialize it to a known value, using non-deterministic choice between existing references (instances) of that type and a new one. Use constraint solver to materialize symbolic values in the resulting symbolic states.

November 2005Scott Stoller, Stony Brook University23 Testcase Generation using MC + SymEx in XRT [Grieskamp, Tillmann, and Schulte, 2005] XRT is an explicit-state MC that supports symbolic execution with constraints. Input language: CIL, the intermediate language of Microsoft's CLR. Intended for testcase generation in unit testing: symbolically explore all feasible paths in the model (specification), and use constraint solver to create high- coverage test suite from the resulting path conditions. XRT is the next generation of Spec#. Developers will be able to write models in any language that compiles to CIL (e.g., C# or VB).

November 2005Scott Stoller, Stony Brook University24 Bounded Verification of JML Specifications Using MC + SymEx [Robby et al., 2005] JML specifications are mainly pre-conditions and post- conditions for methods. Start in a symbolic state with the method pre-condition asserted as a constraint. Symbolically execute the method. Use non-determinism to materialize symbolic references. Check whether the post-condition is satisfied in the symbolic states at method exit points. Instead of materializing the symbolic states into testcases. More efficient with stronger pre-conditions.

November 2005Scott Stoller, Stony Brook University25 Symbolic + Concrete Execution in DART: Directed Automated Random Testing [Godefroid, Klarlund, Sen 2005] [Cadar and Engler 2005] Goal: generate test suites that cover of all feasible execution paths up to a given length in a program. Start with a symbolic and a random concrete input. Run the program, with concrete and symbolic execution, accumulating the path condition φ1 ...  φn. Use constraint solver to find an input that satisfies φ1  …  φ{n-1}  ¬φn, or if we already explored the corresponding path, φ1 and... and ¬φ{n-1} Select random values for unconstrained inputs. Repeat. Case Study: oSIP, a multi-media protocol. 30KLOC.

November 2005Scott Stoller, Stony Brook University26 Concrete MC of Java Concrete MC: little use of sound abstractions. Use (well chosen) unsound abstractions. Motivation: Fewer false alarms. Much easier to apply to complex systems. Very effective for defect detection. Java Path Finder (JPF) [Visser+, 2000+] Explicit-state MC, in the SPIN [Holzmann] tradition, based on a JVM that can perform checkpointing and efficient backtracking. Can handle on the order of 10 KLOC (hence the focus on testcase generation, etc.).

November 2005Scott Stoller, Stony Brook University27 Concrete MC of Java: Bandera, Bogor [Dwyer, Hatcliff, …, 2000+] Bandera: a toolset for software MC. Property specification language, program analyses and transformations (new slicing algorithms, data abstraction,...), path exploration tool, etc. Translates Java to intermediate representation to model checker input language. Last step implemented for multiple model checkers. Bogor: model checker, similar in concept to JPF, with its own extendable input language. Case studies: Java Grande benchmarks, Siena publish- subscribe middleware.

November 2005Scott Stoller, Stony Brook University28 Concrete MC of C: VeriSoft [Godefroid 1997+] Goal: MC single-threaded multi-process systems, implemented in C, up to bounded execution depth. Intercept non-deterministic operations (scheduling decisions, calls to Verisoft.random in test harness). Systematically try all possibilities. VeriSoft stores no states! It stores transitions on a search stack. Backtracking is implemented by restart+replay. Use POR to reduce redundant exploration of states. Applied successfully to large telecom systems.

November 2005Scott Stoller, Stony Brook University29 Concrete MC of C: C Model Checker (CMC) [Musuvathi, Engler, et al., 2002+] Discussed earlier

November 2005Scott Stoller, Stony Brook University30 Concrete MC of C: C Bounded Model Checker (CBMC) [Kroening, Clarke, et al., 2003+] Translate C program into a Boolean formula Φ representing all of the executions up to given length bound, by unwinding the transition relation and introducing fresh variables for each intermediate state. Use SAT solver to try to satisfy Φ  ¬correct. A satisfying assignment is a counterexample to correct. SAT solvers can handle formulas with millions of vars, tens of millions of clauses in CNF. CBMC verified equivalence of Verilog implementations and C specifications of DES and a simple CPU.

November 2005Scott Stoller, Stony Brook University31 Concrete MC of Machine Language: CHESS [Qadeer & Rehof, 2005] Why MC machine language? C statements are not atomic. Interrupt-driven embedded software. Weak memory models Translate machine instructions into calls to functions in an x86 simulator written in Zing. Zing [Rajamani+, 2003+]: explicit-state MC for a small OO language. Supports POR, procedure summaries, software transactions, context-bounded MC, stateful and state-less search. Applications so far are compiled from about 3 KLOC of C.

November 2005Scott Stoller, Stony Brook University32 Concrete MC of Machine Language: Estes [Mercer and Jones, 2005] Use debugger to compute transitions of the program. Load process state from MC into GDB, set desired breakpoint, let GDB execute to the breakpoint, extract the state from GDB. Cycle-accurate simulation Easily allows different granularity for different transitions. Applications: interrupt-driven embedded software. Applications so far are a few hundred lines of C + ASM.

November 2005Scott Stoller, Stony Brook University33 Heap Abstraction Predicate abstraction + CEGAR works well for model checking properties that do not depend on details of heap. Recall: Abstractions are defined by sets of nullary predicates (implicitly parameterized by the current state). Example: px(): x>0, py(): *y.next ≠NULL. Effective automatic abstraction for heap-intensive programs/properties is still a challenge. TVLA [Reps, Sagiv, et al., 2000+] is a framework for verification of heap-intensive properties. Abstractions are defined by sets of predicates with any arity. Example: pn(v1,v2): *v1.next == v2 where v1,v2 range over Node.

November 2005Scott Stoller, Stony Brook University34 Heap Abstraction An abstract heap (shape graph) contains: individual nodes (each representing one object), summary nodes (each representing one or more objects) truth values (true, false, unknown) for the predicates, with the nodes of the abstract heap as arguments Transfer functions describe effect of program statements on abstract heap. TVLA defines core predicates + transfer functions for lists. Application-specific predicates + transfer functions also needed. Good progress on computing them automatically. Deep analysis of small programs. Not yet scalable.

November 2005Scott Stoller, Stony Brook University35 Environment Modeling Model checking a component requires Driver: model of components that call it Stubs: models of components (e.g., libraries) it calls Writing them with appropriate abstraction can be difficult. Static Driver Verifier project invested an enormous amount of time in environment modeling. Verification of TCP using CMC: "One of the surprising results was that it was easier to run the entire Linux kernel in … CMC … than extract out TCP in a stand-alone version." Their stubs led to too many false alarms. MC of Java programs: Modeling Java API is an obstacle.

November 2005Scott Stoller, Stony Brook University36 Outline Introduction to Model Checking (MC) Software MC Success Stories Research Directions Partial-order reduction Heuristic search MC + symbolic execution Concrete (little abstraction) MC of Java, C, machine code Heap abstractions Environment modeling