Glenn Ammons Ras Bodík Jim Larus Univ. of Wisconsin Univ. of Wisconsin Microsoft Research Mining Specifications (lots of) code  specifications.

Slides:



Advertisements
Similar presentations
Modular and Verified Automatic Program Repair Francesco Logozzo, Thomas Ball RiSE - Microsoft Research Redmond.
Advertisements

Mining Specifications Glenn Ammons, Dept. Computer Science University of Wisconsin Rastislav Bodik, Computer Science Division University of California,
1 Chao Wang, Yu Yang*, Aarti Gupta, and Ganesh Gopalakrishnan* NEC Laboratories America, Princeton, NJ * University of Utah, Salt Lake City, UT Dynamic.
Introducing Formal Methods, Module 1, Version 1.1, Oct., Formal Specification and Analytical Verification L 5.
Programming with Constraint Solvers CS294: Program Synthesis for Everyone Ras Bodik Emina Torlak Division of Computer Science University of California,
A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Koushik Sen CS 265.
Properties of Good Requirements Chapter 8. Understandable by end users End-users are not often software engineers. Terminology used must agree with end-
Background for “KISS: Keep It Simple and Sequential” cs264 Ras Bodik spring 2005.
1 Perracotta: Mining Temporal API Rules from Imperfect Traces Jinlin Yang David Evans Deepali Bhardwaj Thirumalesh Bhat Manuvir Das.
Automated creation of verification models for C-programs Yury Yusupov Saint-Petersburg State Polytechnic University The Second Spring Young Researchers.
Atomicity in Multi-Threaded Programs Prachi Tiwari University of California, Santa Cruz CMPS 203 Programming Languages, Fall 2004.
Turning Eclipse Against Itself: Finding Errors in Eclipse Sources Benjamin Livshits Stanford University.
MOPS MOdelchecking Security Properties David Wagner U.C. Berkeley.
Software Reliability Methods Sorin Lerner. Software reliability methods: issues What are the issues?
Automatically Extracting and Verifying Design Patterns in Java Code James Norris Ruchika Agrawal Computer Science Department Stanford University {jcn,
Synthesis of Interface Specifications for Java Classes Rajeev Alur University of Pennsylvania Joint work with P. Cerny, G. Gupta, P. Madhusudan, W. Nam,
Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin Presented.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Automatic Extraction of Object-Oriented Component Interfaces John Whaley Michael C. Martin Monica S. Lam Computer Systems Laboratory Stanford University.
The Rare Glitch Project: Verification Tools for Embedded Systems Carnegie Mellon University Pittsburgh, PA Ed Clarke, David Garlan, Bruce Krogh, Reid Simmons,
Presenter: PCLee Design Automation Conference, ASP-DAC '07. Asia and South Pacific.
** MapReduce Debugging with Jumbune. * Agenda * Debugging Challenges Debugging MapReduce Jumbune’s Debugger Zero Tolerance in Production.
School of Computer ScienceG53FSP Formal Specification1 Dr. Rong Qu Introduction to Formal Specification
/* iComment: Bugs or Bad Comments? */
Verification technique on SA applications using Incremental Model Checking 컴퓨터학과 신영주.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 17: Code Mining.
Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.
© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.
Mining Windows Kernel API Rules Jinlin Yang 09/28/2005CS696.
CMSC 345 Fall 2000 Unit Testing. The testing process.
A Simple Method for Extracting Models from Protocol Code David Lie, Andy Chou, Dawson Engler and David Dill Computer Systems Laboratory Stanford University.
Instructor: Peter Clarke
Introduction CS 3358 Data Structures. What is Computer Science? Computer Science is the study of algorithms, including their  Formal and mathematical.
Static and Dynamic Analysis at JPL Klaus Havelund.
Scientific Computing By: Fatima Hallak To: Dr. Guy Tel-Zur.
Programming Lifecycle
Inferring and checking system rules by static analysis William R Wright.
Bug Localization with Machine Learning Techniques Wujie Zheng
Yang Liu, Jun Sun and Jin Song Dong School of Computing National University of Singapore.
Inferring Specifications to Detect Errors in Code Mana Taghdiri Presented by: Robert Seater MIT Computer Science & AI Lab.
Semi-automatic Property Generation for the Formal Verification of a Satellite On-board System Wesley Gonçalves Silva.
Functional Verification Figure 1.1 p 6 Detection of errors in the design Before fab for design errors, after fab for physical errors.
Introduction CS 3358 Data Structures. What is Computer Science? Computer Science is the study of algorithms, including their  Formal and mathematical.
Strauss: A Specification Miner Glenn Ammons Department of Computer Sciences University of Wisconsin-Madison.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
Writing Systems Software in a Functional Language An Experience Report Iavor Diatchki, Thomas Hallgren, Mark Jones, Rebekah Leslie, Andrew Tolmach.
Chapter 8 Lecture 1 Software Testing. Program testing Testing is intended to show that a program does what it is intended to do and to discover program.
Requirements Validation
Grigore Rosu Founder, President and CEO Professor of Computer Science, University of Illinois
Introduction to Software Analysis CS Why Take This Course? Learn methods to improve software quality – reliability, security, performance, etc.
Requirement Analysis SOFTWARE ENGINEERING. What are Requirements? Expression of desired behavior Deals with objects or entities, the states they can be.
September 1999Compaq Computer CorporationSlide 1 of 16 Verification of cache-coherence protocols with TLA+ Homayoon Akhiani, Damien Doligez, Paul Harter,
Glenn Ammons Ras Bodík Jim Larus Univ. of Wisconsin Univ. of Wisconsin Microsoft Research Mining Specifications (lots of) code  specifications of correctness.
MOPS: an Infrastructure for Examining Security Properties of Software Authors Hao Chen and David Wagner Appears in ACM Conference on Computer and Communications.
Deriving formal specifications (almost) automatically Glenn Ammons and Ras Bodik and James R. Larus.
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
Writing, Verifying and Exploiting Formal Specifications for Hardware Designs Chapter 3: Verifying a Specification Presenter: Scott Crosby.
Chapter 2 Build Your First Project A Step-by-Step Approach 2 Exploring Microsoft Visual Basic 6.0 Copyright © 1999 Prentice-Hall, Inc. By Carlotta Eaton.
IS4500 Software Quality Assurance
Software Testing An Introduction.
Database Corruption Advanced Recovery Techniques|
Verification and Testing
ONR MURI area: High Confidence Real-Time Misuse and Anomaly Detection
runtime verification Brief Overview Grigore Rosu
Software Engineering Lecture #12.
Chapter 1 Introduction(1.1)
Types and Type Checking (What is it good for?)
Implementation support
Software Development Chapter 1.
BLAST: A Software Verification Tool for C programs
Presentation transcript:

Glenn Ammons Ras Bodík Jim Larus Univ. of Wisconsin Univ. of Wisconsin Microsoft Research Mining Specifications (lots of) code  specifications

2 Verification: beyond engine-less cars Recent successes. specifications languages checkers abstractors What’s still missing? ? specifications Drivers wanted.

3 So who formulates specifications? Programmers? Probably not. Why they won’t: too busy; Yet another language to learn? specifications aren’t cool. Why they shouldn’t: may misunderstand usage rules. may not know all usage rules. Mining Specifications:  Convenience.  Like in data mining, discover surprise rules.

4 Advantages of mining  Exploits the massive programmers’ effort reflected in the code. Programmers resolved many problems: incomplete system requirements. incomplete API documentation. implementation-dependent rules. Want redundancy? (without redundant programming) ask multiple programmers (and vote).

5 Our output: a specification x = socket() bind(x) listen(x) y = accept(x) write(y) close(y) close(x) read(y)

6 How do we mine? Underlying premise: Even bad software is debugged enough to show hints of correct behavior.  Maxim: Common usage is the correct usage.

7 Mining = machine learning Reduce the problem into the well-known problem of learning regular languages. Obstacles: 1. bugs from source code may be learned into specification 2. what is “common” behavior? Solutions: 1. learn from dynamic behavior 2. learn probabilistically  learn from traces into probabilistic FSMs

8 Input: trace(s) 7 = socket(2, 1, 0); bind(7, 0x400120, 16); listen(7, 5); 8 = accept(7, 0x400200, 0x400240); read(8, 0x400320, 255); write(8, 0x400320, 12); read(8, 0x400320, 255); write(8, 0x400320, 7); close(8); 10 = accept(7, 0x400200, 0x400240); read(10, 0x400320, 255); write(10, 0x400320, 13); close(10); close(7); … x = socket() bind(x) listen(x) y = accept(x) write(y) close(y) close(x) read(y)

9 The mining algorithm dynamic execution (traces) trace abstraction usage scenarios (strings) (off-the-shelf) RegExp learner generalized scenarios (probabilistic NFA) extract heavy core (and approve) specification (NFA) dynamic checker dynamic exe. to be checked (trace) OK/bug

10 Trace abstraction: 4 challenges Traces interleave useful and useless events. RegExp learner cannot separate them. Specifications must include both temporal and value-flow constraints. RegExp learner only good with temporal constraints. Only some of API calls’ arguments impose “true” dependences. Infeasible to learn value-flow constraints on all arguments. Specifications may impose only partial order. Encoding all legal partial orders would produce a huge FSM.

h(_, ) a(, ) d(, ) b(_, ) e( ) Trace abstraction h(3, 5) c(10) a(4, 5) d(4, 7) b(0, 5) f(10) h(8, 11) e(7) f(50) d(15, 1) c(7) a(9, 11) b(6, 7) d(9, 14) f(20) e(7) … h(_, X) a(Y, X) b(_, X) d(Y, Z) e(Z) h(_, X) a(Y, X) b(_, X) d(Y, Z) e(Z) h(_, 5) c(10) a(4, 5) d(4, 7) b(_, 5) f(10) h(_, 11) e(7) f(_) d(_, _) c(7) a(9, 11) b(_, 11) d(9, _) e(_) f(_) … h(_, X) a(Y, X) d(Y, Z) b(_, X) e(Z) h(_, X) a(Y, X) b(_, X) d(Y, Z)

12 Attempted to learn and verify two published X Windows rules As of Friday: 1. A timestamp-passing rule learned the rule! (compact: 6 states) bugs in 2 out of 17 programs (ups, e93) 2. SetOwner(x) must be followed by GetSelection(x) failed to learn the rule (small learning set) but bugs in 2 out of 5 programs (xemacs, ups) Preliminary experiments

13 Related work Arithmetic pre/post conditions Daikon, Houdini properties orthogonal from us eventually, we may need to include and learn some arithmetic relationships Temporal relationships over calls intrusion detection: [Ghosh et al], [Wagner and Dean] software processes: [Cook and Wolf] error checking: [Engler et al SOSP 2001] lexical and syntactic pattern matching user must write templates (e.g., always follows )

14 Ongoing work Mechanize tool. Find more gold.

15 Future work Mining Give gold to jewelers. SPIN Vault Verisoft SLAM ESP … ? code specificationsbugs inputs

16 Summary Semi-automatically creating well-formend, non- trivial specifications is an important part of the verification tool chain. Contributions: introduced specifications mining phrased it as probabilistic learning from dynamic traces decomposed it into a sequence of subproblems (using an off-the-shelf learner) developed dynamic checker found bugs

17 Discussion Expressibility what classes of properties can/should we learn? can we learn more than we can check? can a single-threaded specification avoid race conditions?

Backup Slides