02/12/2014 Presenter: Yuanhang Wang Instructor: Christoph Csallner 1 Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael.

Slides:



Advertisements
Similar presentations
Modular and Verified Automatic Program Repair Francesco Logozzo, Thomas Ball RiSE - Microsoft Research Redmond.
Advertisements

Chapter 7 Introduction to Procedures. So far, all programs written in such way that all subtasks are integrated in one single large program. There is.
50.530: Software Engineering Sun Jun SUTD. Week 10: Invariant Generation.
Giving a formal meaning to “Specialization” In these note we try to give a formal meaning to specifications, implementations, their comparisons. We define.
Fast Algorithms For Hierarchical Range Histogram Constructions
Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.
Programming Languages Marjan Sirjani 2 2. Language Design Issues Design to Run efficiently : early languages Easy to write correctly : new languages.
ISBN Chapter 3 Describing Syntax and Semantics.
Michael Ernst, page 1 Learning and repair tools background Michael Ernst MIT Lab for Computer Science Joint work with Jake.
Dynamic Invariant Discovery Modified from Tevfik Bultan’s original presentation.
VBA Modules, Functions, Variables, and Constants
Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin Presented.
272: Software Engineering Fall 2008 Instructor: Tevfik Bultan Lecture 16: Dynamic Invariant Discovery.
Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael Ernst, Jake Cockrell, William Griswold, David Notkin Presented by.
Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin Presented.
Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.
Describing Syntax and Semantics
Inferences About Process Quality
© 2006 Pearson Addison-Wesley. All rights reserved2-1 Chapter 2 Principles of Programming & Software Engineering.
 Pearson Education, Inc. All rights reserved Arrays.
Ernst, ICSE 99, page 1 Dynamically Detecting Likely Program Invariants Michael Ernst, Jake Cockrell, Bill Griswold (UCSD), and David Notkin University.
Chapter 13 & 14 Software Testing Strategies and Techniques
1CMSC 345, Version 4/04 Verification and Validation Reference: Software Engineering, Ian Sommerville, 6th edition, Chapter 19.
Fundamentals of Python: From First Programs Through Data Structures
CSCI 5801: Software Engineering
Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.
Fundamentals of Python: First Programs
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Verification and Validation.
Chapter 6: User-Defined Functions I Instructor: Mohammad Mojaddam
Operator Precedence First the contents of all parentheses are evaluated beginning with the innermost set of parenthesis. Second all multiplications, divisions,
CHAPTER 5: CONTROL STRUCTURES II INSTRUCTOR: MOHAMMAD MOJADDAM.
1 Debugging and Testing Overview Defensive Programming The goal is to prevent failures Debugging The goal is to find cause of failures and fix it Testing.
Reading and Writing Mathematical Proofs
Dynamically Discovering Likely Program Invariants to Support Program Evolution Presented By: Wes Toland, Geoff Gerfin Michael D. Ernst, Jake Cockrell,
 2006 Pearson Education, Inc. All rights reserved Arrays.
1 Inference Rules and Proofs (Z); Program Specification and Verification Inference Rules and Proofs (Z); Program Specification and Verification.
The Daikon system for dynamic detection of likely invariants MIT Computer Science and Artificial Intelligence Lab. 16 January 2007 Presented by Chervet.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Cleansing Ola Ekdahl IT Mentors 9/12/08.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
An Object-Oriented Approach to Programming Logic and Design Fourth Edition Chapter 6 Using Methods.
Dynamically Discovering Likely Program Invariants All material in this presentation is derived from documentation online at the Daikon website,
Introduction to Problem Solving. Steps in Programming A Very Simplified Picture –Problem Definition & Analysis – High Level Strategy for a solution –Arriving.
CS Data Structures I Chapter 2 Principles of Programming & Software Engineering.
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
Reasoning about programs March CSE 403, Winter 2011, Brun.
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
Programming Logic and Design Fourth Edition, Comprehensive Chapter 8 Arrays.
Chapter 3 Top-Down Design with Functions Part II J. H. Wang ( 王正豪 ), Ph. D. Assistant Professor Dept. Computer Science and Information Engineering National.
Design - programming Cmpe 450 Fall Dynamic Analysis Software quality Design carefully from the start Simple and clean Fewer errors Finding errors.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
 2008 Pearson Education, Inc. All rights reserved. 1 Arrays and Vectors.
Copyright © 2000, Department of Systems and Computer Engineering, Carleton University 1 Introduction An array is a collection of identical boxes.
Structuring Data: Arrays ANSI-C. Representing multiple homogenous data Problem: Input: Desired output:
1 Assertions. 2 A boolean expression or predicate that evaluates to true or false in every state In a program they express constraints on the state that.
Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.
/ PSWLAB Evidence-Based Analysis and Inferring Preconditions for Bug Detection By D. Brand, M. Buss, V. C. Sreedhar published in ICSM 2007.
CS162 - Topic #12 Lecture: –Arrays with Structured Elements defining and using arrays of arrays remember pointer arithmetic Programming Project –Any questions?
Programming Logic and Design Fifth Edition, Comprehensive Chapter 6 Arrays.
Chapter 15 Running Time Analysis. Topics Orders of Magnitude and Big-Oh Notation Running Time Analysis of Algorithms –Counting Statements –Evaluating.
Verification vs. Validation Verification: "Are we building the product right?" The software should conform to its specification.The software should conform.
CS 5150 Software Engineering Lecture 21 Reliability 2.
Jeremy Nimmer, page 1 Automatic Generation of Program Specifications Jeremy Nimmer MIT Lab for Computer Science Joint work with.
Definition of the Programming Language CPRL
© 2016 Pearson Education, Ltd. All rights reserved.
Chapter 13 & 14 Software Testing Strategies and Techniques
7 Arrays.
Verification and Validation Unit Testing
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
CSE 1020:Software Development
Chapter 13 & 14 Software Testing Strategies and Techniques 1 Software Engineering: A Practitioner’s Approach, 6th edition by Roger S. Pressman.
Presentation transcript:

02/12/2014 Presenter: Yuanhang Wang Instructor: Christoph Csallner 1 Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernstt, Jake Cockrellt, William G. Griswoldt, and David Notkint

Relationship between Invariants and program Invariants can protect a programmer from making changes that inadvertently violate assumptions upon which the program’s correct behavior depends Refining a specification into a correct program Static verification of invariants such as type declarationsRuntime checking of invariants encoded as assert statements Invariants play a central role in program development: Invariants play an equally critical role in software evolution:

One known conclusion we execute a program on a collection of inputs and extract variable values from which we then infer invariants program on a collection of inputs and extract In this research, we focus on the dynamic discovery of invariants: An alternative to expecting programmers to annotate code with invariants is to automatically infer invariants Next Two related results

Two related results A set of techniques, and an implementation, for discovering invariants from execution traces 1 The application of the inference engine to two sets of target programs 2

REDISCOVERY OF INVARIANTS Many of the book’s programs include preconditions, postconditions, and loop invariants that embody important properties of the computation.

REDISCOVERY OF INVARIANTS 100 randomly generated arrays length 7 to 13 Element range -100 to 100 randomly BC-11 is shorthand forBCsize(B)-ll, the last element of array B, and varsrigrepresents var's value at the start of execution.

INVARIANT DETECTION ENGINE Instrumentation to capture the values of variables so that patterns can be detected among those values. selecting the program points at Exit point to insert instrumentation selecting the program points at Loop heads to insert instrumentation selecting the program points at Entry point to insert instrumentation We detect invariants from program executions by instrumenting the source program to trace the variables of interest.

INVARIANT DETECTION ENGINE For every instrumented program point, the trace file contains a list of sets of values, one value per instrumented variable. For instance if procedure p has two formal parameters, is in the scope of three global variables, and is called twelve times, then when computing a precondition for p the invariant engine would be presented a list of twelve elements, each element being a set of five variable values

Inferring invariants The invariant detector, when provided with the output of an instrumented program, lists the invariants detected at each instrumented program point. These invariants may involve a single variable (a constraint that holds over its values) or multiple variables (a relationship among the values of the variables). Our system checks for the following invariants, (x, y, and z are variables, and a, b, and c are computed constants)

Inferring invariants  The invariants listed above are inexpensive to test and do not require full-fledged theorem proving. For example, the linear relationship x = ay + bz + c with unknown coefficients a, b, and c and variables x, y, and z has three degrees of freedom. Consequently, three tuples of values for x, y, and z are sufficient to infer the possible coefficients.

Limit: 0.01 If the system checks millions of potential invariants, reporting thousands of false positives is likely to be unacceptable. Negative invariants Figure 2

Negative invariants Figure 2 Figure 3. Array lengths and element values were chosen from exponential rather than uniform distributions, as in Figure 2.

Derived variables In addition to manifest variables explicitly passed to the engine, we need to compute relations over non-manifest quantities. Array a Integer lasti even if that expression does not appear in the program text

Therefore, we add certain “derived variables” (actually expressions) to the list of variables given to the engine as input. These derived variables include the following: Derived variables From any array: first and last elements, length From numeric array: sum, min, max From array and scalar: element at that index (a[i]), subarray up to, and subarray beyond, that index From function invocation: number of calls so far

Derived variables For instance, if len(A) is derived from array A, then the engine can determine that i < len(A) without hardcoding a less-than comparison check for the case of a scalar and the length of an array. Derived variables permit the engine to infer invariants that are not hard-coded into its list. In this manner, the implementation can report compound relations that we did not necessarily anticipate.

USE OF INVARIANTS Performing the Change

USE OF INVARIANTS We mimicked the constant CLOSURE of value ’*’ with the constant PCLOSURE of value ’+’ The basic code in makepat (Figure 4) determines whether the next character in the input is CLOSURE. it calls the “star closure” function, stclose (Figure 5) under most conditions (and the exceptions should not differ for plus closure). if so

USE OF INVARIANTS To determine appropriate modifications for plclose, we studied stclose. Our initial, static study of the program determined that the compiled pattern is stored in a 100- element array named pat. We speculated that the uses of array pat in stclose’s loop manipulate the pattern that is the target of the closure operator, adding characters to the compiled pattern using the function addstr.

USE OF INVARIANTS  Verify that the loop was indeed entered on every call to stclose The loop’s exit condition says the loop would not be entered if *j were equal to lastj, so we examined the invariants inferred for them on entry to stclose: The third invariant implies that the loop body may not be executed (if lastj = *j, then jp is initialized to l a s t j - I ), which was inconsistent with our initial belief.

 To find the offending values of lastj and *j, we queried the trace database for calls to stclose in which lastj = *j, since these are the cases when the loop is not entered.  The query returned several calls in which the value of *j is 101 or more, exceeding the size of the array pat. We soon determined that, in some instances, the compiled pattern is too long, resulting in an unreported array bounds error. USE OF INVARIANTS

indicates that pat is always updated  To determine what stclose does to pat, we queried the trace database for values of pat at the entry and exit of stclose.

 The negative indexing and assignment of * into position lastj moves the closed-over pattern rightward in the array to make room for the prefix *. For a call to plclose the result for the above test case should be cacb*cb, which would match one or more instances of character b rather than zero or more.  This suggests that the program compiles literals by prefixing them with the character c and puts Kleene-* expressions into prefix form. (A coauthor who was not performing the change independently discovered this fact through careful study of the program text.) USE OF INVARIANTS

Most invariants remained unchanged, while some differing invariants verified our program modifications. Whereas stclose has the invariant *j = *jorig + 1, plclose has the invariant *j 2 *jorig+ 2. This difference is expected, since the compilation of Kleene-+ replicates the entire target pattern, which is two or more characters long in its compiled form. USE OF INVARIANTS

Invariants for makepat parameter start (tested in Figure4) is always 0 No invariant between l j and j was reported Number-ofralls(inset-2) = nurnber_of-calls(stc1ose)

What these invariants indicated?  These invariants indicated that makepat is used in more specialized contexts than we anticipated, saving us considerable effort in understanding its role in pattern compilation.  Rather than helping us modify the program, these invariants instead suggest a need to run more test cases to expose more of replace’s special-case behavior and produce more accurate invariants.

Why dynamically detected invariants is useful? Finally, invariants provide a beneficial degree of serendipity. Third, queries against the database can also be used to build intuition about the source of an invariant. First, inferred invariants are a succinct abstraction of a mass of data contained in the trace database. Fourth, inferred invariants provide a suitable basis for the programmer’s own, more complex inferences. Second, queries against the trace database can help programmers delve deeper when unexpected invariants appear or when expected invariants do not appear.

No technique can make it possible to evolve systems that were previously intractable to change! Why dynamically detected invariants is useful?

To identify quantitative, observable factors that a user can control to manage the time and space requirements of the invariant engine. Our goal Process Program point … … … … … SCALABILITY

We instrumented and ran replace on subsets of the 5542 test cases supplied with the program, including runs over 500, 1000, 1500, 2000, 2500, and 3000 randomly-chosen test inputs, where each set is a subset of the next larger one. SCALABILITY Figure 7: If one run has v1 variables and a runtime of t1, and the other has v2 variables and a runtime of t2, the x axis measures V2/v1 and the y axis measures t2/t1. Figure 7 suggests that invariant detection time grows quadratically with the number of variables over which invariants are' checked. The number of possible binary invariants (relationships over two variables) is also quadratic in the number of variables at a program point.

Test suite size runtime is (for the most part) linearly related to test suite size, the slopes of these relationships vary considerably, as can be seen by the divergent lines of Figure 8

Test suite size Figure 9 plots these slopes against the total number of original variables (the variables in scope at the program point).

Test suite size The number of samples (number of times a particular program point is executed) is linearly related to test suite size (number of program runs). The number of distinct values is also well-correlated with the number of sample. Finally, the number of pairs of values is correlated with the number of values, but apparently not with the ratio of scalar to sequence variables.

Figures 11 and 12 chart the number of identical, missing, and different invariants detected between two sets of test cases, where the smaller is a subset of the larger. Missing invariants are invariants that were detected in one of the test suites but not in the other. Invariant stability

 comparing a test suite of size 1000 to one of size 3000: Invariant stability A difference in a bound for a variable is more likely to be a peculiarity of the data than a significant difference that will change a programmer’s conception of the program’s operation.

Improvements to the Approach When the larger test suite has size 3000, more invariants are different or missing, and these numbers stabilize quickly.  Determining properties of global variables or other large-scale structures does not require  so many instrumentation points When only a part of the program is of interest, the whole program need not be instrumented  The choice of variables instrumented at each program point also affects inference performance.  supplying fewer test cases results in faster runtimes at the risk of less precise output.

Invariants in the resulting program runs can indicate insufficient coverage of program values, even if every line is executed at least once. A nearly-true invariant may indicate a bug or special case that should be brought to the programmer's attention Conclusion The importance of invariants Discovered invariants can  be inserted into a program as assert statements to further test the invariant or to ensure that detected invariants are not later violated as code evolves.  double-check existing documentation or assert statements  bootstrap or direct a (manual or automatic) correctness proof

Michael D. Ernstt, Jake Cockrellt, William G. Griswoldt, and David Notkint Dynamically Discovering Likely Program Invariants to Support Program Evolution Dept. of Computer Science & Engineering University of Washington Dept. of Computer Science & Engineering University of California San Diego