Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.

Slides:

Advertisements

Similar presentations

Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?

Advertisements

Copyright W. Howden1 Programming by Contract CSE 111 6/4/2014.

An Abstract Interpretation Framework for Refactoring P. Cousot, NYU, ENS, CNRS, INRIA R. Cousot, ENS, CNRS, INRIA F. Logozzo, M. Barnett, Microsoft Research.

Semantics Static semantics Dynamic semantics attribute grammars

50.530: Software Engineering Sun Jun SUTD. Week 10: Invariant Generation.

Mahadevan Subramaniam and Bo Guo University of Nebraska at Omaha An Approach for Selecting Tests with Provable Guarantees.

Program Representations. Representing programs Goals.

1 Program Slicing Purvi Patel. 2 Contents Introduction What is program slicing? Principle of dependences Variants of program slicing Slicing classifications.

Bug Isolation via Remote Program Sampling Ben Liblit, Alex Aiken, Alice X.Zheng, Michael I.Jordan Presented by: Xia Cheng.

Michael Ernst, page 1 Learning and repair tools background Michael Ernst MIT Lab for Computer Science Joint work with Jake.

1 Integrating Influence Mechanisms into Impact Analysis for Increased Precision Ben Breech Lori Pollock Mike Tegtmeyer University of Delaware Army Research.

Efficient Systematic Testing for Dynamically Updatable Software Christopher M. Hayden, Eric A. Hardisty, Michael Hicks, Jeffrey S. Foster University of.

Software Engineering and Design Principles Chapter 1.

Dynamic Invariant Discovery Modified from Tevfik Bultan’s original presentation.

Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin Presented.

272: Software Engineering Fall 2008 Instructor: Tevfik Bultan Lecture 16: Dynamic Invariant Discovery.

Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael Ernst, Jake Cockrell, William Griswold, David Notkin Presented by.

OOP #10: Correctness Fritz Henglein. Wrap-up: Types A type is a collection of objects with common behavior (operations and properties). (Abstract) types.

Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin Presented.

Data Flow Analysis Compiler Design Nov. 8, 2005.

Michael Ernst, page 1 Static and dynamic analysis: synergy and duality Michael Ernst CSE 503, Winter 2010 Lecture 1.

1 Software Testing Techniques CIS 375 Bruce R. Maxim UM-Dearborn.

Ernst, ICSE 99, page 1 Dynamically Detecting Likely Program Invariants Michael Ernst, Jake Cockrell, Bill Griswold (UCSD), and David Notkin University.

State coverage: an empirical analysis based on a user study Dries Vanoverberghe, Emma Eyckmans, and Frank Piessens.

Automated Diagnosis of Software Configuration Errors

Software Testing Sudipto Ghosh CS 406 Fall 99 November 9, 1999.

Software Testing Verification and validation planning Software inspections Software Inspection vs. Testing Automated static analysis Cleanroom software.

Dr. Pedro Mejia Alvarez Software Testing Slide 1 Software Testing: Building Test Cases.

Procedure specifications CSE 331. Outline Satisfying a specification; substitutability Stronger and weaker specifications - Comparing by hand - Comparing.

Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.

Review for Midterm Chapter 1-9 CSc 212 Data Structures.

Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?

June 27, 2002 HornstrupCentret1 Using Compile-time Techniques to Generate and Visualize Invariants for Algorithm Explanation Thursday, 27 June :00-13:30.

Foundations of Software Testing Chapter 5: Test Selection, Minimization, and Prioritization for Regression Testing Last update: September 3, 2007 These.

Bug Localization with Machine Learning Techniques Wujie Zheng

Inferring Specifications to Detect Errors in Code Mana Taghdiri Presented by: Robert Seater MIT Computer Science & AI Lab.

The Daikon system for dynamic detection of likely invariants MIT Computer Science and Artificial Intelligence Lab. 16 January 2007 Presented by Chervet.

CS 363 Comparative Programming Languages Semantics.

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Software Verification, Validation and Testing.

Automatically Repairing Broken Workflows for Evolving GUI Applications Sai Zhang University of Washington Joint work with: Hao Lü, Michael D. Ernst.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Technology and Science, Osaka University Dependence-Cache.

CS Data Structures I Chapter 2 Principles of Programming & Software Engineering.

1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng

Reasoning about programs March CSE 403, Winter 2011, Brun.

A Review of Software Testing - P David Coward Reprinted: Information and Software Technology; Vol. 30, No. 3 April 1988 Software Engineering: The Development.

Semantics In Text: Chapter 3.

Using Loop Invariants to Detect Transient Faults in the Data Caches Seung Woo Son, Sri Hari Krishna Narayanan and Mahmut T. Kandemir Microsystems Design.

PLC '06 Experience in Testing Compiler Optimizers Using Comparison Checking Masataka Sassa and Daijiro Sudo Dept. of Mathematical and Computing Sciences.

Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.

Error Explanation with Distance Metrics Authors: Alex Groce, Sagar Chaki, Daniel Kroening, and Ofer Strichman International Journal on Software Tools for.

/ PSWLAB Evidence-Based Analysis and Inferring Preconditions for Bug Detection By D. Brand, M. Buss, V. C. Sreedhar published in ICSM 2007.

Random Test Generation of Unit Tests: Randoop Experience

Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.

Foundations of Software Testing Chapter 5: Test Selection, Minimization, and Prioritization for Regression Testing Last update: September 3, 2007 These.

The PLA Model: On the Combination of Product-Line Analyses 강태준.

Constraint Framework, page 1 Collaborative learning for security and repair in application communities MIT site visit April 10, 2007 Constraints approach.

Jeremy Nimmer, page 1 Automatic Generation of Program Specifications Jeremy Nimmer MIT Lab for Computer Science Joint work with.

Reasoning About Code.

Reasoning about code CSE 331 University of Washington.

Predicting problems caused by component upgrades

Eclat: Automatic Generation and Classification of Test Inputs

Authors: Khaled Abdelsalam Mohamed Amr Kamel

The Future of Software Engineering: Tools

Automated Support for Program Refactoring Using Invariants

Test Case Purification for Improving Fault Localization

Static and dynamic analysis: synergy and duality

Mock Object Creation for Test Factoring

Semantics In Text: Chapter 3.

Graph Coverage for Specifications CS 4501 / 6501 Software Testing

Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.

Presentation transcript:

Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint work with Michael Harder, Jeff Mellen, and Benjamin Morse

Michael Ernst, page 2 Creating test suites Goal: small test suites that detect faults well Larger test suites are usually more effective Evaluation must account for size Fault detection cannot be predicted Use proxies, such as code coverage

Michael Ernst, page 3 Test case selection Example: creating a regression test suite Assumes a source of test cases Created by a human Generated at random or from a grammar Generated from a specification Extracted from observed usage

Michael Ernst, page 4 Contributions Operational difference technique for selecting test cases, based on observed behavior Outperforms (and complements) other techniques (see paper for details) No oracle, static analysis, or specification Stacking and area techniques for comparing test suites Corrects for size, permitting fair comparison

Michael Ernst, page 5 Outline Operational difference technique for selecting test cases Generating operational abstractions Stacking and area techniques for comparing test suites Evaluation of operational difference technique Conclusion

Michael Ernst, page 6 Operational difference technique Idea: Add a test case c to a test suite S if c exercises behavior that S does not Code coverage does this in the textual domain We extend this to the semantic domain Need to compare run-time program behaviors Operational abstraction: program properties x > y a[] is sorted

Michael Ernst, page 7 Test suite generation or augmentation Idea: Compare operational abstractions induced by different test suites Given: a source of test cases; an initial test suite Loop: Add a candidate test case If operational abstraction changes, retain the case Stopping condition: failure of a few candidates

Michael Ernst, page 8 The operational difference technique is effective Operational difference suites are smaller have better fault detection than branch coverage suites (in our evaluation; see paper for details)

Michael Ernst, page 9 Example of test suite generation Program under test: abs (absolute value) Test cases: 5, 1, 4, -1, 6, -3, 0, 7, -8, 3, … Suppose an operational abstraction contains: var = constant var ≥ constant var ≤ constant var = var property  property

Michael Ernst, page 10 Considering test case 5 Initial test suite: { } Initial operational abstraction for { }: Ø Candidate test case: 5 New operational abstraction for { 5 }: Precondition: arg = 5 Postconditions: arg = return New operational abstraction is different, so retain the test case

Michael Ernst, page 11 Considering test case 1 Operational abstraction for { 5 }: Pre: arg = 5 Post: arg = return Candidate test case: 1 New operational abstraction for { 5, 1 }: Pre: arg ≥ 1 Post: arg = return Retain the test case

Michael Ernst, page 12 Considering test case 4 Operational abstraction for { 5, 1 }: Pre: arg ≥ 1 Post: arg = return Candidate test case: 4 New operational abstraction for { 5, 1, 4 }: Pre: arg ≥ 1 Post: arg = return Discard the test case

Michael Ernst, page 13 Considering test case -1 Operational abstraction for { 5, 1 }: Pre: arg ≥ 1 Post: arg = return Candidate test case: -1 New operational abstraction for { 5, 1, -1 }: Pre: arg ≥ -1 Post: arg ≥ 1  (arg = return) arg = -1  (arg = -return) return ≥ 1 Retain the test case

Michael Ernst, page 14 Considering test case -6 Operational abstraction for { 5, 1, -1 }: Pre: arg ≥ -1 Post: arg ≥ 1  (arg = return) arg = -1  (arg = -return) return ≥ 1 Candidate test case: -6 New operational abstraction for { 5, 1, -1, -6 }: Pre: Ø Post: arg ≥ 1  (arg = return) arg ≤ -1  (arg = -return) return ≥ 1 Retain the test case

Michael Ernst, page 15 Considering test case -3 Operational abstraction for { 5, 1, -1, -6 }: Post: arg ≥ 1  (arg = return) arg ≤ -1  (arg = -return) return ≥ 1 Test case: -3 New operational abstraction for { 5, 1, -1, 6, -3 }: Post: arg ≥ 1  (arg = return) arg ≤ -1  (arg = -return) return ≥ 1 Discard the test case

Michael Ernst, page 16 Considering test case 0 Operational abstraction for { 5, 1, -1, -6 }: Post: arg ≥ 1  (arg = return) arg ≤ -1  (arg = -return) return ≥ 1 Test case: 0 New operational abstraction for {5, 1, -1, -6, 0 }: Post: arg ≥ 0  (arg = return) arg ≤ 0  (arg = -return) return ≥ 0 Retain the test case

Michael Ernst, page 17 Considering test case 7 Operational abstraction for { 5, 1, -1, -6, 0 }: Post: arg ≥ 0  (arg = return) arg ≤ 0  (arg = -return) return ≥ 0 Candidate test case: 7 New operational abstraction for { 5, 1, -1, -6, 0, 7 }: Post: arg ≥ 0  (arg = return) arg ≤ 0  (arg = -return) return ≥ 0 Discard the test case

Michael Ernst, page 18 Considering test case -8 Operational abstraction for { 5, 1, -1, -6, 0 }: Post: arg ≥ 0  (arg = return) arg ≤ 0  (arg = -return) return ≥ 0 Candidate test case: -8 New operational abstraction for { 5, 1, -1, -6, 0, -8 }: Post: arg ≥ 0  (arg = return) arg ≤ 0  (arg = -return) return ≥ 0 Discard the test case

Michael Ernst, page 19 Considering test case 3 Operational abstraction for { 5, 1, -1, -6, 0 }: Post: arg ≥ 0  (arg = return) arg ≤ 0  (arg = -return) return ≥ 0 Candidate test case: 3 New operational abstraction for { 5, 1, -1, -6, 0, 3 }: Post: arg ≥ 0  (arg = return) arg ≤ 0  (arg = -return) return ≥ 0 Discard the test case; third consecutive failure

Michael Ernst, page 20 Minimizing test suites Given: a test suite For each test case in the suite: Remove the test case if doing so does not change the operational abstraction

Michael Ernst, page 21 Outline Operational difference technique for selecting test cases Generating operational abstractions Stacking and area techniques for comparing test suites Evaluation of operational difference technique Conclusion 

Michael Ernst, page 22 Dynamic invariant detection Goal: recover invariants from programs Technique: run the program, examine values Artifact: Daikon Experiments demonstrate accuracy, usefulness

Michael Ernst, page 23 Goal: recover invariants Detect invariants (as in assert s or specifications) x > abs(y) x = 16*y + 4*z + 3 array a contains no duplicates for each node n, n = n.child.parent graph g is acyclic if ptr  null then *ptr > i

Michael Ernst, page 24 Uses for invariants Write better programs [Gries 81, Liskov 86] Document code Check assumptions: convert to assert Maintain invariants to avoid introducing bugs Locate unusual conditions Validate test suite: value coverage Provide hints for higher-level profile-directed compilation [Calder 98] Bootstrap proofs [Wegbreit 74, Bensalem 96]

Michael Ernst, page 25 Ways to obtain invariants Programmer-supplied Static analysis: examine the program text [Cousot 77, Gannod 96] properties are guaranteed to be true pointers are intractable in practice Dynamic analysis: run the program complementary to static techniques

Michael Ernst, page 26 Dynamic invariant detection Look for patterns in values the program computes: Instrument the program to write data trace files Run the program on a test suite Invariant engine reads data traces, generates potential invariants, and checks them

Michael Ernst, page 27 Checking invariants For each potential invariant: instantiate (determine constants like a and b in y = ax + b) check for each set of variable values stop checking when falsified This is inexpensive: many invariants, each cheap

Michael Ernst, page 28 Improving invariant detection Add desired invariants: implicit values, unused polymorphism Eliminate undesired invariants: unjustified properties, redundant invariants, incomparable variables Traverse recursive data structures Conditionals: compute invariants over subsets of data (if x>0 then y  z)

Michael Ernst, page 29 Outline Operational difference technique for selecting test cases Generating operational abstractions Stacking and area techniques for comparing test suites Evaluation of operational difference technique Conclusion 

Michael Ernst, page 30 Comparing test suites Key metric: fault detection percentage of faults detected by a test suite Correlated metric: test suite size number of test cases run time Test suite comparisons must control for size

Michael Ernst, page 31 Test suite efficiency Efficiency = (fault detection)/(test suite size) Which test suite generation technique is better?

Michael Ernst, page 32 Different size suites are incomparable A technique induces a curve: How can we tell which is the true curve?

Michael Ernst, page 33 Comparing test suite generation techiques Each technique induces a curve Compare the curves, not specific points Approach: compare area under the curve Compares the techniques at many sizes Cannot predict the size users will want

Michael Ernst, page 34 Approximating the curves (“stacking”) Given a test budget (in suite execution time), generate a suite that runs for that long To reduce in size: select a random subset To increase in size: combine independent suites

Michael Ernst, page 35 Test suite generation comparison 1.Approximate the curves 2.Report ratios of areas under curves

Michael Ernst, page 36 Outline Operational difference technique for selecting test cases Generating operational abstractions Stacking and area techniques for comparing test suites Evaluation of operational difference technique Conclusion 

Michael Ernst, page 37 Evaluation of operational difference technique It ought to work: Correlating operational abstractions with fault detection It does work: Measurements of fault detection of generated suites

Michael Ernst, page 38 Subject programs 8 C programs seven 300-line programs, one 6000-line program Each program comes with pool of test cases (1052 – 13585) faulty versions (7 – 34) statement, branch, and def-use coverage suites

Michael Ernst, page 39 Improving operational abstractions improves tests Let the ideal operational abstraction be that generated by all available test cases Operational coverage = closeness to the ideal Operational coverage is correlated with fault detection Holding constant cases, calls, statement coverage, branch coverage Same result for 100% statement/branch coverage

Michael Ernst, page 40 Generated suites Relative fault detection (adjusted by using the stacking technique): Def-use coverage:1.73 Branch coverage:1.66 Operational difference:1.64 Statement coverage:1.53 Random:1.00 Similar results for augmentation, minimization

Michael Ernst, page 41 Augmentation Relative fault detection (via area technique): Random:1.00 Branch coverage:1.70 Operational difference:1.72 Branch + operational diff.:2.16

Michael Ernst, page 42 Operational difference complements structural Best technique Total Op. Diff.equalBranch CFG changes Non-CFG changes Total

Michael Ernst, page 43 Outline Operational difference technique for selecting test cases Generating operational abstractions Stacking and area techniques for comparing test suites Evaluation of operational difference technique Conclusion 

Michael Ernst, page 44 Future work How good is the stacking approximation? How do bugs in the programs affect the operational difference technique?

Michael Ernst, page 45 Contributions Stacking and area techniques for comparing test suites Control for test suite size Operational difference technique for automatic test case selection Based on observed program behavior Outperforms statement and branch coverage Complementary to structural techniques Works even at 100% code coverage No oracle, static analysis, or specification required