“Isolating Failure Causes through Test Case Generation “ Jeremias Rößler Gordon Fraser Andreas Zeller Alessandro Orso Presented by John-Paul Ore.

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Testing Coverage Test case
Delta Debugging and Model Checkers for fault localization
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
Inpainting Assigment – Tips and Hints Outline how to design a good test plan selection of dimensions to test along selection of values for each dimension.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Automated Documentation Inference to Explain Failed Tests Sai Zhang University of Washington Joint work with: Cheng Zhang, Michael D. Ernst.
Computer Science Automated Test Data Generation for Aspect-Oriented Programs Mark Harman (King’s College London, UK) Fayezin Islam (T-Zero Processing Services,
Automated Fitness Guided Fault Localization Josh Wilkerson, Ph.D. candidate Natural Computation Laboratory.
Bug Isolation via Remote Program Sampling Ben Liblit, Alex Aiken, Alice X.Zheng, Michael I.Jordan Presented by: Xia Cheng.
SBSE Course 3. EA applications to SE Analysis Design Implementation Testing Reference: Evolutionary Computing in Search-Based Software Engineering Leo.
Mining Metrics to Predict Component Failures Nachiappan Nagappan, Microsoft Research Thomas Ball, Microsoft Research Andreas Zeller, Saarland University.
Online Performance Auditing Using Hot Optimizations Without Getting Burned Jeremy Lau (UCSD, IBM) Matthew Arnold (IBM) Michael Hind (IBM) Brad Calder (UCSD)
Empirical Analysis Doing and interpreting empirical work.
Paper Title Your Name CMSC 838 Presentation. CMSC 838T – Presentation Motivation u Problem paper is trying to solve  Characteristics of problem  … u.
(c) 2007 Mauro Pezzè & Michal Young Ch 16, slide 1 Fault-Based Testing.
CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)
Statistical Debugging: A Tutorial Steven C.H. Hoi Acknowledgement: Some slides in this tutorial were borrowed from Chao Liu at UIUC.
(c) 2007 Mauro Pezzè & Michal Young Ch 10, slide 1 Functional testing.
Finding the Weakest Characterization of Erroneous Inputs Dzintars Avots and Benjamin Livshits.
Delta Debugging - Demo Presented by: Xia Cheng. Motivation Automation is difficult Automation is difficult fail analysis needs complete understanding.
Parameterizing Random Test Data According to Equivalence Classes Chris Murphy, Gail Kaiser, Marta Arias Columbia University.
Software Testing. “Software and Cathedrals are much the same: First we build them, then we pray!!!” -Sam Redwine, Jr.
1 User Centered Design and Evaluation. 2 Overview My evaluation experience Why involve users at all? What is a user-centered approach? Evaluation strategies.
Bootstrapping applied to t-tests
State coverage: an empirical analysis based on a user study Dries Vanoverberghe, Emma Eyckmans, and Frank Piessens.
Tal Mor  Create an automatic system that given an image of a room and a color, will color the room walls  Maintaining the original texture.
Software Testing Verification and validation planning Software inspections Software Inspection vs. Testing Automated static analysis Cleanroom software.
CS527: (Advanced) Topics in Software Engineering Overview of Software Quality Assurance Tao Xie ©D. Marinov, T. Xie.
1 Shawlands Academy Higher Computing Software Development Unit.
Optimal n fe Tian-Li Yu & Kai-Chun Fan. n fe n fe = Population Size × Convergence Time n fe is one of the common used metrics to measure the performance.
Locating Causes of Program Failures Texas State University CS 5393 Software Quality Project Yin Deng.
Scalable Statistical Bug Isolation Ben Liblit, Mayur Naik, Alice Zheng, Alex Aiken, and Michael Jordan, 2005 University of Wisconsin, Stanford University,
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.
1 The Software Development Process  Systems analysis  Systems design  Implementation  Testing  Documentation  Evaluation  Maintenance.
Bug Localization with Machine Learning Techniques Wujie Zheng
Scalable Statistical Bug Isolation Authors: B. Liblit, M. Naik, A.X. Zheng, A. Aiken, M. I. Jordan Presented by S. Li.
Which Configuration Option Should I Change? Sai Zhang, Michael D. Ernst University of Washington Presented by: Kıvanç Muşlu.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Predictive Design Space Exploration Using Genetically Programmed Response Surfaces Henry Cook Department of Electrical Engineering and Computer Science.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Software Verification, Validation and Testing.
Tuffy Scaling up Statistical Inference in Markov Logic using an RDBMS
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
Software Reliability Research Pankaj Jalote Professor, CSE, IIT Kanpur, India.
The Software Development Process
Automated Patch Generation Adapted from Tevfik Bultan’s Lecture.
Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter and Andrew Y. Ng Computer Science Department Stanford.
Week 14 Introduction to Computer Science and Object-Oriented Programming COMP 111 George Basham.
CS5103 Software Engineering Lecture 02 More on Software Process Models.
How to isolate cause of failure? 최윤라. Contents Introduction Isolating relevant input Isolating relevant states Isolating the error Experiments.
Bug Localization with Association Rule Mining Wujie Zheng
Fixing the Defect CEN4072 – Software Testing. From Defect to Failure How a defect becomes a failure: 1. The programmer creates a defect 2. The defect.
Scientific Debugging. Errors in Software Errors are unexpected behaviors or outputs in programs As long as software is developed by humans, it will contain.
Some thoughts on error handling for FTIR retrievals Prepared by Stephen Wood and Brian Connor, NIWA with input and ideas from others...
Robust Estimation With Sampling and Approximate Pre-Aggregation Author: Christopher Jermaine Presented by: Bill Eberle.
HASE: A Hybrid Approach to Selectivity Estimation for Conjunctive Queries Xiaohui Yu University of Toronto Joint work with Nick Koudas.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Automating Configuration Troubleshooting with Dynamic Information Flow Analysis Mona Attariyan Jason Flinn University of Michigan.
1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.
Error Explanation with Distance Metrics Authors: Alex Groce, Sagar Chaki, Daniel Kroening, and Ofer Strichman International Journal on Software Tools for.
Simplifying and Isolating Failure-Inducing Input Andreas Zeller and Ralf Hildebrandt IEEE Transactions on Software Engineering (TSE) 2002.
Automated Adaptive Bug Isolation using Dyninst Piramanayagam Arumuga Nainar, Prof. Ben Liblit University of Wisconsin-Madison.
Jeremy Nimmer, page 1 Automatic Generation of Program Specifications Jeremy Nimmer MIT Lab for Computer Science Joint work with.
David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and K
Learning Software Behavior for Automated Diagnosis
About the Presentations
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
Mitigating the Effects of Flaky Tests on Mutation Testing
Presentation transcript:

“Isolating Failure Causes through Test Case Generation “ Jeremias Rößler Gordon Fraser Andreas Zeller Alessandro Orso Presented by John-Paul Ore

Motivation: Debugging & Maintenance is Super Expensive Cost to develop software worldwide: $1,500,000,000,000 (USD) Debugging and Maintenance cost$350,000,000,000 (USD) (assumes 23% of developer time spent debugging) Source: Judge Business School of the University of Cambridge, UK (2013) Evans Data Corporation (2012), Payscale (2012), RTI (2002), CVP Surveys (2012)

What is Debugging? Finding the fault responsible for the failure, and applying a change to program P such that P is correct with regard to the specification S concerning the failure. Debugging includes a search problem. We can automate search.

Talk Outline Problems BugEx seeks to address Background concepts Inner Workings of BugEx Algorithm Empirical Evaluation Relation of this work to 990 Class Project

Automated Debugging: still a hard problem Parnin, Chris, and Alessandro Orso. "Are automated debugging techniques actually helping programmers?." Proceedings of the 2011 International Symposium on Software Testing and Analysis. ACM, 2011.

BugEx : Overview Problems addressed Problem 1: Automated debugging techniques reveal too many possible code locations Solution 1: Increase precision through guided test-generation Problem 2: Even if the location is known, developer might not have perfect bug understanding Solution 2: presents ‘facts’ rather than code location Problem 3: Other experimental techniques unsound (Delta Debugging, Predicate switching) Solution 3: Generate real program executions

BugEx: Underlying Concepts 1.Expands on statistical debugging. Correlate program facts with failures

1. BugEx extends Statistical Debugging Benjamin Liblit et al. Liblit, B., Aiken, A., Zheng, A. X., & Jordan, M. I. (2003). Bug isolation via remote program sampling. ACM SIGPLAN Notices, 38(5), (and more, identified in the paper) “Statistical debugging works off of the contrast between good and bad runs, so you need to feed it both.” – B. Liblit. Passing test case Failing test case

BugEx: Underlying Concepts 1.Expands on statistical debugging. Correlate program facts with failures 2.Use automatic test generation (genetic algorithms) to create statistically significant number of tests

2. Test Case Generation Genetic Algorithms Individual is a TEST encoded in JAVA bytecode Mutation might change branching or variable values TEST_a TEST_b TEST_b’ TEST_a’ Fitness branch distance or predicate distance (closer is better) Image

Test Case Generation Genetic Algorithms Shape of the search directs fitness function (the gradient) Globally Optimality not guaranteed Image © Mathworks, 2010

Overview of BugEx (hint: it’s a Search) Generate Tests explore search space (Genetic algorithm) Find facts that correlate with failure to guide test generation (Statistical debugging) Show results

Counterfactual conditional If not A, then B If cause is present, the failure is present If cause is absent, the failure is absent Fact i  Failure

BugEx Algorithm : Initialization (figure 4 p. 312)

BugEx Algorithm : Main Loop (figure 4 p. 312) (of the best!) (Statistical Debugging) (Genetic Algorithm) LOOP (branches or state predicates)

Microreview of BugEx (hint: it’s a Search) Generate Tests explore search space (Genetic algorithm) Find facts that correlate with failure to guide test generation (Statistical debugging) Show results

14. F := getFacts(T fail ) U getFacts(T pass ) U F 1.Fact must be Boolean: either true or false at runtime 2.Fact must be observable. Branches Reached or not reached T or F branch taken? attribute | parameters | inspector | = | = | != attribute | parameters | inspector | constant State Predicates All available variables, objects, constants at beginning of method ? How Big is this space (in Big O) ? {

16. F correlating := correlateToFailure(F, T fail, T pass ) Bayes’ Theorem Bayesian Inference

Slides courtesy of Jeremias Rößler (2012)

Empirical Evaluation

Empirical Research Questions RQ1. Is the number of relevant facts identified by BUGEX small enough for a developer to examine?

# of Branches vs Time to Converge Branches Seconds

RQ1: BugEx compared to Statistical Debugging BugEx

Empirical Research Questions RQ2. Do the facts identified by BUGEX help the developer understand the failure? Authors answered ‘yes’, compared their fix with the ‘official fix’. Challenging because sometimes the original developers refactored the code at a larger scale.

Subsequent User Studies: nope “This study showed how much effort the design and preparation of a user study requires, and how easy error prone it is. This is probably the reason, why there are still so few user studies in the field of automated debugging.” “So there was little time to prepare BUGEX and the underlying infrastructure.” Rößler, Jeremias. "From software failure to explanation." (2013).

Summary BugEx combines Statistical Debugging and Automated Test Generation (GA) to improve debugging precision. BugEx treats debugging is a search problem, and tries to find information that is useful to developers. Usefulness difficult to evaluate because prototype tool is very specific.

Relation of BugEx to Project Guided automatic test generation. Focus on message passing programs, observed at the component level (ROS – robot operating system) Use program traces to generate test suites for regression testing, based on component properties.