Delta Debugging AAIS 05 Curino, Giusti Delta Debugging Authors: Carlo Curino, Alessandro Giusti Politecnico di Milano An advanced debugging technique.

Slides:



Advertisements
Similar presentations
Delta Debugging and Model Checkers for fault localization
Advertisements

Order Statistics Sorted
Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.
Program Slicing Mark Weiser and Precise Dynamic Slicing Algorithms Xiangyu Zhang, Rajiv Gupta & Youtao Zhang Presented by Harini Ramaprasad.
1 Program Slicing Purvi Patel. 2 Contents Introduction What is program slicing? Principle of dependences Variants of program slicing Slicing classifications.
CS590Z Delta Debugging Xiangyu Zhang (slides adapted from Tevfik Bultan’s )
1 Homework Turn in HW2 at start of next class. Starting Chapter 2 K&R. Read ahead. HW3 is on line. –Due: class 9, but a lot to do! –You may want to get.
What Went Wrong? Alex Groce Carnegie Mellon University Willem Visser NASA Ames Research Center.
Microprocessors Introduction to ia64 Architecture Jan 31st, 2002 General Principles.
Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.
(c) 2007 Mauro Pezzè & Michal Young Ch 10, slide 1 Functional testing.
Delta Debugging - Demo Presented by: Xia Cheng. Motivation Automation is difficult Automation is difficult fail analysis needs complete understanding.
272: Software Engineering Fall 2008 Instructor: Tevfik Bultan Lecture 17: Automated Debugging.
Testing an individual module
Software Testing. “Software and Cathedrals are much the same: First we build them, then we pray!!!” -Sam Redwine, Jr.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
CHAPTER 10 Recursion. 2 Recursive Thinking Recursion is a programming technique in which a method can call itself to solve a problem A recursive definition.
Register Allocation and Spilling via Graph Coloring G. J. Chaitin IBM Research, 1982.
Handouts Software Testing and Quality Assurance Theory and Practice Chapter 5 Data Flow Testing
Impact Analysis of Database Schema Changes Andy Maule, Wolfgang Emmerich and David S. Rosenblum London Software Systems Dept. of Computer Science, University.
CS527: (Advanced) Topics in Software Engineering Overview of Software Quality Assurance Tao Xie ©D. Marinov, T. Xie.
Dr. Pedro Mejia Alvarez Software Testing Slide 1 Software Testing: Building Test Cases.
Software Quality Assurance Lecture #8 By: Faraz Ahmed.
Genetic Algorithm.
Software Testing. Definition To test a program is to try to make it fail.
© 2012 IBM Corporation Rational Insight | Back to Basis Series Chao Zhang Unit Testing.
CS 501: Software Engineering Fall 1999 Lecture 16 Verification and Validation.
Presented By Dr. Shazzad Hosain Asst. Prof., EECS, NSU
CMSC 345 Fall 2000 Unit Testing. The testing process.
CS4311 Spring 2011 Unit Testing Dr. Guoqiang Hu Department of Computer Science UTEP.
Locating Causes of Program Failures Texas State University CS 5393 Software Quality Project Yin Deng.
Benjamin Gamble. What is Time?  Can mean many different things to a computer Dynamic Equation Variable System State 2.
Foundations of Software Testing Chapter 5: Test Selection, Minimization, and Prioritization for Regression Testing Last update: September 3, 2007 These.
Bug Localization with Machine Learning Techniques Wujie Zheng
Testing Workflow In the Unified Process and Agile/Scrum processes.
Dynamic Analysis of Multithreaded Java Programs Dr. Abhik Roychoudhury National University of Singapore.
Introduction to Software Testing. Types of Software Testing Unit Testing Strategies – Equivalence Class Testing – Boundary Value Testing – Output Testing.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
CSE403 Software Engineering Autumn 2001 More Testing Gary Kimura Lecture #10 October 22, 2001.
Software Testing Yonsei University 2 nd Semester, 2014 Woo-Cheol Kim.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 10: Automated Debugging.
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
References: “Pruning Dynamic Slices With Confidence’’, by X. Zhang, N. Gupta and R. Gupta (PLDI 2006). “Locating Faults Through Automated Predicate Switching’’,
“Isolating Failure Causes through Test Case Generation “ Jeremias Rößler Gordon Fraser Andreas Zeller Alessandro Orso Presented by John-Paul Ore.
How to isolate cause of failure? 최윤라. Contents Introduction Isolating relevant input Isolating relevant states Isolating the error Experiments.
Fixing the Defect CEN4072 – Software Testing. From Defect to Failure How a defect becomes a failure: 1. The programmer creates a defect 2. The defect.
Scientific Debugging. Errors in Software Errors are unexpected behaviors or outputs in programs As long as software is developed by humans, it will contain.
1 CEN 4072 Software Testing PPT12: Fixing the defect.
Software Quality Assurance and Testing Fazal Rehman Shamil.
PROGRAMMING TESTING B MODULE 2: SOFTWARE SYSTEMS 22 NOVEMBER 2013.
Automated Debugging with Error Invariants TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A AA A A Chanseok Oh.
Simplifying and Isolating Failure-Inducing Input Andreas Zeller and Ralf Hildebrandt IEEE Transactions on Software Engineering (TSE) 2002.
Agenda  Quick Review  Finish Introduction  Java Threads.
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
Foundations of Software Testing Chapter 5: Test Selection, Minimization, and Prioritization for Regression Testing Last update: September 3, 2007 These.
Mutation Testing Laraib Zahid & Mariam Arshad. What is Mutation Testing?  Fault-based Testing: directed towards “typical” faults that could occur in.
Software Testing and Quality Assurance Practical Considerations (1) 1.
Cs498dm Software Testing Darko Marinov January 24, 2012.
Week#3 Software Quality Engineering.
Regression Testing with its types
Testing Tutorial 7.
Aditya P. Mathur Purdue University
CSCE 315 – Programming Studio, Fall 2017 Tanzir Ahmed
Objective of This Course
Test Case Purification for Improving Fault Localization
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
50.530: Software Engineering
Profs. Brewer CS 169 Lecture 13
Testing Slides adopted from John Jannotti, Brown University
Software Testing and QA Theory and Practice (Chapter 5: Data Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
Presentation transcript:

Delta Debugging AAIS 05 Curino, Giusti Delta Debugging Authors: Carlo Curino, Alessandro Giusti Politecnico di Milano An advanced debugging technique

Delta Debugging AAIS 05 Curino, Giusti Motivations Reducing faults: 50%-80% of total cost Debugging: One of the hardest, yet least systematic activities of software engineering most time-consuming Locating faults: most difficult

Delta Debugging AAIS 05 Curino, Giusti Overview Which problems are solved by Delta Debugging Four solutions: a common approach 1.Simplifying failure-inducing input 2.Isolating failure-inducing thread schedule 3.Identifying failure-inducing changes in the code 4.Isolating Cause-Effect Chains

Delta Debugging AAIS 05 Curino, Giusti Failure-inducing input This HTML input makes Mozilla crash (segmentation fault). Which portion is the failure-inducing one?

Delta Debugging AAIS 05 Curino, Giusti Thread scheduling The result of a multithread program seems not deterministic. Why it happens?

Delta Debugging AAIS 05 Curino, Giusti Code changes The old version of GDB works with DDD, the new one doesn’t! lines of code have been modified between the two versions where’s the bug?

Delta Debugging AAIS 05 Curino, Giusti Cause-effect chain Which part of the program state is involved in the failure?

Delta Debugging AAIS 05 Curino, Giusti Four solutions: a single approach The underlying problem is: Find which part of something determines the failure So a common strategy can be applied: Divide et impera applied to deltas between: Working and failing Inputs Working and failing code versions Working and failing threads schedules Working and failing program states This allows: Efficient and automatic debugging procedure

Delta Debugging AAIS 05 Curino, Giusti Common terminology A test case can either: Fail (The failure shows up) Pass (program runs properly) Be Unspecified (different problems arise) Delta debugging Algorithms iteratively: Apply changes (to input, code, schedule or state) Run tests

Delta Debugging AAIS 05 Curino, Giusti Common terminology (2) Concept of difference: A really general delta between something in 2 test cases Examples: Difference in the input: different character (or bit) in the input stream Difference in thread schedule: difference in the time a given thread switch is performed Difference in the code: different statement in 2 version of a program Difference in the program state: different values of the internal variables of a program

Delta Debugging AAIS 05 Curino, Giusti Simplifying Failure-inducing input

Delta Debugging AAIS 05 Curino, Giusti Minimizing vs Isolating Minimizing (ddmin algorithm): Slower More human friendly Isolating (dd algorithm): Generalization of the ddmin algorithm Faster Good to generate the input of the cause-effect chain DD

Delta Debugging AAIS 05 Curino, Giusti Minimizing: Mozilla bug Minimizing: 57 test to simplify the 896 line HTML input to the “ ” tag that causes the crash Each character is relevant (as shown from line 20 to 26) Only removes deltas from the failing test Returns a n-minimal (global minimum is NP) input that causes a failure

Delta Debugging AAIS 05 Curino, Giusti Minimizing: didactic example

Delta Debugging AAIS 05 Curino, Giusti Isolating: Mozilla bug Isolating: Only 7 tests (instead of 26) Removes deltas from the failing test and add deltas to passing test Isolates a single delta “<” that makes the failure to go away Returns the 2 nearest input on failing and the other passing

Delta Debugging AAIS 05 Curino, Giusti General DD Algorithm Initial Fail Initial Pass Differences

Delta Debugging AAIS 05 Curino, Giusti General DD Algorithm Initial Fail Initial Pass Differences What if we remove these diff from current failing test?

Delta Debugging AAIS 05 Curino, Giusti General DD Algorithm Initial Fail Initial Pass Differences Failure disappears: “Move up”

Delta Debugging AAIS 05 Curino, Giusti General DD Algorithm Initial Fail Initial Pass Differences What if we remove these diff?

Delta Debugging AAIS 05 Curino, Giusti General DD Algorithm Initial Fail Initial Pass Differences UNRESOLVED TEST: “Increase Granularity”

Delta Debugging AAIS 05 Curino, Giusti General DD Algorithm Initial Fail Initial Pass Differences What if we remove these diff from current failing test?

Delta Debugging AAIS 05 Curino, Giusti General DD Algorithm Initial Fail Initial Pass Differences Still Fails: “Move Down”

Delta Debugging AAIS 05 Curino, Giusti Formally: the Algorithm

Delta Debugging AAIS 05 Curino, Giusti Efficiency considerations The worst case: |k| 2 + 3|k| tests (k=cardinality of the change set) all test cases are unresolved except the last one very unlikely The best case: 2*log|k| Try to avoid unresolved tests outcomes Lexical, syntactical knowledge about input

Delta Debugging AAIS 05 Curino, Giusti DEMO Eclipse Plugin Live Demo

Delta Debugging AAIS 05 Curino, Giusti Thread Scheduling The behavior of a multithreaded program may depend on the schedule.

Delta Debugging AAIS 05 Curino, Giusti DD applied to Thread Scheduling Debug is even harder here: Thread switches and schedules are nondeterministic It is difficult to reproduce and isolate failures Goal: Relate failure to a small set of relevant differences from passing and failing schedules Again a “purely experimental approach”, no need to understand the program

Delta Debugging AAIS 05 Curino, Giusti Purely experimental: Pros and Cons Pros: program treated as a black box: requires only to execute the program Failure: an arbitrary behaviour of the program. Requires only to distinguish failure from success. Cons: (w.r.t static analysis) Test-based: can not determine properties for all runs of a program like the general absence of deadlocks require an observable failure

Delta Debugging AAIS 05 Curino, Giusti Dejavu tool Tool: Dejavu (DEterministic JAVa replay Utility) by IBM Reproduce of schedules and induced failures Exploiting Dejavu the Thread Schedule becomes an input We can generate schedules by mixing 1 running schedule and 1 failing schedule

Delta Debugging AAIS 05 Curino, Giusti Differences in thread scheduling Starting point: Passing run Failing run Differences (for t1): t1 occurs in at time 254 t1 occurs in at time 278 ∆1 = |278 − 254| induces a statement interval: the code executed between time 254 and 278

Delta Debugging AAIS 05 Curino, Giusti Differences in thread scheduling We can build further test cases mixing the two schedule to isolate the relevant differences

Delta Debugging AAIS 05 Curino, Giusti Real life test: setting Test #205 of the SPEC JVM98 Java test suite Modification of the raytracer program to a multi-threaded version Introduction of a simple race condition Implementation of an automated test that checks failure/passing Generation of random schedules to find a passing schedule and a failing schedule Differences between the passing and failing schedule: 3,842,577,240 differences Each diff moves thread switch time to +1 or -1

Delta Debugging AAIS 05 Curino, Giusti Real life test: results DD isolate one single difference after 50 test (about 28 min)

Delta Debugging AAIS 05 Curino, Giusti Real life test: pin-point the failure The failure occurs if and only if thread switch #33 occurs at yield point (safe point like function invocation) 59,772,127 (instead of 59,772,126) at 59,772,127 line 91 is the first yield point after the initialization of OldScenesLoaded At 59,772,126 line 82 is the yield point just before the initialization of OldScenesLoaded

Delta Debugging AAIS 05 Curino, Giusti Real life test: conclusion Delta Debugging is efficient even when applied to very large thread schedules (>3,000,000,000 diff) No analysis is required as Delta Debugging relies on experiments alone only the schedule was observed and altered failure-inducing thread switch is easily associated with code Alternate runs are obtained automatically by generating random schedules only one initial run (pass or fail) is required

Delta Debugging AAIS 05 Curino, Giusti Code changes A given revision of a program behaves correctly. The next one does not. Find which of the changes in the code causes the problem. Inconvent when difference == thousands of lines of code

Delta Debugging AAIS 05 Curino, Giusti The manual solution Binary search through the revision history  Regression containment Does not always work: Multiple changes that cause the failure only when combined (interference) A single change can amount to many code lines (granularity) Mixing parallel developement branches originates inconsistency problems

Delta Debugging AAIS 05 Curino, Giusti Procedure Developed in 1999: some differences with current general DD algorithms. Consider the differences between the working and failing revisions. Ignore any knowledge about the temporal ordering of the changes. Goal: find a minimal failure-inducing change set.

Delta Debugging AAIS 05 Curino, Giusti Inconsistencies Mixing code changes regardless of their ordering originates lots of tests with “Unresolved” outcome: Integration failure Construction failure Execution failure They increase complexity of the DD algorithm!

Delta Debugging AAIS 05 Curino, Giusti Future work Group related changes (partly done)  less inconsistent trials. Common change dates/sources Location criteria Lexical criteria Syntactic criteria (common funcions/modules) Semantic criteria

Delta Debugging AAIS 05 Curino, Giusti Cause-Effect Background A bit of background: A program state is represented by variable values, and references.

Delta Debugging AAIS 05 Curino, Giusti Background (2) While the program runs, the state evolves. We assume the program is Deterministic Not interactive  identical states at identical times have identical evolutions.

Delta Debugging AAIS 05 Curino, Giusti Idea: apply DD to program states. We need two distinct runs: one failing one passing We want the two runs to be (initially) as much similar as possibile. If we let the two runs evolve in parallel, their initial state will be similar. Isolating failure-inducing input can help. Apply DD to different "slices" of the program evolution. (A sort of TAC for computer routines).

Delta Debugging AAIS 05 Curino, Giusti Procedure Iteratively Build a new state mixing the passing and failing state. Let the program evolve and see if it passes, fails, or does unrelated weird things (undefined outcome). Isolate the smallest subset of the state relevant for the failure. No news so far. But: this happens at a specific moment of the program evolution. It will be repeated (e.g. at important functions' entry points).

Delta Debugging AAIS 05 Curino, Giusti The result A cause-effect chain that leads to a failure.

Delta Debugging AAIS 05 Curino, Giusti The cause-effect chain The initial states are absolutely legitimate: for example, direct consequence of a specific input that the program should handle.  intended program states. The final effects are the failure.  faulty program states. The error lies somewhere in the middle, when an intended program states evolves into a faulty one.

Delta Debugging AAIS 05 Curino, Giusti Fascinating terminology A defect in the code originates an infection in the state. The infection usually propagates as the program evolves.

Delta Debugging AAIS 05 Curino, Giusti Limits No automatic discrimination of intended and faulty (infected) states! The human user can increase resolution of slices, and pinpoint the code that evolves an INTENDED state to a FAULTY one.  Correct the error (== defect in the code) and break the cause-effect chain that leads to the failure.

Delta Debugging AAIS 05 Curino, Giusti Cause Transitions Sometimes executing an instruction a given variable ceases to be failure-inducing others begin  the failure-inducing subset of the state changes (cause transition) An algorithm can efficiently find cause transitions in cause- effect chains, by means of binary search (again).

Delta Debugging AAIS 05 Curino, Giusti Cause Transitions (2)

Delta Debugging AAIS 05 Curino, Giusti Cause Transitions (3) Why do we bother looking for cause transitions? A variable begins to cause a failure: Good location for a fix More important: “cause transitions are significantly better locators of defects than any other methods previously known” Result: valuable help in the search for the defect: only a bunch of cause transitions, and nearby code locations need to be analyzed as the source of the infection.

Delta Debugging AAIS 05 Curino, Giusti Other approaches to defect localization Coverage Slicing Dynamic invariants no success with Siemens test suite Explicit specification good results, but needs specification of desired internal behavior Nearest neighbor (using coverage) best results albeit quite naive

Delta Debugging AAIS 05 Curino, Giusti Evaluation setup Siemens suite 7 C sample programs (hundreds of lines of code each). 132 variations with one realistic defect each. A test suite for each program. Apply the different defect locators, and compare their performance (only comparison to NN is presented).

Delta Debugging AAIS 05 Curino, Giusti Evaluation results

Delta Debugging AAIS 05 Curino, Giusti Clarification Two small improvements; relevance of code locations (automatic) sources of infection (programmer-driven): Unfair! Jump to the conclusion

Delta Debugging AAIS 05 Curino, Giusti Zoom on the representation of the state We said: “A program state is represented by variable values, and references” In general, representing and manipulating the state is not trivial One of the problems: C pointers  copying their value does not make sense  Solution: Memory graphs.

Delta Debugging AAIS 05 Curino, Giusti Memory graphs Systematically unfold all data structures, starting from base variables.

Delta Debugging AAIS 05 Curino, Giusti Memory graphs (2) Nodes: all values and all variables of a program operations like Edges: variable access pointer dereferencing struct member access array element access  Abstract from memory addresses.  Compare and alter pointers.

Delta Debugging AAIS 05 Curino, Giusti Memory graphs (3) What if the set of variables differ in the two states we are mixing? Just compute the largest common subgraph.  The deltas we apply to a state: Change variable values. Alter data structures.

Delta Debugging AAIS 05 Curino, Giusti Implementation considerations All we need is a way to access and modify program state. GDB is the solution for C programs, but has performance problems (5000% overhead). DD applied to states is still a black box approach (sort of) Easily extended to other languages as soon as something provides GDB- like functionality.

Delta Debugging AAIS 05 Curino, Giusti Conclusions Delta Debugging: is an extremely interesting technique works pretty good at least in theory there are no usable tools can be usefully integrated in various IDE the algorithm is now patent-free (expired patent) SO : LET’S MAKE SOME MONEY ON IT!

Delta Debugging AAIS 05 Curino, Giusti Acknowledgements Some slides and images adapted from Dr. Andreas Zeller’s presentations and papers (

Delta Debugging AAIS 05 Curino, Giusti References Yesterday, My Program Worked. Today, It does Not. Why?, Andreas Zeller, FSE 1999 Finding Failure Causes through Automated Testing. Holger Cleve, Andreas Zeller; 4° International Workshop on Automated Debugging 2000 Simplifying failure-inducing input, Ralf Hildebrandt, Andreas Zeller, ISSTA 2000 Automated Debugging: Are We Close? Andreas Zeller; IEEE Computer, November Isolating Failure-Inducing Thread Schedules. Jong-Deok Choi and Andreas Zeller, ISSTA 2002 Isolating Cause-Effect Chains from Computer Programs, Andreas Zeller, FSE 2002 Locating Causes of Program Failures. Holger Cleve and Andreas Zeller, ICSE 2005