Bug Localization with Association Rule Mining Wujie Zheng

Slides:

Advertisements

Similar presentations

A distributed method for mining association rules

Advertisements

Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.

Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.

Automated Fitness Guided Fault Localization Josh Wilkerson, Ph.D. candidate Natural Computation Laboratory.

Data Mining Association Analysis: Basic Concepts and Algorithms

Comprehensive Evaluation of Association Measures for Software Fault Localization LUCIA, David LO, Lingxiao JIANG, Aditya BUDI Singapore Management University.

Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.

Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.

Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.

Pruning Dynamic Slices With Confidence Xiangyu Zhang Neelam Gupta Rajiv Gupta The University of Arizona.

Data Mining Association Analysis: Basic Concepts and Algorithms

CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Statistical Debugging: A Tutorial Steven C.H. Hoi Acknowledgement: Some slides in this tutorial were borrowed from Chao Liu at UIUC.

Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.

Fast Algorithms for Association Rule Mining

Testing an individual module

Software Bug Localization with Markov Logic Sai Zhang, Congle Zhang University of Washington Presented by Todd Schiller.

Automated Diagnosis of Software Configuration Errors

Class Specification Implementation Graph By: Njume Njinimbam Chi-Chang Sun.

CMSC 345 Fall 2000 Unit Testing. The testing process.

An Automated Approach to Predict Effectiveness of Fault Localization Tools Tien-Duy B. Le, and David Lo School of Information Systems Singapore Management.

Scalable Statistical Bug Isolation Ben Liblit, Mayur Naik, Alice Zheng, Alex Aiken, and Michael Jordan, 2005 University of Wisconsin, Stanford University,

CS4723 Software Validation and Quality Assurance Lecture 15 Advanced Topics Test Plans and Management.

Rate-based Data Propagation in Sensor Networks Gurdip Singh and Sandeep Pujar Computing and Information Sciences Sanjoy Das Electrical and Computer Engineering.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.

Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.

Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,

Bug Localization with Machine Learning Techniques Wujie Zheng

Querying Structured Text in an XML Database By Xuemei Luo.

Scalable Statistical Bug Isolation Authors: B. Liblit, M. Naik, A.X. Zheng, A. Aiken, M. I. Jordan Presented by S. Li.

Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.

Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.

Semantic Wordfication of Document Collections Presenter: Yingyu Wu.

Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng

Automated Patch Generation Adapted from Tevfik Bultan’s Lecture.

References: “Pruning Dynamic Slices With Confidence’’, by X. Zhang, N. Gupta and R. Gupta (PLDI 2006). “Locating Faults Through Automated Predicate Switching’’,

Measuring Association Rules Shan “Maggie” Duanmu Project for CSCI 765 Dec 9 th 2002.

Prioritizing Test Cases for Regression Testing Article By: Rothermel, et al. Presentation by: Martin, Otto, and Prashanth.

Isolating Failure-Inducing Combinations in Combinatorial Testing using Test Augmentation and Classiﬁcation Kiran Shakya Tao Xie North Carolina State University.

Using Social Network Analysis Methods for the Prediction of Faulty Components Gholamreza Safi.

“Isolating Failure Causes through Test Case Generation “ Jeremias Rößler Gordon Fraser Andreas Zeller Alessandro Orso Presented by John-Paul Ore.

 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.

2015/12/251 Hierarchical Document Clustering Using Frequent Itemsets Benjamin C.M. Fung, Ke Wangy and Martin Ester Proceeding of International Conference.

Theory and Practice of Software Testing

Pruning Dynamic Slices With Confidence Original by: Xiangyu Zhang Neelam Gupta Rajiv Gupta The University of Arizona Presented by: David Carrillo.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.

Optimization of Association Rules Extraction Through Exploitation of Context Dependent Constraints Arianna Gallo, Roberto Esposito, Rosa Meo, Marco Botta.

Random Test Generation of Unit Tests: Randoop Experience

Fitness Guided Fault Localization with Coevolutionary Automated Software Correction Case Study ISC Graduate Student: Josh Wilkerson, Computer Science ISC.

Test Case Purification for Improving Fault Localization presented by Taehoon Kwak SoftWare Testing & Verification Group Jifeng Xuan, Martin Monperrus [FSE’14]

Differential Analysis on Deep Web Data Sources Tantan Liu, Fan Wang, Jiedan Zhu, Gagan Agrawal December.

Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.

1 API Recommendation Wujie Zheng

Data Mining Association Analysis: Basic Concepts and Algorithms

Frequent Pattern Mining

White-Box Testing.

Ask the Mutants: Mutating Faulty Programs for Fault Localization

CARPENTER Find Closed Patterns in Long Biological Datasets

Improving Test Suites for Efficient Fault Localization

White-Box Testing.

Test Case Purification for Improving Fault Localization

Farzaneh Mirzazadeh Fall 2007

Automated Fitness Guided Fault Localization

Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.

50.530: Software Engineering

Using Automated Program Repair for Evaluating the Effectiveness of

Presentation transcript:

Bug Localization with Association Rule Mining Wujie Zheng

Outline Introduction  Background of Bug Localization  From Predicate to Predicate Sets Mining Suspicious Predicate Sets as Strong Association Rules  Modeling  The AllRules Algorithm Redundant Rule Pruning  Definition  Sufficient Condition of Redundant Rules  The ClosedRules Algorithm Experiments and Case Study Conclusions

Introduction

Background of Bug Localization Motivation  Software is far from bug-free  Manual debugging is laborious and expensive Definition Bug localization is to find a set or a ranking of source code locations that are likely buggy through automatic analysis. General Setting  A set of failing executions  A set of passing executions

Background of Bug Localization An Example [Jones02]

Background of Bug Localization xSlice [Agrawal95] Set Operation

Background of Bug Localization TARANTULA [Jones02] Visualization:

Background of Bug Localization LIBLIT05 [Liblit05]

Background of Bug Localization SOBER [Liu05]  the probability density function of the evaluation bias of P on passing runs and failing runs respectively  The bug relevance score of P is then defined as the difference between them

From Predicate to Predicate Sets Motivation A failure is caused by not only the bug but also some other trigger conditions The criterion of an interesting (suspicious) predicate set Ps={P 1,…P n }  Any P i in Ps should be related to the bug.  The whole set Ps should be related to the bug. Benefit  Improve the accuracy  The mined implicit relationships may provide more hints for the programmers Potential problems  High computational complexity  High redundancy Existing work  Consider only combinations of two predicates

Mining Suspicious Predicate Sets as Strong Association Rules

Modeling The criterion of an interesting (suspicious) predicate set Ps={P 1,…P n }  Any P i in Ps should be related to the bug.  The whole set Ps should be related to the bug. The appearance of such a Ps in the execution trace  When P i exists, the program has a high probability to run into failure; When the program run into failure, Pi always exists.  When Ps exists, the program has a high probability to run into failure; When the program run into failure, Ps always exists. Strong Association Rule Representation  P i => failure should have high support and confidence  Ps =>failure should have high support and confidence  Support(X=>Y)=p(X,Y), Confidence(X=>Y)=p(Y|X) Benefit from the advance of data mining techniques

The AllRules Algorithm Given a database of the execution traces, the items are the predicates {P 1,…P n } and the label failing/passing.  1st-Phase: select the buggy single predicates 1. Mining all the frequent itemsets {P i, failure}. 2. Calculate all the confidences of P i => failure. 3. Select the top-20 rules and construct a new database with the corresponding P i.  2nd-Phase: select the buggy predicate sets 1. Mining all the frequent itemsets {Ps, failure} from the new database. 2. Calculate all the confidences of Ps => failure. 3. Select the top rules as the results.

Redundant Rule Pruning

Redundant Rules X=>failure is redundant when there exists a superset of X named Y, and the support and confidence Y => failure are not less than those of X => failure.  We should have checked some superset of such Ps before checking it. Sufficient Condition of Redundant Rules  If {X, failure} is not a closed frequent itemset, then X=>failure is a redundant rule.  So we just need to mine the closed frequent itemsets!

The ClosedRules Algorithm Given a database of the execution traces, the items are the predicates {P 1,…P n } and the label failing/passing.  1st-Phase: select the buggy single predicates 1. Mining all the frequent itemsets {P i, failure}. 2. Calculate all the confidences of P i => failure. 3. Select the top-20 rules and construct a new database with the corresponding P i.  2nd-Phase: select the non-redundant buggy predicate sets 1. Mining all of the closed frequent itemsets {Ps, failure} from the new database. 2. Calculate all the confidences of Ps => failure. 3. Select the top rules as the results.

Experiments and Case Study

Subject Programs and Performance Metrics Subject Programs  Siemens suite The Siemens suite was originally prepared by Siemens Corp. It contains 130 faulty versions of 7 programs: print tokens, print tokens2, replace, schedule, schedule2, tcas, and tot_info Performance Metrics  T-score Based on program dependence graph, where each statement is a node and there is an edge between two nodes if two statements have data and/or control dependencies. Given a bug localization report, a programmer is assumed to start from the suspicious statements and does a breadth-first search along the program dependence graphs until he reaches the faulty statements. A T-score is defined as the percentage of code that is examined during this process. T-score estimates the amount of programmer effort required to find bugs using the bug localization algorithms. The less code to be examined, the higher the quality of a bug localization algorithm is.

Predicate Sets vs. Single Predicate Fig. 1. Predicate Sets vs. Single Predicate

Comparison with Other Algorithms Fig. 2. Performance of BLARM, LIBLIT05 and SOBER

Case Study

Subject Programs and Performance Metrics We tested this buggy program with 1608 test cases, among which 1538 cases passed and 70 cases failed. LIBLIT05: 12th; SOBER: 10th; BLARM

Conclusions

A general method to exploit the relationships between predicates. Compact results Better performance

Thank you!