Precise Condition Synthesis for Program Repair

Slides:



Advertisements
Similar presentations
Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.
Advertisements

Chapter 7 User-Defined Methods. Chapter Objectives  Understand how methods are used in Java programming  Learn about standard (predefined) methods and.
C++ Sets and Multisets Set containers automatically sort their elements automatically. Multisets allow duplication of elements whereas sets do not. Usually,
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Algorithms and Problem Solving-1 Algorithms and Problem Solving.
Algorithms and Problem Solving. Learn about problem solving skills Explore the algorithmic approach for problem solving Learn about algorithm development.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Lecturer: Dr. AJ Bieszczad Chapter 76-1 Software engineering standards Standards for you Standards for others Matching design with implementation.
Register Allocation and Spilling via Graph Coloring G. J. Chaitin IBM Research, 1982.
Improving Automatic Abbreviation Expansion within Source Code to Aid in Program Search Tools Zak Fry.
Impact Analysis of Database Schema Changes Andy Maule, Wolfgang Emmerich and David S. Rosenblum London Software Systems Dept. of Computer Science, University.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
Operator Precedence First the contents of all parentheses are evaluated beginning with the innermost set of parenthesis. Second all multiplications, divisions,
Graph Data Management Lab, School of Computer Science gdm.fudan.edu.cn XMLSnippet: A Coding Assistant for XML Configuration Snippet.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Mining and Analysis of Control Structure Variant Clones Guo Qiao.
Querying Structured Text in an XML Database By Xuemei Luo.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University How to extract.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Chapter 6 Programming Languages (1) Introduction to CS 1 st Semester, 2015 Sanghyun Park.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Presented by: Ashgan Fararooy Referenced Papers and Related Work on:
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Algorithms & FlowchartsLecture 10. Algorithm’s CONCEPT.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Xusheng Xiao North Carolina State University CSC 720 Project Presentation 1.
Automatically detecting and describing high level actions within methods Presented by: Gayani Samaraweera.
How Can I Use This Method? 2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING HOW.
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Chapter 1: Preliminaries Lecture # 2. Chapter 1: Preliminaries Reasons for Studying Concepts of Programming Languages Programming Domains Language Evaluation.
Test Case Purification for Improving Fault Localization presented by Taehoon Kwak SoftWare Testing & Verification Group Jifeng Xuan, Martin Monperrus [FSE’14]
Lecture 3: More Java Basics Michael Hsu CSULA. Recall From Lecture Two  Write a basic program in Java  The process of writing, compiling, and running.
Andy Nguyen Christopher Piech Jonathan Huang Leonidas Guibas. Stanford University.
Software Testing and QA Theory and Practice (Chapter 5: Data Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
Arrays Chapter 7.
Code Learning and Transfer for Automatic Patch Generation
Secure Coding Rules for C++ Copyright © 2016 Curt Hill
Advanced Algorithms Analysis and Design
Algorithms and Problem Solving
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
John D. McGregor Session 9 Testing Vocabulary
Structured Browsing for Unstructured Text
Towards Trustworthy Program Repair
Secure Coding Rules for C++ Copyright © Curt Hill
Program comprehension during Software maintenance and evolution Armeliese von Mayrhauser , A. Marie Vans Colorado State University Summary By- Fardina.
2011/11/20: Lecture 15 CMSC 104, Section 4 Richard Chang
Data Structures (CS212D) Overview & Review.
Introduction to javadoc
Lecture 12: Data Wrangling
7 Arrays.
Chapter 6 Methods: A Deeper Look
Objective of This Course
MSIS 655 Advanced Business Applications Programming
Test Case Purification for Improving Fault Localization
Data Integration for Relational Web
Data Mining Chapter 6 Search Engines
Masatomo Hashimoto Akira Mori Tomonori Izumida
Object Oriented Programming in java
Defining Classes and Methods
Algorithms and Problem Solving
7 Arrays.
Using Automated Program Repair for Evaluating the Effectiveness of
Automatically Diagnosing and Repairing Error Handling Bugs in C
(presentor: jee-weon Jung)
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Arrays.
Shin Hwei Tan, Hiroaki Yoshida, Mukul R. Prasad, Abhik Roychoudhury
Software Testing and QA Theory and Practice (Chapter 5: Data Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
Presentation transcript:

Precise Condition Synthesis for Program Repair Y. Xiong at el. International Conference on Software Engineering 2017 Presentor : Jee-weon Jung

Index Overview Introduction Motivating example Approach Evaluation

Overview Object : produce PRECISE patches Concentrate on condition synthesis Study three approaches and finally propose ACS Know what variables should exist in a ‘if’ condition  use sorting method based on dependency relations between variables Propose document analysis technique on API documents  filter variables Mine a set of frequently used predicates Successfully repaired 18 defects on 4 projects of Defects4J Best performance reported, precision was 78.3% (others usually 40%)

Introduction (1) Automatic program repair techniques generate patch that match specification Test suite is mostly used as specification Patch is considered ‘plausible’ if it passes all tests Concept of precision is adtopted in plausible patches Precision is VERY important Low precision patches decrease efficiency Precision of well known programs such as GenProg is about 4% Reason : precise patch is sparse in the space of repair systems

Introduction (2) Rank patches to achieve high precision Studied widely (Prophet, HistoricalFix, DirectFix, …) This paper proposes more accurate ranking criteria for condition synthesis Condition synthesis : insert / modify ‘if’ condition One of most strongest techniques Proposed method decompose condition synthesis Variable selection : decide which variable to use Predicate selection : decide what to do

Introduction (3) Example : ‘if (a > 10)’ First select variable ‘a’  select predicate ‘>10’ Use proposed 3 techniques for ranking variables and predicates Dependency-based ordering : more recent dependency showing variable is more likely to be used in condition Document analysis : analyze Javadoc comments in the src code Predicate mining : extract from existing projects  sort by frequency

Motivating example (1) Math99 defect in Defects4J dataset 2 test cases given ‘a=1,b=50’ / ‘a=Integer.MIN_VALUE,b=1’ Many plausible conditions exist  most are not precise Proposed ACS addes lines 2~4 Synthesize the condition at line 2

Motivating example (2) Dependency-based ordering Document Analysis In the previous example: ‘lcm’ depends on ‘a’ and ‘b’  ‘lcm’ is more likely to be used Use dependency in ordering Document Analysis Mnay java methods come with Javadoc comments  assist condition synthesis using comments Consider ‘@throws’ tag in the document Analyze the subject of sentence Consider only the mentioned variables when making guard condition

Motivating example (3) Predicate Mining Use variable type, name as context Ex : ‘hour’ frequently used with >24, <12, … Ex : ‘factorial’ frequently used with <21 (20! Is the largest factorial representable with 64bit integer

Approach (1) - overview Use 2 types of templates Directly return the oracle Identify the last executed statement s in the failed test  insert one of if (c) return v / if (c) throw e before the statement Modification of an existing condition Locate potential faulty ‘if’ condition c  apply either widening (c  c || c) / narrowing (c  c && c) c : synthesized condition Use SBFL & predicate switching for locating potential faulty condition

Approach (2) – extracting the oracle Three types Constant : directly copy Specified via XXXException.class annotation : throw an instance Function mapping the test input to the output : complicated.. First perform backward slicing from the oracle expression perform backward slicing from the test input arguments

Approach (3) – variable ranking 3 steps : preparing candidate  filtering by document analysis  sort using dependency-based ordering Preparing candidate variable Consider 4 types : local variable / method parameter / ‘this’ pointer / expressions used in other ‘if’ conditions in the current method Sorting by dependency Create a dependency graph between variables Node : variable / dege : dependency relation

Approach (4) – predicate ranking Mining related conditions Use variable type, name, and method name to decide context Two word is considered similar when word decomposition by capital letter has common word Variable name is considered meaningful when length >=2 Synthesize a condition c with variable x in the method m A conditional expression c0 is considered to be in a similar context of c, if (1) it contains one variable x0, (2) x0 has the same type as x, (3) the name of x0 is similar to x when the name of x is meaningful, or the name of the method surrounding c0 is similar to m when the name of x is not meaningful.

Approach (5) – counting predicates Using collected similar expressions, extracted used predicates Choose among only predefined predicates only Expanding search space often leads to incorrect patches Syntactically differenct predicates may semantically be the same Apply ‘pred’ function to extract multiset  sort by frequencies  heuristically select top 20

Evaluation (1) Implementation Dataset Based on the source ocde of Nopol Use fault localization library Gzoltar Apache OpenNLP is exploited Dataset Top five most starred Java projects on GitHub (2016/7/15) Four projects from Defects4J

Evaluation (2) Research questions RQ1: how do the three ranking techniques perform on ranking variables and predicates? RQ2: how does our approach perform on real world defects? RQ3: how does our approach compare with existing approaches? RQ4: to what extent does each component of our approach contribute to the overall performance?

Evaluation (3) RQ1: performance of the three techniques

Evaluation (4) RQ1: performance of the three techniques

Evaluation (5) RQ1: performance of the three techniques

Evaluation (6) RQ2: performance of ACS

Evaluation (7) RQ3: comparison with existing approaches

Evaluation (8) RQ4: detailed analysis of the components

Conclusion Study refined ranking techniques for condition synthesis Achieve high precision and reasonable recall on Defects4J