A HUMAN STUDY OF FAULT LOCALIZATION ACCURACY Zachary P. Fry Westley Weimer University of Virginia September 16, 2010.

A HUMAN STUDY OF FAULT LOCALIZATION ACCURACY Zachary P. Fry Westley Weimer University of Virginia September 16, 2010

SOFTWARE MAINTENANCE Maintenance can account for the majority of the software lifecycle Locating defects in code is a considerable challenge What if we knew how easy it was to locate faults in a code base beforehand? Engineer systems to make bug finding easier Concentrate on problem areas Could we develop a model that measures this? How would we gather a data set? 2

PROBLEM – FAULT LOCALIZATION We treat fault localization as the task of determining if a program or code fragment contains a defect and, if so, locating the line where that defect resides Research question: Which factors contribute to a human’s ability to detect and locate defects? 3

PROBLEM – FAULT LOCALIZATION We examine four categories of defect and code characteristics Error type Surface and syntactical features Control flow and contextual features Abstraction Which of these affect humans’ abilities to locate defects in code? 4

OUTLINE Motivation Structure of Model Human Study Evaluation of Model Conclusions 5

MOTIVATION: AN EXAMPLE 6 /** Move a single disk from src to dest. */ public static void hanoi1(int src, int dest){ System.out.println(src + " => " + dest); } /** Move two disks from src to dest, making use of a spare peg. */ public static void hanoi2(int src, int dest, int spare) { hanoi1(src, dest); System.out.println(src + " => " + dest); hanoi1(spare, dest); } /** Move three disks from src to dest, making use of a spare peg. */ public static void hanoi3(int src, int dest, int spare) { hanoi2(src, spare, dest); System.out.println(src + " => " + dest); hanoi2(spare, dest, src); } hanoi1(src, spare); 33% of participants correctly located the defect

TOWERS OF HANOI – VERSION 2 More complex control flow if/else statement recursion Rich commenting Descriptive identifiers 53% of participant correctly located the fault 7 /******************************************* Performs the initial call to moveTower to solve the puzzle. Moves the disks from tower 1 to tower 3 using tower 2. ********************************************/ public void solve () { moveTower (totalDisks, 1, 3, 2); } /******************************************* Moves the specified number of disks from one tower to another by moving a subtower of n-1 disks out of the way, moving one disk, then moving the subtower back. Base case of 1 disk. ********************************************/ private void moveTower (int numDisks, int start, int end, int temp) { if (numDisks == 1) moveTower(numDisks-1, temp, end, start); else { moveTower (numDisks-1, start, temp, end); moveOneDisk (start, end); moveTower (numDisks-1, temp, end, start); } /******************************************* Prints instructions to move one disk from the specified start tower to the specified end tower. *******************************************/ private void moveOneDisk (int start, int end) { System.out.println ("Move one disk from " + start + " to " + end); } moveOneDisk (start, end);

MODEL – OVERVIEW We desire a model of human fault localization accuracy that, given source code as input, can predict the likelihood that a human will be able to accurately locate faults within it We hypothesize that features relevant to such a model will fall into four categories: fault type, syntax, context, and abstraction Existing work tends to focus on only one of these areas at a time Linear regression – trained on human study data Ease of analysis 8

DEFECT FEATURES Error type Adapted and expanded existing Knight taxonomy Sampled from consecutive Mozilla bugs to obtain types and distribution We consider 17 total types of single-line defects 9 Missing statement Uninitialized variable Extra assignment Incorrect type Incorrect constant Incorrect parameter Negated conditional Incorrect method call Incorrect variable …

MODEL – CODE FEATURES Code based features Most measured automatically, some manually 92 total 10 Syntax Block nesting level Number of method calls Num of local vars Num of var declarations Num of var uses Avg line length … Context Avg/Max CFG in-edges Avg/Max CFG out-edges Avg CFG path length Num of CFG edges Num of CFG leaves Ratio of “ifs” to “elses” … Abstraction Num of array-based structures Uses underlying data structure Implements a heap Implements a tree Implements reheap …

HUMAN STUDY – PARTICIPANT SELECTION 215 fourth year students and volunteers from the internet (crowdsourcing) Monetary reward given for completion to encourage best effort 11 SubsetAverage Accuracy Number of Participants All46.3%65 Accuracy > 40%55.2%46 Experience >4 years51.5%34 Experience = 4 years46.7%17 Experience < 4 years33.4%14

HUMAN STUDY – CODE SELECTION Five textbooks Three sets of code features to vary or control: Syntax and Surface Control flow and Contextual Abstraction Provides similar concepts but differing presentations and/or implementations 45 Java files total 12

HUMAN STUDY – FAULT SEEDING Types and distribution based on Mozilla All faults selected are limited to one line for simplicity Random seeding Zero or one bugs per file Type chosen based on distribution All possible sites enumerated and one is randomly chosen Fault seeded manually, based on actual bugs if possible 20 line search-space windows To further control for code length and facilitate quick and accurate response Randomly chosen around the seeded fault location 13

HUMAN STUDY - PROTOCOL Each participant sees 30 consecutive files and is asked: Is there a bug in this code? If so, on what line does the bug occur? How difficult do you feel this code is to understand (1-5)? Participants cannot execute or automatically search the code – only manual inspection is permitted 14

EVALUATION Three separate experiments 1. Examines defect type as related to fault localization accuracy Are certain bugs harder to find? 2. Examines Syntactical, Contextual, and Abstraction features as related to fault localization accuracy Does our model correlate with actual human ability to locate faults better than existing baselines? 3. Analysis of individual features What features contribute the most towards humans’ ability to locate defects in source code? 15

EVALUATION – EXPERIMENT 1 Goal: relate fault type to fault localization accuracy 16

EVALUATION – EXPERIMENT 2 Goal: measure accuracy of our model’s ability to predict ease of human fault localization Two version of our model All features vs. only those that are measured automatically Baselines Code readability (syntactic and surface features) Cyclomatic complexity (contextual features) “Textbook difficulty” (chapter number in the textbook) 10-fold cross validation to mitigate over-fitting 17

EVALUATION – EXPERIMENT 2 Our model greatly outperforms the baselines Automatic-only model does only slightly worse than the full model 18

EVALUATION – EXPERIMENT 2 Perceived difficulty is a concrete measure of understandability Fault localization accuracy is correlated with understandability While baselines do comparably better, our model correlates in a similar fashion 19

EVALUATION – FEATURE ANALYSIS ANOVA of features with respect to human accuracy 20 (type) - FeatureFPr(F)Dir abs – uses abstraction: array130.9< 0.001- abs – provides abstraction: queue54.1< 0.001+ syn – ratio of constant to variable assignments40.4< 0.001+ syn – avg block nesting level38.9< 0.001- abs – provides abstraction: heap28.3< 0.001+ syn – max global variables25.6< 0.001+ abs – uses abstraction: linked list25.6< 0.001- syn – ratio simple to constant conditional20.6< 0.001- cfg – max CFG out-edges per node10.00.002- cfg – avg CFG in-edges per node5.80.016+ …

CONCLUSION We present a human study of 65 participants based on concrete fault localization tasks We analyze the effect that the type of defects has on humans’ ability to locate faults Based on the source code, we analyze the correlation of surface, control flow, and abstract features on humans’ ability to locate faults We present a model of human fault localization accuracy based on these features that correlates with human accuracy at least four times more than corresponding baselines 21

Questions? 22

A HUMAN STUDY OF FAULT LOCALIZATION ACCURACY Zachary P. Fry Westley Weimer University of Virginia September 16, 2010.

Similar presentations

Presentation on theme: "A HUMAN STUDY OF FAULT LOCALIZATION ACCURACY Zachary P. Fry Westley Weimer University of Virginia September 16, 2010."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A HUMAN STUDY OF FAULT LOCALIZATION ACCURACY Zachary P. Fry Westley Weimer University of Virginia September 16, 2010.

Similar presentations

Presentation on theme: "A HUMAN STUDY OF FAULT LOCALIZATION ACCURACY Zachary P. Fry Westley Weimer University of Virginia September 16, 2010."— Presentation transcript:

Similar presentations

About project

Feedback