An Empirical Evaluation of the MuJava Mutation Operators

An Empirical Evaluation of the MuJava Mutation Operators
Ben H Smith Laurie Williams Good morning, I’m Ben Smith and I will be telling you about empirically evaluating the operators distributed with MuJava. We conducted an empirical study across 15 Java classes to assess several things about the operators distributed with the MuJava tool and about mutation testing in practice. 1

Agenda Objectives Experimental Procedure Results Summary & Questions 2

Objectives Determine efficacy of mutation testing
Compare Mutation Score and Line Coverage Record scores with each mutant killed Effectiveness of Operators Classify each mutant Assess Utility Examine Untested Lines Group and Characterize Study Continued… ESEM Paper in works… I will give more details on our procedure and classification schemes later on, so bear with me. First, we wanted to determine if high mutation score yields a high line coverage. To figure this out we recorded the line coverage after attempting to kill each mutant at random. Next, we wanted to gather information on which mutation operators are most effective at producing new test cases. To determine this, we classified each mutant based on its behavior and used the results to rank operators (and groups of them) by the number of new test cases they produced. Finally, we wanted to determine why there are untested lines at the end of our procedure. So we characterized each untested line by the section of code it exists in and looked at the aggregate results. 3

Agenda Objectives Experimental Procedure Results Summary & Questions 4

MuClipse In March 2003, Ma, Kwon and Offut released the early versions of MuJava. MuJava helps the developer perform mutation testing by 1) generating and compiling mutants for the operators you see here and 2) executing chosen test cases against these mutants. MuClipse is an Eclipse plug-in which ports the functionality of MuJava to the popular IDE. MuClipse integrates both the generation of and testing of mutants to Eclipse. Additionally, MuClipse enables the developer to test mutants with jUnit test cases and includes some increases in efficiency. 5

Procedure & Classifications
Mutation testing consists of three phases: Mutation, Test Execution and Inspection. In the mutation phase, operators are used to generate multiple instances of the implementation code with slight alterations, mutants, meant to emulate programmer errors. In the test execution phase, these mutants are executed against a test suite and their output is compared. If the output from the test case with the original code is different than the output of the test case with the mutated code, the mutant is said to be Killed. If a mutant dies during the first test execution, we call it DOA. All living mutants proceed to the inspection phase. In the inspection phase, we attempt to add test cases to our test suite to kill the remaining living mutants, one at a time, recording line and branch coverage with each newly killed mutant. If we successfully produce different outputs with new tests for a given mutant, we call it “Killed.” If a mutant dies when we are trying to kill a different mutant, we call it “crossfire.” Finally, if a mutant cannot be killed, we call it “stubborn.” 6

Classifications DOA: dies with original test set
Killed: produced a new test case Traditionally meant “different outputs” Crossfire: dies while killing other mutants Stubborn: cannot be killed (false positive?) 7

Size of Study Project Java Class LoC A B C D iTrust - 21 Class
edu.ncsu.csc.itrust.Auth 280 299 278 edu.ncsu.csc.itrust.Demographics 628 544 828 540 edu.ncsu.edu.itrust.Transactions 123 120 133 183 Tested 1031 963 1239 1001 Overall 2295 2636 3448 2867 Html Parser Class org.htmlparser.Attribute 263 org.htmlparser.Parser 318 org.htmlparser.beans.StringBean 365 946 21801 We took 15 Java classes through the mutation testing procedure. Originally we used two instances of iTrust, an open source web healthcare application. The three classes used for performing the web logic for the application from two teams, called A & B as shown here. Since the publication of this paper, we have added two more iTrust instances and three major classes from the open source project HtmlParser. These classes were chosen to match the sizes and uses of the iTrust classes. 8

Agenda Objectives Experimental Procedure Results Summary & Questions
Implementation Code Operators Summary & Questions 9

Scores Across Iterations
Let’s talk about line coverage. This graph demonstrates a typical recording of line coverage, branch coverage and mutation score at each iteration. When all is flat, the developer is examining stubborn mutants. When it has large jumps in line coverage, the developer has killed a mutant in a method or code segment which has never been tested. When line and branch coverage remain consistent but mutation score trends upwards, typically this means mutants are being killed which all have occurred on the same line. 10

Increase in Scores on Average (n=15)
Line Branch Mutation Start 64% 72% 58% End 82% 88% 87% Increase 18% 16% 29% This chart shows what happened on average for the 15 classes on which we performed mutation testing. Line coverage and branch coverage both went up by about 17% and mutation score by about 29%. If you look at this as a ratio, at least every two percent increase in mutation score yields a one percent increase in line coverage. 11

Untested Lines 12 Team Class catch if/else return body A Auth 32 9 3
Demographics 28 50 2 69 Transactions 11 4 B 5 39 27 C 33 1 42 22 12 D 24 25 13 HP Parser 29 16 Attribute StringBean 18 15 Totals 314 182 116 This table shows that most instances of untested lines of code were in a Java catch block. We left out those lines of code which were in a try block because most everything in the iTrust implementation code was in a try block. Exception operators were not included in the distribution of MuJava which MuClipse was developed from. The lack of these Exception operators comes out in this column. The high values under the other columns can be explained by stubborn mutants on those lines. Since we can’t write a test to get to these mutants, they are left alone and those lines remain untested. 12

Implementation Code Operators Summary & Questions 13

Classifications (whole study)
Moving on to the classification of mutants. Each operator generated a different number of mutants and each group of mutants had different proportions of Stubborn, Killed, Crossfire and DOA results. This graph shows how the mutants for each operator did for the whole of the 15 classes we tested. But what does this mean? How do we make a comparison between the EAM operator (point) which produced over 500 mutants and the OAN operator, which produced only 17? n = 3350 14

Equations We specify a Utility function, which ranges from negative one to one, negative one meaning that none of the mutants produced resulted in a test case and positive one meaning that all of the mutants produced resulted in a test case. DOA and Crossfire mutants were given a positive [useful] role, but not as strong because they are dependent on initial test suite and the order which the mutants are encountered, respectively. Crossfire mutants, especially, demonstrate mutation density, which we define here as mutants per line of code. Logical operators and control logic are affected by having more than one mutation on an if statement, but most instances we saw of these were not useful. 15

Sample Calculation Stubborn Killed DOA Crossfire Total Utility AOIS 17
24 50 32 123 % .14 .19 .41 .26 X Coefficient -.14 .20 .13 0.38 16

Operator Utility (Overall)
Operators scoring positively: 74% Operators scoring above 0.5: 26% No operator scored a positive 1. Overall Utility of all operators: ~0.342 These numbers show that in general, operators are useful and therefore mutation testing is useful in terms of producing new test cases. 17

Operator Utility Rankings
Feature (example) iTrust Html Parser A B C D Arithmetic (+) 8 6 5 7 Relational (==) 4 Conditional (||) 1 2 3 Shift (<<) 10 Logical (&) Assignment Short-cut (+=) Encapsulation Inheritance 9 Polymorphism Overloading Java-Specific (e.g. static keyword) Inter-Object The end result varied from project to project. This table shows the rankings of each type of operator as demonstrated on the left. The “call letters” of these operators is unimportant right now. The Utility for each was calculated and then they were ranked from highest to lowest. If two or more received the same score, they received the same ranking. As shown, the first [click], second [click] and third [click] are different with each team’s project [click]. 18

This graph demonstrates the relationship between Utility and Mutation Density. These data are from each iTrust class across all teams and then from the classes of HtmlParser. If we throw out this first point [show it], the peak utility for mutation testing is about 0.8 mutants per line of code. 19

Alert Density = False Positives?
PMD Density Mutation Density Utility StringBean 0.151 0.688 0.410 Auth* 0.168 0.486 0.289 Demographics* 0.192 0.574 0.298 Parser 0.255 0.214 Transactions* 0.336 1.188 0.406 Attribute 0.365 1.357 0.396 20

21

Summary Every 2% mutation score ~1% Line Coverage
Operator Type v. Utility Rank: No relationship Operators have an overall positive effect Lower density => Higher utility Untested Lines result from No operator Stubborn mutant In sum, our study shows that there is a 62% ratio of line coverage to mutation score increases, that no type of operator is better for any particular implementation code, lower mutation density results in higher mutant utility and that our untest lines of code result either from no operator for that line or a stubborn mutant on that line. 22

Questions? Feedback & Comments Welcome! 23

An Empirical Evaluation of the MuJava Mutation Operators

Similar presentations

Presentation on theme: "An Empirical Evaluation of the MuJava Mutation Operators"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An Empirical Evaluation of the MuJava Mutation Operators

Similar presentations

Presentation on theme: "An Empirical Evaluation of the MuJava Mutation Operators"— Presentation transcript:

Similar presentations

About project

Feedback