Download presentation

Presentation is loading. Please wait.

Published byPreston Barker Modified over 2 years ago

1
of 17 Assessing the Influence of Multiple Test Case Selection on Mutation Experiments Marcio E. Delamaro and Jeff Offutt George Mason University & Universidade de São Paulo USA & Brazil www.cs.gmu.edu/~offutt/offutt@gmu.edudelamaro@icmc.usp.brwww.icmc.usp.br/pessoas/delamaro/

2
of 17 A Recent Experimental Procedure Mutation 2014© Delamaro & Offutt2 PM T Add tests until MS = 100 Creating a “universe” of tests Create mutants

3
of 17 Experimental Procedure Mutation 2014© Delamaro & Offutt3 T P M op 1 M op2 M op75 T op1 T op2 T op75 M M M “Only one test set? Not good enough!”

4
of 17 Additional Test Sets Mutation 2014© Delamaro & Offutt4 T P M op 1 M op2 M op75 T op1 -1 T op2 -1 T op75 -1 M M M T op1 -2 T op1 -i T op1 -N T op2 -i T op2 -N T op75 -i T op75 -N

5
of 17 Multiple Test Sets Mutation 2014© Delamaro & Offutt5 Perceived Benefit Individual test sets may vary in Individual test sets may vary in effectiveness because of the specific values effectiveness because of the specific values Generating N test sets may overcome that Generating N test sets may overcome that variance variance 1) How many test sets are needed? 2) Does additional test sets really help? Does reality match perception?

6
of 17 Answering the Question Mutation 2014© Delamaro & Offutt6 We decided to answer this question by measuring the performance of each of 10 sets of tests and studying their variances

7
of 17 Experimental Setup Subjects : 39 C programs –One to 20 functions ( 189 total ) –7 to 390 LOC ( 2853 total ) Mutation 2014© Delamaro & Offutt7 Mutation tool : Proteum –104 to 11,100 mutants ( 66,480 total ) –We used mutation score as a proxy for effectiveness Tests : Hand-constructed test sets to kill all non-equivalent mutants ( the test universe U ) –5 to 142 tests ( 814 total ) –Equivalence determined by hand –3 to 2062 equivalent mutants ( 7829 total )

8
of 17 Collecting Data Mutation 2014© Delamaro & Offutt8 For each program : 1. Generated statement deletion (SSDL) mutants 2. Created 10 sets of tests to kill all SSDL mutants All tests taken from the universe U Tests picked in random order from U 3. Measured size of each test set 4. Computed MS of each test set on all mutants 5. Collected statistics of distribution and central tendency for each test set mean, median, min, max, standard deviation

9
of 17 Research Questions Mutation 2014© Delamaro & Offutt9 RQ1 : How different are different SSDL- adequate test sets in terms of mutation score ? RQ2 : How different are different SSDL- adequate test sets in terms of cost (number of tests) ?

10
of 17 Biggest and Smallest Mutation 2014© Delamaro & Offutt10 ProgramLOCSDMS: Max – Min P49.0904.2971 P199.0901.2448 P3810.0707.1892 P22349.0042.0148 P28390.0034.0112 P3156.0029.0097 Average73.15.0071.0245 Largest and smallest spreads in mutation scores of SSDL-adequate tests over all mutants

11
of 17 Program Size vs. Spread Is the spread correlated with the program size ? Spearman rank correlation is used to compare two series of numbers for correlation –1 or -1 means they are perfectly correlated –0 means no correlation Mutation 2014© Delamaro & Offutt11 LOC and SD : -.65 LOC and Max-Min : -.63 Good news for experimentalists … Creating 10 test sets for a 10 line program is easy. Creating 10 test sets for a 1000 line program is impractical ! Strong correlations

12
of 17 Average Spread Mutation 2014© Delamaro & Offutt12 StatValues Average Minimum.9093 Average Maximum.9338 MS : Max-Min.0245 SD.0071 One-way ANOVA No statistical differences among means

13
of 17 Threats to Validity Representativeness of programs –Different sources, different domains Size of programs –Most studies of this nature are related to unit testing –Large programs would be impractical Manual steps –Constructing the universe of tests –Identifying equivalent mutants A single comparison point—SSDL mutation –Other criteria could be used We used 10 sets –Would results be different with 5 or 100? Mutation 2014© Delamaro & Offutt13

14
of 17Conclusions Mutation 2014© Delamaro & Offutt14 Previous researchers assumed selecting only one adequate test set could interfere with results So created multiple test sets But this assumption was made without evidence !!

15
of 17 Key Findings Mutation 2014© Delamaro & Offutt15 We found significant differences among different test sets For some programs, but not all Differences statistically disappeared when averaged over all 39 programs Differences were less with larger programs

16
of 17Recommendations Mutation 2014© Delamaro & Offutt16 If only a few, small subjects are used, use multiple test sets If many or larger subjects are used, don’t bother

17
of 17Contact Mutation 2014© Delamaro & Offutt17 Jeff Offutt offutt@gmu.eduhttp://cs.gmu.edu/~offutt/ Marcio Delamaro delamaro@icmc.usp.brhttp://www.icmc.usp.br/pessoas/delamaro/

Similar presentations

OK

KNR 445 Statistics t-tests Slide 1 Variability Measures of dispersion or spread 1.

KNR 445 Statistics t-tests Slide 1 Variability Measures of dispersion or spread 1.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google