Adaptive Random Test Case Prioritization

Adaptive Random Test Case Prioritization
Speaker: Bo Jiang* Co-authors: Zhenyu Zhang*, W.K.Chan†, T.H.Tse* *The University of Hong Kong †City University of Hong Kong

Contents Background Motivation
Adaptive Random Test Case Prioritization Experiments and Results Analysis Related Works Conclusion & Future work

Regression Testing Techniques
Test Suite T Obsolete Test Case Elimination Program P Test Suite T’ Accounts for 50% of the cost of software maintenance. Test Case Reduction Test Case Augmentation Test Case Selection Test Case Prioritization Obsolete test case elimination Permanently remove some test cases --- Test case reduction Select a subset of test case for exection --- Test case selection Reorder the test cases for execution ---Test case prioritization Add new test cases to test new feature ---Test case augmentation Test Suite T’ Test Suite T’ Test Suite T’ Test Suite T Program P’

Test Case Prioritization
Definition Test case prioritization permutes a test suite T for execution to meet a chosen testing goal. Typical testing goals Rate of code coverage Rate of fault detection Rate of requirement coverage Merits No impact on the fault detection ability 随机测试’深’ 而不’宽’.

Coverage-based Test Case Prioritization Technique
Total-statement/function/branch Highest code coverage first Resolve tie-case randomly Additional-statement/function/branch Additional highest code coverage first Reset when no more coverage can be achieved Disadvantages Hard to scale to larger programs

Problem With Total Techniques
GREP FLEX APFD Elbaum et TSE 2002

Problem With Total(greedy) Techniques
GREP FLEX APFD Total strategy may NOT be effective for real-life program Elbaum et TSE 2002

Problems with Additional Techniques
1 2 3 4 5 6 10 15 20 25 30 35 40 45 Time Used for Prioritization Random Siemens Random Unix Additional Siemens Additional Unix Total Siemens Total Unix

1 2 3 4 5 6 10 15 20 25 30 35 40 45 Time Used for Prioritization Random Siemens Additional Techniques may NOT be efficient for real-life programs. Random Unix Additional Siemens Additional Unix Total Siemens Total Unix

1 2 3 4 5 6 10 15 20 25 30 35 40 45 Time Used for Prioritization Random Siemens Can we find a prioritization techniques that is both effective and efficient for real life program? Random Unix Additional Siemens Additional Unix Total Siemens Total Unix

Adaptive Random Testing (ART)
A technique for test case generation Evenly spread randomly generated test cases across the input domain. In empirical study, ART can detect failures using up to 50% fewer test cases than random testing.

Fixed-Sized-Candidate-Set ART Algorithm
Random generate a test case and execute it.

Randomly generate a set of candidate test cases.

For each candidate test case, find its nearest neighbor within the executed test cases.

Select the test case which has longest distance with its nearest neighbor and execute it.

Randomly generate a set of candidate test cases.

For each candidate test case, find its nearest neighbor within the executed test cases.

Select the test case which has longest distance with its nearest neighbor and execute it.

Repeat until a failure is encountered. X

Adaptive Random Testing (ART)
ART is based on the observation that failure turned to cluster across the input domain. Intuitively, evenly spread the test case may increase the probability of exposing the first fault faster. In test case prioritization, we also want to increase the rate of fault detection.

Use ART directly for test case prioritization?
The variety of black-box input information makes it hard to define a general distance metric. Video streams Images Xml … The white-box coverage information of the previously executed test cases are readily available Statement coverage Branch coverage Function coverage And…

Distribution of Failures in Profile Space on LilyPond
William Dickinson et FSE, 2001.

MDS Display of Distribution of Failures in Profile Space on LilyPond
Failures tend to cluster together. William Dickinson et FSE, 2001.

MDS Display of Distribution of Failures in Profile Space on GCC
William Dickinson et FSE, 2001.

Distribution of Failures in Profile Space on GCC
Failures tend to cluster together. William Dickinson et FSE, 2001.

Use ART directly for test case prioritization?
Why NOT use such low-cost white-box information to evenly spread test cases across the code coverage space? The variety of black-box input information makes it hard to define a uniform distance metric. Video streams Images Xml … The white-box coverage information of the previously executed test cases are readily available Statement coverage Branch coverage Function coverage

Generate candidate set Random select a test case into the candidate set If code coverage improve, continue; Otherwise, stop. Merits: No magic number, non-parametric Select the farthest candidate from the prioritized set Distance between test cases Distance between a candidate test case and the already prioritized test cases Repeat until all test cases are prioritized

How to measure the distance of test cases Jaccard Distance General distance metric for binary data Can also use other distance metric for substitution. How to select the test case from the candidate set that is farthest away from the already prioritized test cases? Maximize the minimum distance (maxmin for short) Chen et ASIAN '04, LNCS 2004 Maximize the average distance (maxavg for short) Ciupa et ICSE 2008 Maximize the maximum distance (maxmax for short)

Adaptive Random Test Case Prioritization Experiments and Results Analysis Related Works Conclusion & Future Work

Research Questions Do different levels of coverage information have significant impact on ART techniques? Do different definitions of test set distances have significant impacts on ART techniques? Are ART techniques efficient?

Subject Programs Subject No. of Faulty Versions LOC Test Pool Size tcas 41 133–137 1608 schedule 9 291–294 2650 schedule2 10 261–263 2710 tot_info 23 272–274 1052 print_tokens 7 341–342 4130 print_tokens2 350–354 4115 replace 32 508–515 5542 flex 21 8571–10124 567 grep 17 8053–9089 809 gzip 55 4081–5159 217 sed 4756–9289 370

Techniques Studied in the Paper
Group Name Descriptions Random random Random prioritization Level of Coverage Info. Total total-st statement total-fn function total-br branch Additional addtl-st addtl-fn addtl-br ART Test Set Distance (f2) ART-fn-maxmin Function Maximize minimum distance ART-fn-maxavg Maximize average distance ART-fn-maxmax Maximize maximum distance ART-br-maxmin Branch ART-br-maxavg ART-br-maxmax ART-st-maxmin Statement ART-st-maxavg ART-st-maxmax

Experiment Setup Dynamic coverage information collection
gcov tool Effectiveness Metric APFD: weighted average of the percentage of faults detected over the life of the suite Process For each of the 11 subject programs, randomly select 20 test suite, and repeat 50 times for each ART techniques.

Research Questions Do different levels of coverage information have significant impact on ART techniques? Do different definitions of test set distances have significant impacts on ART techniques? Are ART techniques efficient?

Do different levels of coverage information have significant impact on ART techniques?
Fix the other variable: definitions of test set distances. Perform multiple comparison between each pair of coverage information and gather the statistics.

As confirmed by previous research: Branch > Statement > Function
Do different levels of coverage information have significant impact on ART techniques? Fix the other variable: definitions of test set distances. Perform multiple comparison between each pair of coverage information and gather the statistics. As confirmed by previous research: Branch > Statement > Function

Research Questions Do different levels of coverage information have significant impact on ART techniques? Branch > Statement > Function Do different definitions of test set distances have significant impacts on ART techniques? Is ART techniques efficient?

The Impact of Test Set Distance
Fix the other variable: definitions of coverage information Perform multiple comparison between each pair of test set distance and gather the statistics.

The Impact of Test Set Distance
Fix the other variable: definitions of coverage information Perform multiple comparison between each pair of test set distance and gather the statistics. Max-Min > Max-Avg ≈ Max-Max

ART-br-maxmin is the best ART prioritization Technique
Best ART Technique ART-br-maxmin is the best ART prioritization Technique

Research Questions Do different levels of coverage information have significant impact on ART techniques? Branch > Statement > Function Do different definitions of test set distances have significant impacts on ART techniques? Max-Min > Max-Avg > Max-Max How does ART-br-maxmin compare with greedy? Is ART techniques efficient?

Multiple Comparisons for ART-br-maxmin on Siemens

Multiple Comparisons for ART-br-maxmin on Siemens
Only maginal difference difference between ART-br-maxmin and traditional coverage-based techniques, and it is not statistical significant.

Multiple Comparisons for ART-br-maxmin on UNIX

Multiple Comparisons for ART-br-maxmin on UNIX
Only maginal difference difference between ART-br-maxmin and traditional coverage-based techniques, and it is not statistically significant.

Research Questions Do different levels of coverage information have significant impact on ART techniques? Branch > Statement > Function Do different definitions of test set distances have significant impacts on ART techniques? Max-Min > Max-Avg > Max-Max How does ART-br-maxmin compare with greedy? ART-br-maxmin ≈ Additional > Total Is ART techniques efficient?

Time Cost Analysis across All Programs
1 2 3 4 5 10 15 20 25 Time Random Additional Total ART

Time Cost Analysis across All Programs
1 2 3 4 5 10 15 20 25 Time (s) Random ART << Additional ART ≈ Total Additional Total ART

Research Questions Do different levels of coverage information have significant impact on ART techniques? Branch > Statement > Function Do different definitions of test set distances have significant impacts on ART techniques? Max-Min > Max-Avg > Max-Max Is there a best ART technique? ART-br-maxmin ART ≈ Additional > Total Is ART techniques efficient? YES (<<Additional, ≈Total)

Contents Background Motivating Example

Related Works Greedy Techniques for Test Case Prioritization
Rothermel et ICSM 1999, S. Elbaum et TSE’02. Greedy Algorithms ART Seminal Paper Chen et ASIAN '04, LNCS 2004 ART techniques can improve the effectiveness of random test case selection by 40%-50% Theoretical Aspects of ART Techniques Chen et ACM TOSEM 17, 3, 2008. No technique can improve the effectiveness of random test case selection by more than 50%.

Related Works ART for Object-Oriented Software
Ciupa et ICSE 2008 Define the metric for measuring object distance ARTOO is faster to find fault Detect faults not found by directed random. Profile Guided Test Case Generation Dickinson et FSE, 2001. Study the how failure is distributed in profile space in real software Improve test case generation by perusing failure regions

Contents Background Motivating Example

Conclusion Adaptive Random Test Case Prioritization can be much more effective than random prioritization. There is marginal difference in effectiveness between ART-br-maxmin and additional greedy techniques (but not statistically significant), yet ART- br-maxmin is much more efficient. Compared to the total technique, ART-br-maxmin is more effective on real-life program but slightly less efficient.

Future Work Are there any better metrics to measure test case distance? Improve greedy techniques by using ART to resolve tie cases. Extend the ART prioritization techniques to the testing of concurrent programs and other domain specific techniques.

Comments are welcome!

Adaptive Random Test Case Prioritization

Similar presentations

Presentation on theme: "Adaptive Random Test Case Prioritization"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Adaptive Random Test Case Prioritization

Similar presentations

Presentation on theme: "Adaptive Random Test Case Prioritization"— Presentation transcript:

Similar presentations

About project

Feedback