Presentation on theme: "Effectively Prioritizing Tests in Development Environment"— Presentation transcript:
1 Effectively Prioritizing Tests in Development Environment Hi, I am Jay Thiagarajan. I work in a group called Programmers Productivity Research Center at Microsoft ResearchToday, I will be presenting work done by Amitabh Srivastava and myself while trying to develop a test case prioritization system for use in large scale development environment.15 secondsAmitabh SrivastavaJay ThiagarajanPPRC, Microsoft Research
2 Outline Context and Motivation Test Prioritization Systems Measurements and ResultsSummaryLet me give a outline of the talk first. I will describe the scenarios we tried to attack by this system. Then I will describe the existing test prioritization systems and why we diverged from them to use a simple heuristic algorithm. I will also present some results on MS product binaries and conclude my talk with a Question/Answer session.15 secondsThe talk is divided into 4 parts. Why we did this system ? What we did ? How well we did ?Why ? What ? How well we did ?
3 Motivation: Scenarios Pre-checkin testsDevelopers before check-inBuild Verification tests (BVT)Build system after each buildRegression testsTesting team after each buildHot fixes for critical bugsFor the sake of simplicity, let us assume a simple model of development process. Developers write the code depending on specification and check the code into the source control system. Before checking the code, they run a few tests to make sure that their changes does not keep the program from being built and also to catch a moderate number of defects.The build system, which builds the program (daily or after every check-in), runs a few verification tests after the program is built.If the tests pass thru, then the build is passed to the testing team which takes responsibility to verify the validity of the build thru regression testing. Even during full testing it is advantageous to detect the defects as early as possible, e.g. on day 1 rather than day 21Swat Teams which respond to a customer problems by responding with hot-fixes also do testingTo address these scenarios effectively, developers and testers must be able to run the rights tests at the right time. New defects recently introduced into the system are most likely to be from recent code changes. It is effective to focus testing efforts on parts of program affected by changes.(90 seconds)Most important motto is to Catch the bugs as early as possible. Day 1 than day 21.
4 Using program changes Source code differencing S. Elbaum, A. Malishevsky & G. Rothermel “Test case prioritization: A family of empirical studies”, Feb. 2002S. Elbaum, A. Malishevsky & G. Rothermel “Prioritizing test cases for regression testing” Aug. 2000F. Vokolos & P. Frankl, “Pythia: a regression test selection tool based on text differencing”, May 1997Program changes have been used for test selection and for test prioritization. Techniques used were source code differencing, data and control flow analysis, and coarse grained code entities to identify which parts of program might be affected by the changes.10 seconds
5 Using program changes Data and control flow analysis Code entities T. Ball, “On the limit of control flow analysis for regression test selection” Mar. 1998G. Rothermel and M.J. Harrold, “A Safe, Efficient Regression Test Selection Technique” Apr. 1997Code entitiesY. F. Chen, D.S. Rosenblum and K.P. Vo “TestTube: A System for Selective Regression Testing” May 199410 seconds
6 Analysis of various techniques Source code differencingSimple and fastCan be built using commonly available tools like “diff”Simple renaming of variable will trip offWill fail when macro definition changesTo avoid these pitfalls, static analysis is neededData and control flow analysisFlow analysis is difficult in languages like C/C++ with pointers, casts and aliasingInterprocedural data flow techniques are extremely expensive and difficult to implement in complex environmentProblems : speed, different parsers for different languages.Solutions: working on a binary30 seconds
7 Our Solution Focus on change from previous version Determine change at very fine granularity – basic block/instructionOperates on binary codeEasier to integrate in production environmentScales well to compute results in minutesSimple heuristic algorithm to predict which part of code is impacted by the change30 secondsReal world application. Time & effectiveness is critical. 1.8 million lines in 210 secondsAll new bugs are due to new changes.
8 Test Effectiveness Infrastructure …TESTCoverage ToolsRepositoryCoverageMagellanOld BuildNew BuildTest PrioritizationECHELONBinary DiffThis slide here explains the infrastructure we leverage to build this tool. We have a system called Magellan, which is a coverage collection and storage repository. As shown here testers conduct their tests on various systems and the coverage data is collected in a repository. Echelon uses the existing coverage data collected in the repository and uses the change information between two builds to list a prioritized set of tests30 secondsBMAT/VULCANCoverage Impact Analysis
9 Echelon : Test Prioritization Old BuildNew BuildBlock Change AnalysisMagellanRepository(*link with symbol server for symbols)Binary DifferencesCoverage Impact AnalysisCoverage for new build15 secondsTest PrioritizationLeverage what hasalready been testedPrioritized list of test cases
10 Block Change Analysis: Binary Matching Old BuildNew BuildNew BlocksOld Blocks(not changed)(changed)15 secondsAccuracy and heuristic based.BMAT – Binary Matching [Wang, Pierce and McFarling JILP 2000]
11 Coverage Impact Analysis TerminologyTrace: collection of one or more test casesImpacted Blocks: old modified and new blocksCompute the coverage of traces for the new buildCoverage for old (unchanged and modified) blocks are same as the coverage for the old buildCoverage for new nodes requires more analysis15 seconds
12 Coverage Impact Analysis Predecessor Blocks (P)A Trace may cover a newblock N if it covers at least onePredecessor block and at leastone Successor BlockInterproceduraledgeNew Block (N)If P or S is a new block, thenits Predecessors or successorsare used (iterative process)30 secondsSuccessor Blocks (S)
13 Coverage Impact Analysis Limitations - New node may not be executedIf there is a path from successor to predecessorIf there are changes in control path due to data changes15 seconds
14 Echelon : Test Case Prioritization Detects minimal sets of test cases that are likely to cover the impacted blocks (old changed and new blocks)Input is traces (test cases) and a set of impacted blocksUses a greedy iterative algorithm for test selection15 seconds
15 Echelon: Test Selection Impacted Block MapSet 1T1WeightsT1T2T3T4T5T252413241311Set 2T3T5Set 330 secondsWeight is the number of impacted blocks each trace coversT4Denotes that a trace T covers the impacted block
16 Echelon: Test Selection Output Ordered List of TracesEach set contains testcases that will givemaximum coverage ofImpacted nodesTrace T1SET1Trace T2Trace T3Gracefully handles the“main” modification caseTrace T4SET2Trace T515 secondsTalk about optionsTrace T7If all the test can be run,tests should be run in thisorder to maximize thechances of detectingfailures earlySET3.Trace T8.Trace TmSETn
17 Analysis of results Three measurements of interest How many sequences of tests were formed ?How effective is the algorithm in practice ?How accurate is the algorithm in practice ?The first measure gives us a sense of how many tests remain in each sequences and so on.The second measure tries to figure out the accuracy of the heuristic algorithm in practice.The basic premise behind this is Echelon has to run fast and be fairly accurate (more than 80% accurate to be useful in practice)15 secondsThe user only cares about 1st and 2nd measure.
18 Details of BinaryE Version 1 Version 2 Date 12/11/2000 01/29/2001 Functions31,02031,026Blocks668,068668,274Arcs1,097,2941,097,650File size8,880,128PDB size22,602,75222,651,904Impacted Blocks378 (220 N, 158 OC)Number of Traces3128# Source Lines~1.8 Million10 secondsEchelon takes ~210 seconds for this 8MB binary
22 Effectiveness of Echelon Important Measure of effectiveness is early defect detectionMeasured % of defects vs. % of unique defects in each sequenceUnique defects are defects not detected by the previous sequenceTo measure the effectiveness, we took some sample binaries from development process. We measured the percentage of defects detected by each sequences. We also measured the percentage of unique defects detected by each sequence. Unique defects are defects not detected by the previous sequence15 seconds
23 Effectiveness of Echelon Binary 410 kb, functions =2308 blocks 22446, 157 tests, 148 sequences (sequence 1 – 6 tests, 2-4 tests, 3 -2 tests, 4 – 1 test …)First sequence 81% of defects, second sequence, 62%, 6% unique. Rest of the sequences did not detect any unique defects15 seconds
24 Effectiveness of Echelon Binary 400 kb, functions = 3675 blocks 23153, 221 tests, 176 sequences (sequence 1 – 4 tests, 2-4 tests, 3 -3 tests, 4 – 5 tests …)First sequence 85% of defects, second sequence 71%, 9% unique. Third sequence 57% of defects, 4% unique, The rest did not detect any unique defects. None of the tests detected the remaining 2%of the defects.15 seconds
25 Blocks predicted hit that were not hit Limitations - New node may not be executedIf there is a path from successor to predecessorIf there are changes in control path due to data changes10 secondsBlocks predicted hit that were not hit
26 Blocks predicted not hit that were actually hit 10 secondsBlocks predicted not hit that were actually hit(Blocks were target of indirect calls are being predicted as not hit)
29 SummaryBinary based test prioritization approach can effectively prioritize tests in large scale development environmentSimple heuristic with program change in fine granularity works well in practiceCurrently integrated into Microsoft Development process
30 Coverage Impact Analysis Echelon provides a number of optionsControl branch predictionIndirect calls : if N is target of an indirect call a trace needs to cover at least one of its successor blockFuture improvements include heuristic branch predictionBranch Prediction for Free [Ball, Larus]15 seconds
31 Additional Motivation And of course to attend ISSTA and meet some great people
32 Echelon: Test Selection OptionsCalculations of weights can be extended, e.g. traces with great historical fault detection can be given additional weightsInclude time each test takes into calculationPrint changed (modified or new) source code that may not be covered by any tracePrint all source code lines that may not be covered by any trace15 seconds