CS5103 Software Engineering Lecture 16 Test coverage Regression Testing.

CS5103 Software Engineering Lecture 16 Test coverage Regression Testing

2 Today’s class  Test coverage  Input combination coverage  Mutation coverage  Regression Testing  Test Prioritization  Mocking

3 Input Combination Coverage  Basic idea  Origins from the most straightforward idea  In theory, proof of 100% correctness when achieve 100% coverage in theory  In practice, on very trivial cases  Main problems  Combinations are exponential  Possible values are infinite

4 Input Combination Coverage  An example on a simple automatic sales machine  Accept only 1$ bill once and all beverages are 1$  Coke, Sprite, Juice, Water  Icy or normal temperature  Want receipt or not  All combinations = 4*2*2 = 16 combinations  Try all 16 combinations will make sure the system works correctly

5 Input Combination Coverage  Sales Machine Example Coke Sprite Juice Water Normal Icy Receipt No-Receipt Input 1 Input 2 Input 3

6 Combination Explosion  Combinations are exponential to the number of inputs  Consider an annual tax report system with 50 yes/no questions to generate a customized form for you  2 50 combinations = about 10 15 test cases  Running 1000 test case for 1 second -> 30,000 years

7 Observation  When there are many inputs, usually a relationship among inputs usually involve only a small number of inputs  The previous example: Maybe only icy coke and sprite, but receipt is independent

8 Example of Tax Report  Input 1: Family combined report or Single report  Input 2: Home loans or not  Input 3: Receive gift or not  Input 4: Age over 60 or not  …  Input 1 is related to all other inputs  Other inputs are independent of each other

9 Studies  A long term study from NIST (national institute of standardization technology)  A combination width of 4 to 6 is enough for detecting almost all errors

10 N-wise coverage  Coverage on N-wise combination of the possible values of all inputs  Example: 2-wise combinations  (coke, icy), (sprite, icy), (water, icy), (juice, icy)  (coke, normal), (sprite, normal), …  (coke, receipt), (sprite, receipt), …  (coke, no-receipt), (sprite, no-receipt), …  (icy, receipt), (normal, receipt)  (icy, no-receipt), (normal, no-receipt)  20 combinations in total  We had 16 3-wise combinations, now we have 20, get worse??

11 N-wise coverage  Note: One test case may cover multiple N-wise combinations  E.g., (Coke, Icy, Receipt) covers 3 2-wise combinations  (Coke, Icy), (Coke, Receipt), (Icy, Receipt)  100% N-wise coverage will fully cover 100% (N-1)- wise coverage, is this true?  For K Boolean inputs  Full combination coverage = 2 k combinations: exponential  Full n-wise coverage = 4*k*(k-1)* … *(k-n+1)/n! combinations: polynomial, for 2-wise combination, 2*k*(k-1)

12 N-wise coverage: Example  How many test cases for 100% 2-wise coverage of our sales machine example?  (coke, icy, receipt), covers 3 new 2-wise combinations  (sprite, icy, no-receipt), cover 3 new …  (juice, icy, receipt), covers 2 new …  (water, icy, receipt), covers 2 new …  (coke, normal, no-receipt), covers 3 new …  (sprite, normal, receipt), cover 3 new …  (juice, normal, no-receipt), covers 2 new …  (water, normal, no-receipt), covers 2 new …  8 test cases covers all 20 2-wise combinations

13 Combination Coverage in Practice  2-wise combination coverage is very widely used  Pair-wise testing  All pairs testing  Mostly used in configuration testing  Example: configuration of gcc  All lot of variables  Several options for each variable  For command line tools: add or remove an option

14 Input model  What happened if an input has infinite possible values  Integer  Float  Character  String  Note: all these are actually finite, but the possible value set is too large, so that they are deemed as infinite  Idea: map infinite values to finite value baskets (ranges)

15 Input model  Input partition  Partition the possible value set of a input to several value ranges  Transform numeric variables (integer, float, double, character) to enumerated variables  Example:  int exam_score => {less than -1}, {0, 59}, {60,69}, {70,79}, {80,89}, {90, 100}, {100+}  char c => {a, z}, {A,Z}, {0,9}, {other}

16 Input model  Feature extraction  For string and structure inputs  Split the possible value set with a certain feature  Example: String passwd => {contains space}, {no space}  It is possible to extract multiple features from one input  Example: String name => {capitalized first letter}, {not} => {contains space}, {not} => {length >10}, {2-10}, {1}, {0} One test case may cover multiple features

17 Input model  Feature extraction: structure input  A Word Binary Tree (Data at all nodes are strings)  Depth : integer -> partition {0, 1, 1+}  Number of leaves : integer -> partition {0, 1, <10, 10+}  Root: null / not  A node with only left child / not  A node with only right child / not  Null value data on any node / not  Root value: string -> further feature extraction  Value on the left most leaf: string -> further feature extraction ……

18 Input model  Infeasible feature combination?  Example: String name => {capitalized first letter}, {not} => {contains space}, {not} => {length >10}, {2-10}, {1}, {0} Length = 0 ^ contains space Length = 0 ^ capitalized first letter Length = 1 ^ contains space ^ capitalized first letter

19 Input combination coverage  Summary:  Try to cover the combination of possible values of inputs  Exponential combinations:  N-wise coverage  2-wise coverage is most popular, all pairs testing  Infinite possible values  Input partition  Input feature extraction  Coverage is usually 100% once adopted  It is easy to achieve, compared with code coverage  Models are not easy to write

20 Test coverage  So far, covering inputs and code  The final goal of testing  Find all bugs in the software  So there should be a bug coverage  The coverage represents the adequacy of a test suite  50% bug coverage = half done!  100% bug coverage = done!

21 But it is impossible  Bugs are unknown  Otherwise we do not need testing  So we have the number of bugs found, we do not know what to divide  One possible solution  Estimation  1-10 bugs in 1 KLOC  Depends on the type of software and the stage of development, imprecise  When you find many bugs, do you think all bugs are there or the code is really of low quality?

22 Mutation coverage  How can we know how many bugs there are in the code?  If only we plant those bugs!  Mutation coverage checks the adequacy of a test suite by how many human-planted bugs it can expose

23 Concepts  Mutant  A software version with planted bugs  Usually each mutant contains only one planted bug, why?  Mutant Kill  Given a test suite S and a mutant m, if there is a test case t in S, so that execute(original, t) != execute(m, t), we state that S can kill m  Basically, a test suite can kill a mutant, meaning that the test suite is able to detect the planted bug represented by the mutant

24 Illustration Test Cases Original Mutant 1 Mutant 2 Mutant n... Oracles Results sameSurvived different Killed

25 Concepts  Mutation coverage

26 Mutant generation  Traditional mutation operators  Statement deletion  Replace Boolean expression with true/false  Replace arithmetic operators (+, -, *, /, …)  Replace comparison relations (>=, ==, <=, !=)  Replace variables  …

27 Mutation Example: Operator Mutant operatorIn originalIn mutant Statement Deletionz=x*y+1; Boolean expression to true | false if (x<y)if(true) If(false) Replace arithmetic operators z=x*y+1;z=x*y-1 z=x+y-1 Replace comparison operators if(x<y)if(x<=y) if(x==y) Replace variablesz=x*y+1;z = z*y+1 z = x*x+1

28 Mutant testing tools  MILU http://www0.cs.ucl.ac.uk/staff/Y.Jia/#tools  MuJava http://cs.gmu.edu/~offutt/mujava/  Javalanche https://github.com/david-schuler/javalanche/

29 Summary on all coverage measures  Code coverage  Target: code  Adequacy: no -> 100% code coverage != no bugs  Approximation: dataflow, branch, method/statements  Preparation: none (instrumentation can be done automatically)  Overhead: low (instrumentation cause some overhead)

30 Summary on all coverage measures  Input combination coverage  Target: inputs  Adequacy: yes -> 100% input coverage == no bugs  Approximation: n-wise coverage, input partition, input feature extraction  Preparation: hard (require input modelling)  Overhead: none

31 Summary on all coverage measures  Mutation coverage  Target: bugs  Adequacy: no -> 100% mutant coverage != no bugs  Approximation: mutation is already approximation  Preparation: none (mutation and execution can be done automatically)  Overhead: very high (execution on instrumented mutated versions)

32 Regression Testing  So far  Unit testing  System testing  Test coverage  All of these are about the first round of testing  Testing is performed time to time during the software life cycle  Test cases / oracles can be reused in all rounds  Testing during the evolution phase is regression testing

33 Regression Testing  When we try to enhance the software  We may also bring in bugs  The software works yesterday, but not today, it is called “regression”  Numbers  Empirical study on eclipse 2005  11% of commits are bug-inducing  24% of fixing commits are bug-inducing

34 Regression Testing  Run old test cases on the new version of software  It will cost a lot if we run the whole suite each time  Try to save time and cost for new rounds of testing  Test Prioritization  Fake Objects

35 Test prioritization  Rank all the test cases  Run test cases according to the ranked sequence  Stop when resources are used up  How to rank test cases  To discover bugs sooner  Or approximation: to achieve higher coverage sooner

36 APFD: Measurement of Test Prioritization  Average Percentage of Fault Detected (APFD)  Compare two test case sequences  A number of faults (bugs) are detected after each test case  The following two sequences, which is better?  S1: T1 (2), t2(3), t3(5)  S2: T2(1), t1(3), t3(5)  APFD is the average of these numbers (normalized with the total number of faults), and 0 for initial state  APFD (S1) = (0/5 + 2/5 + 3/5 + 5/5) / 4 = 0.5  APFD (S2) = (0/5 + 1/5 + 3/5 + 5/5) / 4 = 0.45

37 APFD: Illustration  APFD can be deemed as the area under the TestCase-Fault curve  Consider t1(f1, f2), t2(f3), t3(f3), t4(f1, f2, f3, f4)

38 Coverage-based test case prioritization  Code coverage based  Require recorded code-coverage information in previous testing  Combination coverage based  Require input model  Mutation coverage based  Require recorded mutation-killing stats

39 Total Strategy  The simplest strategy  Always select the unselected test case that has the best coverage

40 Example  Consider code coverage on five test cases:  T1: s1, s3  T2: s2, s3, s4, s5  T3: s3, s4, s5  T4: s6, s7  T5: s3, s5, s8, s9, s10  Ranking: T5, T2, T3, T1/T4

41 Additional Strategy  An adaption of total strategy  Instead of always choosing the test case with highest coverage  Choose the test case that result in most extra coverage  Starts from the test case with highest coverage

42 Example  Consider code coverage on five test cases:  T1: s1, s3  T2: s2, s3, s4, s5  T3: s3, s4, s5  T4: s6, s7  T5: s3, s5, s8, s9, s10  Ranking: T5(5), T2(2, s2, s4) / T4(2, s6, s7), T1(1, s1), T3

43 Fake Objects  A resource waste in regression testing  We change the code a little bit  We need to run all the unchanged code in the test execution  Using fake objects  For all/some of the unchanged modules  Do not run the modules  Use the results of previous test instead

44 Fake Objects  Example  Testing an expert system for finance  Has two components, UI and interest calculator (based on the inputs from UI)  In first round of testing, store as a map the results of interest calculator: (a, b) -> 5%, (a, c) -> 10%, (d, e) -> 7.7%  In regression testing, if the change is made on UI, you can rerun the software with the data map  Using more fake objects means saving more time in regression testing, should we mock every object???

45 Pros & Cons  Pros  Saving time in regression testing  Cons  Be careful when mocking non-deterministic components  E.g., mocking getSystemTime(), may conflict with another call  Spend a lot of time for recording data maps  Stored data map can be too huge  When the mocked object is changed, the data map requires updates

46 Selection of faking modules  Rules  Using fake objects for time consuming modules  So that you save more time  The fake module should be stable  E.g., libraries  The interface should contain a small data flow  E.g., numeric inputs and return values

47 Fake objects  Fake objects are not just useful for regression testing  They are also useful for  UI Components  Internet Components  Components that will affect real world  Sending an email  Transfer money from credit cards

48 Next class  Debugging  Test coverage based bug localization  Delta debugging

49 Thanks!

CS5103 Software Engineering Lecture 16 Test coverage Regression Testing.

Similar presentations

Presentation on theme: "CS5103 Software Engineering Lecture 16 Test coverage Regression Testing."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS5103 Software Engineering Lecture 16 Test coverage Regression Testing.

Similar presentations

Presentation on theme: "CS5103 Software Engineering Lecture 16 Test coverage Regression Testing."— Presentation transcript:

Similar presentations

About project

Feedback