8/23/00ISSTA Comparison of Delivered Reliability of Branch, Data Flow, and Operational Testing: A Case Study Phyllis G. Frankl Yuetang Deng Polytechnic University Brooklyn, NY
8/23/00ISSTA Outline Measures of test effectiveness Delivered reliability Experiment design Subject program Results Threats to validity Conclusions
8/23/00ISSTA Measures of Test Effectiveness Probability of detecting at least one fault [DN84,HT90,FWey93,FWei93,…] Expected number of failures during test [FWey93,CY96] Number of faults detected [HFGO94] Delivered reliability [FHLS98]
8/23/00ISSTA Select test cases Execute test cases Check results Debug program Release program Check test data adequacy OK? no yes no
8/23/00ISSTA Select test cases Execute test cases Check results Debug program Release program Estimate reliability OK? no yes
8/23/00ISSTA Delivered Reliability Captures intuition that discovery and removal of “important” faults is more crucial Evaluates testing technique according to the extent to which testing will increase reliability Introduced and studied analytically, FHLS (FSE-97, TSE-98)
8/23/00ISSTA Failures, Faults, and Failure Regions int foo(); int x,y; { s1; s2; if c1 { s3; s4; }; s5; s6; } qi = probability that input selected according to operational distribution will hit failure region i
8/23/00ISSTA Failure Rate After Testing/Debugging Reliability after testing and debugging determined by which failure regions are hit by test cases Random variable represents failure rate after testing and debugging Compare testing techniques by comparing statistics of their ’s
8/23/00ISSTA Example
8/23/00ISSTA Testing Criteria Considered Various levels of coverage of –decision coverage (branch testing) –def-use coverage (all-used data flow testing) –grouped into quartiles and deciles random testing with no coverage criterion
8/23/00ISSTA Questions Investigated How do test sets that achieve high coverage levels (of branch testing or data flow testing) compare to those achieving lower coverage, according to –Expected improvement in reliability: –Probability of reaching given reliability target:
8/23/00ISSTA Subject Program “Space” Program 10,000+ LOC C antenna design program, written by professional programmers, containing naturally occurring faults Test generator generates tests according to operational distribution [Pasquini et al] Considered 10 relatively hard-to-detect faults Failure rate:
8/23/00ISSTA Experiment Design Adapted from design used to compare probability of detecting at least one fault [Frankl, Weiss, et al.] Simulate execution of very large number of fixed-sized test sets For each, note coverage achieved (branch, data flow) and faults detected Compute density function of for various coverage-level groups
8/23/00ISSTA features Test cases Coverage matrix Fault-sets Failure rate vector Test cases faultsResults matrix Fault-sets Fault-detection matrix Coverage levels
8/23/00ISSTA Coverage Levels Considered the following groups of test sets for test sets of size 50: –highest decile of decision coverage –highest decile of def-use coverage –four quartiles of decision coverage –four quartiles of def-use coverage
8/23/00ISSTA Expected Values
8/23/00ISSTA Tail Probabilities
8/23/00ISSTA
8/23/00ISSTA
8/23/00ISSTA
8/23/00ISSTA Idealized Test Generation Strategy Select one test case from each subdomain (independently, randomly) Widely studied analytically Results in very large test sets for this subject –decision coverage: 995 –def-use coverage: 4296 Compared to large random test sets
8/23/00ISSTA Expected Values
8/23/00ISSTA Tail Probabilities
8/23/00ISSTA Threats to Validity Single program Dependence on programmers’ characterization of the faults Dependence on universe Universe based on operational distribution Single test set size (50) Accurate estimates of expected value, but less accuracy in estimates of density function
8/23/00ISSTA Conclusions Positive: –higher decision coverage yields lower expected failure rate –higher def-use coverage yields lower expected failure rate –higher coverage increases likelihood of reaching high reliability target (low failure rate target)
8/23/00ISSTA Conclusions (continued) Negative: –reliability gains with increased coverage are modest cost-effectiveness questionable economic significance of increases depends on context –no silver bullet for ultra-reliability