Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Similar presentations


Presentation on theme: "CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)"— Presentation transcript:

1 CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

2 A Very Important Principle  Traditional debugging techniques deal with single (or very few) executions.  With the acquisition of a large set of executions, including passing and failing executions, statistical debugging is often highly effective. Failure reporting In house testing

3 Tarantula (ASE 2005, ISSTA 2007)

4 Scalable Remote Bug Isolation (PLDI 2004, 2005)  Look at predicates Branches Function returns ( 0, >=0, ==0, !=0) Scalar pairs  For each assignment x=…, find all variables y_i and constants c_j, each pair of x (=,<,<=…) y_i/c_j  Sample the predicate evaluations (Bernoulli sampling) Investigate the relation of the probability of a predicate being true with the bug manifestion.

5 Bug Isolation

6 How much does P being true increase the probability of failure over simply reaching the line P is sampled.

7 An Example  Symptoms 563 lines of C code 130 out of 5542 test cases fail to give correct outputs No crashes  The predicate are evaluated to both true and false in one execution void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if (m >= 0){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if ((m >= 0) && (lastm != m) ){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } not enough

8 void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if ((m >= 0) && (lastm != m) ){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } P_f (A) = tilde P (A | A & !B) P_t (A) = tilde P (A | !(A&!B))

9 Program Predicates  A predicate is a proposition about any program properties e.g., idx 0 … Each can be evaluated multiple times during one execution Every evaluation gives either true or false  Therefore, a predicate is simply a boolean random variable, which encodes program executions from a particular aspect.

10 Evaluation Bias of Predicate P  Evaluation bias Def’n: the probability of being evaluated as true within one execution Maximum likelihood estimation: Number of true evaluations over the total number of evaluations in one run Each run gives one observation of evaluation bias for predicate P  Suppose we have n correct and m incorrect executions, for any predicate P, we end up with An observation sequence for correct runs  S_p = (X’_1, X’_2, …, X’_n) An observation sequence for incorrect runs  S_f = (X_1, X_2, …, X_m)  Can we infer whether P is suspicious based on S_p and S_f?

11 Underlying Populations  Imagine the underlying distribution of evaluation bias for correct and incorrect executions are and  S_p and S_f can be viewed as a random sample from the underlying populations respectively  One major heuristic is The larger the divergence between and, the more relevant the predicate P is to the bug 01 Prob Evaluation bias 01 Prob Evaluation bias

12 Major Challenges  No knowledge of the closed forms of both distributions  Usually, we do not have sufficient incorrect executions to estimate reliably. 01 Prob Evaluation bias 01 Prob Evaluation bias

13 Our Approach

14 Algorithm Outputs  A ranked list of program predicates w.r.t. the bug relevance score s(P) Higher-ranked predicates are regarded more relevant to the bug  What’s the use? Top-ranked predicates suggest the possible buggy regions Several predicates may point to the same region … …

15 Outline  Program Predicates  Predicate Rankings  Experimental Results  Case Study: bc-1.06  Future Work  Conclusions

16 Experiment Results  Localization quality metric Software bug benchmark Quantitative metric  Related works Cause Transition (CT), [CZ05] Statistical Debugging, [LN+05]  Performance comparisons

17 Bug Benchmark  Bug benchmark Dreaming benchmark  Large number of known bugs on large-scale programs with adequate test suite Siemens Program Suite  130 variants of 7 subject programs, each of 100-600 LOC  130 known bugs in total  mainly logic (or semantic) bugs Advantages  Known bugs, thus judgments are objective  Large number of bugs, thus comparative study is statistically significant. Disadvantages  Small-scaled subject programs  State-of-the-art performance, so far claimed in literature, Cause-transition approach, [CZ05]

18 Localization Quality Metric [RR03]

19 1st Example 1 2 3 5 4 9 6 10 8 7 T-score = 70%

20 2nd Example 1 2 3 74 9 6 10 5 T-score = 20% 8

21 Related Works  Cause Transition (CT) approach [CZ05] A variant of delta debugging [Z02]delta debugging Previous state-of-the-art performance holder on Siemens suite Published in ICSE’05, May 15, 2005 Cons: it relies on memory abnormality, hence its performance is restricted.  Statistical Debugging (Liblit05) [LN+05] Predicate ranking based on discriminant analysis Published in PLDI’05, June 12, 2005 Cons: Ignores evaluation patterns of predicates within each execution

22 Localized bugs w.r.t. Examined Code

23 Cumulative Effects w.r.t. Code Examination

24 Top-k Selection  Regardless of specific selection of k, both Liblit05 and SOBER are better than CT, the current state-of-the-art holder  From k=2 to 10, SOBER is better than Liblit05 consistently

25 Outline  Evaluation Bias of Predicates  Predicate Rankings  Experimental Results  Case Study: bc-1.06  Future Work  Conclusions

26 Case Study: bc 1.06  bc 1.06 14288 LOC An arbitrary-precision calculator shipped with most distributions of Unix/Linux  Two bugs were localized One was reported by Liblit in [LN+05] One was not reported previously  Some lights on scalability

27 Outline  Evaluation Bias of Predicates  Predicate Rankings  Experimental Results  Case Study: bc-1.06  Future Work  Conclusions

28 Future Work  Further leverage the localization quality  Robustness to sampling  Torture on large-scale programs to confirm its scalability to code size  …

29 Conclusions  We devised a principled statistical method for bug localization.  No parameter setting hassles  It handles both crashing and noncrashing bugs.  Best quality so far.

30 Discussion  Features Easy implementation Difficult experimentation More advanced statistical technique may not be necessary Go wide, not go deep…  Predicates are treated as independent random variables.  Can execution indexing help?  Can statistical principles be combined with slicing or IWIH ?


Download ppt "CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)"

Similar presentations


Ads by Google