Download presentation
Presentation is loading. Please wait.
Published byFrank Harvey Modified over 5 years ago
1
Finding Similar Failures Using Callstack Similarity
Kevin Bartz, Harvard University, Jack W. Stokes, John C. Platt, Ryan Kivet, David Grant, Silviu Calinoiu and Gretchen Loihle, Microsoft Corporation, Redmond, WA Presented by: Sandeep Kumar Dhankar Dept of Computer & Information Sciences University of Delaware
2
OUTLINE Introduction Approach Similarity Classifier Model Results
Conclusion
3
INTRODUCTION REMEMBER THIS?
4
WHAT NEXT? Problem 1 (assigned) Problem 2 (assigned) Problem 3 (assigned)
5
Problems with this approach
Similar problems encountered Duplication of efforts Wrong prioritization of tasks
6
Solution? Try to find and group similar problems
Treat callstack as string and apply string matching techniques
7
CallStack
8
Edit distance Number of insertions, deletions,
modifications required to convert one string to another string E.g. Tried and Tired 2 modifications are required in tried to convert into tired so edit distance is 2
9
Data used 1 million Failure reports to windows error reporting system collected over 90 day period Type of failure– crash, hang, deadlock Name of the causing process Exception code Offending callstack
10
Training set
11
Model Parameters Features Defined over pair of failures
12
Features cont…
13
Model parameters Dependence on Callstack edit distance
14
Callstack edit distance penalty parameters
15
Model P(Sim|β, ϒ, X) = g-1(α + βET1{ET1 = ET2}+ βPN1{PN1 = PN2}+
βEC1{EC1 = EC2}+ βCSEditDistance(CS1,CS2; ϒ)) Where g-1(x) = e-x / (1+e-x) , the inverse logit function
16
Model variations Full model Reduced model Further reduced model
γInsSame = γInsNew and γDelSame = γDelLast Further reduced model γSubMod = γSubFunc = γSuboffset Baseline model with untuned edit distance
17
Computation The edit distance computation dominates the time requirement of computation Consider only those failures whose first three callstack frame matches Returns under 3000 such failures Model applied to them
18
Estimated coefficients for the model
19
Results
20
Result cont… Full Model works with 90% precision to identify similar failures on recall Baseline model with 50% precision on recall
21
Conclusion about paper
Good ability to recover similar failures being shown by model Total computation times not exactly given for edit distance comparisons for the data set Initial failure classification for training data based on tags by developer is not standard thing to use Only first three frames checked for match in fast global search
22
QUESTIONS?
23
THANKS
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.