Presentation is loading. Please wait.

Presentation is loading. Please wait.

WuKong: Automatically Detecting and Localizing Bugs that Manifest at Large System Scales Bowen ZhouJonathan Too Milind KulkarniSaurabh Bagchi Purdue University.

Similar presentations


Presentation on theme: "WuKong: Automatically Detecting and Localizing Bugs that Manifest at Large System Scales Bowen ZhouJonathan Too Milind KulkarniSaurabh Bagchi Purdue University."— Presentation transcript:

1 WuKong: Automatically Detecting and Localizing Bugs that Manifest at Large System Scales Bowen ZhouJonathan Too Milind KulkarniSaurabh Bagchi Purdue University

2 Ever Changing Behavior of Software Software has to be adaptive to accommodate for different platforms, inputs and configurations. As a side effect, manifestation of a bug may depend on a particular platform, input or configuration. 2

3 Ever Changing Behavior of Software 3

4 Software Development Process 4 Develop a new feature and its unit tests Test the new feature on a local machine Push the feature into productoin systems Break production systems Roll back the feature Not tested in production systems!!!

5 Bugs in Production Run Properties – Remains unnoticed when the application is tested on developer's workstation – Breaks production system when the application is running on a cluster and/or serving real user requests Examples – Configuration Error – Integer Overflow 5

6 Bugs in Production Run Properties – Remains unnoticed when the application is tested on developer's workstation – Breaks production system when the application is running on a cluster and/or serving real user requests Examples – Configuration Error – Integer Overflow Scale-Dependent Bugs 6

7 Modeling Program Behavior for Finding Bugs Dubbed as Statistical Debugging [Bronevetsky DSN ‘10] [Mirgorodskiy SC ’06] [Chilimbi ICSE ‘09] [Liblit PLDI ‘03] – Represents program behavior as a set of features that can be measured in runtime – Builds a model to describe and predict the features based on data collected from many runs – Detects abnormal features that deviate from the model's prediction beyond a certain threshold 7

8 Modeling Program Behavior for Finding Bugs Dubbed as Statistical Debugging [Bronevetsky DSN ‘10] [Mirgorodskiy SC ’06] [Chilimbi ICSE ‘09] [Liblit PLDI ‘03] – Represents program behavior as a set of features that can be measured in runtime – Builds a model to describe and predict the features based on data collected from many runs – Detects abnormal features that deviate from the model's prediction beyond a certain threshold 8 Does not account for scale-induced variation in program behavior

9 Modeling Scale-dependent Behavior 9 RUN # # OF TIMES LOOP EXECUTES Is there a bug in one of the production runs? Training runsProduction runs

10 Modeling Scale-dependent Behavior 10 SCALE # OF TIMES LOOP EXECUTES Training runsProduction runs Accounting for scale makes trends clear, errors at large scales obvious

11 Modeling Scale-dependent Behavior Our Previous Research – Vrisha [HPDC '11] Builds a collective model for all features of a program to detect bugs at any feature – Abhranta [HotDep '12] Tweaks Vrisha's model to allow per-feature bug detection and localization 11

12 Modeling Scale-dependent Behavior Our Previous Efforts – Vrisha [HPDC '11] Builds a collective model for all features of a program to detect bugs at any feature – Abhranta [HotDep '12] Tweaks Vrisha's model to allow per-feature bug detection and localization 12 They have limitations...

13 Modeling Scale-dependent Behavior Big gap in scale – e.g. training runs on up to 128 nodes, production runs on 1024 nodes Noisy features – Too many false positives render the model useless 13

14 Reconstructing Scale-dependent Behavior: the WuKong way Covers a wide range of program features Predicts the expected value in a large-scale run for each feature separately Prunes unpredictable features to improve localization quality Provides a shortlist of suspicious features in its localization roadmap 14

15 The Workflow 15 APP PIN RUN 1 APP PIN RUN 3 APP PIN RUN 2 APP PIN RUN 4 APP PIN RUN N... SCALE FEATURE RUN 1 SCALE FEATURE RUN 3 SCALE FEATURE RUN 2 SCALE FEATURE RUN 4 SCALE FEATURE RUN N... SCALE FEATURE MODEL SCALE FEATURE Production Training = ?

16 Feature Collection 16

17 Features considered by WuKong void foo(int a) { if (a > 0) { } else { } if (a > 100) { int i = 0; while (i < a) { if (i % 2 == 0) { } ++i; } 17

18 Features considered by WuKong void foo(int a) { 1:if (a > 0) { } else { } 2:if (a > 100) { int i = 0; 3:while (i < a) { 4:if (i % 2 == 0) { } ++i; } 18 2 1 3 4

19 Modeling 19

20 Predict Feature from Scale X ~ vector of scale parameters X 1...X N Y ~ number of times a particular feature occurs The model to predict Y from X: Compute the prediction error: 20

21 Predict Feature from Scale X ~ vector of scale parameters X 1...X N Y ~ number of times a particular feature occurs The model to predict Y from X: Compute the prediction error: 21

22 Bug Localization 22

23 Locate Buggy Features First, we need to know if the production run is buggy, by doing detection as follows: If there is a bug in this run, we can start looking at the prediction error of each feature: – Rank all features by their prediction error to provide a localization roadmap that contains the top N features 23 Error of feature i in the production run Constant parameterMax error of feature i in all training runs

24 Improve Localization Quality by Feature Pruning 24

25 Noisy Feature Pruning Some features cannot be effectively predicted by the above model – Random – Not scale-determined – Discontinuous The trade-off – Keep those feature would pollute the diagnosis by pushing real faults down the list – Remove these features could miss some faults if the faults happens to be in such features 25

26 Noisy Feature Pruning How to remove them? For each feature: 1.Do a cross validation with training runs 2.Remove the feature if it triggers greater-than- 100% prediction error in more than (100-x)% of training runs Parameter x > 0 is for tolerating outliers in training runs 26

27 Evaluation Fault injection in Sequoia AMG2006 – Up to 1024 processes – Randomly selected conditionals to be flipped Two case studies – Integer overflow in a MPI library – Deadlock in a P2P file sharing application 27

28 Evaluation Fault injection in Sequoia AMG2006 – Up to 1024 processes – Randomly selected conditionals to be flipped Two case studies – Integer overflow in a MPI library – Deadlock in a P2P file sharing application 28

29 Fault Injection Study Fault – Injected at process 0 – Randomly pick a feature to flip Data – Training (w/o fault): 110 runs, 8-128 processes – Production (w/ fault): 100 runs, 1024 processes 29

30 Fault Injection Study Result – Total100 – Noncrashing57 – Detected53 – Located49 30 Successful Localized: 92.5%

31 Evaluation Fault injection in Sequoia AMG2006 – Up to 1024 processes – Randomly selected conditionals to be flipped Two case studies – Integer overflow in a MPI library – Deadlock in a P2P file sharing application 31

32 Evaluation Fault injection in Sequoia AMG2006 – Up to 1024 processes – Randomly selected conditionals to be flipped Two case studies – Integer overflow in a MPI library – Deadlock in a P2P file sharing application 32

33 Case Study: A Deadlock in Transmission’s DHT Implemenation 33

34 Case Study: A Deadlock in Transmission’s DHT Implemenation 34

35 Case Study: A Deadlock in Transmission’s DHT Implemenation 35 Feature 53, 66

36 Conclusion Debugging scale-dependent program behavior is a difficult and important problem WuKong incorporates scale of run into a predictive model for each individual program feature for accurate bug diagnosis We demonstrated the effectiveness of WuKong through a large-scale fault injection study and two case studies of real bugs 36

37 Q&A bzhou@purdue.edu 37

38 Backup 38

39 Runtime Overhead 39 Geometric Mean: 11.4%


Download ppt "WuKong: Automatically Detecting and Localizing Bugs that Manifest at Large System Scales Bowen ZhouJonathan Too Milind KulkarniSaurabh Bagchi Purdue University."

Similar presentations


Ads by Google