Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predict Failures with Developer Networks and Social Network Analysis

Similar presentations


Presentation on theme: "Predict Failures with Developer Networks and Social Network Analysis"— Presentation transcript:

1 Predict Failures with Developer Networks and Social Network Analysis
Andrew Meneely et al.

2 Introduction Research Question Importance Research Goal
Predict failures at the file level Importance Dramatically decrease fixing cost Research Goal Examine human factors in failure prediction by applying social network analysis to code churn information

3 Introduction (cont.) Method Case study
introduce file-based metrics based on SNA as additional predictors of software failures Case study a mature Nortel product (over 3 million LOC) get models using failure data from 2 releases, validated against a subsequent release in 20% files, one model: 58%, optimal: 61% a significant correlation exists between file-based developer network metrics and failures

4 Definitions of Network Metrics
Node, Connection, Path Geodesic path (social distance): shortest path between 2 nodes Diameter: Longest geodesic path Connectivity: measure direct connections Degree: number of connections on a node Hub (a “well-known” developer): degree is above a threshold Disconnected: a node has no edges

5 Network Metrics (cont.)
Centrality: quantify how closely nodes are indirectly connected to the rest of network Closeness: the average distance from v to any other node in the network that can be reached from v Betweenness: the number of geodesic paths that include v divided by the total number of geodesic paths in the network

6 Get Developer Network Metrics
Step 1 Initial code churn information Step 2 Construct developer social network Step 3 Compute developer-based metrics Step 4 Compute file-based metrics

7 Step 1: code churn information

8 Step 2: developer social network

9 Step 3: developer-based metrics

10 Step 4: file-based metrics

11 Independent and Dependent Variables
Independent Variables Dependent Variables the number of system test failures for a file the number of post-release failures for a file

12 Model selection and validation
Find best combination of variables and a regression Training set and validation set Candidate regression Number of failures for a given file: Negative binomial regression and Poisson regression Probability that a file had at least one failure: Logistic regression

13 Step One: Initial model selection
Determine Combinations of candidate variables Transformation of variables Candidate regressions Weights of variables Evaluated by Goodness-of-fit statistics (training error) calculated in SAS v9.1 using proc genmod

14 Step Two: Final model selection
Cross-validation Training partition and validation partition Catch over-fit models Spearman rank correlation coefficient The two models with the highest average correlation coefficient and the lowest standard deviation become our final models to be validated

15 Step Three: Model validation
Evaluated against the validation set Two criteria Spearman rank correlation coefficient between the estimated values and the observed values Examine the difference between our predicted prioritization and an optimal prioritization

16 Step Four: Further Analysis
Evaluate how well the model works Compared to SLOC model Compare the model with a model containing only code churn metrics and not network metrics, and vice versa Assess network metrics as an early indicator Investigate possible latent factors

17 Case study An industrial product at Nortel Networks
2,500 files of (11,000 files, 3.17 million LOC) System Testing Model (step 1) negative binomial regression Degree was positively correlated with failures Closeness was negatively correlated The actual beta-weights are not included Cross-validation (step 2) Spearman rank correlation coefficients for the system test model was 0.778 60.5% of the variance was explained

18

19 Model validation (step 3)
Next release

20 Model validation (cont.)
rate of actual discovery of failures by the Nortel system test team

21 Compared with other models
Model selection and validation

22 Model as an Early Indicator
Use our model early in development perform our analysis of ten-fold cross-validation using data from only the first half of the development time during release Rn+1 average Spearman rank of with standard deviation of 0.02, with all correlation coefficients significant (p<0.01)

23 Conclusion Our model performed significantly well in prioritizing files based on predicted failures. developer networks are useful for failure prediction early in the development phase and provide a useful abstraction of the code churn data

24 Thank you!


Download ppt "Predict Failures with Developer Networks and Social Network Analysis"

Similar presentations


Ads by Google