Presentation is loading. Please wait.

Presentation is loading. Please wait.

CLUSTERING SUPPORT FOR FAULT PREDICTION IN SOFTWARE Maria La Becca Dipartimento di Matematica e Informatica, University of Basilicata, Potenza, Italy

Similar presentations


Presentation on theme: "CLUSTERING SUPPORT FOR FAULT PREDICTION IN SOFTWARE Maria La Becca Dipartimento di Matematica e Informatica, University of Basilicata, Potenza, Italy"— Presentation transcript:

1 CLUSTERING SUPPORT FOR FAULT PREDICTION IN SOFTWARE Maria La Becca Dipartimento di Matematica e Informatica, University of Basilicata, Potenza, Italy marialabecca@gmail.com

2 INTRODUCTION Fault Prediction Approaches Testing Refactoring SW Quality Fault Predictors Process Metrics Product Metrics Component || Package Level LEXICAL & STRUCTURAL INFORMATION LEXICAL & STRUCTURAL INFORMATION NEW SW CLUSTERING APPROACH GOAL FAULT PREDICTION MODELS Cluster Level Predictor

3 Software Clustering Approach – Steps : SOFTWARE CLUSTERING 1 - CORPUS CREATION OO SW System 4 - COMPUTING SIMILARITIES Identifiers & Comment Terms D 2 - CORPUS NORMALIZATION.. Corpus D1 D2 Dn Terms Splitting Identifiers Special Token Elimination Stop Word Removal Stemming 3 - CORPUS INDEXING Vector Space Model (VSM) Di Terms Term by Document Matrix 5 - EXTRACTING DEPENDENCIES JRipples 6 - CLUSTERING Structural Lexical G’ = (V, E, ω) BorderFlow Algorithm Lexically Similar Structurally Dependent

4 Fault Prediction Models FAULT PREDICTION MODELS Classes Lexically Similar Structurally Dependent Product Metrics Multivariate Linear Regression Logistic Regression

5 Definition and Context CASE STUDY SW Clustering Approache Fault Prediction Models Cluster Granularity Level RQ – Does the cluster level approach improve fault prediction as compared with the baseline (i.e., class and package level) ? Baseline Approache (Class & Package) Fault Prediction Models Class & Package Granularity Level Metrics SWLR - LGR SW SystemReleases 3.2 - 4.0 - 4.1 - 4.2 - 4.3 1.4 – 1.5 – 1.6 -1.7 2.2 – 2.4 1.5 - 2.0 – 2.5 – 3.0 15 Release SW Metrics & Fault SW Metrics & Fault Source Code &

6 CASE STUDY Planning Fault Prediction Previous Knowledge OO SW System X.0X.1 INTER X.0 X.1 INTRA Empiric Evaluation Training Set Test Set Dependent Variables (Name) Definition ClassFaultThe number of faults in the classes BinaryClassFault Indicates whether or not faults are present in a class. ClusterFaultThe number of faults in the clusters. BinaryClusterFault Indicates whether or not faults are present in a cluster. PackageFault The number of faults in the packages. BinaryPackageFault Indicates whether or not faults are present in a package. Selected Variables Indipendent Variables (Name) Definition WMC (Weighted Methods for Class) It indicates the number of methods (assuming unity weights for all methods). DIT (Depth of Inheritance Tree) It provides a measure of the inheritance levels from the object hierarchy top. NOC (Number Of Children) It measures the number of immediate descendants of the class. CBO (Coupling Between Object classes) It represents the number of classes coupled to a given class. RFC (Response For Class) It measures the number of methods that can be executed when an object of that class receives a message. LCOM (Lack of Cohesion in Methods) It counts the methods in a class that are not related through the sharing of some of the class fields. NPM (Number of Public Methods) It counts all the methods in a class that are declared as public. LOC (Lines Of Code) It is the number of instructions in each method of the class

7 CASE STUDY Intra-Release K-Rounds To assess and compare predictors SWLR e LGR Results Averaged over the rounds Version X.0 Inter-Release Training Set Test Set Training Set Test Set Validation and Evaluation – Intra- & Inter-Release Analysis K-Fold Cross Validation SWLR Models AIC & RD (Lower Values > Goodness of Fit) LGR Models SAR Kendall τ & Spearman ρ [-1;+1] SWLR Predictors LGR Predictors Precision Recall F - measure

8 RESULTS Results Cluster Level Models Baseline Class Level Models Baseline Package Level Models No Prevalence SWLR Sum of Absolute Residual (SAR) <>>- Correlation FP-FO (K. & S.) INTRA 8 of 15 0 of 152 of 15 5 of 15 Correlation FP-FO (K. & S.) INTER 7 of 110 of 111 of 113 of 11 LGR Goodness of the fit (AIC – RD) <>>- Correlation FP-FO (Precision – Recall - FMeasure) INTRA 9 of 150 of 151 of 155 of 15 Correlation FP-FO (Precision – Recall - FMeasure) INTER 7 of 110 of 113 of 111 of 11 Legend: Best Values Worst Values No OO Software System 6 PREDICTORS 3 SWLR CLUSTER + BASELINE 3 LGR CLUSTER + BASELINE INTRA- INTER-RELEASE Prevalence

9 CONCLUSION Thanks Acknowledgements Carmine Gravino Andrian Marcus Tim Menzies Giuseppe Scanniello


Download ppt "CLUSTERING SUPPORT FOR FAULT PREDICTION IN SOFTWARE Maria La Becca Dipartimento di Matematica e Informatica, University of Basilicata, Potenza, Italy"

Similar presentations


Ads by Google