Presentation is loading. Please wait.

Presentation is loading. Please wait.

Better Software Defect Prediction Using Equalized Learning With Machine Learners Kim Kaminsky Gary D. Boetticher Department of Computer Science University.

Similar presentations


Presentation on theme: "Better Software Defect Prediction Using Equalized Learning With Machine Learners Kim Kaminsky Gary D. Boetticher Department of Computer Science University."— Presentation transcript:

1 Better Software Defect Prediction Using Equalized Learning With Machine Learners Kim Kaminsky Gary D. Boetticher Department of Computer Science University of Houston - Clear Lake Houston, Texas, USA

2 The maturing of Software Engineering as a discipline requires a better understanding of the complexity of the software process. Empirical-based modeling is one mechanism for improving the understanding, and thus management of the software process. Preamble

3 Heavily context dependent –Measure A from Project X  Measure B from Project Y Unreliable data due to poor processes Organizations do not share data Projects are large project estimation data occurs infrequently Knowledge Sharing Limitations in Software Engineering

4 Implicitly Data Starved Domains Defect counts Number of modules Lots of this Little of that

5 NASA Data Sets Defect Counts No. of Modules

6 Balance Data by Replicating Sparse Instances [Mizuno99] Equalized Learning 300 Instances of 0 Defects 20 Instances/ 5 Defects 10 Instances/9 Defects 300 Instances of 0 Defects 300 Instances of 5 Defects 300 Instances of 9 Defects 3 Colors = 3 Diff. Instances

7 + AB * A- 3D 2 (of many) Chromosomes Data Fitness Value = Model performance on data. 888 out of 1000 913 out of 1000 Genetic Programming Process - 1

8 + AB * A- 3D 2 ChromosomesCrossover Mutation + B - 3D * AA + B - D 3.1 Genetic Programming Process - 2

9 379 Unique tuples Input: Product Metrics (Size, Complexity, Vocabulary) } Output: Defect Count Equalized Learning Applied to the NASA KC2 Defect Dataset Equalized produces 3013 samples

10 2000 Characters 1000 Chromosomes 50 Generations Max. 20 Trials Original versus Equalized Data Experiment Configuration

11 Original versus Equalized Data t-test Results

12 Equalized learning spawns large datasets Equalized learning produces better models Conclusions

13 Future Directions Validation Across Different Data Sets Improve Performance: Distributed GP


Download ppt "Better Software Defect Prediction Using Equalized Learning With Machine Learners Kim Kaminsky Gary D. Boetticher Department of Computer Science University."

Similar presentations


Ads by Google