Better Software Defect Prediction Using Equalized Learning With Machine Learners Kim Kaminsky Gary D. Boetticher Department of Computer Science University.

Slides:



Advertisements
Similar presentations
Willing to spend the time! Self motivated! Self responsibility! (If you need something Ask For IT!!!!!) Ability to communicate! (Vocabulary) Write,
Advertisements

Rachel T. Johnson Douglas C. Montgomery Bradley Jones
Learning Introductory Signal Processing Using Multimedia 1 Outline Overview of Information and Communications Some signal processing concepts Tools available.
On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.
Ensemble Learning – Bagging, Boosting, and Stacking, and other topics
(SubLoc) Support vector machine approach for protein subcelluar localization prediction (SubLoc) Kim Hye Jin Intelligent Multimedia Lab
ADBIS 2007 Discretization Numbers for Multiple-Instances Problem in Relational Database Rayner Alfred Dimitar Kazakov Artificial Intelligence Group, Computer.
Stimulus and Response. Simple Stimulus Verifying the Output Self-Checking Testbenches Complex Stimulus Complex Response Predicting the Output.
WATERLOO ELECTRICAL AND COMPUTER ENGINEERING 20s: Computer Hardware 1 WATERLOO ELECTRICAL AND COMPUTER ENGINEERING 20s Computer Hardware Department of.
The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Univ. of Houston.
Using Correlation and Accuracy for Identifying Good Estimators 4 th International Predictor Models.
Nearest Neighbor Sampling for Better Defect Prediction Gary D. Boetticher Department of Software Engineering University of Houston - Clear Lake Houston,
Conclusion Epidemiology and what matters most
Naïve-Bayes Classifiers Business Intelligence for Managers.
Automated Software Maintainability through Machine Learning by Eric Mudge.
SBSE Course 3. EA applications to SE Analysis Design Implementation Testing Reference: Evolutionary Computing in Search-Based Software Engineering Leo.
Doe Bug Prediction Support Human Developers? Findings From a Google Case Study Chris Lewis, ZhongPeng Lin, Caitlin Sadowski, Xiaoyan Zhu, Rong Ou, E.James.
Institute of Intelligent Power Electronics – IPE Page1 Introduction to Basics of Genetic Algorithms Docent Xiao-Zhi Gao Department of Electrical Engineering.
The GDB Cup: Applying “Real World” Financial Data Mining in an Academic Setting Gary D. Boetticher University of Houston - Clear Lake Houston, Texas, USA.
Software Engineering II - Topic: Software Process Metrics and Project Metrics Instructor: Dr. Jerry Gao San Jose State University
Feature Selection for Regression Problems
Graph-Based Concept Learning Jesus A. Gonzalez, Lawrence B. Holder, and Diane J. Cook Department of Computer Science and Engineering University of Texas.
Software Quality Analysis with Limited Prior Knowledge of Faults Naeem (Jim) Seliya Assistant Professor, CIS Department University of Michigan – Dearborn.
Genetic Algorithms Learning Machines for knowledge discovery.
A Comparative Analysis of Software Refinement Techniques Ion IVAN Adrian VISOIU.
Application of reliability prediction model adapted for the analysis of the ERP system Frane Urem, Krešimir Fertalj, Željko Mikulić College of Šibenik,
Transformation of Input Space using Statistical Moments: EA-Based Approach Ahmed Kattan: Um Al Qura University, Saudi Arabia Michael Kampouridis: University.
Texas A&M University Page 1 9/16/ :22:47 PM Wei Zhao Texas A&M University Is Computer Stuff Science, Engineering, or Something else?
University of Coimbra, DEI-CISUC
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Outlier Detection Using k-Nearest Neighbour Graph Ville Hautamäki, Ismo Kärkkäinen and Pasi Fränti Department of Computer Science University of Joensuu,
Software Engineering Chapter 23 Software Testing Ku-Yaw Chang Assistant Professor Department of Computer Science and Information.
1 Software Quality CIS 375 Bruce R. Maxim UM-Dearborn.
Chapter 6 : Software Metrics
Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.
Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains Gary D. Boetticher Department of Software Engineering.
Version control – Project repository, version management capability, make facility, issue/bug tracking Change control Configuration audit – compliments.
Prognosis of Gear Health Using Gaussian Process Model Department of Adaptive systems, Institute of Information Theory and Automation, May 2011, Prague.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
DEPARTMENT OF ENGINEERING SCIENCE Information, Control, and Vision Engineering Extras From Programming Lecture … And exercise solutions.
1 Naïve Bayes Classifiers CS 171/ Definition A classifier is a system that categorizes instances Inputs to a classifier: feature/attribute values.
Your Poster Title Here Your name here, and names of others Place the name of your institution here Your Poster Title Here Your name here, and names of.
Estimating Component Availability by Dempster-Shafer Belief Networks Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo Lane.
THE IRISH SOFTWARE ENGINEERING RESEARCH CENTRELERO© What we currently know about software fault prediction: A systematic review of the fault prediction.
Cmpe 589 Spring 2006 Lecture 2. Software Engineering Definition –A strategy for producing high quality software.
Math – What is a Function? 1. 2 input output function.
Measure18 1 Software Measurement Halstead’s Software Science.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
WERST – Methodology Group
Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris T. Lin, Sanghack Lee, Ngot Bui.
Advanced Software Engineering Lecture 4: Process & Project Metrics.
CISC Machine Learning for Solving Systems Problems Microarchitecture Design Space Exploration Lecture 4 John Cavazos Dept of Computer & Information.
Intro to Estimating Part Art, Part Science. Importance of Good Estimates Time (Realistic Deadlines) most software projects are late because the time was.

Transfer and Multitask Learning Steve Clanton. Multiple Tasks and Generalization “The ability of a system to recognize and apply knowledge and skills.
Managing Qualitative Knowledge in Software Architecture Assesment Jilles van Gurp & Jan Bosch Högskolan Karlskrona/Ronneby in Sweden Department of Software.
Estimation of Distribution Algorithm and Genetic Programming Structure Complexity Lab,Seoul National University KIM KANGIL.
Calculation of Software Failure Probability and Test Case Selection February 14, 2007 Kim, Sung Ho.
CS623: Introduction to Computing with Neural Nets (lecture-18) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Topics discussed in this section:
IT6004 – SOFTWARE TESTING.
Kim Kaminsky Gary D. Boetticher Department of Computer Science
How to Predict More with Less: Defect Prediction Using Machine Learners in an Implicitly Data Starved Domain Kim Kaminsky Gary D. Boetticher Department.
Monte Carlo Simulation Managing uncertainty in complex environments.
Understanding the Human Estimator
Software Engineering Experimentation
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Topics discussed in this section:
Software Engineering Experimentation
Presentation transcript:

Better Software Defect Prediction Using Equalized Learning With Machine Learners Kim Kaminsky Gary D. Boetticher Department of Computer Science University of Houston - Clear Lake Houston, Texas, USA

The maturing of Software Engineering as a discipline requires a better understanding of the complexity of the software process. Empirical-based modeling is one mechanism for improving the understanding, and thus management of the software process. Preamble

Heavily context dependent –Measure A from Project X  Measure B from Project Y Unreliable data due to poor processes Organizations do not share data Projects are large project estimation data occurs infrequently Knowledge Sharing Limitations in Software Engineering

Implicitly Data Starved Domains Defect counts Number of modules Lots of this Little of that

NASA Data Sets Defect Counts No. of Modules

Balance Data by Replicating Sparse Instances [Mizuno99] Equalized Learning 300 Instances of 0 Defects 20 Instances/ 5 Defects 10 Instances/9 Defects 300 Instances of 0 Defects 300 Instances of 5 Defects 300 Instances of 9 Defects 3 Colors = 3 Diff. Instances

+ AB * A- 3D 2 (of many) Chromosomes Data Fitness Value = Model performance on data. 888 out of out of 1000 Genetic Programming Process - 1

+ AB * A- 3D 2 ChromosomesCrossover Mutation + B - 3D * AA + B - D 3.1 Genetic Programming Process - 2

379 Unique tuples Input: Product Metrics (Size, Complexity, Vocabulary) } Output: Defect Count Equalized Learning Applied to the NASA KC2 Defect Dataset Equalized produces 3013 samples

2000 Characters 1000 Chromosomes 50 Generations Max. 20 Trials Original versus Equalized Data Experiment Configuration

Original versus Equalized Data t-test Results

Equalized learning spawns large datasets Equalized learning produces better models Conclusions

Future Directions Validation Across Different Data Sets Improve Performance: Distributed GP