Real-World Challenges in Building Accurate Software Fault Prediction Models DR. ÇA Ğ ATAY ÇATAL TUBITAK (Research Council of TURKEY) Predictive Modelling.

Slides:

Advertisements

Similar presentations

Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.

Advertisements

Symantec 2010 Windows 7 Migration EMEA Results. Methodology Applied Research performed survey 1,360 enterprises worldwide SMBs and enterprises Cross-industry.

Symantec 2010 Windows 7 Migration Global Results.

Angstrom Care 培苗社 Quadratic Equation II

3rd Annual Plex/2E Worldwide Users Conference 13A Batch Processing in 2E Jeffrey A. Welsh, STAR BASE Consulting, Inc. September 20, 2007.

AP STUDY SESSION 2.

Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.

Processes and Operating Systems

Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.

Author: Julia Richards and R. Scott Hawley

Myra Shields Training Manager Introduction to OvidSP.

STATISTICS HYPOTHESES TEST (I)

STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.

Objectives: Generate and describe sequences. Vocabulary:

Variance Estimation in Complex Surveys Third International Conference on Establishment Surveys Montreal, Quebec June 18-21, 2007 Presented by: Kirk Wolter,

David Burdett May 11, 2004 Package Binding for WS CDL.

1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.

Local Customization Chapter 2. Local Customization 2-2 Objectives Customization Considerations Types of Data Elements Location for Locally Defined Data.

Create an Application Title 1Y - Youth Chapter 5.

Process a Customer Chapter 2. Process a Customer 2-2 Objectives Understand what defines a Customer Learn how to check for an existing Customer Learn how.

Plan My Care Training Care Management Working in partnership with Improvement and Efficiency South East.

Chapter 7 Sampling and Sampling Distributions

1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.

The 5S numbers game..

Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,

Welcome. © 2008 ADP, Inc. 2 Overview A Look at the Web Site Question and Answer Session Agenda.

Break Time Remaining 10:00.

EE, NCKU Tien-Hao Chang (Darby Chang)

Turing Machines.

Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.

PP Test Review Sections 6-1 to 6-6

EIS Bridge Tool and Staging Tables September 1, 2009 Instructor: Way Poteat Slide: 1.

Bellwork Do the following problem on a ½ sheet of paper and turn in.

Regression with Panel Data

Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.

Sample Service Screenshots Enterprise Cloud Service 11.3.

Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.

1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.

Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.

Adding Up In Chunks.

MaK_Full ahead loaded 1 Alarm Page Directory (F11)

1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.

Artificial Intelligence

Subtraction: Adding UP

1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.

Prof.ir. Klaas H.J. Robers, 14 July Graduation: a process organised by YOU.

1 Let’s Recapitulate. 2 Regular Languages DFAs NFAs Regular Expressions Regular Grammars.

Speak Up for Safety Dr. Susan Strauss Harassment & Bullying Consultant November 9, 2012.

1 Titre de la diapositive SDMO Industries – Training Département MICS KERYS 09- MICS KERYS – WEBSITE.

Essential Cell Biology

Converting a Fraction to %

Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)

Chapter 8 Estimation Understandable Statistics Ninth Edition

Clock will move after 1 minute

famous photographer Ara Guler famous photographer ARA GULER.

PSSA Preparation.

Essential Cell Biology

Immunobiology: The Immune System in Health & Disease Sixth Edition

Physics for Scientists & Engineers, 3rd Edition

Energy Generation in Mitochondria and Chlorplasts

Select a time to count down from the clock above

Import Tracking and Landed Cost Processing An Enhancement For AS/400 DMAS from  Copyright I/O International, 2001, 2005, 2008, 2012 Skip Intro Version.

1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.

Presentation transcript:

Real-World Challenges in Building Accurate Software Fault Prediction Models DR. ÇA Ğ ATAY ÇATAL TUBITAK (Research Council of TURKEY) Predictive Modelling and Search Based Software Engineering, London, UK, October 2011

2 Outline Introduction Dependable Software Systems Motivation Challenging Issues Fault prediction with no fault data Fault prediction with limited fault data Noise detection on measurement datasets Practical tools (Eclipse plug-in) Cross company vs. Within-company fault prediction Our Models A Systematic Review Study Conclusion

Dependable Systems Are we successful in building dependable software systems? Safety (not being harmful for environment) Security(ability to protect the privacy) Reliability(ability to perform its function for a period of time) Availability (ability to serve whenever needed) 3

4 1. BRITISH ATM PAYS DOUBLE ! 19 March 2008 ATM pays out double the amount withdrawn Dozens of customers lined up in front of ATM This continued until ATM ran out of money at 8 p.m. Hull, England

A Generous British ATM... 5 A Sainsburrys spokesman said We do not know how much the machine paid out at the moment but the matter is under investigation A customer said I joined the queue and when I finally got to the front I drew out 200 pound but it gave me 400 pound. The statement said I only drew out 200 pound. I dont know whether I will have to pay it back The police said Those who benefited could face charges, but only if the company administering the machine complained.

2. ATM Pays Out Double the Cash, 16 January

3. Tesco machine pays double, 18 August

4. Dundee cash machine, 20 January

But what happens if an ATM malfunctions and pays out less than you asked for? We need dependable systems ! 9

10 Motivation Project Managers ask several questions: How can I get the code into production faster? What code should we refactor? How should I best assign my limited resources to different projects? How do I know if code is getting better or worse as time goes on? Baseline Code Analysis Using McCabe IQ Software Metrics Software Fault Prediction

11 Example: gcc project /trunk/gcc/fold-const.c const.c?revision=135517&view=markup const.c?revision=135517&view=markup fold_binarys CC value is 1159 ! Security problems or faults can occur

12 Vulnerability Report – Fold_Binary Method

CHAPTER 2: Challenging Issues

14 Software Fault Prediction Modeling Previous Version Training Learnt Hypothesis Predict Faults Software Metrics Software Metrics Current Project Known Fault Data Unknown Faul t Data

15 1. No Fault Data Software Metrics Software Metrics Unknown Fault Data Learnt Hypothesis Previous Version Current Project Training Predict Faults Unknown Fault Data * How does the software quality assurance team predict the software quality based on only the recorded software metrics? - A new project type for organization - No quality measurement have not been collected * Supervised learning approach cannot be taken

16 2. Limited Fault Data Software Metrics Software Metrics Unknown Fault Data Learnt Hypothesis Previous Version Current Project Training Predict Faults Known Fault Data Unknown Fault Data * During decentralized software development, some companies may not collect fault data for their components * Execution cost of data collection tools may be expensive * Company may not collect fault data for a version due to the lack of budget - Can we learn both from labeled and unlabeled data?

17 3. Noise Detection Noisy modules degrades the performance of machine learning based fault prediction models Attribute Noise Class Noise Class noise impact classifiers more severely as compared to attribute noise We need to identify noisy modules if they exist Some cases: Developers may not report the faults Data entry and data collection errors

18 4. Practical Tools Earliest Work, Porter and Selby, Logistic Regression (Khoshgoftaar et al., 1999) Decision Trees (Gokhale et al., 1997) Neural Networks (Khoshgoftaar et al., 1995) Fuzzy Logic (Xu, 2001) Genetic Programming (Evett et al., 1998) Case-Based Reasoning (Khoshgoftaar et al., 1997) Pareto Classification (Ebert, 1996) Discriminant Analysis (Ohlsson et al., 1998) Naive Bayes (Menzies et al., 2008)... Hundreds of research papers but lacking of practical tools…

19 5. Cross-Project vs. Within-Company Fault Prediction Can we use cross-company (CC) data and predict the fault-proneness of program modules in the absence of fault labels?

CHAPTER 3: Models we built...

1. No Fault Data 21

22 1. No Fault Data Problem- Literature Zhong et al., 2004, Clustering and Expert based Approach K-means and Neural Gas algorithms Mean vector and several statistical data such as min., max. Dependent on the capability of the expert Zhong, S., T. M. Khoshgoftaar, and N. Seliya, Unsupervised Learning for Expert-based Software Quality Estimation, Proceedings of the 8th Intl. Symp. on High Assurance Systems Engineering, Tampa, FL, 2004, pp

23 1. No Fault Data Problem 1. Our technique first applies X-means clustering method to cluster modules and identifies the best cluster number. 2. The mean vector of each cluster is checked against the metrics thresholds vector. A cluster is predicted as fault-prone if at least one metric of the mean vector is higher than the threshold value of that metric. [LOC, CC, UOp, UOpnd, TOp, TOpnd] [65, 10, 25, 40, 125, 70] (Integrated Software Metrics (ISM) document)

24 Datasets from Turkish white-goods manufacturer Effective results are achieved No expert opinion Identification of threshold vector is difficult

2. Limited Fault Data Problem 25

26 2. Limited Fault Data Problem We simulated small labeled-large unlabeled data problem with 5%, 10%, and 20% rates and evaluated the performance of each classifier under these circumstances. Naive Bayes algorithm, even if it is a supervised learning approach, works best for small datasets YATSI (Yet Another Two Stage Idea) improves the performance of Naive Bayes algorithm for large datasets if the dataset does not consist of noisy modules We suggest Naive Bayes for limited fault data problem as well

3. Noise Detection 27

28 3. Noise Detection Our hypothesis: A data object that has a non-faulty class label is considered a noisy instance if the majority of the software metric values exceed their corresponding threshold values. A data object that has a faulty class label is considered a noisy instance if all of the metric values are below their corresponding threshold values. How to calculate software metrics threshold values? R. Shatnawi, W. Li, J. Swain, T. Newman, Finding software metrics threshold values using ROC curves, Journal of Software Maintenance and Evolution: Research and Practice 22 (1) (2010) 1–16.

29 How to Calculate Threshold Values The interval for the candidate threshold values is between the minimum and maximum value of that metric in the dataset. Shatnawi et al. (2010) stated that they chose the candidate threshold value that has the maximum value for both sensitivity and specificity, but such a candidate threshold may not always exist. We calculated the AUC of the ROC curve that passes through three points, i.e., (0, 0), (1, 1), and (PD, PF), and we chose the threshold value that maximizes the AUC.

30

31

4. Practical Tools 32

33 4. Eclipse based Plug-in (RUBY)

Sample User Interfaces - Features 34

Result Views 35

5. Cross-Project Fault Prediction 36

37 5. Cross-Project Fault Prediction We developed models based on software metrics threshold values If majority of software metrics thresholds values are exceeded, the label of the module is faulty Otherwise, non-faulty label is assigned Threshold values are calculated from the other projects (cross-company)

38

AUC, PD, PF 39

Results Case studies showed that the use of cross-company data is useful for building fault predictors in the absence of fault labels and remarkable results are achieved. Our threshold-based fault prediction technique achieved larger PD (but larger PF) value than Naive Bayes based approach. For mission critical applications, PD values are more important than PF values because all of the faults should be removed before deployment. In summary, we showed that cross-company dataset is useful. 40

4. Systematic Review 41

42 A Systematic Review Study 74 papers published between 1990 and journal papers 47 conference papers We report distributions before and after 2005, since that was the year that the PROMISE repository was established.

Results The journals that published more than two fault model papers are: IEEE Transaction of Software Engineering (9); Software Quality Journal (4); Journal of Systems and Software (3); Empirical Software Engineering (3) 14% of papers were published before 2000 and 86% after. Types of data sets used by authors were: private (60%), partial (8%), public (31%), unknown (1%). Partial means data from open source projects that have not been circulated. Since 2005 the proportion of private datasets has reduced to 31%, the proportion of public data sets has increased to 52%. There are 14% partial datasets and 3% unknown. 43

Results (contd) Data analysis methods are machine learning (59%), statistics (22%), statistics and machine learning (18%) and statistics and expert opinion (1%). After 2005 the distribution of methods is machine learning (66%), statistics (14%), statistics and machine learning (17%) and statistics and expert opinion (3%). 60% of papers used method level metrics, 24% used class level metrics, 10% were file level metrics, other categories less than 5%. 2005, 53% were method level, 24% were class level and 17% were file level (others less than 3%). 44

Suggestions More studies should use class-level metrics to support early prediction. Fault studies should use public datasets to ensure results can be repeatable and verifiable. Researchers should increase usage of machine learning techniques. 45

46 Conclusion & Future Work Software fault prediction is still challenging and quite useful We need practical tools Prediction models can be used to predict vulnerability-prone modules Challenges How to make fault prediction work across projects ? How to build models when there is no fault data? How to build models when there is very limited fault data? How to remove noisy modules from datasets?

47 THANK YOU Cagatay CATAL, Ph.D.