Presentation is loading. Please wait.

Presentation is loading. Please wait.

Perspectives on fault data quality Tracy Hall Reader in Software Engineering Brunel University Two short talks on this topic…

Similar presentations


Presentation on theme: "Perspectives on fault data quality Tracy Hall Reader in Software Engineering Brunel University Two short talks on this topic…"— Presentation transcript:

1 Perspectives on fault data quality Tracy Hall Reader in Software Engineering Brunel University Two short talks on this topic…

2 The 3 rd CREST Open Workshop, 27 th January 2010 Schedule Why are we interested in fault data? Why are we interested in fault data? What are the problems with the quality of fault data? What are the problems with the quality of fault data? How did we investigate the quality of fault data? How did we investigate the quality of fault data? What does all this mean? What does all this mean?

3 The 3 rd CREST Open Workshop, 27 th January 2010 Why are we interested in fault data? The analysis of historical fault data could enable us to predict potential fault hotspots in code The analysis of historical fault data could enable us to predict potential fault hotspots in code Lots of previous studies analysed fault data: Lots of previous studies analysed fault data: OSS repositories, NASA data, industrial data OSS repositories, NASA data, industrial data Only a few directly address the problem of extracting reliable fault data: Only a few directly address the problem of extracting reliable fault data: Zimmerman & Zeller Group (Saarland) Zimmerman & Zeller Group (Saarland) Ostrand & Weyuker Ostrand & Weyuker

4 The 3 rd CREST Open Workshop, 27 th January 2010 What are the problems with the quality of fault data? Very little direct fault data around Very little direct fault data around Little use of bug reporting repositories Little use of bug reporting repositories Mining faults from change repositories problematic: Mining faults from change repositories problematic: Identifying all elements of one change Identifying all elements of one change Separating fault-fixing changes from other changes Separating fault-fixing changes from other changes Indirect relationship between fault fixes and faults Indirect relationship between fault fixes and faults Problems exacerbated by: Problems exacerbated by: Size of change repository Size of change repository Reliability of data in repository Reliability of data in repository

5 The 3 rd CREST Open Workshop, 27 th January 2010 How did we investigate the quality of fault data? Performed a small study using Barcode OSS Performed a small study using Barcode OSS Chose Barcode as: Chose Barcode as: used by Meyers & Binkley to investigate the use of program slicing metrics used by Meyers & Binkley to investigate the use of program slicing metrics Identify sets of fault data using three methods: Identify sets of fault data using three methods: 1. Manual analysis of change diffs 2. Keyword search 3. Size of change search

6 The 3 rd CREST Open Workshop, 27 th January 2010 1. Manual analysis of change diffs Manually analysed 199 change diffs Manually analysed 199 change diffs Three researchers independently classified each as either: Three researchers independently classified each as either: Fault fix Fault fix Not fault fix Not fault fix Don’t know Don’t know Inter rater reliability score computed for agreement level Inter rater reliability score computed for agreement level Planned to use this as the baseline fault data set Planned to use this as the baseline fault data set

7 The 3 rd CREST Open Workshop, 27 th January 2010 1 Researcher 2 Researcher 1 FDKNF F37138 DK000 NF3331772 Researcher 2 Researcher 3 FDKNF F31158 DK7539 NF322438 3 Researcher 1 FDKNF F29821 DK000 NF254373 1.114/199 agreements Kappa.28 2. 74/199 agreements Kappa.027 3. 102/199 agreements kappa.17

8 The 3 rd CREST Open Workshop, 27 th January 2010 2. Keyword search

9 The 3 rd CREST Open Workshop, 27 th January 2010 Keyword search results Diff results Change log analysis FNF F216 NF2678 99/131 agreements kappa.4

10 The 3 rd CREST Open Workshop, 27 th January 2010 3. Size of change search

11 The 3 rd CREST Open Workshop, 27 th January 2010 Diff results Fixed window FNF F2519 NF2265 Diff results Sliding window FNF F2921 NF1863 90/131 agreements kappa.3 92/131 agreements kappa.3

12 The 3 rd CREST Open Workshop, 27 th January 2010 What does all this mean?


Download ppt "Perspectives on fault data quality Tracy Hall Reader in Software Engineering Brunel University Two short talks on this topic…"

Similar presentations


Ads by Google