Week 3 Presentation Istehad Chowdhury CISC 864 Mining Software Engineering Data.

Slides:



Advertisements
Similar presentations
On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach Author: Steven L. Salzberg Presented by: Zheng Liu.
Advertisements

AuthorAID Workshop on Research Writing Nepal March 2011.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Benjamin J. Deaver Advisor – Dr. LiGuo Huang Department of Computer Science and Engineering Southern Methodist University.
Computer Science and Software Engineering© 2014 Project Lead The Way, Inc. How To Program.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Reviewing the work of others Referee reports. Components of a referee report Summary of the paper Overall evaluation Comments about content Comments about.
05-899D: Human Aspects of Software Development Spring 2011, Lecture 28 YoungSeok Yoon Institute for Software.
Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.
Software Quality Metrics
1 Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen Swinburne University of Technology Christian Murphy, Gail.
Your Presentations Fall 2005 Software Engineering Computer Science and Engineering Qatar University.
Thesis Defense Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department.
How to Read a CS Research Paper? Philip W. L. Fong.
Ant Colonies As Logistic Processes Optimizers
Software Engineering Process and Tools in the Mozilla Project How open-source CM tools made Firefox the fastest and safest web browser.
Applied Software Project Management 1 Introduction Dr. Mengxia Zhu Computer Science Department Southern Illinois University Carbondale.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Swami NatarajanJuly 14, 2015 RIT Software Engineering Reliability: Introduction.
Software Testing Prasad G.
Flash talk by: Aditi Garg, Xiaoran Wang Authors: Sarah Rastkar, Gail C. Murphy and Gabriel Murray.
Structure of a Research Paper
MAP™ Administration Proctor Training Supplementary Component.
Chapter 8: Systems analysis and design
Identifying Reasons for Software Changes Using Historic Databases The CISC 864 Analysis By Lionel Marks.
An Automated Approach to Predict Effectiveness of Fault Localization Tools Tien-Duy B. Le, and David Lo School of Information Systems Singapore Management.
Lecture 6: The Ultimate Authorship Problem: Verification for Short Docs Moshe Koppel and Yaron Winter.
Tracking The Problem  By Aaron Jackson. What’s a Problem?  A suspicious or unwanted behavior in a program  Not all problems are errors as some perceived.
Project Management Methodology Project Closing. Project closing stage Must be performed for all projects, successfully completed or shut off by management.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Hipikat: A Project Memory for Software Development The CISC 864 Analysis By Lionel Marks.
GCSE Resistant Materials -
Rapid software development 1. Topics covered Agile methods Extreme programming Rapid application development Software prototyping 2.
1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 2. Projects and assignments.
Summarizing Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia.
Systems Life Cycle. Know the elements of the system that are created Understand the need for thorough testing Be able to describe the different tests.
Debug Concern Navigator Masaru Shiozuka(Kyushu Institute of Technology, Japan) Naoyasu Ubayashi(Kyushu University, Japan) Yasutaka Kamei(Kyushu University,
GECCO Papers Same research group, different lead authors Same conference Paper 1: Embodied Distributed Evolutionary Algorithm (EDEA) for on-line, on-board.
Reviewing Papers© Dr. Ayman Abdel-Hamid, CS5014, Fall CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia Tech.
11 Version Control Systems Mauro Jaskelioff (originally by Gail Hopkins)
Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.
Mainly the Neck of the Hourglass: Methods, Results, Tables and Graphs, and Abstracts Barbara Gastel, MD, MPH Veterinary Integrative Biosciences.
A Metrics Program. Advantages of Collecting Software Quality Metrics Objective assessments as to whether quality requirements are being met can be made.
CS5103 Software Engineering Lecture 02 More on Software Process Models.
Classifying Kung-Fu Side kicks With low cost hardware and open source software Victoria Værnø School of Computer Science & Engineering, Seoul National.
P51UST: Unix and SoftwareTools Unix and Software Tools (P51UST) Version Control Systems Ruibin Bai (Room AB326) Division of Computer Science The University.
The case of NC (V) Consumer Behaviour? Ms Karen Kleintjies (Consumer Science/ Studies Research Team)
Internet Literacy Evaluating Web Sites. Objective The Student will be able to evaluate internet web sites for accuracy and reliability The Student will.
© 2012 Václav Rajlich Software Engineering: The Current Practice Ch Conclusion of software change The last phase of software change The activities.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
GCSE ICT 3 rd Edition The system life cycle 18 The system life cycle is a series of stages that are worked through during the development of a new information.
Irwin/McGraw-Hill Copyright © 2000 The McGraw-Hill Companies. All Rights reserved Whitten Bentley DittmanSYSTEMS ANALYSIS AND DESIGN METHODS5th Edition.
ITEC 370 Lecture 20 Testing. Review Questions? Project update on F Test plan –Sections –How / when to use it.
CS276B Text Information Retrieval, Mining, and Exploitation Practical 1 Jan 14, 2003.
So many questions, so little time!. The challenge: Despite giving students rubrics (pre-assignment) and despite giving them detailed feedback on analysis.
Anomaly Detection Carolina Ruiz Department of Computer Science WPI Slides based on Chapter 10 of “Introduction to Data Mining” textbook by Tan, Steinbach,
CII Assignment Techniques A Half-Day Practical Course Mark Butterworth Risk West Copyright: Condie Risk Consultancy Ltd.
Experience Report: System Log Analysis for Anomaly Detection
Project Management PTM721S
Why We Refactor? Confessions of GitHub Contributors
Learning Software Behavior for Automated Diagnosis
DEFECT PREDICTION : USING MACHINE LEARNING
Decomposition.
Roberto Battiti, Mauro Brunato
Internet Literacy Evaluating Web Sites.
iSRD Spam Review Detection with Imbalanced Data Distributions
Talking About Writing Notes
Performance analysis assessment – analysis and evaluation
NOTICE! These materials are prepared only for the students enrolled in the course Distributed Software Development (DSD) at the Department of Computer.
Presentation transcript:

Week 3 Presentation Istehad Chowdhury CISC 864 Mining Software Engineering Data

Research Paper Who Should Fix This Bug? John Anvik, Lyndon Hiew and Gail C. Murphy Department of Computer Science University of British Columbia {janvik, lyndonh,

Problem with Open Bug Repository Overall, to cope with the surge of bugs in large open source projects. “Everyday, almost 300 bugs appear that need triaging. This is far too much for only the Mozilla programmers to handle.” Many bug reports are invalid or duplicate of another bug report Eclipse, 36% Every bug report should be triaged To check validity and duplicity To assign the bug to an appropriate developer

Problem cont.. Triager may not be sure whom to assign the bug. Lot of time is wasted in reassigning and regaining 24% reports in Eclipse are re-assigned

The research work Goal: suggest whom to assign this bug to Technique: Using data mining and machine learning Result: 60% precision and 10% recall

Precision and Recall

Life Cycle of a Bug Report

Roles Reporter/Submitter Resolver Contributor Triager The roles are overlapping

Approach to the problem Semi automated 1. Characterizing bug reports 2. Assigning a label to each report 3. Choosing reports to train the supervised machine learning algorithm 4. Applying the algorithm to create the classifier for recommending assignments.

Heuristics on labeling bug reports FIXED (who provided last approved patch), Firefox FIXED (whoever marked report as resolved), Eclipse DUPLICATE: whoever resolved the report is duplicate. Eclipse and Firefox WORKSFORME (Firefox) -- unclassifiable.

Experimental Results Fig. Recommender accuracy and recall

Validating Results with GCC Why so poor result? Why recall is low in all cases, esp. gcc? Shows need of similarity in project natures.

Trying Alternatives

Trying Alternatives cont.. Unsupervised Machine learning Incremental Machine learning Incorporating Additional sources of Data Component based classifier

Points to Ponder

Points to Ponder cont.. Are new developers assigned any bug? “Needs further study to context of which it can be applied”-empirical research

Points to Ponder cont.. Was there enough instances to evaluate using Cross Validation? For firefox 75%, gcc 86% developers have less than 100 reports Why was the labeling mechanism more successful in case of gcc and Eclipse than firefox? 1% for Eclipse, 47% for firefox

Points in favor The research work was very intense Thoroughly studied Honest in identifying the limitations and smart pointing out of the future works It opens up interesting doors of future research

Points Against The study may not be suitable for a environment where there is a frequent change in the active set of developers The findings are too project specific and works well on “actual bugs” reports

Points Against cont.. If there is any naivety in the heuristics it also propagates to the filtering process based on the heuristics to train the classifier. I liked the way included the lesson learned section. However, the authors should have explained in more details how the mappings were done.

Concluding Remarks It shows promise for improving the bug assignment problem for OSS “Coordination bug reports and CVS is challenging” The effort is worth praising Identifies need for further research

Questions and Comments?