Mining Version Histories to Guide Software Changes Thomas Zimmerman Peter Weisgerber Stephan Diehl Andreas Zeller.

Slides:



Advertisements
Similar presentations
Rigorous Software Development CSCI-GA Instructor: Thomas Wies Spring 2012 Lecture 11.
Advertisements

SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
Aspect Oriented Programming. AOP Contents 1 Overview 2 Terminology 3 The Problem 4 The Solution 4 Join point models 5 Implementation 6 Terminology Review.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Credit Cards presentation slides. Applying For A Credit Card costs: Annual Percentage Rate (APR) Grace period Annual fees Transaction fees Balancing computation.
Fundamentals of Python: From First Programs Through Data Structures
Lecture 1 Introduction to the ABAP Workbench
1 Perracotta: Mining Temporal API Rules from Imperfect Traces Jinlin Yang David Evans Deepali Bhardwaj Thirumalesh Bhat Manuvir Das.
CS 355 – Programming Languages
CS 330 Programming Languages 09 / 16 / 2008 Instructor: Michael Eckmann.
Introduction and Overview to Mining Software Repository Zoltan Karaszi zkaraszi (at) kent.edu MS/PHD seminar (cs6/89191) November 9th,
The problem of sampling error in psychological research We previously noted that sampling error is problematic in psychological research because differences.
Fundamentals of Python: From First Programs Through Data Structures
Data Structures Introduction Phil Tayco Slide version 1.0 Jan 26, 2015.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
Fundamentals of Python: First Programs
REFACTORING Lecture 4. Definition Refactoring is a process of changing the internal structure of the program, not affecting its external behavior and.
Software Engineering – University of Tampere, CS DepartmentJyrki Nummenmaa USE CASES In this lecture: Use cases - What are use.
Identifying Reasons for Software Changes Using Historic Databases The CISC 864 Analysis By Lionel Marks.
CS4723 Software Validation and Quality Assurance
Dependency Tracking in software systems Presented by: Ashgan Fararooy.
CSE 755, part3 Axiomatic Semantics Will consider axiomatic semantics (A.S.) of IMP: ::=skip | | | | ; | | Only integer vars; no procedures/fns; vars declared.
Software Estimation and Function Point Analysis Presented by Craig Myers MBA 731 November 12, 2007.
Branching. Version Control - Branching A way to write code without affecting the rest of your team Merge branches to integrate your changes.
1 On to Object Design Chapter 14 Applying UML and Patterns.
Reviewing Recent ICSE Proceedings For:.  Defining and Continuous Checking of Structural Program Dependencies  Automatic Inference of Structural Changes.
Extreme/Agile Programming Prabhaker Mateti. ACK These slides are collected from many authors along with a few of mine. Many thanks to all these authors.
Input, Output, and Processing
Chapter 3 Developing an algorithm. Objectives To introduce methods of analysing a problem and developing a solution To develop simple algorithms using.
Hipikat: A Project Memory for Software Development The CISC 864 Analysis By Lionel Marks.
Topic (ii): New and Emerging Methods Maria Garcia (USA) Jeroen Pannekoek (Netherlands) UNECE Work Session on Statistical Data Editing Paris, France,
Conditions. Objectives  Understanding what altering the flow of control does on programs and being able to apply thee to design code  Look at why indentation.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
1 Resperate Order Process Analysis & Recommendations. October 2006 Version 1.
Albert Gatt LIN3021 Formal Semantics Lecture 4. In this lecture Compositionality in Natural Langauge revisited: The role of types The typed lambda calculus.
3.2 Semantics. 2 Semantics Attribute Grammars The Meanings of Programs: Semantics Sebesta Chapter 3.
Inter-Type Declarations in AspectJ Awais Rashid Steffen Zschaler © Awais Rashid, Steffen Zschaler 2009.
“Education is a Treasure that follows you everywhere.” – Chines Proverb Methods and Functions.
Elementary C++. Procedural Programming Split your problem into simpler parts then solve each part separately Recognize common parts and solve them only.
Inference: Probabilities and Distributions Feb , 2012.
Programming Fundamentals Lecture No. 2. Course Objectives Objectives of this course are three fold 1. To appreciate the need for a programming language.
SOFTWARE TESTING. Introduction Software Testing is the process of executing a program or system with the intent of finding errors. It involves any activity.
Recommending Adaptive Changes for Framework Evolution Barthélémy Dagenais and Martin P. Robillard ICSE08 Dec 4 th, 2008 Presented by EJ Park.
CMSC 330: Organization of Programming Languages Operational Semantics.
Copyright , Dennis J. Frailey CSE7315 – Software Project Management CSE7315 M15 - Version 9.01 SMU CSE 7315 Planning and Managing a Software Project.
Today… Python Keywords. Iteration (or “Loops”!) Winter 2016CISC101 - Prof. McLeod1.
Teens lesson eight credit cards presentation slides 04/09.
CIS-NG CASREP Information System Next Generation Shawn Baugh Amy Ramirez Amy Lee Alex Sanin Sam Avanessians.
Chapter 26: Generalizations and Surveys. Inductive Generalizations (pp ) Arguments to a general conclusion are fairly common. Some people claim.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
11 Making Decisions in a Program Session 2.3. Session Overview  Introduce the idea of an algorithm  Show how a program can make logical decisions based.
Laurea Triennale in Informatica – Corso di Ingegneria del Software I – A.A. 2006/2007 Andrea Polini XVII. Verification and Validation.
1 Terminal Management System Usage Overview Document Version 1.1.
PeerWise Student Instructions
Software Testing.
MyStatLab Student help/instructions MTH/233
ECONOMETRICS ii – spring 2018
Chapter 6 Hypothesis tests.
History, Characteristics and Frameworks
Data Analysis.
Object-Oriented and Classical Software Engineering Sixth Edition, WCB/McGraw-Hill, 2005 Stephen R. Schach
Teens lesson eight credit cards presentation slides 04/09.
6.001 SICP Variations on a Scheme
Teens lesson eight credit cards presentation slides 04/09.
Data Structures Introduction
Teens lesson eight credit cards presentation slides 04/09.
Review of Previous Lesson
Recommending Adaptive Changes for Framework Evolution
Teens lesson eight credit cards presentation slides 04/09.
Presentation transcript:

Mining Version Histories to Guide Software Changes Thomas Zimmerman Peter Weisgerber Stephan Diehl Andreas Zeller

“In this paper, we apply data mining to version histories: 'Programmers who changed these functions also changed....' Just like the Amazon.com feature helps the customer browsing along related items, our ROSE tool guides the programmer along related changes...”

Agenda ROSE Overview CVS to ROSE Data Analysis Evaluation Paper Critique

ROSE Overview Aims: Suggest and predict likely changes. Suppose a programmer has just made a change. What else does she have to change? Prevent errors due to incomplete changes. If a programmer wants to commit changes, but has missed a related change, ROSE issues a warning. Detect coupling undetectable by program analysis. As ROSE operates exclusively on the version history, it is able to detect coupling between items that cannot be detected by program analysis.

ROSE Overview (2)‏

CVS to ROSE ROSE works in terms of changes in entities ex: changes in directories, files, classes, methods, variables Every entity is a triple (c, i, p), where c is the syntactic category, i is the identifier, and p is the parent entity: ex: (method, initDefaults(), (class, Comp,...))‏ Every change is expressed using predicates: alter(e)‏ add_to(e)‏ del_from(e)‏ Each transaction from CVS is converted to a list of those changes

Data Analysis ROSE aims to mine rules from those alterations: alter(field, fKeys[],...) is possibly followed by: alter(method, initDefaults(),...)‏ alter(file, plug.properties,...)‏ The probability is measured by: Support count. Determines the number of transactions the rule has been derived from. Confidence. The relative amount of the given consequences across all alternatives for a given antecedent. ex: suppose fKeys[] was altered in 11 transactions. 10 of those also alter()'ed initDefaults() and plug.properties. 10 is the support count, and 10/11 (or 0.909) is the confidence.

Data Analysis Other features: add_to() and del_from() allow an abstraction from the name of an added entity to the name of the surrounding entity. The notation of entities allows varying granularities for mining data. Fine-granular mining. For source code of C-like languages, alter() is used for fields, functions, etc. add_to() is used for file entities. Coarse-granular mining. Regardless of file type, only alter() is used for file entities. add_to() and del_from() can be used to capture when a file has been added or deleted Coarse-granular rules have a higher support count and usually return more results. However they are less precise in location, and see limited use for guiding programmers.

Evaluation Usage Scenarios: Navigation through source code. Given a change, can ROSE point to other entities that should typically be changed too? Error prevention. If a programmer has changed many entities but missed to change one, does ROSE find the missing one? Closure. When the transaction is finished, how often does ROSE erroneously suggest that a change is missing in the error prevention scenario? Evaluation on eight large open-source projects ECLIPSE GCC GIMP JBOSS JEDIT KOFFICE POSTGRES PYTHON

Evaluation (2)‏ Summary: One can have precise suggestions or many suggestions, but not both. When given an initial item, ROSE makes predictions in 66 percent of all queries. On average, the predictions of ROSE contain 33 percent of all items changed later in the same transaction, For those queries for which ROSE makes recommendations, in 70% of the cases, a correct location is within ROSE's topmost three suggestions. In 3 percent of the queries where one item is missing, ROSE issues a correct warning. An issued warning predicts on average 75 percent of the items that need to be considered. ROSE's warnings about missing items should be taken seriously: Only 2 percent of all transactions cause a false alarm. In other words: ROSE does not stand in the way. ROSE has its best predictive power for changes to existing entities. ROSE learns quickly: A few weeks after a project starts, ROSE makes already useful suggestions.

Critique Likes: The tool was applied and accordingly evaluated to 8 projects, and conclusions were drawn depending on their varying natures. It's relevant to our assignment, thus it was easy to follow. Dislikes: There is research value, but there is reason to be skeptical that the “recall” of such tools will reach practical levels (for the Navigation purposes). Intuitively, recommendations might break things if blindly followed, regardless of if the recommendation is correct. Ie: there is no practical value if the recommendations are incomplete, which is more likely for complex applications where this really matters. I still don't know what ROSE stands for. :p