© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.1 7. Problem Detection Metrics  Software quality  Analyzing trends Duplicated Code  Detection techniques.

Slides:



Advertisements
Similar presentations
Software Metrics for Object Oriented Design
Advertisements

8. Introduction to Denotational Semantics. © O. Nierstrasz PS — Denotational Semantics 8.2 Roadmap Overview:  Syntax and Semantics  Semantics of Expressions.
12. Common Errors, a few Puzzles. © O. Nierstrasz P2 — Common Errors, a few Puzzles 12.2 Common Errors, a few Puzzles Sources  Cay Horstmann, Computing.
ESE Einführung in Software Engineering 7. Modeling Behaviour Prof. O. Nierstrasz.
Reverse Engineering © SERG Code Cloning: Detection, Classification, and Refactoring.
Figures – Chapter 24.
OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.
March 25, R. McFadyen1 Metrics Fan-in/fan-out Lines of code Cyclomatic complexity Comment percentage Length of identifiers Depth of conditional.
ITEC200 – Week03 Inheritance and Class Hierarchies.
Nov R. McFadyen1 Metrics Fan-in/fan-out Lines of code Cyclomatic complexity* Comment percentage Length of identifiers Depth of conditional.
Page 1 Building Reliable Component-based Systems Chapter 7 - Role-Based Component Engineering Chapter 7 Role-Based Component Engineering.
Object-Oriented Reengineering Patterns and Techniques Prof. O. Nierstrasz Prof. S. Ducasse T.
Software engineering for real-time systems
ESE Einführung in Software Engineering N. XXX Prof. O. Nierstrasz Fall Semester 2009.
© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.1 7. Problem Detection Metrics  Software quality  Analyzing trends Duplicated Code  Detection techniques.
Soft. Eng. II, Spr. 02Dr Driss Kettani, from I. Sommerville1 CSC-3325: Chapter 6 Title : The Software Quality Reading: I. Sommerville, Chap: 24.
OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.
The Software Composition Group Prof. O. Nierstrasz
13. Summary, Trends, Research. © O. Nierstrasz PS — Summary, Trends, Research Summary, Trends, Research...  Summary: functional, logic and object-oriented.
7. Metrics in Reengineering Context
Metamodeling Seminar X. CHAPTER Prof. O. Nierstrasz Spring Semester 2008.
Object-Oriented Metrics
7. Duplicated Code Metrics Duplicated Code Software quality
March R. McFadyen1 Software Metrics Software metrics help evaluate development and testing efforts needed, understandability, maintainability.
ESE Einführung in Software Engineering X. CHAPTER Prof. O. Nierstrasz Wintersemester 2005 / 2006.
N. XXX Prof. O. Nierstrasz Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture notes.
7. Fixed Points. © O. Nierstrasz PS — Fixed Points 7.2 Roadmap Overview  Representing Numbers  Recursion and the Fixed-Point Combinator  The typed.
12. Summary, Trends, Research. © O. Nierstrasz PS — Summary, Trends, Research Roadmap  Summary: —Trends in programming paradigms  Research:...
Testing an individual module
Object Oriented Metrics XP project group – Saskia Schmitz.
Chapter 10 Classes Continued
© S. Demeyer, S. Ducasse, O. Nierstrasz Chapter.1 MakeMoney Corp. C*O of MakeMoney Corp. Our Vision  We invest in software  We do not know software 
OORPT Object-Oriented Reengineering Patterns and Techniques X. CHAPTER Prof. O. Nierstrasz.
CP — Concurrent Programming X. CHAPTER Prof. O. Nierstrasz Wintersemester 2005 / 2006.
12. eToys. © O. Nierstrasz PS — eToys 12.2 Denotational Semantics Overview:  … References:  …
Lecture 17 Software Metrics
REFACTORING Lecture 4. Definition Refactoring is a process of changing the internal structure of the program, not affecting its external behavior and.
Liang, Introduction to Java Programming, Sixth Edition, (c) 2007 Pearson Education, Inc. All rights reserved Chapter 12 Object-Oriented.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Refactoring.
Software Engineering Software Process and Project Metrics.
Chapter 6 : Software Metrics
© S. Demeyer, S. Ducasse, O. Nierstrasz Intro.1 1. Introduction Goals Why Reengineering ?  Lehman's Laws  Object-Oriented Legacy Typical Problems  common.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 10Slide 1 Architectural Design l Establishing the overall structure of a software system.
Object-Oriented Reengineering Patterns 5. Problem Detection.
Software Measurement & Metrics
Concepts of Software Quality Yonglei Tao 1. Software Quality Attributes  Reliability  correctness, completeness, consistency, robustness  Testability.
1 Evaluating Code Duplication Detection Techniques Filip Van Rysselberghe and Serge Demeyer Lab On Re-Engineering University Of Antwerp Towards a Taxonomy.
An Automatic Software Quality Measurement System.
CSc 461/561 Information Systems Engineering Lecture 5 – Software Metrics.
February 8, 2006copyright Thomas Pole , all rights reserved 1 Lecture 3: Reusable Software Packaging: Source Code and Text Chapter 2: Dealing.
Refactoring Agile Development Project. Lecture roadmap Refactoring Some issues to address when coding.
These courseware materials are to be used in conjunction with Software Engineering: A Practitioner’s Approach, 6/e and are provided with permission by.
Object Oriented Metrics
Software Engineering Object Oriented Metrics. Objectives 1.To describe the distinguishing characteristics of Object-Oriented Metrics. 2.To introduce metrics.
CS223: Software Engineering Lecture 21: Unit Testing Metric.
Designing classes How to write classes in a way that they are easily understandable, maintainable and reusable 6.0.
Coupling and Cohesion Pfleeger, S., Software Engineering Theory and Practice. Prentice Hall, 2001.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A Metric-based Approach for Reconstructing Methods.
Object Oriented Metrics
Software Metrics 1.
Course Notes Set 12: Object-Oriented Metrics
Chapter 3: Using Methods, Classes, and Objects
Part 3 Design What does design mean in different fields?
Object-Oriented Metrics
Design Metrics Software Engineering Fall 2003
Design Metrics Software Engineering Fall 2003
MSIS 670 Object-Oriented Software Engineering
Introduction to Data Structure
Software Metrics SAD ::: Fall 2015 Sabbir Muhammad Saleh.
Presentation transcript:

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.1 7. Problem Detection Metrics  Software quality  Analyzing trends Duplicated Code  Detection techniques  Visualizing duplicated code

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.2 Why Metrics in OO Reengineering (ii)? Assessing Software Quality  Which components have poor quality? (Hence could be reengineered)  Which components have good quality? (Hence should be reverse engineered)  Metrics as a reengineering tool! Controlling the Reengineering Process  Trend analysis: which components did change?  Which refactorings have been applied?  Metrics as a reverse engineering tool!

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.3 ISO 9126 Quantitative Quality Model Leaves are simple metrics, measuring basic attributes

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.4 Product & Process Attributes

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.5 External & Internal Attributes

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.6 External vs. Internal Product Attributes ExternalInternal Advantage: close relationship with quality factors Disadvantage: relationship with quality factors is not empirically validated Disadvantages: measure only after the product is used or process took place data collection is difficult; often involves human intervention/interpretation relating external effect to internal cause is difficult Advantages: can be measured at any time data collection is quite easy and can be automated direct relationship between measured attribute and cause

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.7 Metrics and Measurements [Wey88] defined nine properties that a software metric should hold. Read [Fenton] for critiques. For OO only 6 properties are really interesting [Chid 94, Fenton] 1. Noncoarseness: Given a class P and a metric m, another class Q can always be found such that m (P)  m(Q) not every class has the same value for a metric 2. Nonuniqueness. There can exist distinct classes P and Q such that m(P) = m(Q) two classes can have the same metric 3. Monotonicity m(P)  m (P+Q) and m(Q)  m (P+Q), P+Q is the “combination” of the classes P and Q.

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.8 Metrics and Measurements (ii) 4. Design Details are Important The specifics of a class must influence the metric value. Even if a class performs the same actions details should have an impact on the metric value. 5. Nonequivalence of Interaction m(P) = m(Q)  m(P+R) = m(Q+R) where R is an interaction with the class. 6. Interaction Increases Complexity m(P) + (Q) < m (P+Q). when two classes are combined, the interaction between the too can increase the metric value Conclusion: Not every measurement is a metric.

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.9 Selecting Metrics Fast  Scalable: you can’t afford log(n2) when n  1 million LOC Precise  (e.g. #methods — do you count all methods, only public ones, also inherited ones?)  Reliable: you want to compare apples with apples Code-based  Scalable: you want to collect metrics several times  Reliable: you want to avoid human interpretation Simple  Complex metrics are hard to interpret

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.10 Assessing Maintainability Size of the system, system entities  Class size, method size, inheritance  The intuition: large entities impede maintainability Cohesion of the entities  Class internals  The intuition: changes should be local Coupling between entities  Within inheritance: coupling between class- subclass  Outside of inheritance  The intuition: strong coupling impedes locality of changes

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.11 Sample Size and Inheritance Metrics Class Attribute Method Access Invoke BelongTo Inherit Inheritance Metrics hierarchy nesting level (HNL) # immediate children (NOC) # inherited methods, unmodified (NMI) # overridden methods (NMO) Class Size Metrics # methods (NOM) # instance attributes (NIA, NCA) # Sum of method size (WMC) Method Size Metrics # invocations (NOI) # statements (NOS) # lines of code (LOC)

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.12 Sample class Size (NIV)  [Lore94] Number of Instance Variables (NCV)  [Lore94] Number of Class Variables (static) (NOM)  [Lore94] Number of Methods (public, private, protected) (E++, S++) (LOC) Lines of Code (NSC) Number of semicolons [Li93]  number of Statements (WMC) [Chid94] Weighted Method Count  WMC = ∑ c i  where c is the complexity of a method (number of exit or McCabe Cyclomatic Complexity Metric)

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.13 Hierarchy Layout (HNL) [Chid94] Hierarchy Nesting Level, (DIT) [Li93] Deep of Inheritance Tree, HNL, DIT = max hierarchy level (NOC) [Chid94] Number of Children (WNOC) Total number of Children (NMO, NMA, NMI, NME) [Lore94] Number of Method Overridden, Added, Inherited, Extended (super call) (SIX) [Lore94]  SIX (C) = NMO * HNL / NOM  Weighted percentage of Overridden Methods

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.14 Method Size (MSG) Number of Message Sends (LOC) Lines of Code (MCX) Method complexity  Total Number of Complexity / Total number of methods  API calls= 5, Assignment = 0.5, arithmetics op = 2, messages with params = 3....

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.15 Sample Metrics: Class Cohesion (LCOM) Lack of Cohesion in Methods [Chid94] for definition [Hitz95a] for critique I i = set of instance variables used by method M i let P = { (I i, I j ) | I i  I j =  } Q = { (I i, I j ) | I i  I j   } if all the sets are empty, P is empty LCOM =|P| - |Q|if |P|>|Q| 0otherwise Tight Class Cohesion (TCC) Loose Class Cohesion (LCC) [Biem95a] for definition Measure method cohesion across invocations

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.16 Sample Metrics: Class Coupling (i) Coupling Between Objects (CBO) [Chid94a] for definition, [Hitz95a] for a discussion  Number of other classes to which it is coupled Data Abstraction Coupling (DAC) [Li93a] for definition  Number of ADT’s defined in a class Change Dependency Between Classes (CDBC) [Hitz96a] for definition  Impact of changes from a server class (SC) to a client class (CC).

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.17 Sample Metrics: Class Coupling (ii) Locality of Data (LD) [Hitz96a] for definition LD = ∑ |L i | / ∑ |T i | L i = non public instance variables + inherited protected of superclass + static variables of the class T i = all variables used in M i, except non-static local variables M i = methods without accessors

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.18 The Trouble with Coupling and Cohesion Coupling and Cohesion are intuitive notions  Cf. “computability”  E.g., is a library of mathematical functions “cohesive”  E.g., is a package of classes that subclass framework classes cohesive? Is it strongly coupled to the framework package?

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.19 Conclusion: Metrics for Quality Assessment Can internal product metrics reveal which components have good/poor quality? Yes, but...  Not reliable false positives: “bad” measurements, yet good quality false negatives: “good” measurements, yet poor quality  Heavy Weight Approach Requires team to develop (customize?) a quantitative quality model Requires definition of thresholds (trial and error)  Difficult to interpret Requires complex combinations of simple metrics However...  Cheap once you have the quality model and the thresholds  Good focus (± 20% of components are selected for further inspection) Note: focus on the most complex components first!

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication Problem Detection Metrics  Software quality  Analyzing trends Duplicated Code  Detection techniques  Visualizing duplicated code

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.21 Code is Copied Small Example from the Mozilla Distribution (Milestone 9) Extract from /dom/src/base/nsLocation.cpp

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.22 Case StudyLOC Duplication without comments with comments gcc460’0008.7%5.6% Database Server 245’ %23.3% Payroll40’ %25.4% Message Board 6’ %17.4% How Much Code is Duplicated? Usual estimates: 8 to 12% in normal industrial code 15 to 25 % is already a lot!

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.23 What is Duplicated Code? Duplicated Code = Source code segments that are found in different places of a system. in different files in the same file but in different functions in the same function The segments must contain some logic or structure that can be abstracted, i.e., Copied artefacts range from expressions, to functions, to data structures, and to entire subsystems.

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.24 Copied Code Problems General negative effect  Code bloat Negative effects on Software Maintenance  Copied Defects  Changes take double, triple, quadruple,... Work  Dead code  Add to the cognitive load of future maintainers Copying as additional source of defects  Errors in the systematic renaming produce unintended aliasing Metaphorically speaking:  Software Aging, “hardening of the arteries”,  “Software Entropy” increases even small design changes become very difficult to effect

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.25 Code Duplication Detection Nontrivial problem: No a priori knowledge about which code has been copied How to find all clone pairs among all possible pairs of segments?

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.26 General Schema of Detection Process AuthorLevelTransformed Code Comparison Technique [John94a]LexicalSubstringsString-Matching [Duca99a]LexicalNormalized StringsString-Matching [Bake95a]SyntacticalParameterized StringsString-Matching [Mayr96a]SyntacticalMetric TuplesDiscrete comparison [Kont97a]SyntacticalMetric TuplesEuclidean distance [Baxt98a]SyntacticalASTTree-Matching

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.27 Recall and Precision

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.28 Simple Detection Approach (i) Assumption: Code segments are just copied and changed at a few places Noise elimination transformation remove white space, comments remove lines that contain uninteresting code elements (e.g., just ‘else’ or ‘}’) … //assign same fastid as container fastid = NULL; const char* fidptr = get_fastid(); if(fidptr != NULL) { int l = strlen(fidptr); fastid = newchar[ l + 1 ]; … fastid=NULL; constchar*fidptr=get_fastid(); if(fidptr!=NULL) intl=strlen(fidptr) fastid = newchar[l+]

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.29 Simple Detection Approach (ii) Code Comparison Step  Line based comparison (Assumption: Layout did not change during copying)  Compare each line with each other line.  Reduce search space by hashing: 1. Preprocessing: Compute the hash value for each line 2. Actual Comparison: Compare all lines in the same hash bucket Evaluation of the Approach  Advantages: Simple, language independent  Disadvantages: Difficult interpretation

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.30 A Perl script for C++ (i)

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.31 A Perl script for C++ (ii) Handles multiple files Removes comments and white spaces Controls noise (if, {,) Granularity (number of lines) Possible to remove keywords

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.32 Output Sample Lines: create_property(pd,pnImplObjects,stReference,false,*iImplObjects); create_property(pd,pnElttype,stReference,true,*iEltType); create_property(pd,pnMinelt,stInteger,true,*iMinelt); create_property(pd,pnMaxelt,stInteger,true,*iMaxelt); create_property(pd,pnOwnership,stBool,true,*iOwnership); Locations: 6178/6179/6180/6181/ /6199/6200/6201/6202 Lines: create_property(pd,pnSupertype,stReference,true,*iSupertype); create_property(pd,pnImplObjects,stReference,false,*iImplObjects); create_property(pd,pnElttype,stReference,true,*iEltType); create_property(pd,pMinelt,stInteger,true,*iMinelt); create_property(pd,pnMaxelt,stInteger,true,*iMaxelt); Locations: 6177/ /6230 Lines = duplicated lines Locations = file names and line number

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.33 Enhanced Simple Detection Approach Code Comparison Step  As before, but now Collect consecutive matching lines into match sequences Allow holes in the match sequence Evaluation of the Approach  Advantages Identifies more real duplication, language independent  Disadvantages Less simple Misses copies with (small) changes on every line

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.34 Abstraction  Abstracting selected syntactic elements can increase recall, at the possible cost of precision

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.35 Visualization of Duplicated Code Visualization provides insights into the duplication situation A simple version can be implemented in three days Scalability issue Dotplots — Technique from DNA Analysis Code is put on vertical as well as horizontal axis A match between two elements is a dot in the matrix

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.36 Visualization of Copied Code Sequences All examples are made using Duploc from an industrial case study (1 Mio LOC C++ System) Detected Problem File A contains two copies of a piece of code File B contains another copy of this code Possible Solution Extract Method

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.37 Visualization of Repetitive Structures Detected Problem 4 Object factory clones: a switch statement over a type variable is used to call individual construction code Possible Solution Strategy Method

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.38 Visualization of Cloned Classes Class A Class B Class A Detected Problem: Class A is an edited copy of class B. Editing & Insertion Possible Solution Subclassing …

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.39 Visualization of Clone Families 20 Classes implementing lists for different data types Detail Overview

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.40 Duploc Duploc is scalable, integrates detection and visualization

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.41 Conclusion Duplicated code is a real problem  makes a system progressively harder to change Detecting duplicated code is a hard problem  some simple technique can help  tool support is needed Visualization of code duplication is useful  some basic support are easy to build  one student build a simple visualization tool in three days Curing duplicated code is an active research area

© S. Demeyer, S. Ducasse, O. Nierstrasz Duplication.42 License Attribution-ShareAlike 2.5 You are free: to copy, distribute, display, and perform the work to make derivative works to make commercial use of the work Under the following conditions: Attribution. You must attribute the work in the manner specified by the author or licensor. Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one. For any reuse or distribution, you must make clear to others the license terms of this work. Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above. Attribution-ShareAlike 2.5 You are free: to copy, distribute, display, and perform the work to make derivative works to make commercial use of the work Under the following conditions: Attribution. You must attribute the work in the manner specified by the author or licensor. Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one. For any reuse or distribution, you must make clear to others the license terms of this work. Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above.