Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reverse Engineering © SERG Code Cloning: Detection, Classification, and Refactoring.

Similar presentations


Presentation on theme: "Reverse Engineering © SERG Code Cloning: Detection, Classification, and Refactoring."— Presentation transcript:

1 Reverse Engineering © SERG Code Cloning: Detection, Classification, and Refactoring

2 Reverse Engineering © SERG Introduction Maintenance is the most costly phase of the software lifecycle Code Clone: gratuitous copy of source code in a program. Code Clones have the effect of increasing source code size and duplication of errors.

3 Reverse Engineering © SERG Definitions Clone Class/Set: Set of equivalent Clones Precision: Percent of reported clones that are genuine Recall: Percent of genuine clones that are reported

4 Reverse Engineering © SERG Detection String Matching – Represents and evaluates code using string comparisons. Token Parsing – Code transformation into tokens for comparison. Graph Matching – Pattern matching on graph representations of code.

5 Reverse Engineering © SERG Detection Strategies

6 Reverse Engineering © SERG String Matching Techniques Exact String Matching Parameterized Matching Substring Matching

7 Reverse Engineering © SERG Parameterized Matching Employs exact string matching for comparison. 1.Normalization 2.Concatenation 3.Hashing 4.Extract longest matches

8 Reverse Engineering © SERG Matching Algorithm No algorithm can avoid worst case running time of O(n 2 ). Using a suffix tree we can improve running time complexity to O(n+m). Where m is the number of matches. The input size n, is reduced by hashing.

9 Reverse Engineering © SERG Suffix Tree Example Images Copyright (c) 1996-1998, Mark Nelson, All Rights Reserved. Suffix Trie Suffix Tree

10 Reverse Engineering © SERG Substring Matching Substring Matching provides a faster search algorithm. 1.Normalization 2.Substring Generation 3.Matching 4.Consolidation 5.Reporting

11 Reverse Engineering © SERG Caveat Exact string matching does not find clones with trivial alterations that don’t change semantics. Normalization has the risk of false positives. x+y=z; != z+x=y; -> p+p=p for(i=0; i for(p=p; p<p; p++)

12 Reverse Engineering © SERG Token Parsing Techniques Transforms code into tokens by using language specific constructs into a single token string Find similarities within this token string Transform token clones back into code clones for presentation

13 Reverse Engineering © SERG Token Parsing Example int main(){ int i = 0; static int j=5; while(i<20){ i=i+j; } std::cout<<"Hello World"<<i<<std::endl; return 0; } Remove white spaces

14 Reverse Engineering © SERG Token Parsing Example int main(){ int i = 0; static int j=5; while(i<20){ i=i+j; } std::cout<<"Hello World"<<i<<std::endl; return 0; } Shorten Names

15 Reverse Engineering © SERG Token Parsing Example int main (){ int i = 0; int j = 5; while (i < 20){ i = i + j; } cout << "Hello World” << i << endl; return 0; } Tokenize everything, except language constructs

16 Reverse Engineering © SERG Token Parsing Example $p $p(){ $p $p = $p; while($p < $p ){ $p = $p + $p; } $p << $p << $p << $p;; return $p; } Clone relations with all the transformation rules are compared to clone relations with a subset of the transformation rules

17 Reverse Engineering © SERG CCFinder:DEMO

18 Reverse Engineering © SERG Graph Matching Techniques Form machine representation of code Identify clones as identical subgraphs

19 Reverse Engineering © SERG Abstract Syntax Subtree Matching source_file_name = ((SourceFile) attributes[i]).getSourceFileName();

20 Reverse Engineering © SERG Hash subgraphs Identify maximal identical or similar subgraphs Identify sequences of subgraphs Abstract Syntax Subtree Matching

21 Reverse Engineering © SERG Program Dependence Graph Matching

22 Reverse Engineering © SERG Vertices are lines of code Edges are attributed with different types of dependences (control flow, data, etc) NP complete in general, k-cutoff in maximal graph size used to limit runtime –Experiments determine k=20 best O(|V| 2 ) possible graph starting points, reduced via heuristic Program Dependence Graph Matching

23 Reverse Engineering © SERG Two Clones Found by fg-PDG

24 Reverse Engineering © SERG Metrics? Need to evaluate different clone detection techniques Hard to know the real number of clones in a non-trivial application How to compare different types of clones?

25 Reverse Engineering © SERG Basic Metrics LOC: Line number count SLOC: Line number count after the removal of blanks %LOC: Percent of lines with clones in them %FILE: Percent of files with clones in them

26 Reverse Engineering © SERG Interesting Metrics: Radius A BC

27 Reverse Engineering © SERG Interesting Metrics: Radius RAD(B,C)= 1 A BC

28 Reverse Engineering © SERG Interesting Metrics: Radius A BC RAD(A,C)= 2

29 Reverse Engineering © SERG Interesting Metrics: DFL Estimates improvement from clone removal Assume that every clone class can be replaced by a function Assume that every clone is replaceable by a call to that function

30 Reverse Engineering © SERG Comparison of Clone Detectors CCFinder Token (1128) CloneDr AST (84) Cavet Metric (278) Jplag Token (131) Moss Unknown (120) CCFinder1090/381089/27989/871025/101 CloneDr43265/13120/11111/9 Cavel25170120/15109/10 Jplag447327367/50 Moss197626881

31 Reverse Engineering © SERG Comparison of Clone Detectors FrequencyCCFinderCloneDrCavetJPlagMoss 1569664095104 298634108 33321340 4140610 5160500 6190500 720100 In addition Cavet found clones with frequencies: 8,12, and 13

32 Reverse Engineering © SERG Comparison of Clone Detectors CCFinderCloneDrCavetJPlagMoss Recall729191210 Precision72100638273 Different code clone detectors find different clones String based find direct clones Token based find polymorphism issues and may be difficult to fix Graph based find clones that can be automatically refactored

33 Reverse Engineering © SERG Code Clone Refactoring Use standard Refactoring methods –“Extract” - Make a procedure –“Pull Up” - Make an superclass Aspect Oriented Programming –Advanced technique for clones that are too tough for procedural or OO solutions

34 Reverse Engineering © SERG Extract

35 Reverse Engineering © SERG Pull Up

36 Reverse Engineering © SERG Aspect Oriented Programming Observation: Certain types of peripheral code permeate application logic –Logging, security, parameter checking, etc. How to deal with similar but scattered code? –Remove similar code into Aspects –Join Aspects into application logic using “join points”

37 Reverse Engineering © SERG Aspect Oriented Refactoring Candidates

38 Reverse Engineering © SERG Ophir – Aspect Mining Tool

39 Reverse Engineering © SERG Conclusion Alternative techniques detect different clone types. Code clone detection and removal is still in its infancy. Code clone refactoring tools should be incorporated into standard IDEs to achieve widespread adoption.

40 Reverse Engineering © SERG Future Research Detector improvement –Improve precision and refactorability of token based detectors –Performance of graph based detectors –AI-based detectors? Comparison of detection methods

41 Reverse Engineering © SERG Future Research Continue study of code clones –Effects of Agile methodologies –Impact on Web-based code Visualization of clones Automated refactoring systems

42 Reverse Engineering © SERG Questions


Download ppt "Reverse Engineering © SERG Code Cloning: Detection, Classification, and Refactoring."

Similar presentations


Ads by Google