Presentation is loading. Please wait.

Presentation is loading. Please wait.

 Software Clones:( Definitions from Wikipedia) ◦ Duplicate code: a sequence of source code that occurs more than once, either within a program or across.

Similar presentations


Presentation on theme: " Software Clones:( Definitions from Wikipedia) ◦ Duplicate code: a sequence of source code that occurs more than once, either within a program or across."— Presentation transcript:

1  Software Clones:( Definitions from Wikipedia) ◦ Duplicate code: a sequence of source code that occurs more than once, either within a program or across different programs owned or maintained by the same entity. ◦ Clones: sequences of duplicate code.  “Clones are segments of code that are similar according to some definition of similarity.” —Ira Baxter, 2002

2  How clones are created: ◦ copy and paste programming ◦ similar functionality, similar code ◦ plagiarism

3  3 Types of Clones: ◦ Type 1: an exact copy without modifications (except for whitespace and comments). ◦ Type 2: a syntactically identical copy  only variable, type, or function identifiers have been changed. ◦ Type 3: a copy with further modifications  statements have been changed, added, or removed.

4  Per our task, in order to find clones across different programming languages, we will have to first convert the code from each language over to a language independent object model.  Some Language Independent Object Models: ◦ Dagstuhl Middle Metamodel (DMM) ◦ Microsoft CodeDOM  Both of these models provide a language independent object model for representing the structure of source code.

5  Detecting clones across multiple programming languages is on the cutting edge of research.  A preliminary version of this was done by Dr. Kraft and his students for C# and VB. ◦ They compared the Mono C# parser (written in C#) to the Mono VB parser (written in VB). ◦ Publication:  Nicholas A. Kraft, Brandon W. Bonds, Randy K. Smith: Cross-language Clone Detection. SEKE 2008: 54-59

6  Token sequence of CodeDOM graphs with Levenshtein distance ◦ The Levenshtein distance between two sequences is defined as the minimum number of edits needed to transform one sequence into the other  Performs Comparisons of code files  CodeDOM tree is tokenized  Based on Distances ◦ Percentage of matching tokens in a sequence

7

8  Only does file-to-file comparisons ◦ Does not detect clones in same source file  Can only detect Type 1 and some Type 2 clones  Not very efficient (brute force)

9  Split into parameter (identifiers and literals) and non-parameter tokens  Non-parameter tokens summarized using a hash function  Parameter tokens are encoded using a position index for their occurrence in the sequence ◦ Abstracts concrete names and values while maintaining order

10  Represent all prefixes of the sequence in a suffix tree  Suffixes that share the same set of edges have a common prefix ◦ Prefix occurs more than once (clone)


Download ppt " Software Clones:( Definitions from Wikipedia) ◦ Duplicate code: a sequence of source code that occurs more than once, either within a program or across."

Similar presentations


Ads by Google