Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cross Language Clone Analysis Team 2 November 22, 2010

Similar presentations


Presentation on theme: "Cross Language Clone Analysis Team 2 November 22, 2010"— Presentation transcript:

1 Cross Language Clone Analysis Team 2 November 22, 2010
10/13/2010 Presentation 7 Cross Language Clone Analysis Team 2 November 22, 2010

2 Agenda Feasibility Study Release Plan Architecture Parsing CodeDOM
10/13/2010 Agenda Feasibility Study Release Plan Architecture Parsing CodeDOM Clone Analysis Testing Demonstration Team Collaboration Path Forward

3 Our Team Allen Tucker Patricia Bradford Greg Rodgers Brian Bentley
10/13/2010 Our Team Allen Tucker Patricia Bradford Greg Rodgers Brian Bentley Ashley Chafin Add Roles at end – initial allocation of effort

4 10/13/2010 Feasibility Study Our evaluation of the project to determine the difficulty in carrying out the task.

5 Task Summary Our Customers: Dr. Etzkorn and Dr. Kraft
Customer Request: A tool that will abstract programs in C++, C#, Java, and (Python or VB) to the Dagstuhl Middle Metamodel, Microsoft CodeDOM or something similar, and detect cross-language clones. Areas to Note: the user interface easy comparisons of clones visualization of clones sub-clones clone detection for large bodies of code

6 Task Summary (cont.) Per our task, in order to find clones across different programming languages, we will have to first convert the code from each language over to a language independent object model. Some Language Independent Object Models: Dagstuhl Middle Metamodel (DMM) Microsoft CodeDOM Both of these models provide a language independent object model for representing the structure of source code. - Per our task, in order to find clones across different programming languages, we will have to first convert the code from each language over to a language independent object model. - Some Language Independent Object Models: Dagstuhl Middle Metamodel (DMM) and Microsoft CodeDOM - Both of these models provide a language independent object model for representing the structure of source code.

7 Related Research Detecting clones across multiple programming languages is on the cutting edge of research. A preliminary version of this was done by Dr. Kraft and his students for C# and VB. They compared the Mono C# parser (written in C#) to the Mono VB parser (written in VB). Publication: Nicholas A. Kraft, Brandon W. Bonds, Randy K. Smith: Cross-language Clone Detection. SEKE 2008: 54-59 - Detecting clones across multiple programming languages is on the cutting edge of research. Some related research: A preliminary version of this was done by Dr. Kraft and his students for C# and VB. They compared the Mono C# parser (written in C#) to the Mono VB parser (written in VB). They published the following paper.

8 Task Understanding Three Step Process Step 1 Code Translation
Step 2 Clone Detection Step 3 Visualization Common Model Translator Source Files Detected Clones Inspector Common Model Clone Visualization UI Detected Clones

9 Task Understanding (cont.)
Step 1: Code Translation C#, C++, Java, VB (or Python) CodeDOM Step 2: Clone Detection Leverage current clone detection techniques and research Step 3: Clone Visualization Need for an intuitive user interface

10 Clone Detection as a Product
Commercial Product What are the benefits of Software Clone Detection? Main Goal: Decrease Coding Errors (bugs)

11 Benefits Fact: Modularity is a key characteristic in today’s software world Why? Allows us to divide software into a decomposed separation of concerns Attributes to maintainability, reusability, testability and reliability Clone Detection allows us to detect common software spread across large bodies of code Identify code that is subject to further modularity

12 Benefits (cont) But not all code can be cleanly decomposed
Crosscutting Concerns Responsible for tangling and scattering (code duplication) an implementation Logging Scattered across Unrelated Functions How do you Manage large areas of (usually) Duplicated Crosscuts? Errors, Changes

13 Benefits (cont) Aspect Oriented Programming
Modularize Crosscuts using Advice and Join Points Example: Spring Framework Identifying Aspects (crosscuts) Time Consuming task Use Clone Detection to Identify Aspects Define Rule

14 Benefits (cont) Summarize What? How?
Detect code that is a candidate for modularity Identify Crosscuts in modules Am I a candidate for ASP? How? Continuous Integration Generate Reports every time new code is added

15 Features Clone Detection Software Suite Multi-language support
Identifies Tracks Manages Software Clones Multi-language support C++ C# Java

16 Features (cont) Provides complete code coverage
Multi-Application Support Stand-alone Plug-in based (Eclipse) Backend service (Ant task)

17 Features (cont) Extendible Easy to Navigate between Clones
Built on a Plug-in Framework Add new languages Easy to Navigate between Clones Persists Clones for easy Retrieval

18 Human Factors Designing to meet user needs User center approach
Need for an intuitive user interface Clone Visualization techniques

19 Intellectual Property
The University of Alabama in Huntsville would own and manage any and all intellectual property associated with the research and developmental artifacts of this project.

20 Project and Development Issues
Fast, Good, and Cheap…choose two. Fast…time required to deliver products Good…quality of product Cheap…cost of designing and building

21 Risk Analysis Complexity of problem proves more difficult than initial estimates. Technology to be applied is neither well- established or has yet to be developed. Unable to complete defined project scope within schedule. Volatile user requirements leading to redefinition of project objectives.

22 Project Scale-Down Factors
Our initial approach…maximize existing open sourced developed items in order to reduce project timeline. Instability in harvested projects. Lack of support…documentation, forums, etc. Disjoint projects code bases. Non-existing code bases to harvest from.

23 10/13/2010 Release Plan Release Plan and User Stories

24 User Story Approach User Stories Applied…Mike Cohn suggested formal approach As a (role) I want (something) so that (benefit). Quality Attributes Independent Negotiable Valuable to user or customers Estimatable Small Testable

25 Re-tooled User Stories
Came out with original Release Plan on 9/15/20 Due to customer wants/needs, we had to re- tool our user stories. Dr. Etzkorn’s main concerns: Load source code and translate to a language independent model Analyze the translated source code for clones Results from meeting: Created two new user stories (see next two slides) These two user stories have been pushed to the front of our card stack

26 Analysis There are three Agile levels of planning. Release planning is a group of stories selected because they represent a usable set of features that can be released together. These types of plans are made by selecting the stories and deciding how many iterations are needed or by selecting a release date and seeing how much can be done by then. Release plans have no details other than a list of stories to be done by a date. The second level of planning is the iteration or sprint plan. This plan is a subset of the release plan stories that will be done in the very next iteration or sprint. Only one iteration plan exists at a time. With our chosen collection of important features we can now estimate the amount of effort to implement them. The people who will do the work, namely the developers, have authority to set the estimates. The manager will set the total amount of work that the next iteration can have planned. The customer then chooses a subset of the most important features that will fit into the next iteration. The iteration plan will often be verified by breaking the stories into development tasks and estimating them with finer grain units. At this level use cases could also be created. This greater level of detail is permissible because iterations or sprints are kept very short. The third level is the daily plan. A daily plan isn't usually represented by any artifacts. At the daily scrum or stand up meeting everyone will announce their plan for the day and then act on it. Even greater detail is allowed because the plan's duration is one day and no more.

27 CS 666 Studio I User Stories
Phase I

28 Summary ~ 68 remaining development days Focus on top 3 user stories
Focus on Translation and Analysis

29 Source Code Load & Translate
017 1 14 Days As an analyst I want the to load and translate my source code projects so I can analyze the source for clones.

30 Source Code Analyze 018 1 14 Days As an analyst I want the to analyze my source code projects so I can see the clones.

31 Code Clone Highlights 002 1 14 Days As a analyst I want the capability to have the source code associated with clones highlighted within source files so that they are easy to identify.

32 CS 668 Software Studio II Phase II

33 Summary ~ 80 development days Focus on next 5 user stories
Focus on analysis capabilities

34 Auto-Navigate 013 2 7 Days As a developer I want the capability to auto- browse to the code segment associated with a clone so I do not have to manually search for it.

35 Visual Reports 003 1 21 Days As a analyst I want the capability to generate reports on clones within projects in a number of formats (e.g. html, cvs, etc.) so that I can include them in presentations.

36 Clone Density Graph 014 1 21 Days As an analyst I want the capability to have a projects clone density reported in a graph form so I can visually see the distribution of detected clones within a project.

37 Project Management 001 10 5 Days As a analyst I want the capability to load and manage multiple projects within the application so that I can perform analysis on them at various times without having to reload them.

38 Analysis Options 005 3 20 Days As a analyst I want the capability to view summary analysis data (e.g. clones per file, package, projects, etc.) so that I can identify the distribution of clones within a project.

39 Follow-On Work Future Capabilities

40 Project Language Auto-Detection
010 8 14 Days As an analyst I want the capability to have the language of a source code project auto- detected so I do not have to define it.

41 Clone Categorization 008 5 14 Days As an analyst I want the capability to have the detected clones categorized by a number of criteria (e.g. type, priority, etc.) so that work prioritization can be established.

42 False Positive Identification
004 7 14 Days As a analyst I want the capability to label a prospective clone as a false positive so that it will be ignored in analysis and reports.

43 Development Environment Integration
007 4 30 Days As a developer I want the capability to integrate the clone detection tool directly into my development environment (e.g. eclipse, netbeans, visual studio, etc.) so that I have a single application with all development tools integrated.

44 Project History 012 6 21 Days As an analyst I want the capability to see project change history (e.g. initial project, xx clones found, clone id yyy removed, project updated, xx new clones found, etc.) so I can assess the impact of code changes within a project.

45 Detection Updates 011 9 21 Days As an analyst I want the capability to update a projects associated source code and the tool to detect these changes and offer a detection re-do so I can make corrections to clones and see resolutions in action.

46 Interactive Help 015 10 21 Days As a general user I want an interactive help system with context sensitive search so I can learn the system with ease.

47 Build Environment Integration
006 10 30 Days As a configuration manager I want the capability to integrate clone detection into an automated build environment (e.g. ant, nmake, msbuild, etc.) so that I can view reports on a code projects as they are built.

48 Dropped User Stories Cut By Customer

49 Source Code Association
009 Customer priority of 11 (Normal range is 1 – 10)…indicated would cut from scope. 11 5 Days As an analyst I want the capability to retain or not to retain the associated source code with a project so I can reduce my project size footprint.

50 10/13/2010 Current Tasks Requirements & Models

51 Current Tasks’ Requirements
10/13/2010 Current Tasks’ Requirements Requirements modeling for the first user story “Source Code Load & Translate”: Load & parse C#, Java, C++ source code. Translate the parsed C#, Java, C++ source code to CodeDOM. Associate the CodeDOM to the original source code. Requirements modeling for the second user story “Source Code Analyze”: Analyze CodeDom for clones.

52 10/13/2010 UML Model – Load & Parse

53 10/13/2010 UML Model – Translate

54 10/13/2010 UML Model – Associate

55 10/13/2010 UML Model – Analyze

56 10/13/2010 Architecture Design and Architecture

57 Key Architecture Points
10/13/2010 Key Architecture Points Multilanguage support Configurable for different platforms Stand-along application plug-in backend service Extendable

58 Architecture Application User Interface Web Interface Core
10/13/2010 Architecture Application User Interface Web Interface Core Clone Detection Algorithms Code Model Service API Language Support (Interface) Eclipse Plug-in C# Service Java Service C++ Service Etc…

59 Core Unit Code Model Stores the code in common format
10/13/2010 Core Unit Code Model Stores the code in common format Application Programming Interface Used to embed clone detection in applications Language Service Interface Communication layer between the core and the specific language services Code Model Clone Detection Algorithms Core API Language Service Interface

60 Visual Studio Solution
10/13/2010 Visual Studio Solution

61 10/13/2010 Core

62 10/13/2010 Core - API

63 10/13/2010 Language Service

64 10/13/2010 Language Service

65 10/13/2010 Language Service

66 10/13/2010 App Configuration

67 10/13/2010 CRC Cards Class Responsibility Collaboration Cards

68 Java Parser CRC Java Parser Parse Java source code
LALRParser (Gold Parser) Construct Java token tree

69 C# Parser CRC Parser Parse C# source code LALRParser (Gold Parser)
Construct C# token tree

70 Language ServiceCRC LanguageService
Defines standard interface for all language providers. ILanguageService

71 Java Service CRC JavaService Reads Java source code Java Parser
Understands Java grammar production rules CloneDetection Construct CodeDOM compilation unit JavaCodeProvider ILanguageService

72 Cs Service CRC CsService Reads C# source code C# Parser
Understands C# grammar production rules CloneDetection Construct CodeDOM compilation unit CsCodeProvider ILanguageService

73 CloneDetectionCRC CloneDection Loads and manages languages services.
ILanguageService Controls parsing Establishes CodeDOM compilation units to source code file associations Compares code segments CodeDomComparer Provides bookkeeping for code segments CodeDomSummary

74 FileSetNodeCRC FileSetNode
Manages file set tree information for a CloneProject

75 ProjectNodeCRC ProjectNode
Manages project tree information for a CloneProject

76 SourceFileNodeCRC SourceFileNode
Manages source file tree information for a CloneProject

77 EnabledValueConverterCRC
Manages enabled state for visual components bound to an object

78 VisibilityValueConverterCRC
Manages visibility state for visual components bound to an object

79 CloneProjectCRC CloneProject Manages project information
PresentationModel Knows the file sets associated with a project ILanguageService Knows the files associated with each file set Knows the name of the project Can add a file Can remove a file

80 ProjectIOCRC ProjectIO Save a CloneProject CloneProject
Open a CloneProject

81 RecentProjectListCRC
Manages a list of recently viewed projects CloneProject

82 ProjectViewCRC ProjectView Visual display of project tree CloneProject
PresentationModel ProjectNode FileSetNode SourceFileNode ILanguageService

83 AppCRC App Startup class Manage visual theme

84 MainFrameCRC MainFrame Manage application frame PresentationModel
Manage user input – Save CloneProject Manage user input – Open ProjectView Manage user input – Close Manage user input – Exit Manage user input – Add File Set Manage user input – Create New

85 PresentationModelCRC
Manage current project state ICloneDetection Current Project CloneProject Clone Detection Currently Selected File

86 10/13/2010 Parsing Our struggles and our successes.

87 Parsing Struggles & Successes
10/13/2010 Parsing Struggles & Successes We explored and conducted spikes on CSParser and CS CodeDOM Parser. They both had advantages and disadvantage. We came to the conclusion that neither of them were going to fit our needs. We explored and conducted a spike on GOLD Parser. We ultimately chose the GOLD Parser because it best fit our needs. This gave us a way to manage multiple language grammars with one engine.

88 C# Spike

89 C# Spike Review Spike Objectives: CSParser Associated risks/shortfalls
Project feasibility Familiarization CSParser a utility which parses the C# source code and creates a CodeDOM tree of the code Open source Supports most language features Error handling for features not supported

90 C# Spike: CSParser Output

91 C# Spike Review (cont) Spike Conclusion: Moving on from Spike:
Some limitations, but has work around Wrapper code needed Moving on from Spike: This past iteration, we downloaded CSParser and familiarized ourselves with it more. Due to several programs having the same name, we came across CS CodeDOM Parser, as well.

92 CS Parser & CS CodeDOM Parser
The good & the bad for both… CS Parser: Good parser - Parsed a lot of C# language features No GUI - It is all command line Came with a large number of test cases Does not use CodeDOM CS CodeDOM Parser: General parsing GUI Uses CodeDOM

93 C# Plan Since both programs have good and bad features, our plan is to combine them. CSParser + CS CodeDOM Parser Planned combined features: Good parsing GUI CodeDOM Test cases

94 10/13/2010 GOLD Parsing System Spike

95 Topics To Discuss What is it? How does it work?
10/13/2010 Topics To Discuss What is it? How does it work? What can we use it for? How can we extend it?

96 10/13/2010 What Is GOLD? GOLD is a free parsing system that you can use to develop your own programming languages, scripting languages and interpreters. It strives to be a development tool that can be used with numerous programming languages and on multiple platforms. –

97 How It Works (Block Structure)
10/13/2010 How It Works (Block Structure) Source Code Grammar Builder Compiled Grammar Table (*.cgt) Engine Parsed Data

98 How It Works (Components)
10/13/2010 How It Works (Components) Source Code Grammar Builder Compiled Grammar Table (*.cgt) Engine Parsed Data Three Major Components Builder – Reads a source grammar to construct a Compiled Grammar Table Compiled Grammar Table – Stores LALR and DFA parse tables Engine – Performs actual parsing

99 Compiled Grammar Table (*.cgt)
10/13/2010 How It Works (Process) Source Code Grammar Builder Compiled Grammar Table (*.cgt) Engine Parsed Data Step 1 Write the grammar for the language being implemented. (GOLD-Meta Language) Rules: Backus-Naur Form Terminals: Regular Expressions Character sets: Set Notation

100 Compiled Grammar Table (*.cgt)
10/13/2010 How It Works (Process) Source Code Grammar Builder Compiled Grammar Table (*.cgt) Engine Parsed Data Step 2 Analyze Grammar Construct LALR and DFA parse tables which are saved in a Compiled Grammar Table file.

101 Compiled Grammar Table (*.cgt)
10/13/2010 How It Works (Process) Source Code Grammar Builder Compiled Grammar Table (*.cgt) Engine Parsed Data Step 3 Analyze source text with parser engine and construct parse tree Engine can be implemented in any number of programming languages

102 Usage within CloneDigger
10/13/2010 Usage within CloneDigger Source Code Compiled Grammar Table (*.cgt) Engine Parsed Data CodeDOM Conversion AST CodeDOM Conversion Need to write routine to move data from Parsed Tree to CodeDOM Parsed data trees from parser are stored in consistent data structure, but are based on rules defined within grammars

103 Task Understanding Three Step Process Step 1 Code Translation
10/13/2010 Task Understanding Three Step Process Step 1 Code Translation Step 2 Clone Detection Step 3 Visualization Common Model Translator Source Files Detected Clones Inspector Common Model Step 1: Code Translation C#, C++, Java, VB (or Python) CodeDOM Step 2: Clone Detection Leverage current clone detection techniques and research Step 3: Clone Visualization Need for an intuitive user interface Clone Visualization UI Detected Clones

104 Extension and Enhancements
10/13/2010 Extension and Enhancements Source Code Grammar Builder Compiled Grammar Table (*.cgt) Engine Parsed Data Enhance Grammars Update Java Update C# Define C++ Share among other classmates with similar interest Share with greater community

105 Grammars What is a grammar?
10/13/2010 Grammars What is a grammar? A set of rules of a specific kind, for forming strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context —only their form.

106 10/13/2010 Gold Parser Grammars Gold Parser uses context-free grammars that can be used to do Lookahead Left-to-Right (LALR) parsing. LALR compliant grammars that we already have: C# Java Visual Basic .Net

107 10/13/2010 Grammar Example

108 10/13/2010 C++ Grammar Issue Currently no LALR compliant C++ grammar exists due to the overall complexity. Other C++ parsers exist, but give an output format different than the other languages we already have grammars for using Gold Parser. We are still searching for C++ parsing solutions.

109 GOLD Parser Conclusion
10/13/2010 GOLD Parser Conclusion We plan to use GOLD Parsing System. Tasks we have to complete: Update JAVA grammer Update C# grammer Research “Define C++ grammer” Create a CodeDOM conversion to move data from Parsed Tree to CodeDOM

110 10/13/2010 GOLD Parsing System GOLD Parsing Populating CodeDOM

111 Topics To Discuss What we are doing? Compiled Grammar Table
10/13/2010 Topics To Discuss What we are doing? Compiled Grammar Table Bookkeeping Testing

112 How It Works (Block Structure)
10/13/2010 How It Works (Block Structure) Source Code Grammar Builder Compiled Grammar Table (*.cgt) Engine Parsed Data

113 Compiled Grammar Table (*.cgt)
10/13/2010 How It Works (Process) Source Code Grammar Builder Compiled Grammar Table (*.cgt) Engine Parsed Data Typical output from engine: a long nested tree

114 Usage within CloneDigger
10/13/2010 Usage within CloneDigger Source Code Compiled Grammar Table (*.cgt) Engine Parsed Data CodeDOM Conversion AST CodeDOM Conversion Need to write routine to move data from Parsed Tree to CodeDOM Parsed data trees from parser are stored in consistent data structure, but are based on rules defined within grammars

115 10/13/2010 Grammar Updates GOLD Parser Grammar Updates

116 10/13/2010 Grammar Updates Currently the grammars we have for the Gold parser are out dated. Current Gold Grammars C# version 2.0 Java version 1.4 Current available software versions C# version 4.0 Java version 6

117 10/13/2010 Grammar Update Issues Grammars for C# and Java are very complex and require a lot of work to build. Antler and Gold Parser grammars use completely different syntax. Positive note: Other development not halted by use of older grammars.

118 Our Bookkeeping Bookkeeping for parsing the multiple grammars
10/13/2010 Our Bookkeeping Bookkeeping for parsing the multiple grammars

119 Compiled Grammar Table
10/13/2010 Compiled Grammar Table For Java, there is… 359 production rules 249 distinctive symbols (terminal & non-terminal) For C#, there is… 415 production rules 279 distinctive symbols (terminal & non-terminal)

120 Production Rule Dependancies
10/13/2010 Production Rule Dependancies

121 Our Grammar Bookkeeping
10/13/2010 Our Grammar Bookkeeping Since there are so many production rules, we came up with the following bookkeeping: A spreadsheet of the compiled grammar table (for each language) with each production rule indexed. This spreadsheet covers: various aspects of language what we have/have not handled from the parser what we have/have not implemented into CodeDOM percentage complete

122 Our Grammar Bookkeeping
10/13/2010 Our Grammar Bookkeeping

123 Parsing & CodeDOM Status
10/13/2010 Parsing & CodeDOM Status Parsing Handlers’ Status: C# = 100% complete Java = 100% complete

124 10/13/2010 CodeDOM Language Independent Object Model

125 CodeDOM Document Object Model for Source Code API - [System.CodeDom]
Only supports certain aspects of the language since it’s language agnostic Good Enough What Does it Do? Programmatically Constructs Code What Doesn’t it Do? Does NOT parse

126 CodeDOM Example CodeCompileUnit CodeNameSpace Imports Types Members
Event Field Method Statements Expression Property

127 10/13/2010 Clone Anaysis Clones & Dr. Kraft’s Tool

128 Software Clones Software Clones: (Definitions from Wikipedia)
Duplicate code: a sequence of source code that occurs more than once, either within a program or across different programs owned or maintained by the same entity. Clones: sequences of duplicate code. “Clones are segments of code that are similar according to some definition of similarity.” —Ira Baxter, 2002 - Duplicate code: a sequence of source code that occurs more than once, either within a program or across different programs owned or maintained by the same entity. - Clones: sequences of duplicate code. - There is no agreement in the research community on the exact notion of redundancy and cloning. Ira Baxter’s definition of clones expresses this vagueness: (read quote)

129 Clones Types 3 Types of Clones (Definition of Similarity):
Type 1: An exact copy without modifications (except for whitespace and comments) Type 2: A syntactically identical copy Only variable, type, or function identifiers have been changed Type 3: A copy with further modifications Statements have been changed, reordered, added, or removed 3 types of clones: Type 1: an exact copy without modifications (except for whitespace and comments). Type 2: a syntactically identical copy only variable, type, or function identifiers have been changed. Type 3: a copy with further modifications statements have been changed, added, or removed.

130 How Clones are Created Copy and Paste Programming Multiple Developers
Ctrl-C, Ctrl-V Virus Multiple Developers Similar Functionality, Similar Code Plagiarism Code Theft How are clones created: - Copy and paste programming: the programmer selects a code fragment and copies it to another location. Sometimes these clones are modified slightly to adapt them to their new environment or purpose. - similar functionality, similar code: if a programmer sees similar functionality to what he is tasked to develop, then he will reuse that code as a template and then customize the template in the pasted context. - Plagiarism, where code is simply copied without permission or attribution.

131 Clone Research Multi-Language Clone Detection Preliminary Research
Cutting Edge of Research Preliminary Research Dr. Kraft and Students at UAB C# and VB. Publication Nicholas A. Kraft, Brandon W. Bonds, Randy K. Smith: Cross-language Clone Detection. SEKE 2008: 54-59 Utilizes Mono Parsers C# VB - Detecting clones across multiple programming languages is on the cutting edge of research. Some related research: A preliminary version of this was done by Dr. Kraft and his students for C# and VB. They compared the Mono C# parser (written in C#) to the Mono VB parser (written in VB). They published the following paper.

132 Dr. Kraft Clone Analysis
Performs Comparisons of Code Files For each File, a CodeDOM tree is tokenized Uses Levenshtein Distance Calculation Minimum number of edits needed to transform one sequence into the other Distances Calculated Distance determines Probability of a Clone - Detecting clones across multiple programming languages is on the cutting edge of research. Some related research: A preliminary version of this was done by Dr. Kraft and his students for C# and VB. They compared the Mono C# parser (written in C#) to the Mono VB parser (written in VB). They published the following paper.

133 Dr. Kraft Application

134 Limitations Only does file-to-file comparisons
Does not detect clones in same source file Can only detect Type 1 and some Type 2 clones Not very efficient (brute force) - Detecting clones across multiple programming languages is on the cutting edge of research. Some related research: A preliminary version of this was done by Dr. Kraft and his students for C# and VB. They compared the Mono C# parser (written in C#) to the Mono VB parser (written in VB). They published the following paper.

135 Enhancements Add Support for Same File Clone Detection
Add Support for Type 3 Clone Detection Requires more Research Provide a more efficient clone analysis algorithm

136 10/13/2010 Testing White Box & Black Box Testing

137 Testing Our Project White Box Testing: Black Box Testing: Unit Testing
10/13/2010 Testing Our Project White Box Testing: Unit Testing Black Box Testing: Production Rule Testing Allows us to test the robustness of our engine because we can force rule production errors. Regression Testing Automated Functional Testing

138 10/13/2010 Unit Testing

139 Production Rule Test Input File Example
10/13/2010 Production Rule Test Input File Example

140 10/13/2010 Functional Tests

141 10/13/2010 Metrics Project Metrics

142 SLOC For Our Project As of Nov 22, 2010 SLOC:
10/13/2010 SLOC For Our Project As of Nov 22, 2010 SLOC: CS666_Client = 1746 lines CS666_Core = 2653 lines CS666_CppParser = 155 lines CS666_CsParser = 3259 lines CS666_JavaParser = 3378 lines CS666_LanguageSupport = 84 lines CS666_UnitTests = 2162 lines Total = lines (including unit tests)

143 10/13/2010 Demonstration Demonstration of our progress.

144 Demonstration These are the things we would like to show you today:
10/13/2010 Demonstration These are the things we would like to show you today: GUI work Project setup Save project Load project Loading of source code Parsing of source code Translation of source code

145 10/13/2010 Team Collaboration Team 2 & Team 3

146 10/13/2010 Team Collaboration Due to Team 3’s team size, we have taken responsibility of gathering & sharing grammars. Team 3 has the responsibility of the C++ Parsing. Both Teams will… Use the same grammars & engines We will both have limitations based on this. Ex: JAVA grammar is based off 1.4 -> we are limited to using JAVA 1.4 Test the same grammars & engines We will have two test beds.

147 Team Collaboration Method of collaboration:
10/13/2010 Team Collaboration Method of collaboration: Google code project site: Team 4 team members have access to this site. Meetings What does our google code project contain? Source control for grammers & engines Bugs/Issues Team 4 will have ability to document new bugs. Documents/Artifacts

148 10/13/2010 Team Collaboration Both teams met Monday ( ) after class and performed the required Pair Programming. Current Status: Team 2 All project source code has been made available. We are researching and working to update the Java and C# grammars. Team 3 Team 3 is working on C++ parsing. Looking into other parser, ELSA.

149 Path Forward Current Status & Path Forward for Next Semester
10/13/2010 Path Forward Current Status & Path Forward for Next Semester

150 Where we stand… Iteration 1: Parsing -> 85%
Completed parsing for Java & C# No parsing for C++ But we have a foundation and design to start from. Iteration 2: Translation to CodeDOM -> 60% We have the foundation and design completed. Now, it is a matter of turning the crank for the languages. Iteration 3: Clone Analysis -> 30% Ported majority of Dr. Kraft’s student project code. Started focusing on the GUI 150

151 Task Understanding Three Step Process Step 1 Code Translation
10/13/2010 Task Understanding Three Step Process Step 1 Code Translation Step 2 Clone Detection Step 3 Visualization Common Model Translator Source Files Detected Clones Inspector Common Model Step 1: Code Translation C#, C++, Java, VB (or Python) CodeDOM Step 2: Clone Detection Leverage current clone detection techniques and research Step 3: Clone Visualization Need for an intuitive user interface Clone Visualization UI Detected Clones

152 Schedule 152

153 Path Forward Our next step is to re-evaluate where we currently stand.
Revisit Release Plan Pull in Software Studio I work that was not completed. Revisit User Stories Start off strong with unit tests not completed. 153


Download ppt "Cross Language Clone Analysis Team 2 November 22, 2010"

Similar presentations


Ads by Google