Detection of Plagiarism In University Projects Using Metrics-Based Similarity Dagstuhl Seminar on Duplication, Redundancy, and Similarity in Software July,

Slides:



Advertisements
Similar presentations
Duplicate code detection using Clone Digger Peter Bulychev Lomonosov Moscow State University CS department.
Advertisements

Unification and Refactoring of Clones Giri Panamoottil Krishnan and Nikolaos Tsantalis Department of Computer Science & Software Engineering Clone images.
Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
Programming Paradigms Introduction. 6/15/2005 Copyright 2005, by the authors of these slides, and Ateneo de Manila University. All rights reserved. L1:
H1 R1 T1 c Client Director Builder Client Concrete Strategy Builder Strategy H2 R2 T2 H3 R3 T3 Composition of Builder and Strategy Java Source Code Parser.
The Small World of Software Reverse Engineering Ahmed E. Hassan and Richard C. Holt SoftWare Architecture Group (SWAG) University Of Waterloo.
R2PL, Pittsburgh November 10, 2005 Copyright © Fraunhofer IESE 2005 Analyzing the Product Line Adequacy of Existing Components Jens Knodel
1 Program Slicing Purvi Patel. 2 Contents Introduction What is program slicing? Principle of dependences Variants of program slicing Slicing classifications.
Reverse Engineering © SERG Code Cloning: Detection, Classification, and Refactoring.
Improving the Unification of Software Clones Using Tree & Graph Matching Algorithms Giri Panamoottil Krishnan Supervisor: Dr. Nikolaos Tsantalis
Preventive Software Maintenance: The Past, the Present, the Future Nikolaos Tsantalis Computer Science & Software Engineering Consortium for Software Engineering.
Investigating JAVA Classes with Formal Concept Analysis Uri Dekel Based on M.Sc. work at the Israeli Institute of Technology. To appear:
Reliability and Software metrics Done by: Tayeb El Alaoui Software Engineering II Course.
Multiview research High Velocity Refactorings In Eclipse Emerson Murphy-Hill and Andrew P. Black Eclipse Technology Exchange October 21, 2007.
Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Gerhard Maierbacher Scalable Coding Solutions for Wireless Sensor Networks IT.
Strategies to relate the program and problem domains using code instrumentation Mario Marcelo Berón University of Minho Pedro Rangel Henriques University.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 17: Code Mining.
Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th.
Advanced e-Learning techniques for teaching C-programming and selected features of Java and C++ Proposed by Dr. Chittaranjan Mandal, Associate Professor,
Locating Causes of Program Failures Texas State University CS 5393 Software Quality Project Yin Deng.
Supervisor:Mr. Sayed Morteza Zaker Presentor:Fateme hadinezhad.
FlowString: Partial Streamline Matching using Shape Invariant Similarity Measure for Exploratory Flow Visualization Jun Tao, Chaoli Wang, Ching-Kuang Shene.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 ARIES: Refactoring.
Mining and Analysis of Control Structure Variant Clones Guo Qiao.
“An Approach to Identify Duplicated Web Pages” G. Lucca, M. Penta, A. Fasolino Compsac’02 pp Today presented by Kenny Kwok.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Detection and evolution analysis of code clones for.
CMCD: Count Matrix based Code Clone Detection Yang Yuan and Yao Guo Key Laboratory of High-Confidence Software Technologies (Ministry of Education) Peking.
Cross Language Clone Analysis Team 2 October 27, 2010.
Ioana Sora, Gabriel Glodean, Mihai Gligor Department of Computers Politehnica University of Timisoara Software Architecture Reconstruction: An Approach.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Inoue Laboratory Eunjong Choi 1 Investigating Clone.
Knowledge-oriented Maintenance at the University of Ottawa Timothy C Lethbridge KOM Banff.
1 A Heuristic Approach Towards Solving the Software Clustering Problem ICSM03 Brian S. Mitchell /
Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge.
Investigating a Semantic Metrics Suite for Object-Oriented Design Dr. Letha Etzkorn (PI) Ms. Cara Stein Dr. Glenn Cox Dr. Sampson Gholston Dr. Dawn Utley.
May 31, May 31, 2016May 31, 2016May 31, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa Pacific University,
1 Evaluating Code Duplication Detection Techniques Filip Van Rysselberghe and Serge Demeyer Lab On Re-Engineering University Of Antwerp Towards a Taxonomy.
Object Oriented Reverse Engineering JATAN PATEL. What is Reverse Engineering? It is the process of analyzing a subject system to identify the system’s.
Hassen Grati, Houari Sahraoui, Pierre Poulin DIRO, Université de Montréal Extracting Sequence Diagrams from Execution Traces using Interactive Visualization.
Software Debugging, Testing, and Verification Presented by Chris Hundersmarck November 10, 2004 Dr. Bi’s SE516.
Towards the better software metrics tool motivation and the first experiences Gordana Rakić Zoran Budimac.
Duplicate code detection using anti-unification Peter Bulychev Moscow State University Marius Minea Institute eAustria, Timisoara.
Software Architecture Evaluation Methodologies Presented By: Anthony Register.
Testing Inheritance & Polymorphism in OO Software using Formal Specification Presented by : Mahreen Aziz Ahmad (Center for Software Dependability, MAJU)
SE 2310 Seminar DESIGN PATTERN MINING ENHANCED BY MACHINE LEARNING Presented By BHAVIN MODI.
Cross Language Clone Analysis Team 2 February 3, 2011.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Classification.
 Software Clones:( Definitions from Wikipedia) ◦ Duplicate code: a sequence of source code that occurs more than once, either within a program or across.
Diagnosing Design Problems in Object Oriented Systems Adrian Trifu, Radu Marinescu Proceedings of the 12th IEEE Working Conference on Reverse Engineering.
Cross Language Clone Analysis Team 2 February 3, 2011.
What kind of and how clones are refactored? A case study of three OSS projects WRT2012 June 1, Eunjong Choi†, Norihiro Yoshida‡, Katsuro Inoue†
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 コードクローン解析に基づくリファクタリング支援.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
1 Predicting Classes in Need of Refactoring – An Application of Static Metrics Liming Zhao Jane Hayes 23 September 2006.
CSE SW Metrics and Quality Engineering Copyright © , Dennis J. Frailey, All Rights Reserved CSE8314M13 8/20/2001Slide 1 SMU CSE 8314 /
Presented by: Samia Azhar( ) Shahzadi Samia( )
Copyright , Dennis J. Frailey CSE Software Measurement and Quality Engineering CSE8314 M00 - Version 7.09 SMU CSE 8314 Software Measurement.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Aries: Refactoring.
JMVA Comprehension and Analysis 475 Software Engineering for Industry - Coursework 1 Zhongxi Ren Tianyi Ma Qian Wang Zi Wang.
Presented by Lu Xiao Drexel University Quantifying Architectural Debt.
1 Modeling the Search Landscape of Metaheuristic Software Clustering Algorithms Dagstuhl – Software Architecture Brian S. Mitchell
Estimating Code Size After a Complete Code-Clone Merge Buford Edwards III, Yuhao Wu, Makoto Matsushita, Katsuro Inoue 1 Graduate School of Information.
Progress Report Meeting
Authors: Khaled Abdelsalam Mohamed Amr Kamel
○Yuichi Semura1, Norihiro Yoshida2, Eunjong Choi3, Katsuro Inoue1
: Clone Refactoring Davood Mazinanian Nikolaos Tsantalis Raphael Stein
Quaid-i-Azam University
CISC 7120X Programming Languages and Compilers
Programming Languages and Paradigms
A handbook on validation methodology. Metrics.
Presentation transcript:

Detection of Plagiarism In University Projects Using Metrics-Based Similarity Dagstuhl Seminar on Duplication, Redundancy, and Similarity in Software July, 2006 Ettore Merlo, Ecole Polytechnique de Montréal

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Context Detect plagiarism in first years programming projects at university –Programming skills have to be developed during courses

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Plagiarism Detection Comparison of sets of syntactic blocks Spectral analysis of similarity –Increasing thresholds –Spectral shape parameters are computed Projects are ranked by similarity spectrum The most similar projects are considered as candidates for plagiarism

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Plagiarism Problem Detect code transformations that require little programming effort and make apparent differences in source code –Changed identifier by editing operations –Changed source code layout (comments, indentation, order of procedures, functions, and methods, file structure) –Changed constants (initialization, loops)

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Metrics-Based Similarity Definition –Two code fragments are similar if their associated vectors of metrics satisfy some similarity criterion

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Similarity Identification Process F 1 m 11 m 12 ……. M 1k …………………………………. F j m j1 m j2 ……. m jk Source code Parsing and Analysis Metrics Extraction Clones Extraction Abstract Syntax Tree Metrics Clones

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Metrics Extraction Metrics for similarity detection –Volume –Complexity –Module/function interface –Call graph structure –Local memory –Global memory –Dataflow

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Metrics Matching similar(f I,f J ) = | m k (f I ) – m k (f J ) | <= th k –forall k within the size of the metrics vector

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Metrics Matching Complexity n = | fragments_set | Exact solution algorithms show a worst-case O(n ) complexity in general Linear complexity exact solutions exist for specific sub-problems Opportunistic strategies and heuristics may reduce the average-case complexity Approximate solutions may reduce the worst-case complexity 2

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Threshold-Based Quantization

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Threshold-Based Quantization (2) Clusters represent the following hyper-parallelepiped: Clusters represent a partition of all fragments Complexity is O(M·n) where: –M is the cardinality of metrics –n is the total number of fragments –often M << n

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Quantization Error Fragments in neighboring clusters may be closer than (th i / 2) and still be in different clusters Errors for threshold level (th i ) disappear for threshold levels (k·th i ), (k > 1)

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Project Comparison Compute structural similarity spectrum –Compute similarity for increasing threshold levels in s steps Quantize projects for the current threshold level Traverse current clusters to check for commonality in compared project Count common structurally-similar fragments under current threshold level

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Project Comparison (2) Complexity: O(s·M·(n 1 + n 2 )) –n 1, n 2 : size of projects –M: cardinality of metrics –s: threshold steps Rationale: –Plagiarism is hard to deeply hide if little programming energy is deployed –Surface differences are quickly ignored by thresholds of increasing levels

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Project Comparison (3) Typical spectrum

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Parameters Granularity: functions and methods Steps: 5 Metrics and thresholds: –CALLS: 1 –LOCALS: 1 –NONLCALS: 1 –PARNUM: 1 –STMNT: 3 –NBRANCHES: 1 –NLOOPS: 1

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Plagiarism problem Projects are composed of a variable number of fragments –Problem similar to class comparison or to software evolution analysis Identify projects with high spectral similarity –p = number of projects –Galaxy approach O(p) –Pair comparison O(p 2 )

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Galaxy Algorithm:

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Procedural Projects

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 OO Projects

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Clone Visualization Visual display of source code fragments differences DP-matching algorithm on tokens

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Matching Algorithms Compute the sets of lexical changes –Dynamic programming –Sub-optimal and heuristic ones

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 int restore_stack ( object info ) { int restore_list ( int index, object info ) { Matching Example

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Remarks Similarity contrast is very good for procedural code Distribution of similarity for OO code is less sharp –Reference classes were given as a part of the projects –Methods tend to be smaller –More methods tend to be similar –Class structure could be taken into consideration –Inter-class relationship could be taken into account

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Administrative Approach Identify most similar projects Do not make any hypothesis about the causes of similarity Shift the burden of explanation over the authors of a project

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Conclusions A metrics based plagiarism detection approach in an academic environment has been presented The presented approach has been successfully used to discourage plagiarism in course projects

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Bibliography Merlo E., Antoniol G., Di Penta M., Rollo F. "Linear Complexity Object-Oriented Similarity for Clone Detection and Software Evolution Analysis", Proc. International Conference of Software Maintenance (ICSM), IEEE Computer Society Press, 2004, pp Merlo E., Antoniol G., Di Penta M., ``Complexity and Feasibility Issues in Object Oriented Clone Detection'', Proc. 2nd International Workshop on Detection of Software Clones (IWDSC-2003), Victoria (BC), Canada, 2003, pp G. Antoniol, U. Villano, E. Merlo, M. Di Penta, ``Analyzing Cloning Evolution in the Linux Kernel'‘, Information and Software Technology, Vol. 44, No. 13, pp , October 1, 2002

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Bibliography (2) E. Merlo, M. Dagenais, P. Bachand, J. S. Sormani, G. Antoniol ``Investigating Large Software System Evolution: the Linux Kernel'' Computer Software and Applications Conference, COMPSAC Dagenais M., Patenaude J. F., Merlo E., Lague B., ``Comparison of clones occurrence in Java and Modula-3 software systems'', in ``Advances in Software Engineering: Comprehension, Evaluation, and Evolution'', H. Erdogmus and O. Tanir (Eds.), Springer-Verlag, ISBN: , Casazza G., Antoniol G., Villano U., Merlo E., Di Penta M., ``Identifying Clones in the Linux Kernel'', Proc. International Workshop on Source Code Analysis and Manipulation (IWSCAM), IEEE Computer Society Press, pp , 2001

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Bibliography (3) Antoniol A., Casazza G., Di Penta M., Merlo E., ``Modeling Clones Evolution through Time Series'', Proc. International Conference of Software Maintenance (ICSM), IEEE Computer Society Press, pp , 2001 Antoniol G., Casazza G., Merlo E., ``GAWK Software System Evolution'', International Workshop on Feedback and Evolution in Software and Business Processes (FEAST), July 2000 Balazinska M., Merlo E., Dagenais M., Lague B., Kontogiannis K., ``Advanced Clone-analysis as a Basis for Object-oriented System Refactoring'', Proc. Working Conference on Reverse Engineering (WCRE), IEEE Computer Society Press, pp , 2000.

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Bibliography (4) Balazinska M., Merlo E., Dagenais M., Lague B., Kontogiannis K., ``Measuring Clone Based Reengineering Opportunities'', Proc. International Software Metrics Symposium, pp , IEEE Computer Society Press, 1999 Balazinska M., Merlo E., Dagenais M., Lague B., Kontogiannis K., ``Partial Redesign of Java Software Systems Based on Clone Analysis'', Proc. 6th Working Conference on Reverse Engineering, WCRE99, pp , IEEE Computer Society Press, 1999 Dagenais M., Merlo E., Lague B., Proulx D., ``Clones Occurrence on Large Object Oriented Software Packages'', Proc. CASCON'98, pp , IBM Canada, National Research Council of Canada, 1998

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Bibliography (5) Lague, B., Proulx, D., Mayrand, J., Merlo, E.M., Hudepohl, J., ``Assessing the Benefits of Incorporating Function Clone Detection in a Development Process'', Proc. of International Conference on Software Maintenance, IEEE Computer Society Press, 1997, pp Mayrand, J., Leblanc, C., and Merlo, E., ``Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics'', Proc. IEEE International Conference on Software Maintenance, Monterey, California, November 1996, IEEE Computer Society Press, pp Kontogiannis K., De Mori R., Merlo E., Galler M., Bernstein M., ``Pattern matching techniques for clone detection'', Journal of Automated Software Engineering, V.3, 1996, pp , Kluwer Academic Publishers.

Ettore Merlo (Ecole Polytechnique de Montréal), (C) Copyright, July 2006 Further Contacts Ettore Merlo Ecole Polytechnique de Montréal tel: +1 (514 ) ext fax: +1 (514)