Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatic Plagiarism detection Charlie Daly Jane Horgan Dublin City University.

Similar presentations


Presentation on theme: "Automatic Plagiarism detection Charlie Daly Jane Horgan Dublin City University."— Presentation transcript:

1 Automatic Plagiarism detection Charlie Daly Jane Horgan Dublin City University.

2 Overview Context How it works (overview) Comparison with other plagiarism detection systems How it works (details) –Marks the original (with a watermark) –Invisibly Results

3 Context It is not –catching people breaking copyright –detecting plagiarism in essays etc. It only works for programs, specifically when students submit a program for an assingment. Plagiarism is a huge problem on many programming courses.

4 Why are the so many systems? Lecturers who are also programmers get upset when they see their students copying their assignements. –It is seen as an affront –So they write a program. 'Efficiency' in education => large classes sizes => manual detection is difficult.

5 So why another system? All previous systems use pair-wise comparison. Individual programs are compared against the other programs. This means –they are programming-language specific –they don't work across years. –they cannot identify the original author.

6 So how does our technique work? When a student submits a program, the program is marked with a watermark indicating the author. If the student subsequently gives an electronic copy of the program to another student, then the watermark will be recognised by the system as soon as it is submitted.

7 But... Need to be able to modify the original student's file The watermark needs to be invisible to the student.

8 Hard Disk Stored on a hard disk The process Program Here's the student program The student submits the program The watermark is added. Program Watermark Program On the student's own hard disk!

9 Compared to previous systems +Can detect plagiarism as soon as submitted +Identifies the author +Programming-language independent +Works with tiny programs -Only works with an electronic copy -Easy to bypass if students know about it -Plagiarising student must get a copy after it has been submitted bu the author

10 RoboProf provides infrastructure RoboProf is a learning environment. Automatically sets and marks simple assignments. The Student submits a program, which is compiled and run on the student's machine. –an applet with read-write access is used to manage the compilation and marking. The program output is then sent to the server for marking.

11 RoboProf Server Assignment Specification The Student logs on An applet compiles and runs the program locally The program and output are sent to the server for marking Results are returned to the student Browser The student writes a program and submits it

12 Part 1: modifying the student Program Now that an applet can write to the student's disk, it can modify the student's file (to add the watermark). Only problem remaining... how do we implement the watermark.

13 The Watermark Needs to be invisible to the student. Needs to encode –the student ID –the year

14 The Watermark use 10 binary digits for the student ID, => can distinguish 1024 students. use 4 binary digits for the year. Also use an ID for the assignment and record which attempt it is (RoboProf allows students to resubmit a program to improve the mark). Checksum (4 digits)

15 The watermark The binary code requires 34 bits (10+4+10+6+4). This code is written directly onto the file. #include main() { } 0000101010 0001 000010110 000111 0000 0000101010 Student ID 0001 Year

16 Making it invisible 0000101010 0001 000010110 000111 #include main() { } 0000101010 A space is used to represent the binary digit 0 and a tab is used to represent the binary digit 1.

17 Making it invisible A space is used to represent the binary digit 0 and a tab is used to represent the binary digit 1. 0000101010  becomes space tab invisible!

18 Results We used the plagiarism detector as part of RoboProf on a group of students (283). There were two main parts to the course, continuous assessment and a programming exam. The continuous assessment was to be done in the students' own time and was subject to plagiarism whereas, the programming exam was supervised.

19 Results We compared the exam results of those who plagiarised (40%) with those who didn't The results are unsurprising: plagiarists performed less well in the exam. And the more they plagiarised, the worse they performed. Also plagiarists submitted their continuous assessment on average a week later than their honest peers.

20 Incidence of plagiarism Number copied Frequency

21 Exam Results Number copied Exam mark

22 Completion date copied original

23 The end

24 Questions What happens if a program is submitted which already contains a watermark? It can happen legitimately if a student resubmits a program So the watermark is checked against the submitter's ID, and if they don't match the lecturer is emailed and investigates further. Then the watermark is overwritten => can detect chains of plagiarism.

25 Question Eile Why did you only monitor plagiarism; why not take any action? There are three answers: –Resources: The university has machinery in place to deal with plagiarism. It is very bureaucratic and soaks up time. –Some students accidentally committed plagiarism; testing the system. –Need corroborating evidence; can't let the trick be known.

26 Question 3 "But won't the watermark that is sent to the server have been just created by the system? It'll just read the watermark it generated." No. It reads the program and then doctors it. The server gets the unadultered program, the student is left with the modified program.

27 Question 4 Any problems in practice? Yes, a modern IDE can detect when the source has been modified and askes if you wish to reload the buffer. Hasn't been fixed yet. You need to correctly set up applet security. A student may save the file after it has been modified (clean version still in the editor).

28 How much is original Inserting a watermark unknown to the user (as far as I know). Using unseen whitespace has been used to detect copyright infringment (it was unknowingly inserted by the author).


Download ppt "Automatic Plagiarism detection Charlie Daly Jane Horgan Dublin City University."

Similar presentations


Ads by Google