Presentation is loading. Please wait.

Presentation is loading. Please wait.

Plagiarism Monitoring and Detection -- Towards an Open Discussion Edward L. Jones Computer Information Sciences Florida A & M University Tallahassee, Florida.

Similar presentations


Presentation on theme: "Plagiarism Monitoring and Detection -- Towards an Open Discussion Edward L. Jones Computer Information Sciences Florida A & M University Tallahassee, Florida."— Presentation transcript:

1 Plagiarism Monitoring and Detection -- Towards an Open Discussion Edward L. Jones Computer Information Sciences Florida A & M University Tallahassee, Florida

2 Outline What is Plagiarism, and Why Address It Plagiarism Detection & Countermeasures A Metrics-Based Detection Approach Extending the Approach Conclusions & Future Work

3 Why Tackle Plagiarism? Plagiarism undermines educational objectives Failure to address sends wrong message A non-contrived ethical issue in computing Plagiarism is hard to define Plagiarism is costly to pursue/prosecute An interesting problem for tinkering

4 What is Plagiarism? “use of another’s ideas, writings or inventions as one’s own” (Oxford American Dictionary, 1980) Shades of Gray – Theft of work – Gift of work – Collusion – Collaboration – Coincidence Intent to Deceive

5 How is it Detected? By chance – Anomalies – Temporal proximity when grading Automation methods – Direct text comparison (Unix diff) – Lexical pattern recognition – Structural pattern recognition – Numeric profiling

6 Plagiarism Concealment Tactics None Change comments Change formatting Rename identifiers Change data types Reorder blocks Reorder statements Reorder expressions Superfluous code Alternative control structures

7 Prosecution -- DA in the House? Course syllabus broaches the subject – Concrete definition generally lacking – Sense of “we’ll know it when we see it” N? Tolererance Policy Investigation Stage Prosecution Stage Missed opportunity to teach?

8 An Awareness Approach Monitor closeness of student programs – Objective measures – Automated Post anonymous closeness results in public – Nonconfrontational awareness – “A word to the wise … “ Benchmark student behavior – Establishing thresholds – Effects of course, language

9 Program 2 Program 1 ( lines1, words1, characters1 Closeness Measures -- Physical ( lines2, words2, characters2) Euclidean Distance

10 Program 2 Program 1 ( length1, vocabulary1, volume1) Closeness Measures -- Halstead ( length2, vocabulary2, volume2) Euclidean Distance

11 Comparison of Measures Physical profile ==> weight test – Simple/cheap to compute (Unix wc command) – Sensitive to character variations Halstead profile ==> content test – More complex/expensive to compute – Ignores comments and white space – Sensitive only to changes in program content Detection effectiveness vs. plagiarism tactic

12 Closeness Computation Normalization – Establish upper bound for comparison (1.414) – Distance computed on normalized (unit) vectors Normalization I -- Self normalization – p = (a, b, c) ==> (a/L, b/L, c/L) – Largest component dominates Normalization II -- Global scaling – p = (a, b, c) ==> q = (a/aMAX, b/bMAX, c/cMAX) – Self normalization applied to q

13 Distribution Of Closeness Values

14 Comparison of Profiles

15 Closeness Distribution Closeness values vary by assignment Programming language may lead clustering at the lower end of the spectrum Reuse of modules leads to cluster ingat the lower end of the spectrum No a priori threshold pin-pointing plagiarism All measures exhibit these behaviors

16 Suspect Identification Collaboration Suspects (5-th Percentile) Rank Closenessstudent1student alpha alpha alpha beta beta gamma alpha gamma gamma epsilon sigma delta alpha epsilon beta epsilon gamma theta beta theta

17 Independence Index Student Independence Indices Indexstudent1 1alpha 2beta 3 gamma 5epsilon 6sigma 6delta 9theta Index = position at which student debuts on Closeness List

18 Preponderance of Evidence Historical Record of Student Behavior – Collaboration/partnering – Independence indices Profile and analyze other artifacts – Compilation logs – Execution logs

19 Another Approach Make student demonstrate familiarity with submitted program – Seed errors into program – Time limit for removing error and resubmitting Holistic approach – Intentional, not accidental

20 Conclusions We can do something about plagiarism -- the first step is to develop eyes and ears Simple metrics appear to be adequate Tools are essential Sophistication is not as necessary as automation Students are curious to know how they compare with other students

21 On-Going & Future Work Complete the toolset – Student Independence Index Incorporate other Artifacts – Compilation logs – Execution logs Integrate into Automated Grading Disseminate Results – Package tool as shareware

22 Questions?

23 Thank You

24 Flow Chart Student Programs Profile Compute Closeness Suspicious Programs


Download ppt "Plagiarism Monitoring and Detection -- Towards an Open Discussion Edward L. Jones Computer Information Sciences Florida A & M University Tallahassee, Florida."

Similar presentations


Ads by Google