G.O. Wesolowsky Statistical Detection of Cheating on Multiple Choice Exams: Software, Implementation, and Controversy George O. Wesolowsky Professor Emeritus.

Slides:



Advertisements
Similar presentations
Chapter 7 Hypothesis Testing
Advertisements

Student plagiarism in Norwegian universities and university colleges: What works, what doesn’t work, what still needs to be done Jude Carroll KTH & Oxford.
Academic Integrity at Trinity and Across the Nation A Report Prepared for the Trinity Community* March 15, 2002 by C. Mackenzie Brown Chair of Academic.
School of Electrical and Computer Engineering ECE 400 Seminar Fall 2012.
Parts taken from Human Relations – Lamberton & Minor TRI-COUNTY TECHNICAL COLLEGE PSY 103 Human Relations Professor Jackie Kroening
USING AND PROMOTING REFLECTIVE JUDGMENT AS STUDENT LEADERS ON CAMPUS Patricia M. King, Professor Higher Education, University of Michigan.
ICE Evaluations Some Suggestions for Improvement.
Point and Confidence Interval Estimation of a Population Proportion, p
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
INFO 624 Week 3 Retrieval System Evaluation
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
8-1 Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall Chapter 8 Confidence Interval Estimation Statistics for Managers using Microsoft.
Copyright ©2011 Pearson Education 8-1 Chapter 8 Confidence Interval Estimation Statistics for Managers using Microsoft Excel 6 th Global Edition.
© 2013 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Introductory Statistics: Exploring the World through.
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Statistics for Managers Using Microsoft® Excel 5th Edition
Getting Started with Hypothesis Testing The Single Sample.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
CHAPTER 19: Two-Sample Problems
Experiments and Observational Studies.  A study at a high school in California compared academic performance of music students with that of non-music.
Grade Point Average - Your grade point average (GPA) is calculated by dividing the total amount of quality points earned by the total amount of.
Plagiarism. What is plagiarism? Using the work of another person and passing it off as your own.
What you need to know about this class A powerpoint syllabus.
Chapter 10 Hypothesis Testing
Overview Definition Hypothesis
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Determining Sample Size
Writing Research Papers. Research papers are often required of students in high school and in higher education.
Copyright © 2010 Pearson Education, Inc. Chapter 13 Experiments and Observational Studies.
Experiments and Observational Studies. Observational Studies In an observational study, researchers don’t assign choices; they simply observe them. look.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Basic Business Statistics 11 th Edition.
Confidence Interval Estimation
Classroom Assessments Checklists, Rating Scales, and Rubrics
Chapter 6 : Software Metrics
How to Evaluate Student Papers Fairly and Consistently.
Academic Integrity GSAS TA Orientation Fall 2014.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Statistical Inference
Hypotheses tests for means
Copyright ©2011 Pearson Education 9-1 Statistics for Managers using Microsoft Excel 6 th Global Edition Chapter 9 Fundamentals of Hypothesis Testing: One-Sample.
Service Learning Dr. Albrecht. Presenting Results 0 The following power point slides contain examples of how information from evaluation research can.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Copyright © 2012 Pearson Education. All rights reserved © 2010 Pearson Education Copyright © 2012 Pearson Education. All rights reserved. Chapter.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
Academic Integrity: Processes & Expectations at the College Level Andrea Goodwin Associate Director, Office of Student Conduct University of Maryland Diane.
Chapter 12 Confidence Intervals and Hypothesis Tests for Means © 2010 Pearson Education 1.
Chap 8-1 Chapter 8 Confidence Interval Estimation Statistics for Managers Using Microsoft Excel 7 th Edition, Global Edition Copyright ©2014 Pearson Education.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Math 3680 Lecture #13 Hypothesis Testing: The z Test.
Hypothesis Testing Introduction to Statistics Chapter 8 Feb 24-26, 2009 Classes #12-13.
RESEARCH METHODS IN INDUSTRIAL PSYCHOLOGY & ORGANIZATION Pertemuan Matakuliah: D Sosiologi dan Psikologi Industri Tahun: Sep-2009.
Parts taken from Human Behavior in Organizations 2ed TRI-COUNTY TECHNICAL COLLEGE PSY 120 Organizational Psychology Professor Jackie Kroening
DEVELOPED BY MARY BETH FURST ASSOCIATE PROFESSOR, BUCO DIVISION AMY CHASE MARTIN DIRECTOR OF FACULTY DEVELOPMENT AND INSTRUCTIONAL MEDIA UNDERSTANDING.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
A Process and Outcomes Evaluation of Academic Honesty Interventions in On- Line Courses ROGER DURAND; PHILLIP J. DECKER; EDWARD WALLER UNIVERSITY OF HOUSTON.
10.2 Comparing Two Means Objectives SWBAT: DESCRIBE the shape, center, and spread of the sampling distribution of the difference of two sample means. DETERMINE.
Experiments Textbook 4.2. Observational Study vs. Experiment Observational Studies observes individuals and measures variables of interest, but does not.
Review Design of experiments, histograms, average and standard deviation, normal approximation, measurement error, and probability.
Definition Slides Unit 2: Scientific Research Methods.
Definition Slides Unit 1.2 Research Methods Terms.
Chapter Nine Hypothesis Testing.
Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests
Step 1: Specify a null hypothesis
The Good The Bad & The Ugly Real-Life Examples of the SLO Assessment Report Form With Tips on How to Complete It August 21, 2012.
Process Capability.
What is Academic Honesty?
What is Academic Honesty?
Presentation transcript:

G.O. Wesolowsky Statistical Detection of Cheating on Multiple Choice Exams: Software, Implementation, and Controversy George O. Wesolowsky Professor Emeritus of Management Science De Groote School of Business McMaster University, Hamilton. Ontario, Essentials: Bubbling in or selecting choices Unauthorized communication – one way or two way. Commonly known as copying, which is cheating

Outline of this Presentation G.O. Wesolowsky Outline of this Presentation Introduction: Cheating on multiple choice tests How I got into this. Outline of statistical detection methodology Practical capabilities of SCheck Common attitudes to detection and prevention Recommendations

Ideal* Writing Conditions G.O. Wesolowsky Ideal* Writing Conditions * I have seen more than 30% cheating under such conditions

Less than Ideal Writing Conditions G.O. Wesolowsky http://math.berkeley.edu/~ribet/113/

Prevalence of MC Tests and Exams G.O. Wesolowsky Prevalence of MC Tests and Exams 30% ? of marks in UG classes given through MC At McGill (20000 undergrads) in the Fall Semester of 2002: Finals: 83 courses,15072 students Midterms: 70+ courses, 14000 students

How They Do It: Copying Sampler G.O. Wesolowsky How They Do It: Copying Sampler Peeking Passing (papers or whole exams Signaling. One invigilator told me a student was twitching so much all over his face and hand he thought at first it was a seizure Clandestine electronic communication Cartoon of a toilet communications base from a web sit of a Scandinavian company selling electronic counter measures.

How They Do It: Types of Cheating not Resulting in Similar Responses G.O. Wesolowsky How They Do It: Types of Cheating not Resulting in Similar Responses I am an impostor Usually not vulnerable to statistical detection

G.O. Wesolowsky A guide to cheating during tests and examinations From Wikibooks, the open-content textbooks collection http://en.wikibooks.org/wiki/A_guide_to_cheating_during_tests_and_examinations Contents [hide] 1 Preamble 1.1 A few definitions to consider 1.2 The rewards/dangers of cheating 1.2.1 Rationales of cheating 1.2.2 Rationales of prosecuting cheaters 1.2.3 Possible Penalties 2 General notes 3 Techniques 3.1 Copying from a person 3.1.1 Application of codes 3.2 Copying from a pre-written source 3.2.1 Directly from textbook/notes 3.2.2 Cheat Sheet 3.3 Precautions 3.4 Copying from a planted source 3.5 Locating Cheating Material on the Web 3.6 Test previewing

Some Statistics plagiarized text (CAI) G.O. Wesolowsky Some Statistics plagiarized text (CAI) CAI Research Conducted By Don McCabe (Released In June, 2005) is typical of many studies: As part of CAI’s Assessment Project, almost 50,000 undergraduates on more than 60 campuses have participated in a nationwide survey of academic integrity since the fall of 2002. The results were disturbing, provocative, and challenging. On most campuses, 70% of students admitted to some cheating. Close to one-quarter of the participating students admitted to serious test cheating in the past year and half admitted to one or more instances of serious cheating on written assignments. Faculty are reluctant to take action against suspected cheaters. In Assessment Project surveys involving almost 10,000 faculty, 44% of those who were aware of student cheating in their course in the last three years, have never reported a student for cheating to the appropriate campus authority. Students suggest that cheating is higher in courses where it is well known that faculty members are likely to ignore cheating.

One Method of Cheating Detection G.O. Wesolowsky One Method of Cheating Detection My favorite method.

Questionable Statistical Detection G.O. Wesolowsky It is not infrequent that instructors, when confronted by a suspected cheating situation, invent their own methodology on the spot. This is usually what I call ‘outlier methodology’. The basis is some way of using the number of wrong answers that two students have in common. It could be simply a count of such 'wrong matches', a proportion, a run length, a ratio with other counts, or a multivariate plot of such variables. The idea is to look for outliers and attribute them to cheating.

G.O. Wesolowsky Example Bonnie and Clyde engaged in suspicious behavior. A comparison of responses revealed: “Bonnie and Clyde are surprisingly similar; 23 matches out of 23 wrong.” .....C......B...........C........BD............D..DB.A..ABD.B..A..CB...B.........ABAC..A..C http://www.astro.washington.edu/fraser/multiple-choice-cheating.html Both chose C, which is wrong . = correct

But then: The instructor wrote a program: G.O. Wesolowsky But then: The instructor wrote a program: “My script just returns any match that has a high percentage of matching errors (and sufficient errors to convince you that some thing's up!) “ “Holmes and Watson are surprisingly similar; 8 matches out of 11 wrong.” ..........................D......D....................B.....D.A.......BBA.........BD.B..... CD..BCC....BB...DC..CA..D.EC.....D......AD...B.DDAB...B..A..A.AA....CDB.AA.C......BDDBE.B.. Intuitive override: “ I had found by chance (Bonnie and Clyde), but what about the rest? It's very unlikely that Holmes and Watson were cheating, but I think it's likely that the others were”. This instructor then concluded that statistical detection is not really reliable. Bad statistical detection often discredits the good.

Aside: A Better “quickie” Index G.O. Wesolowsky Aside: A Better “quickie” Index A better but not good simple index is the Harpp-Hogan index, which is the number of wrong matches divided by the number of differences. One is supposed to be suspicious when it is > 1. For Holmes and Watson this works out to 8/32. 

Problems with “Simple” Indices or combinations thereof G.O. Wesolowsky Problems with “Simple” Indices or combinations thereof The value of the indices can depend in an unknown way on class size, number of questions, number of choices, etc. They use very little information. Capability of students and difficulty of questions are often not incorporated The risk of “false accusations” is not predictable Many combinations of indices and plots are possible, and they may point in different directions.

How I Got Into This Request from an administrator G.O. Wesolowsky How I Got Into This Request from an administrator Two students were suspected in another course, how many exactly similar answers did they have in my course? Probability tree diagrams Checked the literature Wesolowsky G.O. (2000) "Detecting Excessive Similarity in Answers on Multiple Choice Exams", Journal of Applied Statistics, Vol. 27, 909-921.

pki pji 1 - pki 1 - pji 1 - pki 1 - pki 1 - pki match w1i match w1i G.O. Wesolowsky match Probability correct pji w1i Cond. probability wrong match 1 - pki w1i Question i 1 - pji w2i w2i Probability of a match by students j and k on question I = sum of match probabilities 1 - pki match w3i 1 - pki w3i match A matching answer occurs on a question if both students get the same right answer or the same wrong answer What is needed are the probabilities of a correct answer for each student, and the conditional probability of a wrong answer Sum the product of the probabilities along each of the “match” branches w4i 1 - pki w4i match

G.O. Wesolowsky Assumptions The probability that a student gets an answer right depends on the ability of the student and the difficulty of the question The probability of a match on wrong questions depends on the ‘popularity’ of wrong answers Independencies as implicit in the diagram

But how to we estimate wli and pji ? G.O. Wesolowsky But how to we estimate wli and pji ?

depends on two things 1 1 Above average student Below average student G.O. Wesolowsky depends on two things Above average student 1 Below average student Proportion of class that answered correctly on question i The ability of the student and the difficulty of the question 1

Finding cj = proportion of questions answered correctly by student j G.O. Wesolowsky Finding cj = proportion of questions answered correctly by student j 1) aj is the student ability index 2) The function is borrowed from location theory. 3) For each student, the aj estimate is obtained by making sure that the modeled proportion correct is equal to the proportion correct actually obtained. Find by solving

G.O. Wesolowsky P value for each pair of students = probability of the observed number of matches or more Question q Question 1 Question j M M M 1) If the probability of a match on each question were the same, this p-value could be found by the binomial distribution. A Bernoulli process always has the same probability of success 2) Here, the probability of a success (match on a question) is different for every question 3) The probability distribution is called the compound binomial distribution Compound Binomial Distribution because the probability of a match is different on each question

Example of SCheck Output G.O. Wesolowsky ** pair = 2 78 ** Harpp-Hogan stat = #wr.mat/#diff = 19.00 ################################################################## Zb = 7.891 'equivalent' z from the BVP model Significance of Zb on a pre-selected pair = 1.5E-15 Significance bound (Bonferroni) on program selected pairs = 1.3E-11 #matches = 33 | 34 (mu,s)=( 11.410, 2.689) prop. right for 2 = 0.441 prop. right for 78 = 0.412 Quest. range = [ 1 34 ] #students = 132 ---------------------------------------------------------------- .d.abccd.e .e.abedb.. ...da..b.. ea.e --------------------------------------------------------------- .d.abccdee .e.abedb.. ...da..b.. ea.e estimated match probabilities: 0.423 0.357 0.360 0.324 0.367 0.377 0.376 0.232 0.285 0.316 0.237 0.236 0.369 0.283 0.249 0.423 0.254 0.321 0.255 0.483 0.696 0.310 0.371 0.238 0.345 0.536 0.258 0.211 0.460 0.290 0.224 0.326 0.388 0.231 ** pair = 2 78 ** Harpp-Hogan stat = #wr.mat/#diff = 19.00 ################################################################## Zb = 7.891 'equivalent' z from the BVP model Significance of Zb on a pre-selected pair = 1.5E-15 Significance bound (Bonferroni) on program selected pairs = 1.3E-11 #matches = 33 | 34 (mu,s)=( 11.410, 2.689) prop. right for 2 = 0.441 prop. right for 78 = 0.412 Quest. range = [ 1 34 ] #students = 132 ---------------------------------------------------------------- .d.abccd.e .e.abedb.. ...da..b.. ea.e --------------------------------------------------------------- .d.abccdee .e.abedb.. ...da..b.. ea.e estimated match probabilities: 0.423 0.357 0.360 0.324 0.367 0.377 0.376 0.232 0.285 0.316 0.237 0.236 0.369 0.283 0.249 0.423 0.254 0.321 0.255 0.483 0.696 0.310 0.371 0.238 0.345 0.536 0.258 0.211 0.460 0.290 0.224 0.326 0.388 0.231

Data Dredging The number of student pairs examined is n(n-1)/2. G.O. Wesolowsky Data Dredging The number of student pairs examined is n(n-1)/2. For 693 students this is 239,778 pairs suspicious An oversight by many statistical detection methods. Consider a standardized normally distributed index of similarity . It might seem that a Z of 4 would be very unusual. But a dotplot shows otherwise. With so many pairs, rare single draw occurrences become common. The level of suspiciousness has to be raised.

“Unusual” Z’s Depend on Class Size G.O. Wesolowsky “Unusual” Z’s Depend on Class Size Class size No. of pairs P(Zmax >3) P(Zmax >4) P(Zmax >5) P(Zmax >6) 2 1 0.00135 3.167E-5 2.8665E-7 9.8659E-10 100 4950 0.99875 .145104 .0014179 4.8836E-6 400 79800 1.00000 .920134 .0226151 7.87297E-5 1000 499500 .1334040 .0004928 5000 12497500 .9721919 .0122542 Analogy. Suppose the Z’s are independent. The probability of seeing a Z > 5 in a class is 3 in 10 million. But in a class of 5000, the probability is .97

Multiply the Pvalue by n(n-1)/2 G.O. Wesolowsky Multiply the Pvalue by n(n-1)/2 ** pair = 2 78 ** Harpp-Hogan stat = #wr.mat/#diff = 19.00 ################################################################## Zb = 7.891 'equivalent' z from the BVP model Significance of Zb on a pre-selected pair = 1.5E-15 Significance bound (Bonferroni) on program selected pairs = 1.3E-11 #matches = 33 | 34 (mu,s)=( 11.410, 2.689) prop. right for 2 = 0.441 prop. right for 78 = 0.412 Quest. range = [ 1 34 ] #students = 132 ---------------------------------------------------------------- .d.abccd.e .e.abedb.. ...da..b.. ea.e --------------------------------------------------------------- .d.abccdee .e.abedb.. ...da..b.. ea.e estimated match probabilities: 0.423 0.357 0.360 0.324 0.367 0.377 0.376 0.232 0.285 0.316 0.237 0.236 0.369 0.283 0.249 0.423 0.254 0.321 0.255 0.483 0.696 0.310 0.371 0.238 0.345 0.536 0.258 0.211 0.460 0.290 0.224 0.326 0.388 0.231 A similarity this unusual will occur at most 1.3 times, on the average, per 100 billion classes.

G.O. Wesolowsky Important! The significance (probability that a similarity that high will occur for an innocent pair) is different for a pair that is pre-selected by, say, suspicious behavior, from that of a pair that was selected purely by the program. In other words, the former case does not need as high a level of similarity evidence. Scheck, therefore, allows pre-selected pairs to be forced into the analysis

Features of SCheck New: Two Type I methods for setting cutoffs G.O. Wesolowsky Developed from experience with large scale testing, research into cheating psychology, tribunal cases, different data formats, etc. New: Two Type I methods for setting cutoffs Adjustment for “speed tests” Interactive or stored option choice Batch processing of multiple files Optional Excel grades output Files with all components necessary for verification of calculations Diagnostic graph Optional fine tuning (T parameter) Compact and intuitive question diagnostics Utility programs (format translators) Up to 30000 students Up to 200 questions Up to 27 choices, numbers or letters True or false or multiple choice in any combination Select a contiguous block of questions Option for pre-selected student pairs Option for similarity scores for all students Options for removing student identification from input and output files Since 1998, quite a few features were added. Important ones are:

G.O. Wesolowsky

G.O. Wesolowsky

G.O. Wesolowsky Can choose a file, and then just hit <Enter> key until output appears.

G.O. Wesolowsky 1) Many choices, which could also be placed in a single file to avoid the interactive mode.

G.O. Wesolowsky This box and the previous one allow selection of a block of questions. Useful if,say, some questions only gather information.

G.O. Wesolowsky

G.O. Wesolowsky

This forces suspect pairs into the output for analysis. G.O. Wesolowsky This forces suspect pairs into the output for analysis.

1) The students pre-chosen don’t have to be adjacent G.O. Wesolowsky 1) The students pre-chosen don’t have to be adjacent

G.O. Wesolowsky This means that a false positive should occur in fewer that 1 in a 100 classes or runs.

A marginal improvement in the model G.O. Wesolowsky A marginal improvement in the model

Straightness = normality Slope indicates stdev of Z’s G.O. Wesolowsky Vertical red line indicates similarity cutoff. Position depends on class size Straightness = normality Slope indicates stdev of Z’s Innocent class is symmetrical within the lines Typical class with no identified cheaters One possibly suspicious outlier, but outliers can happen by chance.

G.O. Wesolowsky

Forced pairs in NAM file G.O. Wesolowsky This pair was forced in as an illustration. The program would not have selected it otherwise. It has a negative Z and I hence not suspicious, Students are identified

Forced pairs in OUT file G.O. Wesolowsky Students are not identified, but there is more information

Diagnostics on questions G.O. Wesolowsky Diagnostics on questions 1) The program gives all the information necessary to manually check the model 2) Also has tables to check if answer keys are correct, questions are not flawed, etc. 3) This is not item analysis but I think more useful.

It’s Cheating time G.O. Wesolowsky Finally, what if there were dark deeds done during the test?

Detected Pairs Summary of significances of identified pairs G.O. Wesolowsky Detected Pairs Summary of significances of identified pairs --------------------------------------- pair Z A Priori Bonferroni Signif Signif. ----------------------------------------- 2, 78 7.891 1.5E-15 1.3E-11 2, 97 7.428 5.5E-14 4.8E-10 36, 69 6.253 2.0E-10 1.7E-6 36, 70 4.755 9.9E-7 8.5E-3 36, 72 5.514 1.8E-8 1.5E-4 60, 119 4.931 4.1E-7 3.5E-3 69, 70 6.527 3.3E-11 2.9E-7 69, 72 5.474 2.2E-8 1.9E-4 70, 72 6.527 3.3E-11 2.9E-7 78, 97 7.067 7.9E-13 6.9E-9 ---------------------------------------- All pairs were found in adjacent seating But some individuals have similarity links to more than one person!

G.O. Wesolowsky 36 69 70 72 2 78 97 60 119 These students were all in adjacent seating and proximity copying was possible 132 students teamwork

Have you ever seen anything like it? G.O. Wesolowsky Have you ever seen anything like it? (Contact at a testing agency) When I saw something like this previously, the question was a comment: “There is something wrong with your program”

Pair Z Pair Z Pair Z Pair Z 16, 39 6.453 16, 41 5.453 16, 42 6.090 G.O. Wesolowsky Pair Z Pair Z Pair Z Pair Z 16, 39 6.453 16, 41 5.453 16, 42 6.090 16, 44 5.051 16, 46 5.089 16, 50 5.074 16, 65 5.837 39, 41 7.385 39, 42 8.178 39, 43 6.196 39, 44 6.061 39, 46 6.916 39, 50 6.878 39, 57 5.662 39, 64 5.896 39, 65 7.887 39, 69 5.515 41, 42 6.958 41, 43 5.515 41, 44 5.043 41, 46 6.196 41, 50 5.811 41, 57 5.011 41, 64 4.907 41, 65 6.745 42, 43 5.786 42, 44 5.614 42, 46 7.236 42, 50 7.190 42, 57 5.293 42, 64 6.183 42, 65 7.458 42, 69 5.786 43, 46 5.386 43, 64 5.118 43, 65 5.660 44, 46 5.176 44, 50 5.083 44, 64 4.941 44, 65 5.590 46, 50 6.013 46, 57 4.933 46, 64 5.786 46, 65 6.337 46, 69 5.055 49, 65 4.964 50, 57 4.900 50, 64 5.737 50, 65 6.314 57, 65 5.106 64, 65 6.021 65, 69 5.660 82, 87 8.456 82, 109 7.750 82, 110 8.945 86, 92 6.381 87, 109 7.375 87, 110 8.456 91, 93 5.896 91, 100 7.385 91, 107 7.385 93, 100 6.592 93, 107 6.592 100,107 8.716 109,110 7.750 113,117 5.328 113,119 6.013 113,124 5.386 113,137 5.515 117,119 7.062 118,138 8.415 120,122 6.319 120,124 5.257 120,137 7.054 120,141 5.345 122,137 5.664 122,141 7.419 124,137 5.185 Students seem to have a lot of partners, to whom they are linked with high similarities

Really enthusiastic teamwork! G.O. Wesolowsky 82 113 16 87 49 117 109 39 109 119 50 110 110 41 120 57 86 92 122 118 42 64 124 138 43 137 65 91 93 100 107 1) How did “the firm” manage this? 2) I asked and was told this was sensitive and confidential, so I will never know. 3) The student numbers have adjacency tendencies, so it could be proximity copying, as opposed to electronics, but how? 44 141 69 46 35 suspects, 79 pairs, 200 students

A NEW OPTION SCheck Version 8a7. #### 03-05-2009 11:21:34 G.O. Wesolowsky A NEW OPTION SCheck Version 8a7. #### 03-05-2009 11:21:34 Bonferroni program selected significance bound is 0.01 _____________________________________________________________ ** pair = 48 188 ** Harpp-Hogan stat = #wr.mat/#diff = 2.12 ################################################################## Zb = 5.211 'equivalent' z from the BVP model Significance of Zb on a pre-selected pair = 9.4E-8 Approximate significance of program selected pair = 3.7E-5 Signif. bound (Bonferroni) on program selected pairs = 5.6E-3 #matches = 42 | 50 (mu,s)=( 24.832, 3.318) prop. right for 48 = 0.540 prop. right for 188 = 0.580 Quest. range = [ 1 50 ] NRT = 1.00 #students = 348 ---------------------------------------------------------------- .b.b...aa. a.ba..baa. ..cb..b.c. .c.e.aad.e .....b.a.b --------------------------------------------------------------- .b.b...aa. a.ba..baa. ..cbb.b.c. .c...d..ce .....b...a estimated match probabilities: 0.502 0.542 0.530 0.616 0.558 0.679 0.571 0.499 0.566 0.747 0.568 0.613 0.499 0.536 0.635 0.552 0.652 0.577 0.639 0.609 1.000 0.597 0.253 0.434 0.278 0.555 0.383 0.405 0.524 0.422 0.674 0.297 0.289 0.231 0.429 0.225 0.211 0.661 0.453 0.270 0.825 0.219 0.585 0.705 0.433 0.257 0.562 0.217 0.451 0.295

History of Applications G.O. Wesolowsky History of Applications Early studies for economics department For individual instructors Large assessment organizations National education departments “This is the only instance where separate examination venues have shown up and I have used your analysis on probably about 30,000 candidates.”

G.O. Wesolowsky Dallas Morning News Study on Cheating on the TAKS test (June 2007), an Application of SCheck DMN: “ The test scores of more than 50,000 students show evidence of cheating. Some of those students were the innocent victims of others copying their answers. But experts say most were likely either deliberately copying answers or had their answer sheets doctored by school staff. “ TEA response “Officials at the Texas Education Agency have consistently argued that statistical analysis can't prove cheating and that they must rely on other forms of evidence – like getting teachers to confess to misbehavior – in their investigations. TEA decided not to use data drawn from student answer sheets – even with evidence of widespread copying in a classroom. “

G.O. Wesolowsky Common Objections: We studied together, we are from a similar background, we are twins, etc. Some direct studies have been done. Note that a huge number of pairs is being looked at. We would expect that there would be a large number of highly similar pairs that couldn’t have cheated Would expect that if prevention is implemented the high similarity rate would continue

Both expectations have been proven false in thousands of data sets. G.O. Wesolowsky Both expectations have been proven false in thousands of data sets. Prior to electronic cheating methods, no very high similarities lacking the opportunity to cheat (adjacent seating) were found. Multiple versions of exams cause a drastic decrease in the cheating rate.

Pitfalls in interpretation G.O. Wesolowsky Pitfalls in interpretation High marks make detection difficult* Non-responses can invalidate the model Cheating must be substantial for detection Too much cheating can invalidate the model Hierarchical questions violate assumptions Data sets may be too small *In the extreme, if one student copies from another with a perfect test, there is no way statistical detection can distinguish this case from the case of two brilliant students with perfect papers. Scheck knows this.

Effective Prevention Measures G.O. Wesolowsky Effective Prevention Measures Multiple versions Randomized or assigned seating Seat spacing Invigilation Electronic counter-measures “Education on integrity” & ethics indoctrination

G.O. Wesolowsky December 2002 exams McGill University. Proper prevention measures are in place. finals midterms 2 17

G.O. Wesolowsky End Part 1

Some Controversial Personal Opinions G.O. Wesolowsky Some Controversial Personal Opinions Multiple choice tests provide a substantial component of grades in undergraduate courses. Statistical detection of copying or collusion on such tests has proven to be quite successful, and has in turn demonstrated that simple and non-intrusive prevention measures are very effective. Why then is this subject conspicuously absent in most discussions on cheating prevention strategies?

G.O. Wesolowsky The most common general attitudes of student leaders, instructors, and administrators towards cheating are remarkably similar See no Evil, Hear no Evil, Speak no Evil Genuine concensus

G.O. Wesolowsky Comments A “look the other way” strategy is actually the best one if the goal is to protect the integrity reputation of a university Why? Media assume that the cheating problem is proportional to the number of reported cases or the amount of discussion about cheating. Keeping quiet, therefore, creates the impression that there is no problem. A university that tries to do something about cheating often gets a reputation for being infested with cheating. (Otherwise, why would they talk about it?) No good deed goes unpunished.

Instructor: Not on my tests you won’t! G.O. Wesolowsky Instructor Attitudes Instructor: Not on my tests you won’t!

#1 Believe in justice? Feel personal betrayal? G.O. Wesolowsky Believe in justice? Feel personal betrayal? Not very many of these, and so their effect on the system is relatively minor.

G.O. Wesolowsky #2

G.O. Wesolowsky Instructor: I’m not a Policeman or Prison Guard, I’m an Educator and Researcher Translation: I am not going to charge any students with cheating or take any special prevention measures This sounds idealistic and has the added benefit of saving a lot of work and avoiding unpleasant confrontations 1) Find may more of these. 2) Variations, such as I’m too busy with research.

G.O. Wesolowsky Comments Implicit assumption: The university’s main function is to provide an education and grades are merely an unimportant side-issue. I disagree: The main product of a university is grades and degrees and diplomas. Without these, even if it continued to give a good education, the university would be out of business overnight. On the other hand, degrees without education (diploma mills) are a growing phenomenon. http://chronicle.com/free/v50/i42/42a00901.htm The assumption of a degree is that a required level of academic achievement has been met. Grades and degrees without this level of achievement are defective. By giving degrees with based on defective grades it is giving the public a defective product.

Postscript: Education versus Grades G.O. Wesolowsky Postscript: Education versus Grades By JANE ARMSTRONG Friday, January 27, 2006 Posted at 5:20 AM EST From Friday's Globe and Mail “Professor David Weale called it a "January clearance" -- and clear out they did. Dismayed by his crowded classroom, the history teacher at the University of Prince Edward Island offered his students a deal some couldn't resist: Drop this Christianity class and you'll get a B minus. “ Rule of 20? “The offer, also dubbed the "Weale deal" worked. The next week, about 20 of the 95 students were gone. So too is Prof. Weale after shocked administrators caught wind of the unorthodox academic transaction.”

G.O. Wesolowsky Addendum “Vice-president Gary Bradshaw said the school had no choice but to suspend Prof. Weale while a disciplinary probe begins. Offering students a credit without doing the work "strikes at the very heart of the academic principles," Mr. Bradshaw said.” The next week, he sweetened the offer, saying he'd give students who left a mark of 68. Students mulled it over during the break and negotiated the deal up to 70, which is a B minus. Departing students were required to send Prof. Weale an e-mail and pay the $450 for the course.

Student Leaders and Instructors G.O. Wesolowsky Student Leaders and Instructors

Student Leaders and Instructors: A Matter of Trust* G.O. Wesolowsky Student Leaders and Instructors: A Matter of Trust* Student Leader: “If you take prevention measures you show you don’t trust us and that cheating is expected. This will only increase cheating. Anyway, cheating is very rare.” Translation: My constituency would feel threatened. Instructor: “Cheating prevention and detection poisons the atmosphere and destroys the vital rapport and atmosphere of trust I have with my students” Translation: My teaching evaluations would go down. *paraphrased

G.O. Wesolowsky Administration

Administration: The Best Solution is ‘Ethics Reengineering’ G.O. Wesolowsky “E-tegrity board members, …,have developed a range of new initiatives to integrate integrity into the college’s culture.” “… suggest a more broadly focused approach that creates an educational community valuing academic integrity and focusing on the moral and ethical development of students” Sounds good and plays well in the media. Often arises out of bad publicity on cheating incidents. 1) Will it work if it is seriously implemented? McCabe versus the economists http://www.arts.ubc.ca/Why_Students_Cheat.116.0.html 2) Will it ever be implemented beyond the talk level? Web pages, articles in school newsletters, “integrity officers”, inviting McCabe. Usual outcomes: a) effort fades away b) majority of students and instructors are not involved. 3) Contrast this with prevention measures on MC tests, which have been shown to virtually eliminate cheating. The re-engineering initiatives usually crop up after some bad publicity. Local or national Economists: Cheating depends on opportunity payoffs and risks, - As Reagan would say” there they go again.” Psychologists: linking cheating to psychopathic tendencies. One study uses my software. 2) In the case of business schools it could be because of misbehavior in corporations. 3) The great majority of students are unaffected.

Comments on Honor Codes G.O. Wesolowsky Comments on Honor Codes

Honor Codes* Traditional Honor Codes Pledge and ceremonies G.O. Wesolowsky Honor Codes* Traditional Honor Codes Pledge and ceremonies Proctoring of tests not allowed Students in charge of tribunals Students obliged to report infractions (rarely enforced) Modified Honor Codes Proctoring allowed *McCabe

G.O. Wesolowsky Quote on Honor codes In this same study that found over 75 percent of students admitting to cheating, McCabe saw that only 57 percent of students cheated at schools with honor codes. Chronic cheating also seems to reduce with honor codes. At schools without codes, 1 in 5 students admits to cheating more than twice. Only 1 in 16 students admits to the same offense at schools with honor codes. "I think it's a question of making your students understand that academic integrity is important to the school," McCabe said. "Just the fact that it's being discussed" can heighten student's awareness and reduce cheating, he concluded. Administrator friendly comment.

Do Honor Codes Work? Comments G.O. Wesolowsky Do Honor Codes Work? Comments ?Only? 57% Response bias: ‘Honor’ rhetoric leads to fewer admissions of cheating? Fear of Draconian penalties? Control for other variables: Types of exams, subject matter etc. Low response rates → non-response bias “it is clear the response rate is below desired levels, averaging between 10% and 15% and varying from as little as 5% to 10% on some large campuses to over 50% on a limited number of small, residential campuses” http://www.ojs.unisa.edu.au/journals/index.php/IJEI/article/viewFile/14/9

Advantages of Honor Codes G.O. Wesolowsky Advantages of Honor Codes Media friendly Reduce the number of reported cases of dishonesty Disadvantages of Honor Codes?

G.O. Wesolowsky

G.O. Wesolowsky

G.O. Wesolowsky Recommendations* Make statistical testing (even without identifying the students involved) a non-optional part of the scanning report. Institute mandatory or strongly “encouraged” prevention measures: multiple versions, assigned seating, electronic counter-measures, etc.. As a last and least important step, use statistical testing to support charges of academic dishonesty. In other words, use it this way only after the cheating is cleaned up. Observation: Statistical evidence is very unlikely to be sufficient by itself. Other evidence, such as a proctor’s observations, will be required to make charges stick. * Plan is plagiarized from McGill

G.O. Wesolowsky End