Overview of Caveon Data Forensics

Overview of Caveon Data Forensics
Steve Addicott Dennis Maynes March 2011

Outline of Presentation
Overview of Caveon Data Forensics (DF) Process FDOE DF Goals DF Tools & Methods Spring 2011 FCAT DF Program Conservative thresholds Students & Schools Summary Q&A

Caveon Test Security… “The 3Ps”
Proven Best in Class Serving the largest test programs in the world Practical Identify unusual test taking and respond appropriately Protection Integrity of exams, DOE reputation, students’ opportunities

Caveon Data Forensics™ Process
Analyses of test data First building a “model” of typical question responses Identify unusual behaviors with potential of unfair advantage

Caveon Data Forensics Process (cont)
Examples of “Unusual” Behavior Very high agreement among pairs or groups of test takers Very unusual number of erasures, particularly wrong to right. Very substantial gains or losses from one occasion to another High agreement among responses…that is, the same answers..

Overview of the Use of Data Forensics
Many high stakes testing programs now using Data Forensics Standards for Testing, e.g., “CCSSO’s Operational Best Practices for State Assessment Programs” Essential to act on the results This is the wave of the future….

FDOE Data Forensics Goals
Uphold fairness and validity of test results Identify risks and irregularities Take action based on data and analysis “Measure and Manage” Communicate Zero Tolerance for Cheating

Testing Examiner’s Role
Ensure (and then certify) the test administration is fair and proper Declare scores invalid when fairness and validity are negatively impacted Decision depends upon fairness and validity, not whether an individual cheated

Forensic Tools and Methods
Similarity: answer-copying, collusion Erasures: tampering Gains: pre-knowledge, coaching Aberrance: tampering, exposure Identical tests: collusion Perfect tests: answer key loss 9

Similarity Our Most Powerful & “Credible” Statistic
Measures degree of similarity between 2 or more test instances Analyze each test instance against all other test instances in the test Probable causes of extremely high similarity: Answer Copying Test Coaching Proxy Test Taking Collusion Non-independent test taking…students are not doing their own work, or some variation of it. 10

Erasures Based on estimated answer changing rates from:
Wrong-to-Right Anything-to-Wrong Find answer sheets with unusual WtR answers Extreme statistical outliers could involve tampering, “panic cheating”, etc. Large numbers of wrong to right. Could indicate tampering, or copying. If change from anything to wrong, that is counter evidence 11

Unusual Gains/Losses Predict score using prior year info.
Measure large score increases/decreases against predicted score Which score truly reflects the student’s actual ability or competence? Extreme Gains/Losses may result from: Pre-knowledge, ie “Drill It and Kill It” Coaching Student development—visual acuity Last bullet…statistical results…it is possible that some unusual data result from some reasonable explanation… 12

Spring FCAT Data Forensics
Focus on two groups Student-level School-level Utilize VERY conservative thresholds 13

A quick discussion of conservative thresholds….
Redskins winning 2011 Super Bowl = 1 in 50 Chance of being hit by lightning = 1 in a million Chance of winning the lottery = 1 in 10 million Chance of DNA false-positive = 1 in 30 million Chance of tests being flagged and taken independently = 1 in a TRILLION False positive : incorrect match..they matched, but its different people… Independent means they did their own work 30,000 times less likely than a DNA match… This is so far outside of what we might anticipate seeing.. 14

Student-level Analysis
Similarity Analysis only Most credible Chance of tests being so similar, and taken independently = 1 in a trillion Invalidate test scores beyond 1012 Fairness and Validity of test instance must be questioned Appeals process to be implemented 15

Example of Flagged Examinees
16

Example: 9th Grade Math Cluster
Identifies apparent student collusion Definitions “Dominant” = same answer selected by majority of group members “non-Dominant” = different answer selected by majority of group members Example of 2 students that passed, but not independently ie, they didn’t do their own work.

Impact of “1 in a Trillion” Threshold, Math & Reading 2010
Grade N Flagged Students 3rd 408,317 144 4th 394,039 103 5th 390,714 92 6th 387,502 224 7th 393,401 245 8th 387,190 69 9th 401,046 622 10th 360,176 57 Totals 3,122,385 1,556 3rd grade is 4/100ths of a percent… 9th grade is 2/10ths of a percent

School-Level Analysis
Similarity, Gains, and Erasures Flagged schools conduct internal review Extreme instances may prompt formal investigations and sanctions 20

Benefits of Conservative Threshholds
Focus on most egregious Instances Provides results that are Explainable Defensible Can move later to different thresholds Easier to manage Walk before we run Assessment & Accountability Advisory Cmmttee in Feb

Program Results Monitored behavior improves
Invalidations deter cheating Speak to bullets specifically

Summary Goal: Fair and valid testing for all students
DOE to conduct Data Forensics on FCAT test data Focus on Individual students -- extremely simililar tests Schools—Similarity, Gains, and Erasures

Follow Up Questions? Victoria.Ash@fldoe.org

Overview of Caveon Data Forensics

Similar presentations

Presentation on theme: "Overview of Caveon Data Forensics"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Overview of Caveon Data Forensics

Similar presentations

Presentation on theme: "Overview of Caveon Data Forensics"— Presentation transcript:

Similar presentations

About project

Feedback