Presentation on theme: "Panel 1 Prevention of Irregularities in Academic Testing Panelists Audrey Amrein-Beardsley: Arizona State University Gregory J. Cizek: University of North."— Presentation transcript:
Panel 1 Prevention of Irregularities in Academic Testing Panelists Audrey Amrein-Beardsley: Arizona State University Gregory J. Cizek: University of North Carolina -Chapel Hill James S. Liebman: Columbia University Law School Scott Norton: Louisiana Department of Education
Degrees of Cheating and the Prevention of Testing Irregularities Audrey Amrein-Beardsley: Arizona State University
Degrees of Cheating and the Prevention of Testing Irregularities Background Phelps (2005) wrote, “If a teacher’s performance is judged, in whole or in part, on the basis of their students’ test results, certainly they are given an incentive to cheat” (p. 49). Nichols and Berliner (2007) wrote “high-stakes testing almost always corrupts the indicators used and the educators judged by such tests.” Campbell’s Law (1976): "The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” Sacks (1999) questioned its prevalence, arguing that it is “more likely rare than common” (p. 126). Shephard (1990) argued a decade earlier that cheating “is generally believed to occur in a very tiny percentage of schools (1–3%)” (p. 20). Recently analysts found that one in five elementary and middle schools across the state of Georgia submitted “highly abnormal” answer sheets with almost 90% of one school’s scores suspect (Fausset, 2010). Newspaper articles continue to be published about teachers and administrators accused of cheating on high- stakes tests, yet an accurate number of incidences of cheating is still unknown.
Theoretical Assertions -We can assume with confidence that there are many more incidents of cheating on high-stakes tests than are reported in the media. -We can assume with confidence that this has become increasingly apparent post- NCLB. -But what is cheating? -And might we understand this better to prevent such incidences from: – Causing dramatic irregularities? – Distorting valid interpretations? Degrees of Cheating and the Prevention of Testing Irregularities
Empirical Approach -Research study investigated the types of, and degrees to which, teachers in Arizona (n= 3,000) engaged in test-related cheating practices on state-mandated, high-stakes, large-scaled tests. -Knowing that these behaviors create irregularities in academic testing, particularly when high-stakes consequences are attached to test outcomes, the goal here was to inform testing policies and procedures. -Led to taxonomy of cheating based on definitions of 1st, 2nd, and 3rd degree offenses in field of law. -Taxonomy useful for understanding and preventing further irregularities in academic testing. Degrees of Cheating and the Prevention of Testing Irregularities
Taxonomy Cheating in the First Degree - willful and premeditated, the most serious and most worthy of sanctions – E.g., erasing and changing students’ test answers, filling in bubbles left blank, overtly and covertly providing correct answers on tests, falsifying student ID numbers, excluding/suspending students with poor academic performance Cheating in the Second Degree – often more subtle, defined more casually, not necessarily premeditated or with malintent – E.g., encouraging students to redo problems or double-check their work, accommodating for “stupid mistake,.” distributing “cheat sheets,” talking students through processes and definitions, giving extra time on tests or time during lunch, recess, and before or after school Cheating in the Third Degree - caused by indifference, recklessness, or negligence, also referred to as involuntary cheating – E.g. “teaching to the test” promulgated by test sensitization over time and access to test blueprints, “narrowing of the curriculum” while excluding or postponing the other educationally relevant things, creating and hyper-utillizing “clone items,” inordinately focusing on test taking strategies These contribute to testing irregularities Degrees of Cheating and the Prevention of Testing Irregularities
Figure 1. Respondents’ awareness of other(s)’ and self-admitted cheating practices Cheating Behavior Key: a. Erased and changed test answers; b. Gave answers to students; c. Pointed to correct answers; d. Changed student identification numbers in order to eliminate low-scoring students’ results; e. Not tested academically weak students; f. Asked students to be absent during testing days; g. Encouraged others to cheat; h. Encouraged students to redo problems; i. Coached students during the test; j. Created cheat sheets for students to help prepare them for tests; k. Read questions to students when not allowed to do so; l. Gave students extra time on tests; m. Allowed students to work on tests during lunch, recess, or specials areas; n. Allowed students to figure out answers on scratch paper for them to check prior to having students mark answer sheets; o. Made copies of tests in order to teach to the tests; p. Wrote down questions to prepare for following years’ tests; q. Wrote down vocabulary words to teach to following years’ classes; r. Kept test booklets in order to familiarize students with the tests; s. Left materials on the walls knowing it would help students.
What to do? First Degree Recommendations: Keep tests secure before and after administrations. – E.g., prevent exposure, photo copying, item transformation and cloning, year-to-year use of old forms, etc. Have the least likely to distort and artificially inflate in charge of testing procedures. – E.g., decrease likelihood of oral emphasis of correct answers, rewording problems, defining key terms, extra time on tests, test manipulation, “cleaning up the tests,” etc. Administer tests in artificial environments. – E.g., administration should occur where access to curricular materials, resources, and visuals is most limited Put in place policies to ensure that the most “undesirable” or lowest scoring students cannot be exempted or excluded from participating in tests. – E.g., SPED or ELL exemptions, absences, suspensions
What to do? Second and Third Degree Recommendations: Do not become overly dependent on technical solutions. – E.g., erasure analyses and other technical approaches can only do so much A healthy testing culture is most important. – Many do not consider what they are doing, and while one educator might consider some of these practices smart, others might consider them unprofessional or unethical – Some irregularities occur as part of district or school test preparation and administration policies Parent/Teacher/Administrator, anonymous, whistle-blowing system. – Many believed that when cheating was reported it was often ignored or consequences were negligible Come to collective understandings and local policies about professional and ethical test preparation and testing practices. – Collectively determine whether things like “teaching to the test,” “narrowing the curriculum,” focusing on “bubble kids,” etc. are appropriate Keep decisions and policies in the best educational interests of the children. – Particularly in our highest-needs schools
Conclusions When pressured to do well on high-stakes tests, educators engage in quite clever practices, besides just teaching well, to get higher test scores. Teachers and administrators who engage in such practices are not all unethical or unprofessional. Remember Campbell’s Law (1976): "The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” In this context, what they do distorts test scores, causes irregularities, and violates the extent to which we can make valid conclusions about what students and teachers are doing in America’s schools.
Prevention Strategies to Ensure Test Score Integrity: Shared Responsibilities Gregory J. Cizek: University of North Carolina - Chapel Hill
Prevention Strategies to Ensure Test Score Integrity: Shared Responsibilities * Students * Educators * Test Developers/Publishers * Professional Associations
Prevention of Irregularities in Academic Testing: Shared Responsibilities Students * embrace personal, community academic integrity * learning vs. performance goals * report concerns
Prevention of Irregularities in Academic Testing: Shared Responsibilities Educators * training in assessment; qualification * conscientious re: assessment purposes, administration conditions, procedures, duties * responsibilities for review/monitoring for progress, irregularities * disseminate procedures and actions * proctoring assignments
Prevention of Irregularities in Academic Testing: Shared Responsibilities Test Providers / Contractors (Broader Initiatives) * clear definition of cheating * clear, educator-referenced materials * web-based qualification utility, database? * less corruptible formats * CBT/CAT delivery
Prevention of Irregularities in Academic Testing: Shared Responsibilities Test Providers / Contractors (Narrower/Technical Initiatives) * seating charts * batch headers * chain of custody * just in time delivery * NDAs, signed statements (with penalties) * purposeful test design * consultation on detection methods
Prevention of Irregularities in Academic Testing: Shared Responsibilities Policy Development * National Council on Measurement in Education (NCME) model policy draft
Prevention of Irregularities in Academic Testing: Shared Responsibilities Concluding thoughts * Should we be surprised? * Should we be shocked? * Wrong reactions * Best approach is prevention * Plenty that can be done * Shared responsibilities
An SEA Perspective: Test Security Policy and Implementation Issues Scott Norton: Louisiana Department of Education
SEA Policy Development In Louisiana, the state: – Establishes and defines state test security policy (since 1998) – Requires districts to submit their district test security policies to the state – Requires Oath of Security for all students, test administrators, proctors, school test coordinators, and principals – Establishes procedures for management of secure test materials – Allows the State Superintendent to cancel suspect test scores – Requires several types of analyses Erasure analysis procedures Suspected violations in written responses (constructed responses, short answers, and essays) – Establishes procedures to deal with breaches of test security Violation by student as observed by test administrator Reported violations by school personnel Suspected violations discovered by scoring contractors
Test Monitoring LDOE conducts on-site monitoring during spring and summer testing. Training is provided for monitors and follow up is required when findings are noted. Any school with a record of prior test security problems is usually monitored; the remaining visits are scheduled at random. Of about 1,400 public schools, the LDOE monitors about schools each spring.
Procedures for Plagiarism Vendors produce information regarding written responses that indicate – Use of unauthorized materials – Copying another student’s response – Possible teacher interference Independent reviews are conducted by professional assessment and related-content personnel at the department to determine if there is sufficient evidence to void the tests.
Erasure Analysis Vendors scan every answer document for wrong- to-right erasures, and the state average and standard deviation are computed for each subject at each grade level. Students whose wrong-to-right erasures exceed the state average by more than four standard deviations are identified for further investigation. For each student with excessive erasures, the proportion of wrong-to-right erasures to the total number of erasures is considered.
District Voids Each year, districts train all test administrators in proper test administration procedures, including the state and district test security policies. State policy includes procedures for conducting investigations and reporting testing irregularities. Districts may void scores based on their findings.
Areas for Improvement More state oversight is needed for district-led investigations. Standardization across states may be needed for established procedures such as erasure analysis. Better information is needed about other statistical analysis procedures for detecting suspect patterns of responses, unusual gains or losses, etc.
Effective Test Security Practices: New York City and State James S. Liebman: Columbia University Law School
Fallacies and Antidotes Two Fallacies Cheating neutralizes the value of testing Despite test stakes, cheating is not a serious risk Two Antidotes Inform educators and demonstrate through actions that your state or district takes the risk, and reality, of cheating seriously Inform the public of the same
Avoiding First-Degree Cheating - 1 Keep test windows short – Administer test to each grade on same day statewide Deliver, unwrap and return materials shortly before and after test administration – Deliver shrink-wrapped materials shortly before test date, with “pre-slugged” class packs also shrink-wrapped – Store under lock and key accessible only to Principal – Open school-wide shrink-wrap within 1 hour, and class pack shrink-wrap within 15 minutes, of test administration. Testing begins no later than 9:15 a.m. – Deploy unannounced central staff, with checklist, to monitor opening of shrink- wrap (~ 10%) – Principal (for school) and teacher (for classroom) certify in writing that they received and returned the same number of test booklets – Scan answer sheets, or courier test booklets, to central scoring locations by end of testing day with letter of explanation for late deliveries – Distribute answer keys, rating guides after answer documents are out of schools
Avoiding First-Degree Cheating - 2 Score tests off-site using scorers with no vested interest – Automatic scoring (uniform, scannable answer sheets) – Distributed scoring of constructed responses (campus, regional) Assign multiple scorers for constructed responses, essays, etc. – Manual comparisons to other scorers of same booklets by “table leader” – “Read behinds” by table leader using exemplars – Ban on school-based rescoring NYCDOE Distributed Scoring RFP – Electronic collection of scanned answer sheets, constructed response booklets – Electronic distribution to multiple scorers, none with vested interest – Electronic scoring, with test booklet, scoring rubric, exemplars visible at once – Automated monitoring for unreliable scorer patterns (comparisons to other scorers of same booklet, items, group of items, for median score, time taken to score, etc.) – Option to provide automated (AI) scoring of constructed responses, essays, etc. – Aggregation of scores and reporting for each child – Answer sheets and scored constructed responses preserved for formative analysis
Avoiding 1 st and 2 nd Degree Cheating Assign test coordinator in each school; train; turnkey Develop and widely distribute Test Administration Handbook Principals, proctors certify familiarity with Handbook and awareness of disciplinary implications of violations FAQs Keep an open line for reports of infractions – Teachers as allies – Chancellor Walcott’s invitation – Duty to report immediately – Multiple locations (principal, monitors, local test office, state test office, “special investigations”) – Allow anonymity; offer confidentiality
Avoiding Second-Degree Cheating Deliver proctor’s instructions from front of room – Only answer general questions about test instructions (front of room) – Give no information about individual Q’s and A’s collectively or individually – Collectively, not individually, instruct students to check answers for misalignment Actively proctor, without papers or pen or pencil in hand Permit students to go to the restroom accompanied by hall monitor; record time absent; provide extra time to finish while rest of class sits with booklets closed Collect all booklets immediately after last student is finished Cover or remove classroom learning aids Unblock windows in testing room doors or keep door open Deploy unannounced central staff, with checklist, to monitor above (10%)
1 st and 2 nd Degree Open Questions Erasure (pencil) and wrong-answer (pen) analysis – NY State seeking $1 million/year for 10% coverage (500,000 answer sheets – ELA + math) Statewide inter-rater reliability analysis by test vendor – NY State seeking $700,000/year Fully computerized test administration and scoring (RTTT- A) – Artificial Intelligence scoring of constructed answers – Computer-enhanced items Third-party scoring Proctors other than the teacher in that classroom, esp. when results have direct stakes for teachers
Avoiding Third-Degree Cheating Rigorous measurement and accountability systems that place rote teaching-to-the-test “below the line” – Better standards and tests, to which you want teachers to teach (RTTT-A) – “Balanced Scorecard” accountability that assesses quality of data-driven differentiated instructional strategies as well as outcomes – Ban on copying past tests or use for “test-prep” purposes “Koretz” tests for scores predicting better pattern of “anchor” outcomes than occurs (http://www.rand.org/pubs/monograph_reports/MR1014/index.html)http://www.rand.org/pubs/monograph_reports/MR1014/index.html
“Koretz” Test: Example 1 Changes in mathematics scores on KIRIS and ACT tests
“Koretz” Test: Example 2
Prevention of Irregularities in Academic Testing Panelists’ Addresses: Audrey Amrein-Beardsley: Gregory J. Cizek: James S. Liebman: Scott Norton: