Eastern Evaluation Research Society 32 nd Annual Conference – April 19-21, 2009 Evaluation in the Digital Age: Promises and Pitfalls The A Star Audit The.

Eastern Evaluation Research Society 32 nd Annual Conference – April 19-21, 2009 Evaluation in the Digital Age: Promises and Pitfalls The A Star Audit The A Star ® Auditof Group Test Administrations Eliot R. Long & Raghu Govindaraj A*Star Audits, LLC Iselin, NJ – Brooklyn, NY www.astaraudits.com A window onto classroom test administration practices

Problems in Test Administration Undermine the Usefulness of Test Scores Significant deviations from standardized test administration are found in all assessment environments, yet – Improbable response patterns are more than twice as frequent among classrooms as twice as frequent among classrooms as compared to, for example, employer testing. compared to, for example, employer testing. Improper Proctor influence is found to 1. Reduce the range of measurement with respect to the true range of test-taker achievement, to the true range of test-taker achievement, 2. Create generally lower and widely varying proctor group to proctor group test score reliability group to proctor group test score reliability 3. Create a test score modulator – with changes in achievement and proctor influence moving in achievement and proctor influence moving in the opposite, offsetting, direction. in the opposite, offsetting, direction. 4. Allow non-achievement related effects to drive test scores and misinterpretation of results. test scores and misinterpretation of results. see, Encouraged Guessing: Masking Variations in Achievement Gains Eliot R. Long www.astaraudits.com

The A*Star Method Measuring the ‘heartbeat’ of assessment 1. Identify normative group response patterns 1. Identify normative group response patterns 2. Identify the range of variation around the norm 2. Identify the range of variation around the norm 3. Measure each individual group against the norm 3. Measure each individual group against the norm in terms of the normal variation in terms of the normal variation 4. Identify those test-takers 4. Identify those test-takers whose responses contribute whose responses contribute to the group deviation. to the group deviation.

Class Profile of Test Item Success Individually, students may make a few lucky guesses, careless mistakes, or otherwise unexpected responses. As a class, these variations balance out and a stable pattern of achievement emerges. The pattern of success for students as a group may be illustrated by a simple plot of the percent correct at each test question. For this “Class Profile” the p-values are plotted in the same order that the questions appear in the test booklet. Easier questions Difficult questions

Class Comparison to its Skill Level Norm Interpretation of the class profile is supported by a comparison to an appropriate (peer group) norm profile The Norm - A peer group norm is created by grouping all classes at the same class average score. The norm p-values are represented by the upper margin of the shaded area. P-value Correlation We may evaluate the comparison of the class to its norm in many ways. One example is by a correlation of the class and norm p-values. The correlation in this example is: n = 50; r =.921 n = 50; r =.921 Norm

Norms Set by Peer Groups A range of norm profiles represent a range of skill levels Classes at different class average score levels are grouped to provide a range of skill level norms. Each individual class is then evaluated against a norm representative of its peers. Note that the same pattern of rising and falling p-values is repeated at each skill level, only at generally higher or lower p-values. Note also the exception for question #34. This item has confusing elements in the item stem and does not contribute to measurement.

Consistency of Test Administratio ns When all classroom profiles are correlated with their appropriate skill level norms, the distribution of correlation coefficients indicates the consistency of the school district test administrations. A comparison of classroom groups with job applicant groups (tested by employers) indicates lower consistency in classroom test administrations. Schools median r =.907 Employers median r =.958

Proctor Effects in a Northeastern School District Unexpected High Volume of Test Answers Variations in teacher encouraged guessing to complete all test answers creates non-skill related variation in response patterns. Two classrooms with the same class average score and yet substantially different student test work behavior. The percent of students who answer each test question is represented by the line with small stars. Correlation with the norm: r =.910Correlation with the norm: r =.827

Added Random Guessing Improves Agreement with the Norm When random guessing is added to replace answers left blank, the class average score is raised by 11% and the correlation with the new, higher skill level norm is raised from.827 to.906. Random guessing is part of the peer group norm. Test results as the test Results after adding 1/4 correct Test results as the test Results after adding 1/4 correct was administered. for each answer left blank. was administered. for each answer left blank. Correlation with the norm: r =.827 Correlation with the norm: r =.906 Correlation with the norm: r =.827 Correlation with the norm: r =.906

Proctor Effects in the Midwest Unexpected, High Volume of Answers From elementary schools in the Midwest – a grade 3 math test. Two schools with the same school average test score, one school with high test completion, one with substantially lower completion. The test was presented as two 35 item test booklets, administered in two sessions, one in the morning and one in the afternoon. Correlation with the norm: r =.910Correlation with the norm: r =.849 Correlation with the norm: r =.910Correlation with the norm: r =.849

Proctor Effects in the Mid-Atlantic Unexpected, High Volume of Answers Testing for a special, federally funded, summer program in a Mid-Atlantic State Grades 6 through 10 Test: A basic verbal skills, pre-employment test Skill Level Norms: Based on job applicants Norms do not include encouraged guessing. Proctor Encouraged GuessingNo Proctor Encouraged Guessing Proctor Encouraged GuessingNo Proctor Encouraged Guessing Correlation with the norm: r =.830 Correlation with the norm: r =.899

Encouraged guessing Effect on classroom test reliability (KR-20) Teacher involvement in their students’ test work behavior Teacher involvement in their students’ test work behavior to encourage guessing is entrepreneurial, often undermining to encourage guessing is entrepreneurial, often undermining test score reliability. test score reliability. 50+ Answers Left Blank No Answers Left Blank 50+ Answers Left Blank No Answers Left Blank 42 classrooms at and below average 330 classrooms at and below average 42 classrooms at and below average 330 classrooms at and below average likely to have little encouragement to guess likely to have extensive encouragement to guess likely to have little encouragement to guess likely to have extensive encouragement to guess

Two Low Performing Classrooms Two Different Forms of Encouragement to Guess Both classes perform roughly as expected over the first 20 test items – the different proctor Methods to encourage guessing over 30 test items. Encouragement to Encouragement to Encouragement to Encouragement to Guess at Random Guess by choosing ‘C’ Guess at Random Guess by choosing ‘C’

Improper Proctor Influence Proctor influence ranges from positive to moderately negative to a serious undermining of the assessment. Significant improper influence leads to measurable deviations in classroom response patterns. Response pattern probability: P < 0.01 Response pattern probability: P < 0.001 Response pattern probability: P < 0.01 Response pattern probability: P < 0.001 Correlation with the norm: r =.713 Correlation with the norm:.579 Correlation with the norm: r =.713 Correlation with the norm:.579

Subject Group Analysis Identify those most likely subject to improper influence Most often, improper teacher influence is unplanned and disorganized. Most often, improper teacher influence is unplanned and disorganized. Yet, where the influence is persistent, subsets of students will be identified Yet, where the influence is persistent, subsets of students will be identified with matching, unlikely response patterns. with matching, unlikely response patterns. Subject Group: n = 8 of 17 Subject Group: n = 12 of 19 Subject Group: n = 8 of 17 Subject Group: n = 12 of 19 Response pattern probability P = 3.43e-9Response pattern probability: P = 6.68e-15 Response pattern probability P = 3.43e-9Response pattern probability: P = 6.68e-15 Correlation with the norm: r =.498 Correlation with the norm: r =.342 Correlation with the norm: r =.498 Correlation with the norm: r =.342

Whole Test Manipulation Same Teacher – Two Successive Years The first year begins normally and becomes increasingly The first year begins normally and becomes increasingly irregular. The second year begins irregular and continues over the entire test – indicating a preplanned intent to control the testing outcome. Grade 5 Reading – First Year Grade 5 Reading – Second Year Grade 5 Reading – First Year Grade 5 Reading – Second Year Correlation with the norm: r =.760 Correlation with the norm: r =.487 Correlation with the norm: r =.760 Correlation with the norm: r =.487

Self-Correction Following the second year reading test, the teacher was notified that her testing practices were under investigation. Three weeks later, her administration of the math test was remarkably improved. Grade 5 Math – Second Year Grade 5 Math – Second Year Pattern correlation with the norm: r =.947 Pattern correlation with the norm: r =.947Note: The teacher was not given any instruction on how to change her test administration practices. She was only told that irregularities had been found in her students’ test answers. The next test administration resulted in an essentially perfect response pattern.

Encouraged Guessing Encourages Improper Influence Identified patterns of improper influence most often arise in the area where students would otherwise guess or leave answers blank. Class without improper influence. Class with likely improper influence. Class without improper influence. Class with likely improper influence. Note higher performance than the normNote lower performance than the norm Note higher performance than the normNote lower performance than the norm over early test items. Seemingly erraticover early test items. Erratic p-values at over early test items. Seemingly erraticover early test items. Erratic p-values at p-values at the end are due the fewer, higherthe end involve most or all students, p-values at the end are due the fewer, higherthe end involve most or all students, achieving students completing the test.unlikely without proctor involvement. achieving students completing the test.unlikely without proctor involvement.

Summary Teacher encouraged guessing is widely practiced and contaminates test results with non-skills related test answers – undermining test score reliability Teacher encouraged guessing requires teacher intervention into their students’ test work behavior - opening a wide opportunity for proctor influence –sometimes confused, sometimes well-meaning, sometimes highly improper. An analysis, such as the A*Star Audit, may provide a well supported, informative review, identifying the location, character, and severity of proctor influence. Conducted on a routine basis, test administration reviews provide the opportunity to teach good test administration practices and serve to encourage self-improvement. A major opportunity exists to substantially improve the quality of test score information, promising to enrich the efforts in program evaluation and accountability.

The A*Star Audit Report The A*Star Audit report provides a comprehensive listing all proctor groups along with measures of each group’s consistency with its appropriate norm.

Multiple Grade Level Report When reviews are conducted for all grade levels – or for sequential years – institutional patterns may be evaluated. – or for sequential years – institutional patterns may be evaluated.

The A Star ® Audit A comprehensive review of all classroom groups - Measures each class by the variation of its peers - Provides a detailed analysis of each irregular group Conducted quickly, independently, without interruption of school activities. - Audit based on routinely collected test results - Audit processed by sophisticated, patented, computer software, monitored by application specialists, and reported by psychometricians. monitored by application specialists, and reported by psychometricians. Audit results evaluate individual groups and overall assessment quality - Individual classrooms are proactively identified where problems exist - Consistency in test administrations may be reported by classroom, by class achievement level, and by district and school. class achievement level, and by district and school. Audit results are provided as a printed report and as a searchable electronic file. - A printed report lists each classroom/school group with its test results and its standing on five A*Star measures of response pattern consistency. its standing on five A*Star measures of response pattern consistency. - An electronic file provides the same classroom/school group measures of response pattern consistency in a searchable file along with the opportunity response pattern consistency in a searchable file along with the opportunity to further evaluate selected groups and to print reports of selected group to further evaluate selected groups and to print reports of selected group results. Data may be exported to facilitate analysis and reporting. results. Data may be exported to facilitate analysis and reporting.

Eastern Evaluation Research Society 32 nd Annual Conference – April 19-21, 2009 Evaluation in the Digital Age: Promises and Pitfalls The A Star Audit The.

Similar presentations

Presentation on theme: "Eastern Evaluation Research Society 32 nd Annual Conference – April 19-21, 2009 Evaluation in the Digital Age: Promises and Pitfalls The A Star Audit The."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Eastern Evaluation Research Society 32 nd Annual Conference – April 19-21, 2009 Evaluation in the Digital Age: Promises and Pitfalls The A Star Audit The.

Similar presentations

Presentation on theme: "Eastern Evaluation Research Society 32 nd Annual Conference – April 19-21, 2009 Evaluation in the Digital Age: Promises and Pitfalls The A Star Audit The."— Presentation transcript:

Similar presentations

About project

Feedback