Presentation is loading. Please wait.

Presentation is loading. Please wait.

C R E S S T / Harvard Daniel Koretz Harvard Graduate School of Education National Center for Research on Evaluation, Standards, and Student Testing “Believe.

Similar presentations


Presentation on theme: "C R E S S T / Harvard Daniel Koretz Harvard Graduate School of Education National Center for Research on Evaluation, Standards, and Student Testing “Believe."— Presentation transcript:

1 C R E S S T / Harvard Daniel Koretz Harvard Graduate School of Education National Center for Research on Evaluation, Standards, and Student Testing “Believe me, it’s not cheating, but some strange method” Annual CRESST Conference September 11, 2002, Los Angeles, CA GRE/TOEFL prep teacher, Shanghai

2 C R E S S T / Harvard 2 Validating inferences in the age of NCLB è Validity is a property of inferences, not of measures è Key inferences are now about gains obtained under high stakes conditions è Traditional validation is insufficient n Inappropriate framework n Insufficient methods è Risk is false positives: inflated gains

3 C R E S S T / Harvard 3 Map of talk è Will not show evidence of severe inflation—old hat by now è Will discuss approach to validation of gains è Will illustrate possible leverage points for coaching, inflation of scores è Will note possible directions for future

4 C R E S S T / Harvard 4 Why traditional validation is insufficient è Cross-sectional, insensitive to changes in levels of performance è Insufficient in high-stakes contexts: n Largely ignores behavioral responses to testing n Ignores inadvertent emphases in tests n Assumes stability in relationships between aspects of performance, both tested and untested

5 C R E S S T / Harvard 5 Why these limitations matter è Scores can rise rapidly—and be inflated— without affecting correlations among tests è Behavioral responses to testing (e.g., coaching) can make sampled content unrepresentative of domain after initial validation è Inadvertent emphases in tests can provide leverage for coaching

6 C R E S S T / Harvard 6 KY math trends, KIRIS and ACT

7 C R E S S T / Harvard 7 Correlations Between ACT and KIRIS Mathematics

8 C R E S S T / Harvard 8 Keys to validating gains è Assess generalizability of gains to other (audit) measures è Determine how much generalizability should be expected n Based on users’ inferences (example of TAAS vs. NAEP) è Examine behavioral responses

9 C R E S S T / Harvard 9 CRESST work on the validation of gains è Develop framework for validation efforts (Tech Report 551) è Explore teacher surveys and interviews as a means of obtaining information behavioral responses to testing (ongoing) è Develop statistical models for the analysis of gains (new)

10 C R E S S T / Harvard 10 Framework for validating gains è Identify substantive and nonsubstantive performance elements in test, inferences è Determine weights given to PEs in test n May be unintended n May be trivial or zero è Determine weights given to PEs in key inferences about gains è Validity hinges on consistency of change in performance on PEs with inference weights

11 C R E S S T / Harvard 11 Types of test preparation è Teaching more è Working harder è Working more effectively è Reallocation è Alignment è Coaching è Cheating

12 C R E S S T / Harvard 12 Reallocation è Refers to shifting limited instructional resources among substantive areas n Within subject n Between subjects è Results in reallocating achievement è Can lead to either meaningful change or inflation è Inflates by undermining representation of the domain

13 C R E S S T / Harvard 13 Alignment è Sometimes presented as providing protection against inflation: emphasis on PEs deemed important è But this is just a form of reallocation è Whether gains are inflated depends on n Importance of emphasized material to inference, and n Importance of de-emphasized or omitted material to inference

14 C R E S S T / Harvard 14 Coaching è Focuses on details of the test n Substantive, including item style n Non-substantive, such as item formats and scoring rubrics è Includes test-taking tricks (e.g., POE, plug-in) è Can inflate scores or simply waste time

15 C R E S S T / Harvard 15 Possible levers for coaching è Possibly inadvertent content overweighting è Item style n Recurrent content detail n Recurrent form of presentation è Inadvertent, recurrent construct underrepresentation è Recurrent cognitive demand with limited construct relevance

16 C R E S S T / Harvard 16 Eva has four sets of straws. The measurements of the straws are given below. Which set of straws could not be used to form a triangle? A. Set 1: 4 cm, 4 cm, 7 cm B. Set 2: 2 cm, 3 cm, 8 cm C. Set 3: 3 cm, 4 cm, 5 cm D. Set 4: 5 cm, 12 cm, 13 cm Item from G8 MCAS

17 C R E S S T / Harvard 17 Each arrangement in this pattern is made up of tiles. How many tiles will be in the 6 th arrangement in the pattern? Item from G8 MCAS

18 C R E S S T / Harvard 18 Prompt from G8 MCAS Use the balance scales below to answer the question below

19 C R E S S T / Harvard 19 Prompt from G10 NAEP Use the unit of length below to estimate the perimeter of the figure shown. Between which two consecutive whole-number units does the perimeter lie?

20 C R E S S T / Harvard 20 Prompt from G10 MCAS Use the map below to answer this question.

21 C R E S S T / Harvard 21 Prompt from a G8 KIRIS item

22 C R E S S T / Harvard 22 Prompt from G10 MCAS Use the figure below to answer the next question

23 C R E S S T / Harvard 23 Answers for G10 MCAS prompt If the figure above is folded into a cube, which of the following solids will be formed?

24 C R E S S T / Harvard 24 Next steps for research è Develop methods for ascertaining which levers teachers use to inflate scores è Develop methods for identifying systematically the patterns in tests that facilitate or inhibit coaching and inappropriate reallocation è Develop methods for ‘unpacking’ lack of generalization and for better distinguishing between meaningful gains and inflation


Download ppt "C R E S S T / Harvard Daniel Koretz Harvard Graduate School of Education National Center for Research on Evaluation, Standards, and Student Testing “Believe."

Similar presentations


Ads by Google