Investigations into Comparability for the PARCC Assessments

Investigations into Comparability for the PARCC Assessments
2015 National Conference on Student Assessment Enis Dogan

Comparability as a priority
Comparability has been a central priority for PARCC Across states Across forms within year Across years TAC members Ric Luecht and Wayne Camara authored a paper titled “Evidence and Design Implications Required to Support Comparability Claims” in 2011. . We indicate the relevant standard for each condition/outcome from the Standards for Educational and Psychological Testing (AERA, APA, NCME, 1999). For the empirical studies, the source of validity evidence is classified into five categories in accordance with the Standards (p ): (1) evidence based on test content, (2) evidence based on response processes, (3) evidence based on internal structure, (4) evidence based on relations to other variables, and (5) evidence based on consequences of testing

“In order to compare two or more test scores, we need to ask a basic question. Are the constructs underlying those scores the same, similar, or different? The measurement literature often suggests a basic duality as to what is being measured: the same construct versus different constructs. However, there are degrees of sameness.” “Equating leads to what is sometimes referred to as score interchangeability (Brennan & Kolen, 1995; Holland & Dorans, 2006). After equating, it ought to be a matter of indifference to students, teachers, administrators or policy makers as to which form of the same test or which items each examinee sees.” 11 states and DC Tie to policy considerations Include HE involvement

Different expectations for different aspects of comparability Is it the same construct? Across device Between modes Among EOC assessments for each pair for the two sets of assessments 11 states and DC Tie to policy considerations Include HE involvement

Mode and Device comparability
Test Administration Mode and Devices study with operational data Item/Task-Level Comparability Do the individual items/tasks perform similarly and rank order similarly across different devices? For items which appear in both CBT and PPT modes, do the individual items/tasks perform similarly and rank order similarly across different modes? Test-Level Comparability Would students receive similar scale scores and be consistently classified into performance levels across different modes and devices? Are the psychometric properties of the test scores (e.g. factor structure, reliability, difficulty) similar across different modes and devices?

Comparability of HS Mathematics End-of-Course Assessments
Examine predictive validity? Examine factor structure over 3 assessments in each pathway. “Focus should be on comparability of scores for students who take all three assessments in one pathway to scores for students who take all three assessments in the other pathway” (Kolen, NCME 2015) What will this mean for cut scores?

About the methodology DIF CFA Comparison of p-values
Comparison of IRT parameter estimates DIF CFA 11 states and DC Tie to policy considerations Include HE involvement

Role of Research Haladyna (2006): “Without research, a testing program will have difficulty generating sufficient evidence to validate its intended test score interpretations and use…The planning, designing, creating, and administration of any testing program are highly dependent on a body of knowledge that comes from research and experience” (p. 739). These studies are excellent examples for applied research in assessment development and validation . We indicate the relevant standard for each condition/outcome from the Standards for Educational and Psychological Testing (AERA, APA, NCME, 1999). For the empirical studies, the source of validity evidence is classified into five categories in accordance with the Standards (p ): (1) evidence based on test content, (2) evidence based on response processes, (3) evidence based on internal structure, (4) evidence based on relations to other variables, and (5) evidence based on consequences of testing

Enis Dogan

Investigations into Comparability for the PARCC Assessments

Similar presentations

Presentation on theme: "Investigations into Comparability for the PARCC Assessments"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Investigations into Comparability for the PARCC Assessments

Similar presentations

Presentation on theme: "Investigations into Comparability for the PARCC Assessments"— Presentation transcript:

Similar presentations

About project

Feedback