Investigations into Comparability for the PARCC Assessments

Slides:

Advertisements

Similar presentations

Test Development.

Advertisements

Fairness in Testing: Introduction Suzanne Lane University of Pittsburgh Member, Management Committee for the JC on Revision of the 1999 Testing Standards.

Conceptualization and Measurement

The Research Consumer Evaluates Measurement Reliability and Validity

Advanced Topics in Standard Setting. Methodology Implementation Validity of standard setting.

Effective Communication of Exam Results: What Should (or Shouldn't) be Included in the Candidate's Score Report Elizabeth A. Witt, Ph.D. American Board.

CH. 9 MEASUREMENT: SCALING, RELIABILITY, VALIDITY

Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.

MEASUREMENT. Measurement “If you can’t measure it, you can’t manage it.” Bob Donath, Consultant.

Examination of Holland’s Predictive Pattern Order Hypothesis for Academic Achievement William D. Beverly and Robert A. Horn Northern Arizona University,

ITEC6310 Research Methods in Information Technology

VALIDITY & RELIABILITY Raja C. Bandaranayake. QUALITIES OF MEASUREMENT DEVICES  Validity Does it measure what it is supposed to measure?  Reliability.

Understanding Validity for Teachers

Questions to check whether or not the test is well designed: 1. How do you know if a test is effective? 2. Can it be given within appropriate administrative.

Technical Issues Two concerns Validity Reliability

EPSY 8223: Test Score Equating

Item Response Theory for Survey Data Analysis EPSY 5245 Michael C. Rodriguez.

Kaizen–What Can I Do To Improve My Program? F. Jay Breyer, Ph.D. Presented at the 2005 CLEAR Annual Conference September Phoenix,

MEASUREMENT OF VARIABLES: OPERATIONAL DEFINITION AND SCALES

Measurement in Exercise and Sport Psychology Research EPHE 348.

Is the Force Concept Inventory Biased? Investigating Differential Item Functioning on a Test of Conceptual Learning in Physics Sharon E. Osborn Popp, David.

Instrumentation.

Psychometric Issues in the Use of Testing Accommodations Chapter 4 David Goh.

Assessing Learning-Centered Leadership Andrew C. Porter University of Pennsylvania Joseph Murphy, Ellen Goldring, & Stephen N. Elliott Vanderbilt University.

Getting More from Your Technical Advisory Committee: Designing and Implementing a Validity Research Agenda 2015 National Conference on Student Assessment.

Measuring Mathematical Knowledge for Teaching: Measurement and Modeling Issues in Constructing and Using Teacher Assessments DeAnn Huinker, Daniel A. Sass,

Principles in language testing What is a good test?

Including Quality Assurance Within The Theory of Action Presented to: CCSSO 2012 National Conference on Student Assessment June 27, 2012.

Introduction to Validity

Illustration of a Validity Argument for Two Alternate Assessment Approaches Presentation at the OSEP Project Directors’ Conference Steve Ferrara American.

Assessing Learning for Students with Disabilities Tom Haladyna Arizona State University.

Enhancing the Technical Quality of the North Carolina Testing Program: An Overview of Current Research Studies Nadine McBride, NCDPI Melinda Taylor, NCDPI.

CAROLE GALLAGHER, PHD. CCSSO NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 26, 2015 Reporting Assessment Results in Times of Change:

Scaling and Equating Joe Willhoft Assistant Superintendent of Assessment and Student Information Yoonsun Lee Director of Assessment and Psychometrics Office.

Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.

MEASUREMENT. MeasurementThe assignment of numbers to observed phenomena according to certain rules. Rules of CorrespondenceDefines measurement in a given.

Validity Validity is an overall evaluation that supports the intended interpretations, use, in consequences of the obtained scores. (McMillan 17)

Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.

Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”

Nurhayati, M.Pd Indraprasta University Jakarta.  Validity : Does it measure what it is supposed to measure?  Reliability: How the representative is.

Study of Device Comparability within the PARCC Field Test.

Chapter 6 - Standardized Measurement and Assessment

VALIDITY, RELIABILITY & PRACTICALITY Prof. Rosynella Cardozo Prof. Jonathan Magdalena.

PARCC Field Test Study Comparability of High School Mathematics End-of- Course Assessments National Conference on Student Assessment San Diego June 2015.

Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.

Instrument Development and Psychometric Evaluation: Scientific Standards May 2012 Dynamic Tools to Measure Health Outcomes from the Patient Perspective.

ESTABLISHING RELIABILITY AND VALIDITY OF RESEARCH TOOLS Prof. HCL Rawat Principal UCON,BFUHS Faridkot.

Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov March 23, 2011.

Chapter 2 Theoretical statement:

Introduction to the Validation Phase

Leacock, Warrican and Rose (2009)

Test Blueprints for Adaptive Assessments

Concept of Test Validity

Associated with quantitative studies

Test Design & Construction

Introduction to the Validation Phase

assessing scale reliability

Validity and Reliability

Reliability and Validity

Reliability and Validity

پرسشنامه کارگاه.

Reliability and Validity of Measurement

Dr. Chin and Dr. Nettelhorst Winter 2018

Applied Psychometric Strategies Lab Applied Quantitative and Psychometric Series Abbey Love, MS, & Dani Rosenkrantz, MS, EdS Guiding Steps for the Evaluation.

Assessment Information

RESEARCH METHODS Lecture 18

Timeline for STAAR EOC Standard Setting Process

Brian Gong Center for Assessment

Assessment Literacy: Test Purpose and Use

Innovative Approaches for Examining Alignment

Presentation transcript:

Investigations into Comparability for the PARCC Assessments 2015 National Conference on Student Assessment Enis Dogan

Comparability as a priority Comparability has been a central priority for PARCC Across states Across forms within year Across years TAC members Ric Luecht and Wayne Camara authored a paper titled “Evidence and Design Implications Required to Support Comparability Claims” in 2011. . We indicate the relevant standard for each condition/outcome from the Standards for Educational and Psychological Testing (AERA, APA, NCME, 1999). For the empirical studies, the source of validity evidence is classified into five categories in accordance with the Standards (p. 11-17): (1) evidence based on test content, (2) evidence based on response processes, (3) evidence based on internal structure, (4) evidence based on relations to other variables, and (5) evidence based on consequences of testing

Comparability as a priority “In order to compare two or more test scores, we need to ask a basic question. Are the constructs underlying those scores the same, similar, or different? The measurement literature often suggests a basic duality as to what is being measured: the same construct versus different constructs. However, there are degrees of sameness.” “Equating leads to what is sometimes referred to as score interchangeability (Brennan & Kolen, 1995; Holland & Dorans, 2006). After equating, it ought to be a matter of indifference to students, teachers, administrators or policy makers as to which form of the same test or which items each examinee sees.” 11 states and DC Tie to policy considerations Include HE involvement

Comparability as a priority Different expectations for different aspects of comparability Is it the same construct? Across device Between modes Among EOC assessments for each pair for the two sets of assessments 11 states and DC Tie to policy considerations Include HE involvement

Mode and Device comparability Test Administration Mode and Devices study with operational data Item/Task-Level Comparability Do the individual items/tasks perform similarly and rank order similarly across different devices? For items which appear in both CBT and PPT modes, do the individual items/tasks perform similarly and rank order similarly across different modes? Test-Level Comparability Would students receive similar scale scores and be consistently classified into performance levels across different modes and devices? Are the psychometric properties of the test scores (e.g. factor structure, reliability, difficulty) similar across different modes and devices?

Comparability of HS Mathematics End-of-Course Assessments Examine predictive validity? Examine factor structure over 3 assessments in each pathway. “Focus should be on comparability of scores for students who take all three assessments in one pathway to scores for students who take all three assessments in the other pathway” (Kolen, NCME 2015) What will this mean for cut scores?

About the methodology DIF CFA Comparison of p-values Comparison of IRT parameter estimates DIF CFA 11 states and DC Tie to policy considerations Include HE involvement

Role of Research Haladyna (2006): “Without research, a testing program will have difficulty generating sufficient evidence to validate its intended test score interpretations and use…The planning, designing, creating, and administration of any testing program are highly dependent on a body of knowledge that comes from research and experience” (p. 739). These studies are excellent examples for applied research in assessment development and validation . We indicate the relevant standard for each condition/outcome from the Standards for Educational and Psychological Testing (AERA, APA, NCME, 1999). For the empirical studies, the source of validity evidence is classified into five categories in accordance with the Standards (p. 11-17): (1) evidence based on test content, (2) evidence based on response processes, (3) evidence based on internal structure, (4) evidence based on relations to other variables, and (5) evidence based on consequences of testing

Enis Dogan edogan@parcconline.org