Issues of Technical Adequacy in Measuring Student Growth for Educator Effectiveness Stanley Rabinowitz, Ph.D. Director, Assessment & Standards Development.

Slides:



Advertisements
Similar presentations
Test Development.
Advertisements

The Journey Toward Accessible Assessments Karen Barton CTB/McGraw-Hill Validity & Accommodations:
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
VALIDITY AND RELIABILITY
Evaluation of the Iowa Algebra Aptitude Test Terri Martin Doug Glasshoff Mini-project 1 June 17, 2002.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.
Designs to Estimate Impacts of MSP Projects with Confidence. Ellen Bobronnikov March 29, 2010.
Chapter 4A Validity and Test Development. Basic Concepts of Validity Validity must be built into the test from the outset rather than being limited to.
Issues Related to Assessment with Diverse Populations
Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.
Chapter 4 Validity.
Beginning the Research Design
New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Standard Setting Inclusive Assessment Seminar Marianne.
BASIC TERMINOLOGY n Test n Measurement n Evaluation.
Measurement Joseph Stevens, Ph.D. ©  Measurement Process of assigning quantitative or qualitative descriptions to some attribute Operational Definitions.
Classroom Assessment A Practical Guide for Educators by Craig A
Validity Lecture Overview Overview of the concept Different types of validity Threats to validity and strategies for handling them Examples of validity.
Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity.
Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.
Technical Issues Two concerns Validity Reliability
Assessment Literacy Series
NCLB AND VALUE-ADDED APPROACHES ECS State Leader Forum on Educational Accountability June 4, 2004 Stanley Rabinowitz, Ph.D. WestEd
Ch 6 Validity of Instrument
NEXT GENERATION BALANCED ASSESSMENT SYSTEMS ALIGNED TO THE CCSS Stanley Rabinowitz, Ph.D. WestEd CORE Summer Design Institute June 19,
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Data Analysis. Quantitative data: Reliability & Validity Reliability: the degree of consistency with which it measures the attribute it is supposed to.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.
Evaluating a Research Report
WELNS 670: Wellness Research Design Chapter 5: Planning Your Research Design.
Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.
CCSSO Criteria for High-Quality Assessments Technical Issues and Practical Application of Assessment Quality Criteria.
Reliability & Validity
Validity Is the Test Appropriate, Useful, and Meaningful?
Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.
EDU 8603 Day 6. What do the following numbers mean?
Advanced Research Methods Unit 3 Reliability and Validity.
Evaluating Impacts of MSP Grants Hilary Rhodes, PhD Ellen Bobronnikov February 22, 2010 Common Issues and Recommendations.
CAROLE GALLAGHER, PHD. CCSSO NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 26, 2015 Reporting Assessment Results in Times of Change:
VALUE/Multi-State Collaborative (MSC) to Advance Learning Outcomes Assessment Pilot Year Study Findings and Summary These slides summarize results from.
Presented By Dr / Said Said Elshama  Distinguish between validity and reliability.  Describe different evidences of validity.  Describe methods of.
Copyright © Allyn & Bacon 2008 Intelligent Consumer Chapter 14 This multimedia product and its contents are protected under copyright law. The following.
Evaluating Impacts of MSP Grants Ellen Bobronnikov Hilary Rhodes January 11, 2010 Common Issues and Recommendations.
The Theory of Sampling and Measurement. Sampling First step in implementing any research design is to create a sample. First step in implementing any.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
Assessment: Reliability & Validity. Reliability Refers to the repeatability of a given testing instrument The extent to which a student would be expected.
Evaluating Impacts of MSP Grants Ellen Bobronnikov January 6, 2009 Common Issues and Potential Solutions.
Nurhayati, M.Pd Indraprasta University Jakarta.  Validity : Does it measure what it is supposed to measure?  Reliability: How the representative is.
Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov February 16, 2011.
Chapter Eight: Quantitative Methods
Chapter 6 - Standardized Measurement and Assessment
1 Announcement Movie topics up a couple of days –Discuss Chapter 4 on Feb. 4 th –[ch.3 is on central tendency: mean, median, mode]
Aligning Assessments to Monitor Growth in Math Achievement: A Validity Study Jack B. Monpas-Huber, Ph.D. Director of Assessment & Student Information Washington.
Chapter 3 Selection of Assessment Tools. Council of Exceptional Children’s Professional Standards All special educators should possess a common core of.
Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.
Copyright © 2009 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 47 Critiquing Assessments.
Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov March 23, 2011.
Planning my research journey
CHAPTER 3: Practical Measurement Concepts
Reliability and Validity in Research
Test Design & Construction
Test Validity.
Introduction to the Validation Phase
پرسشنامه کارگاه.
Reliability and Validity of Measurement
Analyzing Reliability and Validity in Outcomes Assessment Part 1
Measurement Concepts and scale evaluation
AACC Mini Conference June 8-9, 2011
Presentation transcript:

Issues of Technical Adequacy in Measuring Student Growth for Educator Effectiveness Stanley Rabinowitz, Ph.D. Director, Assessment & Standards Development Services WestEd /AACC Summit II: Designing Comprehensive Evaluation Systems February 27, 2012

Presentation Purposes What are the characteristics of appropriate assessments to measure student growth? What technical criteria should be applied? How can we define “good enough?” What challenges do we face (general and growth related)? 1

Demonstrating Technical Adequacy Lack of Bias Reliability Validity Test Administration Procedures Scoring and Reporting (interpretive guides) 2

Technical Criteria Purpose o Content and context o General factors relevant to all assessments o Additional factors specific to measuring growth Population o Ensure validity and fairness for all student populations o Conduct field tests and item reviews Content o Articulate the range of skills and concepts to be assessed o Specify appropriate item types 3

Evidence Provided o Assertion? Summary? Detailed Description? Data supported? Type of Data o Quantitative? Qualitative? Both? Sufficiency o Comprehensive? Context and interpretation provided? Quality o Statistical assumptions satisfied? Replicable? Accurate? Generalizable? Adequacy o Credible information? Sufficiency, Quality, and Adequacy 4

Definition of Rigorous and Comparable Existing Instrument Use as available Modify items/tasks (alignment, breadth/depth) Collect additional evidence (who, how, when?) “Paper Test” – Performance Assessments – Modules – Classroom Work— Grades Which type of evidence takes precedence (reliability vs. consequential validity)? Sliding scale for weighting assessment data Cost: House analogy Challenges: General 5

Vertical Scale vs. Scale Free Models (e.g., Vertically Articulated Achievement Levels, SGPs) Non contiguous grades and content True Gains vs. Error Multiple equated forms Recommended pre-post time frame Reliability at various points on the score scale, especially extremes Evidence of gain score reliability Challenges: Specific to Growth 6

Criteria (Validity Cluster) 7 Criteria ClusterCriterionSpecific Evidence Validity Field Testing Field Test Sampling Design: Representativeness and Norming Field Test Sampling Design: Currency (at least, dates documented) Field Test Sampling Design: Randomization Fidelity (link of test to stated purpose of the test) Design Attrition of Persons (for Pre/Post Designs) Test Blueprint Scoring Rubric for OE Items: Construction and Validation Accommodations

8 Criteria ClusterCriterionSpecific Evidence Validity Content Content Alignment Studies Expert judgments p-values Discrimination (Item-test Correlations) Bias/DIF analysis IRT/Item fit (ICC) Distractor Analysis Construct Factorial Validity (structural equation modeling) Multi-Trait/Multi-Method Equivalence/Comparability (construct the same regardless of examinee’s ability) Criteria (Validity Cluster) cont.

9 Criteria ClusterCriterionSpecific Evidence Validity Criterion Predictive validity - Validation to the Referent Predictive validity - Individual and group scores Concurrent validity - Validation to External Criteria Concurrent validity – Validity of External Criteria Concurrent validity - Individual and group scores Consequential Evaluation of Testing Consequences Individual and group scores Criteria (Validity Cluster) cont.

10 Criteria ClusterCriterionSpecific Evidence ValidityGrowth Multiple equated forms Recommended pre-post time frame Reliability at various points of score scale Gain score reliability Criteria (Validity Cluster) cont.

11 Criteria (Reliability Cluster) Criteria ClusterCriterionSpecific Evidence Reliability Reliability: Single Administration Scale Internal Consistency Split-half Scorer / Hand-scoring Reliability: Multiple Administrations Test-retest Reliability: Either Single or Multiple Administrations Alternate form Individual and group scores Classification consistency Generalizability

12 Criteria ClusterCriterionSpecific Evidence Freedom from Bias Judgmental and Statistical (DIF) Reviews Bias review panel Content Ethnicity Cultural Linguistic Socio-economic Geographic Students with disabilities Universal Design Criteria (Freedom from Bias Cluster)

13 Criteria (Testing System Cluster) Criteria ClusterCriterionSpecific Evidence Testing System (Superordinate) Criteria Form-Level Analyses N (sample size) Central Tendency (Mean, Median, Mode) Variation (Range, Variance, Standard Deviation) Standard Error of Measurement Bias IRT fit (TCC) Equating Scaling

14 Criteria (Testing System Cluster) cont. Criteria ClusterCriterionSpecific Evidence Testing System (Super-ordinate) Criteria Reporting Student level ESEA Subgroups Class District State Populations Description of Standards Setting: Methods, Participants, Group Size Report Format Basic Custom