Accuracy, Transparency, and Incentives: Contrasting Criteria for Evaluating Growth Models Andrew Ho Harvard Graduate School of Education Maryland Assessment.

Slides:

Advertisements

Similar presentations

Ed-D 420 Inclusion of Exceptional Learners. CAT time Learner-Centered - Learner-centered techniques focus on strategies and approaches to improve learning.

Advertisements

Using Growth Models to improve quality of school accountability systems October 22, 2010.

Objectives Explain the purpose of the RIME feedback method.

VALUE – ADDED 101 Ken Bernacki and Denise Brewster.

Comparing Growth in Student Performance David Stern, UC Berkeley Career Academy Support Network Presentation to Educating for Careers/ California Partnership.

Accountability 2.0 Next-Generation Design & Performance Richard J. Wenning This work is.

Using the WV Growth Model to Measure Student Achievement Nate Hixson Assistant Director, Office of Research.

Enquiring mines wanna no.... Who is it? Coleman Report “[S]chools bring little influence to bear upon a child’s achievement that is independent of.

Concept of Measurement

Chapter 12 - Forecasting Forecasting is important in the business decision-making process in which a current choice or decision has future implications:

Using Growth Models for Accountability Pete Goldschmidt, Ph.D. Assistant Professor California State University Northridge Senior Researcher National Center.

C R E S S T / U C L A Improving the Validity of Measures by Focusing on Learning Eva L. Baker CRESST National Conference: Research Goes to School Los Angeles,

Introduction to Probability and Statistics Linear Regression and Correlation.

Data Analysis Statistics. Inferential statistics.

CHAPTER 3 Describing Relationships

Chapter 7 Correlational Research Gay, Mills, and Airasian

Chapter 12 Inferential Statistics Gay, Mills, and Airasian

Correlation and Linear Regression

© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.

Including a detailed description of the Colorado Growth Model 1.

NCLB AND VALUE-ADDED APPROACHES ECS State Leader Forum on Educational Accountability June 4, 2004 Stanley Rabinowitz, Ph.D. WestEd

Becoming a Teacher Ninth Edition

1 Comments on: “New Research on Training, Growing and Evaluating Teachers” 6 th Annual CALDER Conference February 21, 2013.

How Can Teacher Evaluation Be Connected to Student Achievement?

Learning Objective 1 Explain the two assumptions frequently used in cost-behavior estimation. Determining How Costs Behave – Chapter10.

Accountability 2.0 Next Generation Design & Performance Richard J. Wenning This work is.

Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.

Supporting Growth Interpretations Using Through-Course Assessments Andrew Ho Harvard Graduate School of Education Innovative Opportunities and Measurement.

Project on Educator Effectiveness & Quality Chancellor Summit September 27, 2011 Cynthia Osborne, Ph.D.

Issues in Assessment Design, Vertical Alignment, and Data Management : Working with Growth Models Pete Goldschmidt UCLA Graduate School of Education &

Applying SGP to the STAR Assessments Daniel Bolt Dept of Educational Psychology University of Wisconsin, Madison.

Examining Relationships in Quantitative Research

UCLA Graduate School of Education & Information Studies National Center for Research on Evaluation, Standards, and Student Testing Practical Considerations.

Building School Capacity Through Teacher Evaluation Susan Moore Johnson Harvard Graduate School of Education Project on the Next Generation of Teachers.

Cross-Cutting Issues in Recent State Growth Modeling Efforts Andrew Ho, Discussant Harvard Graduate School of Education Measuring Growth: A Key Feature.

PREPARING [DISTRICT NAME] STUDENTS FOR COLLEGE & CAREER Setting a New Baseline for Success.

Standard Setting Results for the Oklahoma Alternate Assessment Program Dr. Michael Clark Research Scientist Psychometric & Research Services Pearson State.

Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.

“Value added” measures of teacher quality: use and policy validity Sean P. Corcoran New York University NYU Abu Dhabi Conference January 22, 2009.

July 2 nd, 2008 Austin, Texas Chrys Dougherty Senior Research Scientist National Center for Educational Achievement Adequate Growth Models.

The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.

DVAS Training Find out how Battelle for Kids can help Presentation Outcomes Learn rationale for value-added progress measures Receive conceptual.

Chapter 6: Analyzing and Interpreting Quantitative Data

1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School.

1 Children Left Behind in AYP and Non-AYP Schools: Using Student Progress and the Distribution of Student Gains to Validate AYP Kilchan Choi Michael Seltzer.

EVAAS Proactive and Teacher Reports: Assessing Students’ Academic Needs and Using Teacher Reports to Improve Student Progress Cherokee County Schools February.

PED School Grade Reports (with thanks to Valley High School) ACE August 3, 2012 Dr. Russ Romans District Accountability Manager.

1 Getting Up to Speed on Value-Added - An Accountability Perspective Presentation by the Ohio Department of Education.

Combining Multiple Measures What are the indicators/ components? What are the priority outcomes? What are the performance expectations? How can we evaluate.

LISA A. KELLER UNIVERSITY OF MASSACHUSETTS AMHERST Statistical Issues in Growth Modeling.

Lesson 3 Measurement and Scaling. Case: “What is performance?” brandesign.co.za.

Copyright © 2014 American Institutes for Research and Cleveland Metropolitan School District. All rights reserved. March 2014 Interpreting Vendor Assessment.

 Mark D. Reckase.  Student achievement is a result of the interaction of the student and the educational environment including each teacher.  Teachers.

To support efforts to raise student achievement To support the district’s accountability status To offer standardized accountability metrics to complement.

Correlation and Regression Q560: Experimental Methods in Cognitive Science Lecture 13.

Determining How Costs Behave

Growth: Changing the Conversation

CHAPTER 3 Describing Relationships

Student Growth Measurements and Accountability

Educational Analytics

CHAPTER 3 Describing Relationships

CHAPTER 3 Describing Relationships

CHAPTER 3 Describing Relationships

CHAPTER 3 Describing Relationships

CHAPTER 3 Describing Relationships

CHAPTER 3 Describing Relationships

CHAPTER 3 Describing Relationships

CCSSO National Conference on Student Assessment June 21, 2010

CHAPTER 3 Describing Relationships

Presentation transcript:

Accuracy, Transparency, and Incentives: Contrasting Criteria for Evaluating Growth Models Andrew Ho Harvard Graduate School of Education Maryland Assessment Research Center for Education Success (MARCES) Assessment Conference: Value Added Modeling and Growth Modeling with Particular Application to Teacher and School Effectiveness College Park, Maryland, October 18, 2012

How can we advance from passing judgment on schools and teachers to facilitating their improvement? By which criteria should we evaluate accountability models? Predictive accuracy of individual student “growth” models for school-level accountability. – Projection Model – Trajectory Model – Conditional Status Percentile Rank models (e.g., SGPs) Incentives. – Conditional Incentive diagrams and alignment to policy goals. Transparency, black boxes, and score reporting. What makes a good growth model?

School accountability metrics that count students who are “on track” to proficiency, career and college readiness, or some other future outcome. A seemingly straightforward criterion is minimization of the distance between predicted and actual future performance. Context

Once the discussion is framed in terms of standards, the only rhetorically acceptable choice is high standards. Once the discussion is framed in terms of predictive accuracy, the only rhetorically acceptable choice is maximal accuracy. How “predictive accuracy” is like “standards”

A simple projection model

Minimize Squared Error

A simple trajectory model

Projections from Conditional Status Percentile Ranks

Contrasting all predictive models

Growth DescriptionGrowth Prediction Gain Scores, Trajectories Status Beyond Prediction, CSPR Trajectory, Gain- Score Model Projection/Regre ssion Model Use a regression model to predict future scores from past scores, statistically, empirically. Where a student was, where a student is, and what has been learned in between. Where a student is, above and beyond where we would have predicted she would be, given past scores. Extend past gains in systematic fashion into the future. Consider whether future performance is adequate An important contrast in “growth” use and interpretation

Gain Prediction from previous score (or scores, or scores and demographics) Status beyond prediction Adding two students with equal gains Adding two different students with equal status beyond predictions. Gain Scores, Trajectories Status Beyond Prediction Two Approaches to Growth Description

Extends gains over time in straightforward fashion. With more prior years, a best-fit line or curve can be extended similarly. Extended trajectories do not have to be linear. Estimates a prediction equation for the “future” score. Because current students have unknown future scores, estimate the prediction equation from a previous cohort that does have their “future” year’s score. Input current cohort data into this prediction equation. Trajectory ModelProjection/Regression Model Two Approaches to Growth Prediction

Three students with equal projections from a regression model. The same three students’ predictions with a regression model. Three students with equal projections from a gain-score model Stark Contrasts in Projections

Models by RMSE

Decision Plots For a Grade 6 score of -2, projection models require a 1.6, CSPRs require a -.5, and trajectory models require a -1. For a Grade 6 score of 2, projection models require anything above a -1.6, CSPRs require a.5, and trajectory models require a 1.

Torquing Projection Lines

Aspirational Models vs. Predictive Models The CSPR and trajectory model lines are, from this perspective, more aspirational than predictive. They envision a covariance structure where relative position is less fixed over time than it is empirically. Can this be okay, even if it decreases predictive accuracy?

From Decision Plots to Effort Plots The regression line or “conditional expectation” line gives us a baseline expectation given our Grade 6 scores. Anything above this line may require “effort.” We can plot this effort on prior-year scores by subtracting out this regression line.

Conditional Effort Plots Required gain beyond expectation. Maximizing predictive accuracy may lead to implausible gains required to get low- achieving students to be “on track.” A low score as a “ball and chain.” A high score as a “free pass.”

Conditional Incentive Plots In a zero-sum model for incentives to teach certain students, these conditional effort plots imply conditional incentive plots, as shown. The question may be, what is the goal of the policy? This informs conditional incentive plots, and these can inform model selection. This is a useful alternative to letting prediction drive model selection and then being surprised by the shape of incentives.

Lower initial scores can inflate trajectories: New Model Rewards Low Scores, Encourages “Fail-First Strategy” Very intuitive, requires vertical scales, less accurate in terms of future classifications. Low scorers require huge gains. High scorers can fall comfortably. New Model Labels High and Low Achievers Early, Permanently. Counterintuitive, does not require vertical scales, more accurate classifications. Trajectory, Gain-Score ModelRegression/Prediction Model Stark Contrasts in Incentives

I argue that school accountability metrics should be designed with less attention to “standards” and “prediction” and closer attention to conditional incentives and their alignment with policy goals. What about teacher accountability metrics? Conditional incentive plots for VAMs are generally uniform across the distributions of variables included in the model. Scaling anomalies may lead to distortions, if equal intervals do not correspond with equal “effort for gains,” although this is difficult to game. Conditional Incentive Plots for VAMs

As accountability calculations become increasingly complicated, score reporting and transparency become even more necessary mechanisms for the improvement of schools and teaching. Systems will be more successful with clear reporting of actionable (and presumably defensible) results. An example: I used to be very suspicious of categorical/value-table models, as they create pseudo- vertical scales and sacrifice information. – I still have reservations, but they are still excellent tools for reporting and communicating results, even when the underlying models are not themselves categorical. – An actionable, interpretable categorical framework layered over a continuous model. Transparency and Score Reporting

If I had a choice between – A simple VAM that communicated actionable responses and incentives clearly, vs. – A complicated VAM that did not have any guidance about how to improve… This is a false dichotomy. We can make the complex seem simple. Conditional incentive plots and similar attention to differential student contribution to VAM estimates are one approach to this. Both to anticipate gaming behavior and encourage desired responses. Communicating Incentives Clearly

In his NCME Career Award address, Haertel distinguished between two categories of purposes of large-scale testing: Measurement and Influence. A “influencing” purpose is often depends less on the results of the test itself. – Directing student effort – “Shaping public perception” Validation arguments for Influencing purposes are rarely well described. These plots are a modest first step for visualizing the Influencing mechanisms of proposed models. Haertel (2012) Measurement vs. Influence

A medical analogy (thanks to Catherine McClellan) can be helpful in thinking about where VAM and school accountability research should continue to go. Doctors must gather data, identify symptoms, reach a diagnosis, and prescribe a treatment. – In school and teacher effectiveness conversations, we often get stuck at “symptoms.” – Doctors do not average blood pressure results with fMRI results to get increasingly reliable and accurate measures of “health.” Or at least they don’t stop there. – We need to continue advancing the science of diagnosis (what’s wrong) and treatment (now what). We must continue beyond predictive accuracy and even conditional incentives to deeper understanding of teachers’ and administrators’ learning in response to evaluation systems. School/Teacher Effectiveness and House, MD