“Value added” measures of teacher quality: use and policy validity Sean P. Corcoran New York University NYU Abu Dhabi Conference January 22, 2009.

Slides:

Advertisements

Similar presentations

Ed-D 420 Inclusion of Exceptional Learners. CAT time Learner-Centered - Learner-centered techniques focus on strategies and approaches to improve learning.

Advertisements

Using Growth Models to improve quality of school accountability systems October 22, 2010.

The National Conference on Value-Added UW-Madison, April Program Chairs Douglas Harris, Adam Gamoran, Steve Raudenbush Program Committee Members.

Briefing: NYU Education Policy Breakfast on Teacher Quality November 4, 2011 Dennis M. Walcott Chancellor NYC Department of Education.

A Guide to Education Research in the Era of NCLB Brian Jacob University of Michigan December 5, 2007.

Teacher Effectiveness in Urban Schools Richard Buddin & Gema Zamarro IES Research Conference, June 2010.

A “Best Fit” Approach to Improving Teacher Resources Jennifer King Rice University of Maryland.

Douglas N. Harris University of Wisconsin at Madison Evaluating and Improving Value-Added Modeling.

Designs to Estimate Impacts of MSP Projects with Confidence. Ellen Bobronnikov March 29, 2010.

Student Growth Percentile Model Question Answered

Informing Policy: State Longitudinal Data Systems Jane Hannaway, Director The Urban Institute CALDER

Teacher Quality, Teacher Evaluation, and “Value-Added”

2012 MIS Conference. 1. Assigned Educator Primary teacher assigned to student HQT 2. Teacher of Record Precise accounting of instructional time responsibilities.

Critiquing Research Articles For important and highly relevant articles: 1. Introduce the study, say how it exemplifies the point you are discussing 2.

Consistency of Assessment

Communicating through Data Displays October 10, 2006 © 2006 Public Consulting Group, Inc.

Using Hierarchical Growth Models to Monitor School Performance: The effects of the model, metric and time on the validity of inferences THE 34TH ANNUAL.

C R E S S T / U C L A Improving the Validity of Measures by Focusing on Learning Eva L. Baker CRESST National Conference: Research Goes to School Los Angeles,

The reform of A level qualifications in the sciences Dennis Opposs SCORE seminar on grading of practical work in A level sciences, 17 October 2014, London.

Assessing Students’ Skills in Science & Technology Hong Kong University – May 19 th 2006 Graham Orpwood York/Seneca Institute for Mathematics, Science.

Chapter 7 Correlational Research Gay, Mills, and Airasian

Classroom Assessment A Practical Guide for Educators by Craig A

What Makes For a Good Teacher and Who Can Tell? Douglas N. Harris Tim R. Sass Dept. of Ed. Policy Studies Dept. of Economics Univ. of Wisconsin Florida.

SCHOOLING POLICIES FOR QUALITY AND ECONOMIC GROWTH Eric A. Hanushek Stanford University May 2013.

1 The New York State Education Department New York State’s Student Reporting and Accountability System.

Analysis of Clustered and Longitudinal Data

FLCC knows a lot about assessment – J will send examples

Challenges in Developing a University Admissions Test & a National Assessment A Presentation at the Conference On University & Test Development in Central.

NCLB AND VALUE-ADDED APPROACHES ECS State Leader Forum on Educational Accountability June 4, 2004 Stanley Rabinowitz, Ph.D. WestEd

John Cronin, Ph.D. Director The Kingsbury NWEA Measuring and Modeling Growth in a High Stakes Environment.

Inferences about School Quality using opportunity to learn data: The effect of ignoring classrooms. Felipe Martinez CRESST/UCLA CCSSO Large Scale Assessment.

Adolescent Literacy – Professional Development

Evaluating Teacher Performance Daniel Muijs, University of Southampton.

Staff Development and the Change Process

The Impact of Including Predictors and Using Various Hierarchical Linear Models on Evaluating School Effectiveness in Mathematics Nicole Traxel & Cindy.

Update on Virginia’s Growth Measure Deborah L. Jonas, Ph.D. Executive Director for Research and Strategic Planning Virginia Department of Education July-August.

What Was Learned from a Second Year of Implementation IES Research Conference Washington, DC June 8, 2009 William Corrin, Senior Research Associate MDRC.

RESEARCH ON MEASURING TEACHING EFFECTIVENESS Roxanne Stansbury EDU 250.

Assessing Teachers with Value- Added Models (VAMs) Erik Ruzek October 14, 2010 UCI Department of Education, Chair’s Advisory Board Presentation.

Project on Educator Effectiveness & Quality Chancellor Summit September 27, 2011 Cynthia Osborne, Ph.D.

EDU 8603 Day 6. What do the following numbers mean?

Final Reports from the Measures of Effective Teaching Project Tom Kane Harvard University Steve Cantrell, Bill & Melinda Gates Foundation.

Progress Monitoring and RtI: Questions from the Field Edward S. Shapiro Director, Center for Promoting Research to Practice Lehigh University, Bethlehem,

Research on teacher pay-for-performance Patrick McEwan Wellesley College (Also see Victor Lavy, “Using performance-based pay to improve.

Evaluating Impacts of MSP Grants Hilary Rhodes, PhD Ellen Bobronnikov February 22, 2010 Common Issues and Recommendations.

The Do’s and Don’ts of High-Stakes Student Achievement Testing Andrew Porter Vanderbilt University August 2006.

Impediments to the estimation of teacher value added Steven Rivkin Jun Ishii April 2008.

Evaluating Impacts of MSP Grants Ellen Bobronnikov Hilary Rhodes January 11, 2010 Common Issues and Recommendations.

School-level Correlates of Achievement: Linking NAEP, State Assessments, and SASS NAEP State Analysis Project Sami Kitmitto CCSSO National Conference on.

Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov February 16, 2011.

Application of Growth and Value-Added Models to WASL A Summary of Issues, Developments and Plans for Washington WERA Symposium on Achievement Growth Models.

Measurement Experiment - effect of IV on DV. Independent Variable (2 or more levels) MANIPULATED a) situational - features in the environment b) task.

Aligning Assessments to Monitor Growth in Math Achievement: A Validity Study Jack B. Monpas-Huber, Ph.D. Director of Assessment & Student Information Washington.

LISA A. KELLER UNIVERSITY OF MASSACHUSETTS AMHERST Statistical Issues in Growth Modeling.

e-marking in large-scale, high stakes assessments conference themes :  role of technology in assessments and teacher education  use of assessments for.

VAM Training. Florida’s value-added model developed by Florida educators  The Department convened a committee of stakeholders (Student Growth Implementation.

Student Growth Model Salt Lake City School District Christine Marriott Assessment and Evaluation Department Salt Lake City School District State.

CHAPTER ONE: INTRODUCTION TO ACTION RESEARCH CONNECTING THEORY TO PRACTICE IMPROVING EDUCATIONAL PRACTICE EMPOWERING TEACHERS.

BY MADELINE GELMETTI INCLUDING STUDENTS WITH DISABILITIES AND ENGLISH LEARNERS IN MEASURES OF EDUCATOR EFFECTIVENESS.

 Mark D. Reckase.  Student achievement is a result of the interaction of the student and the educational environment including each teacher.  Teachers.

1 New York State Growth Model for Educator Evaluation June 2012 PRESENTATION as of 6/14/12.

Measuring College Value-Added: A Delicate Instrument

Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov March 23, 2011.

Consider Your Audience

Educational Analytics

Bursting the assessment mythology: A discussion of key concepts

Dr. Robert H. Meyer Research Professor and Director

Partial Credit Scoring for Technology Enhanced Items

Assessment Literacy: Test Purpose and Use

Schooling policies for quality and economic growth

Presentation transcript:

“Value added” measures of teacher quality: use and policy validity Sean P. Corcoran New York University NYU Abu Dhabi Conference January 22, 2009

Overview An introduction to the use of “value added” measures (VAM) of teacher effectiveness – in both research and practice. An introduction to the use of “value added” measures (VAM) of teacher effectiveness – in both research and practice. A discussion of the policy validity of VAM – motivated by current work on “teacher effects” on multiple assessments of similar skills. With: A discussion of the policy validity of VAM – motivated by current work on “teacher effects” on multiple assessments of similar skills. With: Jennifer L. Jennings (Columbia U) Jennifer L. Jennings (Columbia U) Andrew A. Beveridge (Queens College) Andrew A. Beveridge (Queens College)

What are “value added” measures? Essentially, an indirect estimate of a teacher’s contribution to learning, measured using gains in students’ standardized test score results Essentially, an indirect estimate of a teacher’s contribution to learning, measured using gains in students’ standardized test score results What makes them “indirect?” What makes them “indirect?” Uses a statistical model to account for certain student characteristics (key: past achievement), attributing remaining test score gains to the teacher Uses a statistical model to account for certain student characteristics (key: past achievement), attributing remaining test score gains to the teacher Clearly an improvement over test score levels Clearly an improvement over test score levels

What are “value added” measures? Generally, “teacher effects” cannot be separated from “classroom effects” Generally, “teacher effects” cannot be separated from “classroom effects” E.g. two classrooms of similarly situated students where one has a particularly disruptive student E.g. two classrooms of similarly situated students where one has a particularly disruptive student May be able to improve VAM with multiple years of results for teachers May be able to improve VAM with multiple years of results for teachers This approach raises a range of additional issues and questions, some of which I will address in a moment This approach raises a range of additional issues and questions, some of which I will address in a moment

Growth in VAM VAM of teacher effectiveness were initially mostly of academic interest VAM of teacher effectiveness were initially mostly of academic interest Rivkin et al. (2005): effect size of.10/.11 SD for reading/math Rivkin et al. (2005): effect size of.10/.11 SD for reading/math Nye et al. (2004): th percentile shift in teacher quality increased reading/math by.35/.48 SD Nye et al. (2004): th percentile shift in teacher quality increased reading/math by.35/.48 SD

Growth in VAM Value added assessment of teachers is becoming widespread practice in the U.S. Value added assessment of teachers is becoming widespread practice in the U.S. Houston, Dallas, Denver, Minneapolis, Charlotte Houston, Dallas, Denver, Minneapolis, Charlotte Houston EVASS EVASS EVASS New York City – for now a “development tool” only New York City – for now a “development tool” only The Teacher Data Tool Kit The Teacher Data Tool KitTeacher Data Tool KitTeacher Data Tool Kit

Why the sudden interest? 1.A logical extension of school accountability Movement to collect, publicly report student achievement measures at the school level Movement to collect, publicly report student achievement measures at the school level In some cases, rewards and sanctions (e.g. NCLB) In some cases, rewards and sanctions (e.g. NCLB) Common sense appeal (both Obama and McCain supported “pay for performance” for teachers) Common sense appeal (both Obama and McCain supported “pay for performance” for teachers)

Why the sudden interest? 2.Data availability Large longitudinal databases of student performance enabled these calculations Large longitudinal databases of student performance enabled these calculations Concurrent advancements in methodology Concurrent advancements in methodology

Why the sudden interest? 3.Improving our assessment and measurement of teacher quality Easily observed characteristics of teachers are often poor predictors of classroom achievement (Hanushek and Rivkin 2006) Easily observed characteristics of teachers are often poor predictors of classroom achievement (Hanushek and Rivkin 2006) Especially true of qualifications for which teachers are remunerated (e.g. education, certification, experience) Especially true of qualifications for which teachers are remunerated (e.g. education, certification, experience)

Issues with VAM (to name a few…) 1.Focus on a narrow measure of educational outcomes: does “the test” adequately reflect our expectations of the educational system? E.g. skill content, short-term vs. long-term benefits E.g. skill content, short-term vs. long-term benefits 2.Validity: assuming “the test” reflects outcomes we care about, is the instrument a valid one? Teaching to the test and test inflation (Koretz 2007) – even “good” tests lose validity over time Teaching to the test and test inflation (Koretz 2007) – even “good” tests lose validity over time

Issues with VAM (to name a few…) 3.Modeling for causal inference: how can we be confident that our VAM are providing “good” estimates of the teachers true (i.e. causal) contribution to student learning? Students are not randomly assigned to teachers Students are not randomly assigned to teachers Dynamic tracking Dynamic tracking “Teacher effects” may be context dependent “Teacher effects” may be context dependent

Issues with VAM (to name a few…) 4.Precision Estimates of teacher effects are just that: estimates Estimates of teacher effects are just that: estimates Each student’s test score gain is a small—and noisy—indicator of teacher effectiveness Each student’s test score gain is a small—and noisy—indicator of teacher effectiveness Are our estimates precise enough to base personnel decisions on them? Are our estimates precise enough to base personnel decisions on them?

Issues with VAM (to name a few…) 5.Other Perverse incentives (gaming / cheating) Perverse incentives (gaming / cheating) Subject dependency Subject dependency Persistence Persistence Scaling issues – e.g. ceiling effects Scaling issues – e.g. ceiling effects Missing data – e.g. absent or exempted students Missing data – e.g. absent or exempted students

The “policy validity” of VAM Do VAM of teacher effectiveness have “policy validity?” That is, are they appropriate for practical implementation, and for what purposes? (Harris 2007) Do VAM of teacher effectiveness have “policy validity?” That is, are they appropriate for practical implementation, and for what purposes? (Harris 2007) If one were to make personnel decisions based on VAM, at the very least these measures should be: If one were to make personnel decisions based on VAM, at the very least these measures should be: Convincing as “causal” estimates Convincing as “causal” estimates Relatively precise Relatively precise

Our research question If VAM are meaningful indicators of teacher effectiveness, they should be relatively consistent across alternative assessments of the same skills (especially for narrowly defined skills) If VAM are meaningful indicators of teacher effectiveness, they should be relatively consistent across alternative assessments of the same skills (especially for narrowly defined skills) In most cases we only observe one assessment – the “high stakes” state assessment – upon which teacher effects are estimated In most cases we only observe one assessment – the “high stakes” state assessment – upon which teacher effects are estimated

Houston Houston is somewhat unique in that one can observe two measures of student achievement: Houston is somewhat unique in that one can observe two measures of student achievement: TAKS – a “high stakes” exam TAKS – a “high stakes” exam Stanford 10 – a “low stakes” exam Stanford 10 – a “low stakes” exam Both test reading and math skills Both test reading and math skills How consistent are VAM of effectiveness on these two tests? How consistent are VAM of effectiveness on these two tests?

Houston data and method Longitudinal student-level data on all students in the Houston ISD, 1998 – 2006 (we use ) Longitudinal student-level data on all students in the Houston ISD, 1998 – 2006 (we use ) Students are linked to their teachers Students are linked to their teachers Student background Student background About 127,000 students About 127,000 students We estimate teacher effects for 4 th and 5 th grade teachers on both TAKS and Stanford tests We estimate teacher effects for 4 th and 5 th grade teachers on both TAKS and Stanford tests Using 1 and 3 years of results Using 1 and 3 years of results

Correlation across tests Low- and high- stakes reading Low- and high-stakes mathematics Correlation coefficient

Teacher effects on multiple tests

Teacher effects on multiple tests (one year of data only)

Teacher effects on multiple subjects

Teacher effect stability

Conclusions Teachers who are good at promoting growth on a high-stakes test are not necessarily those who are good at promoting growth on a low-stakes tests of the same subject. Teachers who are good at promoting growth on a high-stakes test are not necessarily those who are good at promoting growth on a low-stakes tests of the same subject. Teacher effects vary significantly across years and subjects Teacher effects vary significantly across years and subjects Useful for policy? Probably—but we should resist relying too heavily on these measures Useful for policy? Probably—but we should resist relying too heavily on these measures Of course, more research is needed! Of course, more research is needed!