“Value added” measures of teacher quality: use and policy validity Sean P. Corcoran New York University NYU Abu Dhabi Conference January 22, 2009.

“Value added” measures of teacher quality: use and policy validity Sean P. Corcoran New York University NYU Abu Dhabi Conference January 22, 2009

Overview An introduction to the use of “value added” measures (VAM) of teacher effectiveness – in both research and practice. An introduction to the use of “value added” measures (VAM) of teacher effectiveness – in both research and practice. A discussion of the policy validity of VAM – motivated by current work on “teacher effects” on multiple assessments of similar skills. With: A discussion of the policy validity of VAM – motivated by current work on “teacher effects” on multiple assessments of similar skills. With: Jennifer L. Jennings (Columbia U) Jennifer L. Jennings (Columbia U) Andrew A. Beveridge (Queens College) Andrew A. Beveridge (Queens College)

What are “value added” measures? Essentially, an indirect estimate of a teacher’s contribution to learning, measured using gains in students’ standardized test score results Essentially, an indirect estimate of a teacher’s contribution to learning, measured using gains in students’ standardized test score results What makes them “indirect?” What makes them “indirect?” Uses a statistical model to account for certain student characteristics (key: past achievement), attributing remaining test score gains to the teacher Uses a statistical model to account for certain student characteristics (key: past achievement), attributing remaining test score gains to the teacher Clearly an improvement over test score levels Clearly an improvement over test score levels

What are “value added” measures? Generally, “teacher effects” cannot be separated from “classroom effects” Generally, “teacher effects” cannot be separated from “classroom effects” E.g. two classrooms of similarly situated students where one has a particularly disruptive student E.g. two classrooms of similarly situated students where one has a particularly disruptive student May be able to improve VAM with multiple years of results for teachers May be able to improve VAM with multiple years of results for teachers This approach raises a range of additional issues and questions, some of which I will address in a moment This approach raises a range of additional issues and questions, some of which I will address in a moment

Growth in VAM VAM of teacher effectiveness were initially mostly of academic interest VAM of teacher effectiveness were initially mostly of academic interest Rivkin et al. (2005): effect size of.10/.11 SD for reading/math Rivkin et al. (2005): effect size of.10/.11 SD for reading/math Nye et al. (2004): 25-75 th percentile shift in teacher quality increased reading/math by.35/.48 SD Nye et al. (2004): 25-75 th percentile shift in teacher quality increased reading/math by.35/.48 SD

Growth in VAM Value added assessment of teachers is becoming widespread practice in the U.S. Value added assessment of teachers is becoming widespread practice in the U.S. Houston, Dallas, Denver, Minneapolis, Charlotte Houston, Dallas, Denver, Minneapolis, Charlotte Houston EVASS EVASS EVASS New York City – for now a “development tool” only New York City – for now a “development tool” only The Teacher Data Tool Kit The Teacher Data Tool KitTeacher Data Tool KitTeacher Data Tool Kit

Why the sudden interest? 1.A logical extension of school accountability Movement to collect, publicly report student achievement measures at the school level Movement to collect, publicly report student achievement measures at the school level In some cases, rewards and sanctions (e.g. NCLB) In some cases, rewards and sanctions (e.g. NCLB) Common sense appeal (both Obama and McCain supported “pay for performance” for teachers) Common sense appeal (both Obama and McCain supported “pay for performance” for teachers)

Why the sudden interest? 2.Data availability Large longitudinal databases of student performance enabled these calculations Large longitudinal databases of student performance enabled these calculations Concurrent advancements in methodology Concurrent advancements in methodology

Why the sudden interest? 3.Improving our assessment and measurement of teacher quality Easily observed characteristics of teachers are often poor predictors of classroom achievement (Hanushek and Rivkin 2006) Easily observed characteristics of teachers are often poor predictors of classroom achievement (Hanushek and Rivkin 2006) Especially true of qualifications for which teachers are remunerated (e.g. education, certification, experience) Especially true of qualifications for which teachers are remunerated (e.g. education, certification, experience)

Issues with VAM (to name a few…) 1.Focus on a narrow measure of educational outcomes: does “the test” adequately reflect our expectations of the educational system? E.g. skill content, short-term vs. long-term benefits E.g. skill content, short-term vs. long-term benefits 2.Validity: assuming “the test” reflects outcomes we care about, is the instrument a valid one? Teaching to the test and test inflation (Koretz 2007) – even “good” tests lose validity over time Teaching to the test and test inflation (Koretz 2007) – even “good” tests lose validity over time

Issues with VAM (to name a few…) 3.Modeling for causal inference: how can we be confident that our VAM are providing “good” estimates of the teachers true (i.e. causal) contribution to student learning? Students are not randomly assigned to teachers Students are not randomly assigned to teachers Dynamic tracking Dynamic tracking “Teacher effects” may be context dependent “Teacher effects” may be context dependent

Issues with VAM (to name a few…) 4.Precision Estimates of teacher effects are just that: estimates Estimates of teacher effects are just that: estimates Each student’s test score gain is a small—and noisy—indicator of teacher effectiveness Each student’s test score gain is a small—and noisy—indicator of teacher effectiveness Are our estimates precise enough to base personnel decisions on them? Are our estimates precise enough to base personnel decisions on them?

Issues with VAM (to name a few…) 5.Other Perverse incentives (gaming / cheating) Perverse incentives (gaming / cheating) Subject dependency Subject dependency Persistence Persistence Scaling issues – e.g. ceiling effects Scaling issues – e.g. ceiling effects Missing data – e.g. absent or exempted students Missing data – e.g. absent or exempted students

The “policy validity” of VAM Do VAM of teacher effectiveness have “policy validity?” That is, are they appropriate for practical implementation, and for what purposes? (Harris 2007) Do VAM of teacher effectiveness have “policy validity?” That is, are they appropriate for practical implementation, and for what purposes? (Harris 2007) If one were to make personnel decisions based on VAM, at the very least these measures should be: If one were to make personnel decisions based on VAM, at the very least these measures should be: Convincing as “causal” estimates Convincing as “causal” estimates Relatively precise Relatively precise

Our research question If VAM are meaningful indicators of teacher effectiveness, they should be relatively consistent across alternative assessments of the same skills (especially for narrowly defined skills) If VAM are meaningful indicators of teacher effectiveness, they should be relatively consistent across alternative assessments of the same skills (especially for narrowly defined skills) In most cases we only observe one assessment – the “high stakes” state assessment – upon which teacher effects are estimated In most cases we only observe one assessment – the “high stakes” state assessment – upon which teacher effects are estimated

Houston Houston is somewhat unique in that one can observe two measures of student achievement: Houston is somewhat unique in that one can observe two measures of student achievement: TAKS – a “high stakes” exam TAKS – a “high stakes” exam Stanford 10 – a “low stakes” exam Stanford 10 – a “low stakes” exam Both test reading and math skills Both test reading and math skills How consistent are VAM of effectiveness on these two tests? How consistent are VAM of effectiveness on these two tests?

Houston data and method Longitudinal student-level data on all students in the Houston ISD, 1998 – 2006 (we use 2003-06) Longitudinal student-level data on all students in the Houston ISD, 1998 – 2006 (we use 2003-06) Students are linked to their teachers Students are linked to their teachers Student background Student background About 127,000 students About 127,000 students We estimate teacher effects for 4 th and 5 th grade teachers on both TAKS and Stanford tests We estimate teacher effects for 4 th and 5 th grade teachers on both TAKS and Stanford tests Using 1 and 3 years of results Using 1 and 3 years of results

Correlation across tests Low- and high- stakes reading Low- and high-stakes mathematics Correlation coefficient 0.340.41

Teacher effects on multiple tests

Teacher effects on multiple tests (one year of data only)

Teacher effects on multiple subjects

Teacher effect stability

Conclusions Teachers who are good at promoting growth on a high-stakes test are not necessarily those who are good at promoting growth on a low-stakes tests of the same subject. Teachers who are good at promoting growth on a high-stakes test are not necessarily those who are good at promoting growth on a low-stakes tests of the same subject. Teacher effects vary significantly across years and subjects Teacher effects vary significantly across years and subjects Useful for policy? Probably—but we should resist relying too heavily on these measures Useful for policy? Probably—but we should resist relying too heavily on these measures Of course, more research is needed! Of course, more research is needed!

“Value added” measures of teacher quality: use and policy validity Sean P. Corcoran New York University NYU Abu Dhabi Conference January 22, 2009.

Similar presentations

Presentation on theme: "“Value added” measures of teacher quality: use and policy validity Sean P. Corcoran New York University NYU Abu Dhabi Conference January 22, 2009."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

“Value added” measures of teacher quality: use and policy validity Sean P. Corcoran New York University NYU Abu Dhabi Conference January 22, 2009.

Similar presentations

Presentation on theme: "“Value added” measures of teacher quality: use and policy validity Sean P. Corcoran New York University NYU Abu Dhabi Conference January 22, 2009."— Presentation transcript:

Similar presentations

About project

Feedback