1 Getting Value From Value-Added Committee on Value-Added Methodology for Instructional Improvement, Program Evaluation, and Educational Accountability National Research Council and National Academy of Education Presentation at the annual meeting of the Society for Research on Educational Effectiveness Washington DC March 5, 2010

2 Committee Members Henry Braun, Boston College (Chair) Henry Braun, Boston College (Chair) Jane Hannaway, Urban Institute Jane Hannaway, Urban Institute Kevin Lang, Boston University Kevin Lang, Boston University Scott Marion, National Center for the Improvement of Educational Assessment Scott Marion, National Center for the Improvement of Educational Assessment Lorrie Shepard, University of Colorado Lorrie Shepard, University of Colorado Judith Singer, Harvard University Judith Singer, Harvard University Mark Wilson, University of California, Berkeley Mark Wilson, University of California, Berkeley

3 Today’s Presentation Henry Braun: Henry Braun: Introduction Introduction Uses of VAM Uses of VAM Measurement Issues Measurement Issues Analytic Issues Analytic Issues Consequences of Using VAM Consequences of Using VAM Judith Singer: Judith Singer: Key System Components Key System Components Considerations for Policy Makers Considerations for Policy Makers Using VAM to Evaluate Teachers Using VAM to Evaluate Teachers

4 Structure of the Workshop Identified 4 themes: Identified 4 themes: Goals and uses of VAM Goals and uses of VAM Measurement issues with VAM Measurement issues with VAM Analytic issues with VAM Analytic issues with VAM Consequences (policy considerations) of using VAM Consequences (policy considerations) of using VAM Commissioned 4 papers (and 2 discussants) for each theme Commissioned 4 papers (and 2 discussants) for each theme Commissioned writers represented different disciplines Commissioned writers represented different disciplines Economics Economics Educational statistics Educational statistics Health/medicine Health/medicine Measurement/International assessment Measurement/International assessment Program evaluation Program evaluation

5 Assignments for Workshop Presenters Asked presenters to discuss what they judged to be: Asked presenters to discuss what they judged to be: Critical issues with VAM Critical issues with VAM Areas of consensus and disagreement in their fields Areas of consensus and disagreement in their fields The types of research needed to resolve the areas of disagreement The types of research needed to resolve the areas of disagreement Implications of these issues for uses of VAM in practice Implications of these issues for uses of VAM in practice

6 Workshop Presenters Dale Ballou, Vanderbilt University Dale Ballou, Vanderbilt University Derek Briggs, University of Colorado at Boulder Derek Briggs, University of Colorado at Boulder John Q. Easton, CCSR (now at IES) John Q. Easton, CCSR (now at IES) Adam Gamoran, University of Wisconsin, Madison Adam Gamoran, University of Wisconsin, Madison Robert Gordon, Center for American Progress Robert Gordon, Center for American Progress Ashish Jha, Harvard School of Public Health Ashish Jha, Harvard School of Public Health Michael Kane, National Conference of Bar Examiners (now at ETS) Michael Kane, National Conference of Bar Examiners (now at ETS) Michael J. Kolen, University of Iowa Michael J. Kolen, University of Iowa Helen F. Ladd, Duke University Helen F. Ladd, Duke University Robert L. Linn, University of Colorado, Boulder Robert L. Linn, University of Colorado, Boulder J.R. Lockwood, RAND Corporation J.R. Lockwood, RAND Corporation Daniel F. McCaffrey, RAND Corporation Daniel F. McCaffrey, RAND Corporation Sean Reardon, Stanford University Sean Reardon, Stanford University Mark D. Reckase, Michigan State University Mark D. Reckase, Michigan State University Brian Stecher, RAND Corporation Brian Stecher, RAND Corporation J. Douglas Willms, University of New Brunswick J. Douglas Willms, University of New Brunswick

7 Structure of the Report Workshop held Nov. 13-14, 2008 Workshop held Nov. 13-14, 2008 Report is workshop summary; not a consensus report. Report is workshop summary; not a consensus report. Structure of the report: Structure of the report: Introduction to VAM Introduction to VAM Uses and Consequences of VAM Uses and Consequences of VAM Measurement Issues Measurement Issues Analytic Issues Analytic Issues Considerations for Policy Makers Considerations for Policy Makers

8 Introduction: Goals for VAM To estimate the contributions of schools and/or teachers to student learning as represented by test score trajectories To estimate the contributions of schools and/or teachers to student learning as represented by test score trajectories Intention is to make causal inferences by correcting for non-random pairings of students with teachers and schools Intention is to make causal inferences by correcting for non-random pairings of students with teachers and schools Differences between economists and statisticians in approaches, models, and assumptions Differences between economists and statisticians in approaches, models, and assumptions

9 Measurement Issues Tests are incomplete measures of student achievement. Value-added estimates are based on test scores that reflect a narrower set of educational goals (cognitive and other) than most parents and educators have for students. Tests are incomplete measures of student achievement. Value-added estimates are based on test scores that reflect a narrower set of educational goals (cognitive and other) than most parents and educators have for students. Measurement error. Test scores are not perfectly precise. Measurement error. Test scores are not perfectly precise.

10 Measurement Issues (cont.) Interval scale. To provide a consistent ranking of schools’, teachers’, or programs’ value-added, one important assumption underlying value-added analyses employing regression models is that the tests used in the analyses are reported on an equal interval scale. Interval scale. To provide a consistent ranking of schools’, teachers’, or programs’ value-added, one important assumption underlying value-added analyses employing regression models is that the tests used in the analyses are reported on an equal interval scale.

11 Measurement Issues (cont.) Vertical linking of tests. Some value-added models require vertically linked test score scales; that is, the scores on tests from different grades are linked to a common scale so that students’ scores from different grades can be compared directly. Vertical linking of tests. Some value-added models require vertically linked test score scales; that is, the scores on tests from different grades are linked to a common scale so that students’ scores from different grades can be compared directly. Models of learning. Some researchers argue that value-added models would be more useful if there were better content standards that laid out developmental pathways of learning and highlighted critical transitions; tests could then be aligned to such developmental standards. Models of learning. Some researchers argue that value-added models would be more useful if there were better content standards that laid out developmental pathways of learning and highlighted critical transitions; tests could then be aligned to such developmental standards.

12 Analytic Issues Bias. In order to tackle the problem of nonrandom assignment of students to teachers and teachers to schools, value-added modeling adjusts for preexisting differences among students, using prior test scores and (sometimes) other observed student and school characteristics. Bias. In order to tackle the problem of nonrandom assignment of students to teachers and teachers to schools, value-added modeling adjusts for preexisting differences among students, using prior test scores and (sometimes) other observed student and school characteristics. Precision and stability. Research on the precision of value-added estimates consistently finds large sampling errors. Precision and stability. Research on the precision of value-added estimates consistently finds large sampling errors.

13 Analytic Issues (cont. ) Data quality. Missing or faulty data can have a negative impact on the precision and stability of value-added estimates and can also contribute to bias. Data quality. Missing or faulty data can have a negative impact on the precision and stability of value-added estimates and can also contribute to bias. Complexity versus transparency. More complex value-added models tend to have better technical qualities. Complexity versus transparency. More complex value-added models tend to have better technical qualities.

14 Possible Consequences of Using VAM Incentives and consequences. If value-added indicators are part of an accountability system, they are likely to change educators’ behavior and to lead to unintended consequences, as well as to intended ones. Attribution. In situations in which there is team teaching or a coordinated emphasis within a school (e.g., writing across the curriculum), is it appropriate to attribute students’ learning to a single teacher?

15 Key System Components To maximize the utility of the models, the system needs: A longitudinal database that tracks individual students over time and links them to their teachers (for teacher accountability) or to their schools (school accountability) A longitudinal database that tracks individual students over time and links them to their teachers (for teacher accountability) or to their schools (school accountability) Confidence that missing data are missing for legitimate reasons (student mobility) and not because of data collection problems Confidence that missing data are missing for legitimate reasons (student mobility) and not because of data collection problems Expert staff to run the value-added analyses Expert staff to run the value-added analyses

16 Key System Components (cont.) Vertically coherent set of standards, curriculum, and pedagogical strategies that are linked to the standards, and a sequence of tests well aligned to that set of standards Vertically coherent set of standards, curriculum, and pedagogical strategies that are linked to the standards, and a sequence of tests well aligned to that set of standards Reporting system that effectively presents results and provides support so users are likely to make appropriate inferences Reporting system that effectively presents results and provides support so users are likely to make appropriate inferences

17 Key System Components (cont.) Ongoing training for teachers and administrators so they can understand and use results Ongoing training for teachers and administrators so they can understand and use results Mechanism to monitor the system’s effects on teachers and students so the program can be adapted if unintended consequences arise Mechanism to monitor the system’s effects on teachers and students so the program can be adapted if unintended consequences arise

18 Using VAM to Evaluate Teachers Workshop participants were concerned about using VAM as the sole indicator for high-stakes decisions about teachers Workshop participants were concerned about using VAM as the sole indicator for high-stakes decisions about teachers Low numbers of students per teacher Low numbers of students per teacher Issues with stability of year-to-year estimates Issues with stability of year-to-year estimates Uncertainty about the extent to which causal inferences can be supported, particularly when students have multiple teachers Uncertainty about the extent to which causal inferences can be supported, particularly when students have multiple teachers

19 Using VAM to Evaluate Teachers (cont.) VAM might be useful for lower stakes purposes VAM might be useful for lower stakes purposes For instance, as the first step in identifying teachers who need improvement or who have pedagogical strategies that could be emulated For instance, as the first step in identifying teachers who need improvement or who have pedagogical strategies that could be emulated VAM estimates might be useful as one of several indicators considered in combination with other indicators for either higher or lower stakes uses VAM estimates might be useful as one of several indicators considered in combination with other indicators for either higher or lower stakes uses Consistent VAM estimates of teachers’ value- added over time could provide more conclusive evaluative evidence Consistent VAM estimates of teachers’ value- added over time could provide more conclusive evaluative evidence

20 Considerations for Policy Makers Compared to what? Compared to what? Risks and rewards of VAM compared to other methods of evaluation/accountability Risks and rewards of VAM compared to other methods of evaluation/accountability Is there a best VAM? Is there a best VAM? Data requirements for VAM Data requirements for VAM Types of standards and tests Types of standards and tests ID, tracking, and warehouse systems ID, tracking, and warehouse systems Stakes, stakes, stakes Stakes, stakes, stakes

21 A Note About Stakes Participants noted that any considerations of VAM uses are contingent upon the intended stakes attached to the decisions Participants noted that any considerations of VAM uses are contingent upon the intended stakes attached to the decisions Low stakes to some, might feel high to others Low stakes to some, might feel high to others

22 Key Research Areas What are the effects of measurement error on accurately estimating teacher, school, or program effects? What are the effects of measurement error on accurately estimating teacher, school, or program effects? What is the contribution of measurement error to the volatility in estimates, (e.g., a teacher’s value-added estimates) over time? What is the contribution of measurement error to the volatility in estimates, (e.g., a teacher’s value-added estimates) over time?

23 Key Research Areas (cont.) Since there are questions about the assumption that test score scales are equal- interval, to what extent are inferences from value-added modeling sensitive to monotonic transformations (transformations that preserve the original order) of test scores? Since there are questions about the assumption that test score scales are equal- interval, to what extent are inferences from value-added modeling sensitive to monotonic transformations (transformations that preserve the original order) of test scores? How might value-added analyses be given a thorough evaluation before being operationally implemented? How might value-added analyses be given a thorough evaluation before being operationally implemented?

24 Key Research Areas (cont.) How might the econometric and statistical models incorporate features from the other’s approach that are missing from their own model? How might the econometric and statistical models incorporate features from the other’s approach that are missing from their own model? How do violations of model assumptions affect the accuracy of value-added estimates? How do violations of model assumptions affect the accuracy of value-added estimates? For example, how does not meeting assumptions about the assignment of students to classrooms affect accuracy? For example, how does not meeting assumptions about the assignment of students to classrooms affect accuracy? How do the models perform in simulation studies? How do the models perform in simulation studies?

25 Key Research Areas (cont.) How could the precision of value-added estimates be improved? How could the precision of value-added estimates be improved? What are the implications of Rothstein’s results about causality/bias for both the economic and statistical approaches? What are the implications of Rothstein’s results about causality/bias for both the economic and statistical approaches? How might value-added estimates of effectiveness be validated? How might value-added estimates of effectiveness be validated? How do policy makers, educators, and the public use value- added information? What is the appropriate balance between the complex methods necessary for accurate measures and the need for measures to be transparent? How do policy makers, educators, and the public use value- added information? What is the appropriate balance between the complex methods necessary for accurate measures and the need for measures to be transparent?

26 Workshop papers available at: http://www7.nationalacademies.org/bota/VAM_Work shop_Agenda.html Report available at: http://www.nap.edu/catalog.php?record_id=12820 Further information: Stuart Elliott (selliott@nas.edu) Judy Koenig (jkoenig@nas.edu)

