Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2009 National Comprehensive Center for Teacher Quality. All rights reserved. Evaluating Teacher Effectiveness: Where Do We Go From Here? Laura.

Similar presentations

Presentation on theme: "Copyright © 2009 National Comprehensive Center for Teacher Quality. All rights reserved. Evaluating Teacher Effectiveness: Where Do We Go From Here? Laura."— Presentation transcript:

1 Copyright © 2009 National Comprehensive Center for Teacher Quality. All rights reserved. Evaluating Teacher Effectiveness: Where Do We Go From Here? Laura Goe, Ph.D. California Education Research Association 89 th Annual Conference San Diego, CA  November 18, 2010

2 22 Laura Goe, Ph.D.  Former teacher in rural & urban schools Special education (7 th & 8 th grade, Tunica, MS) Language arts (7 th grade, Memphis, TN)  Graduate of UC Berkeley’s Policy, Organizations, Measurement & Evaluation doctoral program  Principal Investigator for the National Comprehensive Center for Teacher Quality  Research Scientist in the Performance Research Group at ETS

3 3 National Comprehensive Center for Teacher Quality (the TQ Center) A federally-funded partnership whose mission is to help states carry out the teacher quality mandates of ESEA  Vanderbilt University Students with special needs, at-risk students  Learning Point Associates Technical assistance, research, fiscal agent  Educational Testing Service Technical assistance, research, dissemination

4 The goal of teacher evaluation The ultimate goal of all teacher evaluation should be… TO IMPROVE TEACHING AND LEARNING 4

5 Conundrums to be addressed  The many definitions of teacher effectiveness and the only one that matters  When all you have is a hammer, everything looks like a nail  The value of value-added  The locked room mystery: how to evaluate teachers without ever setting foot in a classroom  Gates and states and where they are now 5

6 Defining teacher effectiveness 6 Wherein we will consider some definitions of teacher effectiveness and the only one that matters

7 Definitions in the research & policy worlds  Much of the research on teacher effectiveness doesn’t define effectiveness at all though it is often assumed to be teachers’ contribution to student achievement  Bryan C. Hassel of Public Impact stated in 2009 that “The core of a state’s definition of teacher effectiveness must be student outcomes”  Checker Finn stated in 2010 that “An effective teacher is one whose pupils learn what they should while under his/her tutelage” 7

8 Goe, Bell, & Little (2008) definition of teacher effectiveness 1.Have high expectations for all students and help students learn, as measured by value-added or alternative measures. 2.Contribute to positive academic, attitudinal, and social outcomes for students, such as regular attendance, on-time promotion to the next grade, on-time graduation, self-efficacy, and cooperative behavior. 3.Use diverse resources to plan and structure engaging learning opportunities; monitor student progress formatively, adapting instruction as needed; and evaluate learning using multiple sources of evidence. 4.Contribute to the development of classrooms and schools that value diversity and civic-mindedness. 5.Collaborate with other teachers, administrators, parents, and education professionals to ensure student success, particularly the success of students with special needs and those at high risk for failure. 8

9 Race to the Top definition of effective & highly effective teacher Effective teacher: students achieve acceptable rates (e.g., at least one grade level in an academic year) of student growth (as defined in this notice). States, LEAs, or schools must include multiple measures, provided that teacher effectiveness is evaluated, in significant part, by student growth (as defined in this notice). Supplemental measures may include, for example, multiple observation-based assessments of teacher performance. (pg 7) Highly effective teacher students achieve high rates (e.g., one and one-half grade levels in an academic year) of student growth (as defined in this notice). 9

10 Teacher evaluation 10 Wherein we will consider the statement “When all you have is a hammer, everything looks like a nail.”

11 Research Behind the Push for New Evaluation Measures and Systems  The Widget Effect report (Weisberg et al., 2009) “…examines our pervasive and longstanding failure to recognize and respond to variations in the effectiveness of our teachers.” (from Executive Summary)  Value-added research shows that teachers vary greatly in their contributions to student achievement (Rivkin, Hanushek, & Kain, 2005). 11

12 Measurement Instruments (from Holdheide et al., 2010) 12

13 Measures of teacher effectiveness  Evidence of growth in student learning and competency Standardized tests, pre/post tests in untested subjects Student performance (art, music, etc.) Curriculum-based tests given in a standardized manner Classroom-based tests such as DIBELS  Evidence of instructional quality Classroom observations Lesson plans, assignments, and student work  Evidence of professional responsibility Administrator/supervisor reports Surveys of students and/or parents An “evidence binder” created & presented by the teacher 13

14 14 Teacher observations: strengths and weaknesses  Strengths Great for teacher formative evaluation Helps evaluator understand teachers’ needs across school or across district  Weaknesses Only as good as the instruments and the observers Considered “less objective” Expensive to conduct (personnel time, training, calibrating) Validity of observation results may vary with who is doing them, depending on how well trained and calibrated they are

15 Example: University of Virginia’s CLASS observation tool Emotional Support Classroom Organization Instructional Support Pre-K and K-3 Positive Climate Negative Climate Teacher Sensitivity Regard for Student (Adolescent) Perspectives Behavior Management Productivity Instructional Learning Formats Concept Development Quality of Feedback Language Modeling Upper Elementary/ Secondary Content Understanding Analysis and Problem Solving Quality of Feedback

16 16 Example: Charlotte Danielson’s Framework for Teaching

17 Example: Kim Marshall’s Rubric Planning & Preparation for Learning Highly EffectiveEffectiveImprovement Necessary Does Not Meet Standards a. KnowledgeIs expert in the subject area and has a cutting- edge grasp of child development and how students learn. Knows the subject matter well and has a good grasp of child development and how students learn. Is somewhat familiar with the subject and has a few ideas of ways students develop and learn. Has little familiarity with the subject matter and few ideas on how to teach it and how students learn. b. StrategyHas a well-honed game plan for the year that is tightly aligned with state standards and assessments. Plans the year so students will meet state standards and be ready for external assessments. Has done some thinking about how to cover high standards and test requirements this year. Plans lesson by lesson and has little familiarity with state standards and tests. 17

18 18 Research on observations: Danielson Framework  Lots of research on Danielson Framework (1996) and whether its scores correlate with student achievement growth Goe (2007) reviews many studies, most finding weak or no correlation Kane et al. (2010) describes research linking observation scores with value-added scores (found some small, significant correlations) Sartain et al. (2010) describes challenges in implementation, differences researcher/principal ratings  Consortium on Chicago School Research has ongoing project studying implementation and results of replacing the “checklist” with the Danielson Framework

19 Research on observations: CLASS  Considerable research, mostly conducted by creators of CLASS Howes et al. (2008): children’s relationship with teachers, not teachers’ qualifications, mattered Pianta et al. (2007): “Children from nonpoor families and who scored high on achievement at 54 months were most likely to experience classrooms high in positive emotional or instructional climate throughout elementary school. Poor children were highly unlikely (only 10%) to experience classrooms with high instructional climate across multiple grades.” 19

20 Federal priorities (August 2010)  From “Race to the Top” and reiterated in the August 5, 2010 Federal Register (Vol. 75, No. 150) “Secretary’s Priorities for Discretionary Grant Programs” Teachers should be evaluated using state standardized tests where possible For non-tested subjects, other measures (including pre- and post-tests) can be used but must be “rigorous and comparable across classrooms” and must be “between two points in time” Multiple measures should be used, such as multiple classroom evaluations 20

21 Teacher behaviors/practices that correlate with achievement  High ratings on learning environment observed in a teacher’s classroom (Kane et al., 2010)  Positive student/teacher relationships (Howes et al., 2008)  Parent engagement by teachers and schools (Redding et al., 2004)  Teachers’ participation in intensive professional development with follow-up (Yoon et al., 2007) 21 IN MANY CURRENTLY-USED TEACHER EVALUATION MODELS, THESE ARE NEVER MEASURED.

22 The value of value-added 22 Wherein we will consider the value of value-added.

23 23 Most popular growth models  Value-added models There are many versions of value-added models (VAMs), but results from the different models are quite similar Most states and districts that use VAMs use the Sanders’ model, also called TVAAS Prior test scores (3+ years in the Sanders’ model) are used to predict the next test score for a student  Colorado Growth model Focuses on “growth to proficiency” Measures students against “peers”

24 24 What Value-Added Models Cannot Tell You  Value-added models are really measuring classroom effects, not teacher effects  Value-added models can’t tell you why a particular teacher’s students are scoring higher than expected Maybe the teacher is focusing instruction narrowly on test content Or maybe the teacher is offering a rich, engaging curriculum that fosters deep student learning.  How the teacher is achieving results matters!

25 Cautions about using value-added for teacher evaluation  Braun et al. (2010) provides some useful definitions and a good review of research; notes that most researchers are not comfortable with using VAMs as the sole measures of teacher effectiveness  Schochet & Chiang (2010) “Type I and II error rates for comparing a teacher’s performance to the average are likely to be about 25 percent with three years of data and 35 percent with one year of data.” 25

26 Considerations in using value-added for teacher evaluation  Koedel & Betts (2009) suggest using multiple years of data for teacher evaluation to mitigate sorting bias; novice teachers cannot be evaluated under this system  McCaffrey et al. (2009) “…there are significant gains in the stability [of teachers’ value-added scores] obtained by using two- year average performance measures rather than singe-year estimates” 26

27 27 VAMs don’t measure most teachers  About 69% of teachers (Prince et al., 2006) can’t be accurately assessed with VAMs Teachers in subject areas that are not tested with annual standardized tests Teachers in grade levels (lower elementary) where no prior test scores are available Questions about the validity of measuring special education teachers and ELL teachers with VAMs

28 The Locked Room 28 Wherein we will speculate on the locked room mystery: how to evaluate teachers without ever setting foot in a classroom

29 New technologies: Videotaping  Videotaping teachers in classrooms There is considerable interest in videotapes that teachers start/stop themselves, then download and transmit to a central location for scoring The technology makes it possible to get a view of more than just what the teacher is doing—the wide angle allows reviewers to see what the students are doing as well 29

30 Pluses and minuses with video  Plus: Teachers’ videos can be scored from a distance Highly-trained and calibrated raters can examine the results Yields greater reliability and comparability across classrooms Cost savings: no need to train local evaluators Easy to aggregate scores to spot trends, identify problems  Minus: Teachers’ videos can be scored from a distance Teacher does not have an opportunity to have a conversation with or question his/her evaluator Removes one of the important benefits of observations to teachers—receiving individual, specific feedback on practice 30

31 Other uses for video technology  Can be used as an “audit” mechanism A district trains local evaluators but wants to see if their scores are valid and reliable The local evaluators can score videos that have already been scored by trained “master evaluators” who work with the video company By comparing local evaluator scores with “master evaluator” scores, it may be possible to see specific areas of the observation where additional training is required to ensure consistency of interpretation May be possible to identify raters who are not reliable or are interpreting evidence differently and retrain them 31

32 Efficiency vs. Opportunity  Even if technology and data eliminates the need to ever go into a classroom, the teachers often benefit greatly from classroom visits Informal drop-in visits 10-min visits with a “two stars and a wish” note left in a teacher’s mailbox Peer evaluators who can drop in regularly and provide feedback on changes they see in teachers’ practice, student engagement, etc. 32

33 Student achievement data  For most teachers who will be evaluated with standardized tests, the person conducting the evaluation will never be in the classroom  Some may argue that it is not necessary to see the classroom since it is outcomes that are important  But there is much to be learned from a teacher in challenging circumstances whose students are showing remarkable growth 33

34 Multiple measures are key  Multiple sources of evidence of a students’ learning provide… The teacher with better evidence about what the student knows and is able to do, so she can adapt instructional strategies accordingly The evaluator with better evidence about a teachers’ contribution to student learning  Results from a rubric-based assessment of performance and results from a standardized test may show very different aspects of a students’ knowledge and skills 34

35 The state of research & policy 35 Wherein we will explore Gates and states and where they are now

36 Capacity to link students to teachers is changing 36

37 Reprinted from Wall Street Journal, 9/7/10.

38 A few teacher evaluation models  TAP (Teacher Advancement Program)  Austin, TX  Rhode Island  Washington, DC  Delaware 38

39 Questions to ask about models  Are they “rigorous and comparable across classrooms”?  Do they show student learning growth “between two points in time”?  Are they based on grade level and subject standards?  Do they allow teachers from all subjects to be evaluated with evidence of student learning growth? 39

40 Teacher Advancement Program (TAP) Model  TAP requires that teachers in tested subjects be evaluated with value-added models  All teachers are observed in their classrooms (using a Charlotte Danielson type instrument) at least three times per year by different observers (usually one administrator and two teachers who have been appointed to the role)  Teacher effectiveness (for performance awards) determined by combination of value-added and observations  Teachers in non-tested subjects are given the school- wide average for their value-added component, which is combined with their observation scores 40

41 Austin Independent School District Student Learning Objectives:  Teachers determine two SLOs for the semester/year  One SLO must address all students, other may be targeted  Use broad array of assessments  Assess student needs more directly  Align classroom, campus, and district expectations  Aligned to state standards/campus improvement plans  Based on multiple sources of student data  Assessed with pre and post assessment  Targets of student growth  Peer collaboration 41

42 42 Rubric for student learning objectives

43 43 Rubric for student learning objectives (cont’d)

44 Rhode Island DOE Model: Framework for Applying Multiple Measures of Student Learning Category 1: Student growth on state standardized tests (e.g., NECAP, PARCC) Student learning rating Professional practice rating Professional responsibilities rating + + Final evaluation rating Category 2: Student growth on standardized district-wide tests (e.g., NWEA, AP exams, Stanford- 10, ACCESS, etc.) Category 3: Other local school-, administrator-, or teacher- selected measures of student performance The student learning rating is determined by a combination of different sources of evidence of student learning. These sources fall into three categories: 44

45 Rhode Island Model: Student Learning Group Guiding Principles “Not all teachers’ impact on student learning will be measured by the same mix of assessments, and the mix of assessments used for any given teacher group may vary from year to year.” Teacher A (5 th grade English) Teacher B (11 th grade English) Teacher C (middle school art) Category 1 (growth on NECAP) Category 2 (e.g., growth on NWEA) Category 3 (e.g., principal review of student work over a six month span) Teacher A’s student learning rating + += Category 2 (e.g., AP English exam) Category 3 (e.g., joint review of critical essay portfolio) Teacher B’s student learning rating += 45 Category 3 (e.g., joint review of art portfolio) This teacher may use several category 3 assessments Teacher C’s student learning rating =

46 Washington DC’s IMPACT: Teacher Groups  Group 1: general ed teachers for whom value-added data can be generated  Group 2: general ed teachers for whom value-added data cannot be generated  Group 3: special education teachers  Group 4: non-itinerant English Language Learner (ELL) teachers and bilingual teachers  Group 5: itinerant ELL teachers  Etc… 46

47 Score calculation for Groups 1 & 2 Group 1 (tested subjects) Group 2 (non- tested subjects Teacher value-added (based on test scores) 50%0% Teacher-assessed student achievement (based on non-VAM assessments) 0%10% Teacher and Learning Framework (observations) 35%75% Commitment to School Community 10% School Wide Value-Added 5% 47

48 Non-VAM tests (accepted under Washington, DC’s IMPACT evaluation system)  DC Benchmark Assessment System (DC BAS)  Dynamic Indicators of Basic Early Literacy Skills (DIBELS)  Developmental Reading Assessment (DRA)  Curriculum-based assessments (e.g., Everyday Mathematics)  Unit tests from DCPS-approved textbooks  Off-the-shelf standardized assessments that are aligned to the DCPS Content Standards  Rigorous teacher-created assessments that are aligned to the DCPS Content Standards  Rigorous portfolios of student work that are aligned to the DCPS Content Standards 48

49 Delaware Model  Standardized test will be used as part of teachers’ scores in some grades/subjects  “Group alike” teachers, meeting with facilitators, determine which assessments, rubrics, processes can be used in their subjects/grades (multiple measures)  Assessments must focus on standards, be given in a “standardized” way, i.e., giving pre-test on same day, for same length of time, with same preparation  Teachers recommend assessments to the state for approval  Teachers/groups of teachers take primary responsibility for determining student growth  State will monitor how assessments are “working” 49

50 Gates & Spencer Research (ETS)  How do results from different measures compare? Several multi-year research projects seek to answer that question Projects involve ETS, the Institute for Social Research, RAND, the University of Virginia, and the University of Michigan Numerous instruments (four types of classroom observations instruments), teacher assignment protocol, content knowledge measures, teacher self-efficacy measures 50

51 Basic Research Design (Gates, Spencer projects)  Observe and video-record teacher multiple times in a classroom(s) during a school year  Rate the classroom interactions using appropriate general and subject-specific rubrics  Collect student assignments & resulting student work  Assess teacher knowledge and beliefs (efficacy) through tests/surveys  Estimate VAM scores for teachers  Examine relationships within and across all measures 51

52 Instruments being used (Gates, Spencer)  PLATO (ELA) – Pam Grossman - Stanford  Math Quality Instruction (MQI) – Heather Hill, Harvard  CLASS (observation instrument) – Bob Pianta, U of VA  Framework for Teaching – Charlotte Danielson  Intellectual Demand Assignment Protocol (IDAP) – Fred Newmann et al.  Traditional Praxis measures – content knowledge  Tests of Knowledge For Teaching measures from Deborah Ball, Heather Hill and colleagues  Developing new measures of Knowledge For Teaching (collaboration with University of Michigan)  Carol Dweck measures on views of intelligence 52

53 Observation instruments Charlotte Danielson’s Framework for Teaching CLASS Kim Marshall Rubric 20Teacher%20Eval%20Rubrics%20Jan% 53

54 Models Austin (Student Learning Objectives) Teacher Advancement Program Washington DC IMPACT Guidebooks ccess/IMPACT+(Performance+Assessment)/IMPACT+Guidebooks Rhode Island Model rking%20Group% Delaware Model 54

55 References Braun, H., Chudowsky, N., & Koenig, J. A. (2010). Getting value out of value-added: Report of a workshop. Washington, DC: National Academies Press. Finn, Chester. (July 12, 2010). Blog response to topic “Defining Effective Teachers.” National Journal Expert Blogs: Education. Goe, L. (2007). The link between teacher quality and student outcomes: A research synthesis. Washington, DC: National Comprehensive Center for Teacher Quality. Goe, L., Bell, C., & Little, O. (2008). Approaches to evaluating teacher effectiveness: A research synthesis. Washington, DC: National Comprehensive Center for Teacher Quality. Hassel, B. (Oct 30, 2009). How should states define teacher effectiveness? Presentation at the Center for American Progress, Washington, DC. how-should-states-define-teacher-effectiveness Holdheide, L., Goe, L., Croft, A., & Reschly, D. (2010). Challenges in evaluating special education teachers and English Language Learner specialists. Washington, DC: National Comprehensive Center for Teacher Quality. 55

56 References (continued) Howes, C., Burchinal, M., Pianta, R., Bryant, D., Early, D., Clifford, R., et al. (2008). Ready to learn? Children's pre-academic achievement in pre-kindergarten programs. Early Childhood Research Quarterly, 23(1), 27-50. Kane, T. J., Taylor, E. S., Tyler, J. H., & Wooten, A. L. (2010). Identifying effective classroom practices using student achievement data. Cambridge, MA: National Bureau of Economic Research. Koedel, C., & Betts, J. R. (2009). Does student sorting invalidate value-added models of teacher effectiveness? An extended analysis of the Rothstein critique. Cambridge, MA: National Bureau of Economic Research. McCaffrey, D., Sass, T. R., Lockwood, J. R., & Mihaly, K. (2009). The intertemporal stability of teacher effect estimates. Education Finance and Policy, 4(4), 572-606. Pianta, R. C., Belsky, J., Houts, R., & Morrison, F. (2007). Opportunities to learn in America’s elementary classrooms. [Education Forum]. Science, 315, 1795-1796. 56

57 References (continued) Prince, C. D., Schuermann, P. J., Guthrie, J. W., Witham, P. J., Milanowski, A. T., & Thorn, C. A. (2006). The other 69 percent: Fairly rewarding the performance of teachers of non-tested subjects and grades. Washington, DC: U.S. Department of Education, Office of Elementary and Secondary Education. Race to the Top Application Rivkin, S. G., Hanushek, E. A., & Kain, J. F. (2005). Teachers, schools, and academic achievement. Econometrica, 73(2), 417 - 458. Sartain, L., Stoelinga, S. R., & Krone, E. (2010). Rethinking teacher evaluation: Findings from the first year of the Excellence in Teacher Project in Chicago public schools. Chicago, IL: Consortium on Chicago Public Schools Research at the University of Chicago. Schochet, P. Z., & Chiang, H. S. (2010). Error rates in measuring teacher and school performance based on student test score gains. Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. 57

58 References (continued) Redding, S., Langdon, J., Meyer, J., & Sheley, P. (2004). The effects of comprehensive parent engagement on student learning outcomes. Paper presented at the American Educational Research Association Weisberg, D., Sexton, S., Mulhern, J., & Keeling, D. (2009). The widget effect: Our national failure to acknowledge and act on differences in teacher effectiveness. Brooklyn, NY: The New Teacher Project. Yoon, K. S., Duncan, T., Lee, S. W.-Y., Scarloss, B., & Shapley, K. L. (2007). Reviewing the evidence on how teacher professional development affects student achievement (No. REL 2007-No. 033). Washington, D.C.: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Southwest. 58

59 59 Questions?

60 60 Laura Goe, Ph.D. P: 609-734-1076 E-Mail: National Comprehensive Center for Teacher Quality 1100 17th Street NW, Suite 500 Washington, DC 20036-4632 877-322-8700 >

Download ppt "Copyright © 2009 National Comprehensive Center for Teacher Quality. All rights reserved. Evaluating Teacher Effectiveness: Where Do We Go From Here? Laura."

Similar presentations

Ads by Google