Presentation is loading. Please wait.

Presentation is loading. Please wait. 1 Monthly Webinar Statewide Teacher Evaluation Systems January 11, 2011 State Consortium on Educator Effectiveness (SCEE)

Similar presentations

Presentation on theme: " 1 Monthly Webinar Statewide Teacher Evaluation Systems January 11, 2011 State Consortium on Educator Effectiveness (SCEE)"— Presentation transcript:

1 1 Monthly Webinar Statewide Teacher Evaluation Systems January 11, 2011 State Consortium on Educator Effectiveness (SCEE)

2 2


4 4




8 Thirty-one states are members of SCEE State teams are comprised of 6 people each, from the SEA, the professional standards board, and stakeholders in the education workforce arena Who is on this webinar?

9 1.Does your state have a statewide system of teacher evaluation? a)Yes (2 States) b)Yes, but we are revising it (4 States) c)No (4 States) d)No, but we are developing one (3 States) Poll of team leads on the webinar

10 2.If your state does have a statewide system or if you are revising or developing one, what components are included (you may select more then one)? a)Requirements that LEAs conduct teacher evaluations (5 States) b)Criteria that must be included in teacher evaluations (4 States) c)Statewide instrument(s) for teacher evaluation (2 States) Poll of team leads on the webinar

11 3.If you have a statewide system, is there legislation that requires it? a)Yes (6 States) b)No (4 States) c)Unsure (1 State) Poll of team leads on the webinar

12 4.Is teacher evaluation subject to collective bargaining in your state? a)Yes (6 States) b)No (4 States) c)Locally-determined (3 States) Poll of team leads on the webinar

13 Copyright © 2009 National Comprehensive Center for Teacher Quality. All rights reserved. Evaluating Teacher Effectiveness Laura Goe, Ph.D. CCSSO’s State Consortium on Educator Effectiveness (SCEE) Webinar  January 11, 2011

14 Laura Goe, Ph.D.  Former teacher in rural & urban schools Special education (7 th & 8 th grade, Tunica, MS) Language arts (7 th grade, Memphis, TN)  Graduate of UC Berkeley’s Policy, Organizations, Measurement & Evaluation doctoral program  Principal Investigator for the National Comprehensive Center for Teacher Quality  Research Scientist in the Performance Research Group at ETS 14

15 15 National Comprehensive Center for Teacher Quality (the TQ Center) A federally-funded partnership whose mission is to help states carry out the teacher quality mandates of ESEA  Vanderbilt University Students with special needs, at-risk students  Learning Point Associates Technical assistance, research, fiscal agent  Educational Testing Service Technical assistance, research, dissemination

16 The goal of teacher evaluation The ultimate goal of all teacher evaluation should be… TO IMPROVE TEACHING AND LEARNING 16

17 Defining teacher effectiveness 17 Wherein we will consider some definitions of teacher effectiveness and the only one that matters

18 Definitions in the research & policy worlds  Much of the research on teacher effectiveness doesn’t define effectiveness at all though it is often presumed to be teachers’ contribution to student achievement  Bryan C. Hassel of Public Impact stated in 2009 that “The core of a state’s definition of teacher effectiveness must be student outcomes”  Checker Finn stated in 2010 that “An effective teacher is one whose pupils learn what they should while under his/her tutelage” 18

19 Goe, Bell, & Little (2008) definition of teacher effectiveness 1.Have high expectations for all students and help students learn, as measured by value-added or alternative measures. 2.Contribute to positive academic, attitudinal, and social outcomes for students, such as regular attendance, on-time promotion to the next grade, on-time graduation, self- efficacy, and cooperative behavior. 3.Use diverse resources to plan and structure engaging learning opportunities; monitor student progress formatively, adapting instruction as needed; and evaluate learning using multiple sources of evidence. Continued 19

20 Goe, Bell, & Little (2008) definition of teacher effectiveness 4.Contribute to the development of classrooms and schools that value diversity and civic-mindedness. 5.Collaborate with other teachers, administrators, parents, and education professionals to ensure student success, particularly the success of students with special needs and those at high risk for failure. 20

21 Race to the Top definition of effective & highly effective teacher Effective teacher: students achieve acceptable rates (e.g., at least one grade level in an academic year) of student growth (as defined in this notice). States, LEAs, or schools must include multiple measures, provided that teacher effectiveness is evaluated, in significant part, by student growth (as defined in this notice). Supplemental measures may include, for example, multiple observation-based assessments of teacher performance. (pg 7) 21

22 Race to the Top definition of effective & highly effective teacher Highly effective teacher students achieve high rates (e.g., one and one-half grade levels in an academic year) of student growth (as defined in this notice). 22

23 Taking a stand on standards 23 Wherein we consider why standards matter

24 The role of teaching standards in evaluation systems  Good teaching standards are the foundation of an evaluation system The measures and tools should be chosen to collect evidence on the standards Evidence of teachers’ level of proficiency on key standards should form the basis for evaluation  Teachers and administrators should be very familiar with the standards for their state Through teacher preparation programs, mentoring, and professional development 24

25 Why standards matter  With standards, everyone knows what’s important  Teachers know what is expected of them  Evaluators know what they’ll need evidence for  A common understanding about what matters makes for better discussion among teachers, administrators, evaluators, and other stakeholders  Alignment with professional development is tighter  Professional development can be targeted towards “problem” areas across groups of teachers, schools 25

26 Teacher evaluation 26 Wherein we will consider the statement “When all you have is a hammer, everything looks like a nail.”

27 Research Behind the Push for New Evaluation Measures and Systems  The Widget Effect report (Weisberg et al., 2009) “…examines our pervasive and longstanding failure to recognize and respond to variations in the effectiveness of our teachers.” (from Executive Summary)  Value-added research shows that teachers vary greatly in their contributions to student achievement (Rivkin, Hanushek, & Kain, 2005) 27

28 Measures of teacher effectiveness  Evidence of growth in student learning and competency Standardized tests, pre/post tests in untested subjects Student performance (art, music, etc.) Curriculum-based tests given in a standardized manner Classroom-based tests such as DIBELS  Evidence of instructional quality Classroom observations Lesson plans, assignments, and student work  Evidence of professional responsibility Administrator/supervisor reports Surveys of students and/or parents An “evidence binder” created & presented by the teacher 28

29 29 Teacher observations: strengths and weaknesses  Strengths Great for teacher formative evaluation Helps evaluators understand teachers’ needs across school or across district  Weaknesses Only as good as the instruments and the observers Considered “less objective” Expensive to conduct (personnel time, training, calibrating) Validity of observation results may vary with who is doing them, depending on how well trained and calibrated they are

30 Example: University of Virginia’s CLASS observation tool Emotional Support Classroom Organization Instructional Support Pre-K and K-3 Positive Climate Negative Climate Teacher Sensitivity Regard for Student (Adolescent) Perspectives Behavior Management Productivity Instructional Learning Formats Concept Development Quality of Feedback Language Modeling Upper Elementary/ Secondary Content Understanding Analysis and Problem Solving Quality of Feedback

31 31 Example: Charlotte Danielson’s Framework for Teaching

32 Example: Kim Marshall’s Rubric Planning & Preparation for Learning Highly EffectiveEffectiveImprovement Necessary Does Not Meet Standards a. KnowledgeIs expert in the subject area and has a cutting- edge grasp of child development and how students learn. Knows the subject matter well and has a good grasp of child development and how students learn. Is somewhat familiar with the subject and has a few ideas of ways students develop and learn. Has little familiarity with the subject matter and few ideas on how to teach it and how students learn. b. StrategyHas a well-honed game plan for the year that is tightly aligned with state standards and assessments. Plans the year so students will meet state standards and be ready for external assessments. Has done some thinking about how to cover high standards and test requirements this year. Plans lesson by lesson and has little familiarity with state standards and tests. 32

33 33 Research on observations: Danielson Framework  Lots of research on Danielson Framework (1996) and whether its scores correlate with student achievement growth Goe (2007) reviews many studies, most finding weak or no correlation with student achievement Kane et al. (2010) describes research linking observation scores with value-added scores (found some small but significant correlations) Sartain et al. (2010) describes challenges in implementation, differences in researcher/principal ratings  Consortium on Chicago School Research has ongoing project studying implementation and results of replacing the “checklist” with the Danielson Framework

34 Research on observations: CLASS  Considerable research, mostly conducted by creators of CLASS Howes et al. (2008): children’s relationship with teachers, not teachers’ qualifications, mattered Pianta et al. (2007): “Children from nonpoor families and who scored high on achievement at 54 months were most likely to experience classrooms high in positive emotional or instructional climate throughout elementary school. Poor children were highly unlikely (only 10%) to experience classrooms with high instructional climate across multiple grades.” 34

35 35 Research on observations: Kim Marshall’s rubric Oops. There isn’t any.

36 Federal priorities (August 2010)  From “Race to the Top” and reiterated in the August 5, 2010 Federal Register (Vol. 75, No. 150) “Secretary’s Priorities for Discretionary Grant Programs” Teachers should be evaluated using state standardized tests where possible For non-tested subjects, other measures (including pre- and post-tests) can be used but must be “rigorous and comparable across classrooms” and must be “between two points in time” Multiple measures should be used, such as multiple classroom evaluations 36

37 Teacher behaviors/practices that correlate with achievement  High ratings on learning environment observed in a teacher’s classroom (Kane et al., 2010)  Positive student/teacher relationships (Howes et al., 2008)  Parent engagement by teachers and schools (Redding et al., 2004)  Teachers’ participation in intensive professional development with follow-up (Yoon et al., 2007) 37 We’re still trying to figure out what matters, so it’s important to use multiple measures, never just one.

38 Multiple measures are key for evaluating all educators!  Multiple sources of evidence of a students’ learning provide… The teacher with better evidence about what the student knows and is able to do, so she can adapt instructional strategies accordingly The evaluator with better evidence about a teachers’ contribution to student learning  Results from a rubric-based assessment of performance and results from a standardized test may show very different aspects of a students’ knowledge and skills 38

39 The value of value-added 39 Wherein we will consider the value of value-added

40 1.Does your state use a value-added assessment or growth model to gather information about teacher effectiveness? 1.Yes (1 state) 2.No (5 States) 3.Not yet, but we are planning to (8 States) 4.Unsure (1 State) Poll of team leads on the webinar

41 2.Does your state use value-added assessment or growth data in teacher evaluation? 1.Yes (1 state) 2.No (4 states) 3.Not yet, but we are planning to (10 States) Poll of team leads on the webinar

42 42 Most popular growth models  Value-added models (requires prediction) There are many versions of value-added models (VAMs), but results from the different models are quite similar Most states and districts that use VAMs use the Sanders’ model, also called TVAAS Prior test scores (3+ years in the Sanders’ model) are used to predict the next test score for a student  Colorado Growth model (no prediction needed) Focuses on “growth to proficiency” Measures students against “academic peers”

43 43 What Value-Added Models Cannot Tell You  Value-added models are really measuring classroom effects, not teacher effects  Value-added models can’t tell you why a particular teacher’s students are scoring higher than expected Maybe the teacher is focusing instruction narrowly on test content Or maybe the teacher is offering a rich, engaging curriculum that fosters deep student learning.  How the teacher is achieving results matters!

44 Cautions about using value-added for teacher evaluation  Braun et al. (2010) provides some useful definitions and a good review of research; notes that most researchers are not comfortable with using VAMs as the sole measures of teacher effectiveness  Schochet & Chiang (2010) “Type I and II error rates for comparing a teacher’s performance to the average are likely to be about 25 percent with three years of data and 35 percent with one year of data.” 44

45 Considerations in using value-added for teacher evaluation  Koedel & Betts (2009) suggest using multiple years of data for teacher evaluation to mitigate sorting bias; novice teachers cannot be evaluated under this system  McCaffrey et al. (2009) “…there are significant gains in the stability [of teachers’ value-added scores] obtained by using two- year average performance measures rather than singe-year estimates” 45

46 46 VAMs don’t measure most teachers  About 69% of teachers (Prince et al., 2006) can’t be accurately assessed with VAMs Teachers in subject areas that are not tested with annual standardized tests Teachers in grade levels (lower elementary) where no prior test scores are available Questions about the validity of measuring special education teachers and ELL teachers with VAMs

47 The Locked Room 47 Wherein we will speculate on the locked room mystery: how to evaluate teachers without ever setting foot in a classroom

48 Videotaping instead of “live” observations: Pluses and minuses  Plus: Teachers’ videos can be scored from a distance Highly-trained and calibrated raters can examine the results Yields greater reliability and comparability across classrooms Cost savings: no need to train local evaluators Easy to aggregate scores to spot trends, identify problems  Minus: Teachers’ videos can be scored from a distance Teacher does not have an opportunity to have a conversation with or question his/her evaluator Removes one of the important benefits of observations to teachers—receiving individual, specific feedback on practice 48

49 Efficiency vs. Opportunity  Even if technology and data eliminates the need to ever go into a classroom, the teachers often benefit greatly from classroom visits Informal drop-in visits 10-min visits with a “two stars and a wish” note left in a teacher’s mailbox Peer evaluators who can drop in regularly and provide feedback on changes they see in teachers’ practice, student engagement, etc. 49

50 Oh, the perils of attribution! 50 Wherein we will discuss horizontal and vertical sources of error in attributing student learning growth to particular teachers

51 51 Horizontal: Attributing learning gains to teachers in a single year  How should teachers be held accountable for student learning gains when a student  Is only in classroom for a portion of the year?  Has a high rate of school absences?  Fails to complete assessments that will be used for determining teachers’ contribution to student growth?  Which teacher(s) should be held accountable in a co-teaching or team-teaching situation?  Various co-teaching models make it difficult to evaluate individual teachers’ contributions  Pull-outs, situations where others teach subject, too

52 Vertical: Crediting/blaming teachers for students’ prior experiences  Teacher effects persist Three years of ineffective teachers can have serious consequences for later student outcomes (Sanders & Rivers, 1996) “…long term educational advantages to individuals are most likely to come from a series of positive experiences over a sustained period” (Tymms et al., 2009)  Many, but not all, value-added models incorporate information about prior teacher effects when calculating current teachers’ scores 52

53 The state of the states 53 Wherein we will explore a few examples of district and state evaluations systems

54 A few teacher evaluation models  TAP (Teacher Advancement Program)  Austin, TX  Rhode Island  Washington, DC  Delaware 54

55 Questions to ask about models  Are they “rigorous and comparable across classrooms”?  Do they show student learning growth “between two points in time”?  Are they based on grade level and subject standards?  Do they allow teachers from all subjects and grades (not just 4-8 math & ELA) to be evaluated with evidence of student learning growth? 55

56 Teacher Advancement Program (TAP) Model  TAP requires that teachers in tested subjects be evaluated with value-added models  All teachers are observed in their classrooms (using a Charlotte Danielson type instrument) at least three times per year by different observers (usually one administrator and two teachers who have been appointed to the role)  Teacher effectiveness (for performance awards) determined by combination of value-added and observations  Teachers in non-tested subjects are given the school- wide average for their value-added component, which is combined with their observation scores 56

57 Austin Independent School District Student Learning Objectives:  Teachers determine two SLOs for the semester/year  One SLO must address all students, other may be targeted  Use broad array of assessments  Assess student needs more directly  Align classroom, campus, and district expectations  Aligned to state standards/campus improvement plans  Based on multiple sources of student data  Assessed with pre and post assessment  Targets of student growth  Peer collaboration 57

58 58 Rubric for student learning objectives

59 59 Rubric for student learning objectives (cont’d)

60 Rhode Island DOE Model: Framework for Applying Multiple Measures of Student Learning Category 1: Student growth on state standardized tests (e.g., NECAP, PARCC) Student learning rating Professional practice rating Professional responsibilities rating + + Final evaluation rating Category 2: Student growth on standardized district-wide tests (e.g., NWEA, AP exams, Stanford- 10, ACCESS, etc.) Category 3: Other local school-, administrator-, or teacher- selected measures of student performance The student learning rating is determined by a combination of different sources of evidence of student learning. These sources fall into three categories: 60

61 Rhode Island Model: Student Learning Group Guiding Principles “Not all teachers’ impact on student learning will be measured by the same mix of assessments, and the mix of assessments used for any given teacher group may vary from year to year.” Teacher A (5 th grade English) Teacher B (11 th grade English) Teacher C (middle school art) Category 1 (growth on NECAP) Category 2 (e.g., growth on NWEA) Category 3 (e.g., principal review of student work over a six month span) Teacher A’s student learning rating + += Category 2 (e.g., AP English exam) Category 3 (e.g., joint review of critical essay portfolio) Teacher B’s student learning rating += 61 Category 3 (e.g., joint review of art portfolio) This teacher may use several category 3 assessments Teacher C’s student learning rating =

62 Washington DC’s IMPACT: Teacher Groups  Group 1: general ed teachers for whom value-added data can be generated  Group 2: general ed teachers for whom value-added data cannot be generated  Group 3: special education teachers  Group 4: non-itinerant English Language Learner (ELL) teachers and bilingual teachers  Group 5: itinerant ELL teachers  Etc… 62

63 Score comparison for Groups 1 & 2 Group 1 (tested subjects) Group 2 (non- tested subjects Teacher value-added (based on test scores) 50%0% Teacher-assessed student achievement (based on non-VAM assessments) 0%10% Teacher and Learning Framework (observations) 35%75% Commitment to School Community 10% School Wide Value-Added 5% 63

64 Non-VAM tests (accepted under Washington, DC’s IMPACT evaluation system)  DC Benchmark Assessment System (DC BAS)  Dynamic Indicators of Basic Early Literacy Skills (DIBELS)  Developmental Reading Assessment (DRA)  Curriculum-based assessments (e.g., Everyday Mathematics)  Unit tests from DCPS-approved textbooks  Off-the-shelf standardized assessments that are aligned to the DCPS Content Standards  Rigorous teacher-created assessments that are aligned to the DCPS Content Standards  Rigorous portfolios of student work that are aligned to the DCPS Content Standards 64

65 Delaware Model  Standardized test will be used as part of teachers’ scores in some grades/subjects  “Group alike” teachers, meeting with facilitators, determine which assessments, rubrics, processes can be used in their subjects/grades (multiple measures)  Assessments must focus on standards, be given in a “standardized” way, i.e., giving pre-test on same day, for same length of time, with same preparation  Teachers recommend assessments to the state for approval  Teachers/groups of teachers take primary responsibility for determining student growth  State will monitor how assessments are “working” 65

66 Final thoughts  We are not very good at predicting which sets of teacher qualifications, characteristics, and practices will result in the best student outcomes Once in the classroom, multiple measures of teacher performance and student outcomes can help determine effectiveness  There is not enough research yet to say which model and combination of measures will provide the most accurate and useful information about teacher effectiveness Focus on models and measures that may help districts, schools, and teachers improve performance 66

67 Models and measures should provide useful information about effectiveness  Those models that yield actionable information are most likely to contribute to improvements in teacher practice Standardized tests scores provide little information about how to change practice Teacher practice linked to multiple student outcomes is most actionable  Teachers benefit from knowing how their specific practices resulted in student learning  Thus, create opportunities for teachers to examine outcomes in light of practice 67

68 Observation instruments Charlotte Danielson’s Framework for Teaching CLASS Kim Marshall Rubric 20Teacher%20Eval%20Rubrics%20Jan% 68

69 Models Austin (Student Learning Objectives) Teacher Advancement Program Washington DC IMPACT Guidebooks ccess/IMPACT+(Performance+Assessment)/IMPACT+Guidebooks Rhode Island Model rking%20Group% Delaware Model 69

70 References Braun, H., Chudowsky, N., & Koenig, J. A. (2010). Getting value out of value-added: Report of a workshop. Washington, DC: National Academies Press. Finn, Chester. (July 12, 2010). Blog response to topic “Defining Effective Teachers.” National Journal Expert Blogs: Education. Goe, L. (2007). The link between teacher quality and student outcomes: A research synthesis. Washington, DC: National Comprehensive Center for Teacher Quality. Goe, L., Bell, C., & Little, O. (2008). Approaches to evaluating teacher effectiveness: A research synthesis. Washington, DC: National Comprehensive Center for Teacher Quality. Hassel, B. (Oct 30, 2009). How should states define teacher effectiveness? Presentation at the Center for American Progress, Washington, DC. performance/210-how-should-states-define-teacher-effectiveness 70

71 References (continued) Howes, C., Burchinal, M., Pianta, R., Bryant, D., Early, D., Clifford, R., et al. (2008). Ready to learn? Children's pre-academic achievement in pre-kindergarten programs. Early Childhood Research Quarterly, 23(1), 27-50. Kane, T. J., Taylor, E. S., Tyler, J. H., & Wooten, A. L. (2010). Identifying effective classroom practices using student achievement data. Cambridge, MA: National Bureau of Economic Research. Koedel, C., & Betts, J. R. (2009). Does student sorting invalidate value-added models of teacher effectiveness? An extended analysis of the Rothstein critique. Cambridge, MA: National Bureau of Economic Research. McCaffrey, D., Sass, T. R., Lockwood, J. R., & Mihaly, K. (2009). The intertemporal stability of teacher effect estimates. Education Finance and Policy, 4(4), 572-606. Pianta, R. C., Belsky, J., Houts, R., & Morrison, F. (2007). Opportunities to learn in America’s elementary classrooms. [Education Forum]. Science, 315, 1795-1796. 71

72 References (continued) Prince, C. D., Schuermann, P. J., Guthrie, J. W., Witham, P. J., Milanowski, A. T., & Thorn, C. A. (2006). The other 69 percent: Fairly rewarding the performance of teachers of non-tested subjects and grades. Washington, DC: U.S. Department of Education, Office of Elementary and Secondary Education. Race to the Top Application Rivkin, S. G., Hanushek, E. A., & Kain, J. F. (2005). Teachers, schools, and academic achievement. Econometrica, 73(2), 417 - 458. Sartain, L., Stoelinga, S. R., & Krone, E. (2010). Rethinking teacher evaluation: Findings from the first year of the Excellence in Teacher Project in Chicago public schools. Chicago, IL: Consortium on Chicago Public Schools Research at the University of Chicago. Schochet, P. Z., & Chiang, H. S. (2010). Error rates in measuring teacher and school performance based on student test score gains. Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. 72

73 References (continued) Redding, S., Langdon, J., Meyer, J., & Sheley, P. (2004). The effects of comprehensive parent engagement on student learning outcomes. Paper presented at the American Educational Research Association Tymms, P., Jones, P., Albone, S., & Henderson, B. (2009). The first seven years at school. Educational Assessment Evaluation and Accountability, 21(1), 67-80. Weisberg, D., Sexton, S., Mulhern, J., & Keeling, D. (2009). The widget effect: Our national failure to acknowledge and act on differences in teacher effectiveness. Brooklyn, NY: The New Teacher Project. Yoon, K. S., Duncan, T., Lee, S. W.-Y., Scarloss, B., & Shapley, K. L. (2007). Reviewing the evidence on how teacher professional development affects student achievement (No. REL 2007-No. 033). Washington, D.C.: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Southwest. 73

74 74 Questions?

75 75 Laura Goe, Ph.D. P: 609-734-1076 E-Mail: National Comprehensive Center for Teacher Quality 1100 17th Street NW, Suite 500 Washington, DC 20036-4632 877-322-8700 >

76 76

Download ppt " 1 Monthly Webinar Statewide Teacher Evaluation Systems January 11, 2011 State Consortium on Educator Effectiveness (SCEE)"

Similar presentations

Ads by Google