November 15, Regional Educational Laboratory - Southwest The Effects of Teacher Professional Development on Student Achievement: Finding from a Systematic Review of Evidence Presented by: Kwang Suk Yoon (AIR) Teresa Duncan (AIR) Sylvia Wen-Yu Lee (National Taiwan U) Kathy Shapley (Edvance Research, Inc.) American Education Research Association New York March 27, 2008
November 15, Teacher Professional Development One of key policy strategies for standards-based reform efforts No Child Left Behind Act Teacher Quality provisions “High-quality” professional development Expectations: PD activities are to be regularly evaluated for their impact on teacher effectiveness and improved student achievement Until recently, little systematic effort on vetting the PD effectiveness Background
November 15, Objective AIR completed a fast-turnaround study, sponsored by the Regional Education Laboratory Southwest (RELSW), which was funded by IES To conduct a systematic review of research-based evidence on the effects of teacher professional development (PD) on student achievement
November 15, Research-based Evidence on the effects of PD on student achievement Challenges in demonstrating research-based evidence Quality of professional development Workable theory of actions Logic model Sufficient implementation Quality of empirical study Valid causal inferences Rigorous study design
November 15, Overview of methodology Systematic review Using explicit & transparent methods Following a set of standards Being accountable, replicable, and updatable Review protocol Aligned with What Works Clearinghouse (WWC) evidence standards Study selection criteria Multi-stage, multi-coder review process Screening, coding & reconciliation Evidence Review Tool (ERT)
November 15, Study selection criteria Topic: Inservice teacher professional development Population: K-12 students and their teachers Subject: reading/English/language arts, mathematics, or science Study design Randomized controlled trial (RCT) Quasi-experimental design (QED) with matched comparison group Student achievement outcome Measures and their psychometric properties Time: Country: Australia, Canada, the United Kingdom, or the United States
November 15, Overview of the review process
November 15, Literature search & prescreening Literature searches Keyword searches on 7 major electronic databases Contacted key researchers Identified 1,343 citations potentially addressing the effects of PD on student achievement Prescreening Quickly scanning the abstracts for a few selection criteria (e.g., empirical study?) Narrowed down to 132 relevant studies
November 15, Reasons for failing Stage-1 screening criteria Of 132 relevant studies, about two-thirds failed in the “study design” criterion Stage-1 coding
November 15, Stage-2 Coding Determining the study quality ratings N = 27 relevant studies eligible for quality ratings 9 met WWC evidence standards 5 met evidence standards with reservations 4 met evidence standards without reservations 18 failed to meet the evidence standards RCT – randomization, attrition, disruption, etc. QED – baseline equivalence, attrition, disruption, etc.
November 15, Nine studies Carpenter et al., 1989 (RCT) Cole, 1992 (RCT) Duffy et al., 1986 (RCT) Marek & Methven, 1991 (QED) McCutchen et al., 2002 (QED) McGill-Franzen et al., 1999 (RCT) Saxe et al., 2001 (QED) Sloan, 1993 (RCT) Tienken, 2003 (RCT with group equivalence problems)
November 15, Documentation of studies Effect size Characteristics of PD Form Duration, contact hours Content – Kennedy’s (1998) classification Provider Participants (volunteers?) Information about the implementation of PD Stage-3 coding
November 15, Results (1) Paucity of rigorous studies Only 9 studies met evidence standards Mostly small-scale, underpowered efficacy trials Distribution of 9 studies By study design 5 RCT 4 QED By content area Concentrated in reading/English/language arts By grade level All focused on elementary school level
November 15, Results (2) Overall effect size Average of 20 effect sizes (drawn from 9 studies) =.54 Of 20 effect sizes, 12 were not statistically significant. Effects by subject area Fairly consistent across three subject areas Effects by form, duration, and intensity of PD Lack of variability in form: all workshop or summer institute Some evidence of the effect of intensive PD Effects by content of PD No consistent pattern Failed to replicate Kennedy’s (1998) finding
November 15, Conclusion There is some evidence of positive effects of PD on student achievement. Caveats The limited number of studies and the variability in their professional development approaches preclude detailed conclusions about the effectiveness of particular professional development programs or about the effectiveness of professional development by such features as form and content. The PD impact research is still at its developmental stage.
November 15, Suggestions (1) Doing more well-conducted efficacy trials, replications, and effectiveness trials Improving the design of PD impact studies Redressing common reasons for failing WWC evidence standards For example, lack of baseline equivalence in QED Increasing statistical power to detect effects Aligning outcome measures with PD Examining mediation effects Addressing potential confounding effects of PD with other important instructional factors (e.g., curriculum)
November 15, Suggestions (2) Adequate documentation of PD & study PD: theory of action, implementation Sample and cluster, if any Group assignment Baseline equivalence Effect size (ES) Use of structured abstract to facilitate research synthesis Structured abstract (Mosteller et al, 2004) Claim-based structured abstract (Kelly & Yin, 2007)
November 15, Contact info & links Kwang Suk Yoon ( ) Link to our RELSW report on the IES website id=70 Link to our presentation material Thank you!
November 15, Message A “stainless steel” law of systematic reviews may be operating—namely, “the more rigorous the review, the less evidence there will be that the intervention is effective.” (Peter Rossi)
November 15, Logic Model