Presentation on theme: "Teacher Evaluation: Issues of Validity and Reliability ASSESSMENT SRIG BIENNIAL MEETING MARCH 30, 2012 NAfME NATIONAL CONFERENCE 3:45PM-5:45PM GRAND B."— Presentation transcript:
Teacher Evaluation: Issues of Validity and Reliability ASSESSMENT SRIG BIENNIAL MEETING MARCH 30, 2012 NAfME NATIONAL CONFERENCE 3:45PM-5:45PM GRAND B TIMOTHY S. BROPHY, CHAIR KELLY PARKES, INCOMING CHAIR
3:45pm. Greeting and Welcome; Election results. Timothy S. Brophy, Chair 3:55pm. Program begins: Teacher Evaluations – Issues of Validity and Reliability Timothy S. Brophy and Richard Colwell. Teacher Evaluation: Issues of Validity and Reliability. 4:20pm Dru Davison, Memphis City Schools. The Tennessee Fine Arts Pilot: A Multiple Measures Portfolio System (Perform, Create, Respond, Connect) with Blind Peer Review. Electronic presentation. 4:40pm Keitha Lucas Hamann, U. Minnesota-Twin Cities, and Doug Orzolek, University of St. Thomas. Teacher Performance Assessment in Minnesota: Challenges for Music Educators. 5:05pm Breakout groups – Strategies for Measuring Student Growth in Music 5:30pm Leaders report 5:40pm Announcements of upcoming events. Closing remarks by Kelly Parkes, Incoming Chair TODAY’S PROGRAM
TEACHER EVALUATIONS: ISSUES OF VALIDITY AND RELIABILITY TIMOTHY S. BROPHY, UNIVERSITY OF FLORIDA RICHARD COLWELL, PROFESSOR EMERITUS, UNIVERSITY OF ILLINOIS NAfME CONFERENCE ASSESSMENT SRIG MEETING MARCH 30, 2012
The Context for The Reform of Teacher Evaluation The Problem: Determining Music Teacher Effectiveness Validity and Reliability Issues Challenges to the SRIG SESSION OVERVIEW
Achieving Equity in Teacher Distribution The State will take actions to improve teacher effectiveness and comply with section 1111(b)(8)(C) of the ESEA (20 U.S.C. 6311(b)(8)(C)) in order to address inequities in the distribution of highly qualified teachers between high- and low-poverty schools, and to ensure that low-income and minority children are not taught at higher rates than other children by inexperienced, unqualified, or out-of-field teachers. (H.R.1, p. 169) THE POLITICAL CONTEXT: THE AMERICAN RECOVERY AND REINVESTMENT ACT (2009)
RTTT Phase 2 defines teacher evaluation: States, LEAs, or schools must include multiple measures, provided that teacher effectiveness is evaluated, in significant part, by student growth (as defined in this notice). Supplemental measures may include, for example, multiple observation-based assessments of teacher performance. (p. 19499) THE POLITICAL CONTEXT: RACE TO THE TOP PHASE 2 - CFDA NUMBER: 84.395A (2010)
Student achievement means: (b) For non-tested grades and subjects: alternative measures of student learning and performance such as student scores on pre-tests and end-of-course tests; student performance on English language proficiency assessments; and other measures of student achievement that are rigorous and comparable across classrooms. (p. 19500) Student growth means the change in student achievement for an individual student between two or more points in time. A State may also include other measures that are rigorous and comparable across classrooms. (p. 19500) Source: Federal Register/Vol. 75, No. 71/Wednesday, April 14, 2010/ Notices THE POLITICAL CONTEXT: RACE TO THE TOP PHASE 2
35-50% student achievement 50-65% observations or other methods Teacher evaluation and “effectiveness” determination THE NEW “EVALUATION EQUATION”
RTTT defines effective teachers in very specific terms. We need to be able to know what it means for music teachers to be: “Effective” – when students achieve at “acceptable rates” – at least one grade level in an academic year “Highly effective” – when her/his students achieve at “high rates” – for example, 1.5 grade levels in an academic year A BIG QUESTION: What is a “year’s growth” in music education? How do we find out? MUSIC TEACHER EFFECTIVENESS
THE “ELEPHANT IN THE LIVING ROOM” - GROWTH IN MUSIC What do we need to measure “one grade level” of growth in music? Rigorous, standards- based grade level music curriculum on all standards Clear, consistent grade-level expectations Valid, reliable assessments Comparability across schools, districts, and states
Student music learning = student achievement in RTTT Assessment must be done well or not at all NAEP is one reference for validity and reliability NAfME continues to advocate for the arts as a core subject. Question: if music is a core subject, how do we define it? What is assessed? The 2008 NAEP analysis omitted validity, reliability, item analysis, regressions, factor analysis and other test characteristics NAEP analysis was concerned with demographic and SES related characteristics – race, gender, free and reduced lunch, community and school type, etc. PART 1 OF THE EQUATION: VALID AND RELIABLE ASSESSMENTS OF STUDENT MUSIC LEARNING
Classroom Observation Principal Evaluation Instructional Artifact Portfolio Teacher Self-Report Student Survey Value-Added Model PART 2 OF THE EQUATION: “OTHER MEASURES” STRENGTHS AND CAUTIONS Source: Goe, Holdheide, & Miller (2011). A practical guide to designing comprehensive teacher evaluation systems. National Comprehensive Center for Teacher Quality: Washington, DC.
To what extent do changes in a student’s performance reflect actual changes in his or her understanding of the underlying content? When student test scores are used to estimate teaching effectiveness, what is the extent to which those estimates accurately represent the teacher’s contribution to student learning? What evidence do we have regarding various threats to the validity of inferences for a particular use of a measure? How do we attribute student performance to individual teachers when the assessments are intended to cover material from multiple courses? Source: Steel, Hamilton, & Stecher (2010). Incorporating student performance measures into teacher evaluation systems. Santa Monica, CA: Rand Corporation. GENERAL VALIDITY ISSUES - USING STUDENT LEARNING MEASURES IN TEACHER EVALUATIONS
Observations and Evaluative tools MUST be implemented by trained personnel who are content experts in music education “Other measures” used MUST be valid for music teachers and account for the variables unique to music education Student music achievement MUST be measured using valid, reliable instruments Student achievement data used for music teacher evaluation MUST be from music assessments, not an arbitrary attribution of the effect of the music teacher on scores for the “usual tested subjects” of math, reading, science, and writing VALIDITY ISSUES FOR MUSIC TEACHER EVALUATION
Common approach: internal consistency reliability, which expresses the extent to which items on the test measure the same underlying construct Measures of internal consistency reliability do not take into account interrater reliability in the scoring of any open-response items that tests may include, and they also do not measure the reliability of the value-added estimates themselves. Interrater reliability is an important consideration in the case of items that are assessed by human scorers because one wants to minimize the extent to which an individual’s score on the assessment is dependent on the idiosyncrasies of the rater who happens to score it. Reliability of value-added estimates is an important consideration because, due to random classroom- and student-level error, value-added estimates are known to be unstable from year to year. Source: Steel, Hamilton, & Stecher (2010). Incorporating student performance measures into teacher evaluation systems. Santa Monica, CA: Rand Corporation. GENERAL RELIABILITY ISSUES
Clearly defining “open- ended” responses in music – prepared performance, on- demand performance, composition, improvisation, arrangement, etc. Expert rubric development and training of scorers Norming/calibration of rubrics used for open- ended responses Thorough item analysis for all item types RELIABILITY NEEDS FOR MUSIC TEACHER EVALUATION: STUDENT MUSIC ACHIEVEMENT
Readily available analysis techniques allow us to obtain sophisticated item analysis data for music items Item Response Theory models should become the standard analysis approach 3 parameter models for dichotomous items which measure difficulty and discrimination while controlling for guessing Polytomous generalized rating scale models extend IRT theory to the analysis of rubric-based assessments (i.e. Samejima’s graded response model) Easy software programs such as XCalibre4™ make these complex calculations accessible Frank Baker’s classic book, Basics of Item response theory, is now a free ERIC document DEVELOPING ASSESSMENT RELIABILITY AND VALIDITY: ITEM ANALYSIS
Prince et al (2009) The Other 69 Percent: “Identifying highly effective teachers of subjects that are not tested with standardized achievement tests — such as teachers of art, music, physical education, vocational education, and foreign languages — requires a different approach.” (p. 5) “It is easy to believe that we can assess whether students read well or solve math problems well or understand social studies or science, but it is much more difficult to imagine how to assess whether students properly understand a subject such as art. Until we can agree on what constitutes effective teacher performance, it will be difficult to measure it and reward it.” (p. 6) TEACHER EFFECTIVENESS A CALL FOR ACTION IN MUSIC EDUCATION
What is an effective music teacher? What is a highly effective music teacher? How do we measure music teacher effectiveness? How do we evaluate music teacher effectiveness? MUSIC TEACHER EFFECTIVENESS QUESTIONS FOR OUR PROFESSION
FIRST AND FOREMOST: We must lead the profession to develop technically sound, valid, reliable, assessments of student music learning in every state, that are thoroughly analyzed for validity, reliability, DIF, and item characteristics A process or model of assessment development for states and districts In cooperation with SMTE, collect and evaluate the validity and reliability of music teacher evaluation systems in NAfME states Design and implement studies to develop empirically supported criteria for music teacher evaluation, use these to develop music teacher evaluation models, and assess their validity and reliability CHALLENGE TO THE SRIG: EVALUATION RESEARCH NEEDS
THE “EVALUATION DILEMMA” “Solutions to the evaluation dilemma are as complex as the issue itself. The evaluation of music teachers remains an area in need of relevant research, and the development of an appropriate evaluation and observation instrument must be urgently addressed. It is now the responsibility of the united music teaching profession, in tandem with active music education researchers, to address this challenge.” Source: Brophy (1993) Evaluation of music educators: Toward defining an appropriate instrument.
Your consent to our cookies if you continue to use this website.