Presentation on theme: "Scale Construction and Halo Effect in Secondary Student Ratings of Teacher Performance Ph.D. Dissertation Defense Eric Paul Rogers Department of Instructional."— Presentation transcript:
Scale Construction and Halo Effect in Secondary Student Ratings of Teacher Performance Ph.D. Dissertation Defense Eric Paul Rogers Department of Instructional Psychology and Technology Brigham Young University 30 June 2005
Background Teacher evaluation has been a frustrating process for teachers and administrators. Scholars have identified a variety of inadequacies in the practice of teacher evaluation. The use of rating scales in teacher evaluation has been the target of particularly severe criticism. Employing students as raters has been one of the most controversial aspects of rating scale use. Much of the criticism is justified given the poor design and implementation of rating scales in teacher evaluation.
Background It is proposed that these criticisms can be effectively mitigated by the careful design, implementation, and interpretation of student ratings of teacher performance. Efforts to address these criticisms are explored in this study with student ratings of teachers in religious education settings sponsored by the Church Educational System of the Church of Jesus Christ of Latter-day Saints. In addition, among the various threats to the validity of decisions based on ratings, halo effect is considered ubiquitous. This study employs various approaches to diagnosing halo effect and seeks to discover whether males and females exhibit differing degrees of halo in their ratings of teachers, a question not previously addressed in the research literature.
Overview 1.What are the key areas of teacher performance valued by CES administrators, teachers, and students? 2.In what ways do students conceptualize these areas of valued teacher performance? 3.To what degree do the items derived from student conceptualizations function to produce reliable ratings from which valid conclusions may be drawn about teacher performance? 4.In what ways should items and scales be revised to improve reliability and validity? 5.To what extent do male and female seminary students exhibit differing degrees of halo effect in their ratings of teachers? This study employs a combination of qualitative and quantitative research techniques to answer five specific research questions:
ResultsResearch Question 1 What are the key areas of teacher performance valued by CES administrators, teachers, and students? Teaches students the Gospel of Jesus Christ Teaches by the Spirit Teaches by example Establishes and maintains an appropriate setting Helps students accept responsibility for gospel learning Effectively decides what to teach Effectively decides how to teach Effectively uses scripture study skills Effectively uses teaching skills Relates well with students Prepares young people for effective church service Has high expectations for students An effective CES teacher:
In what ways do students conceptualize these areas of valued teacher performance? Examples of responses elicited during student focus group interviews: Teaches students the Gospel of Jesus Christ They teach from the scriptures. They teach less opinion and more doctrine. They avoid expressing personal opinions. They recognize their own opinions. They teach what the prophets teach. ResultsResearch Question 2
To what degree do the items derived from student conceptualizations function to produce reliable ratings from which valid conclusions may be drawn about teacher performance? ResultsResearch Question 3 Student-Teacher Rapport Scale (STRS) Scripture Mastery Expectation Scale (SMES) Spiritual Learning Environment Scale (SLES) Although twelve scales were originally developed, only three scales are defensible based on established psychometric standards:
In what ways might items and scales be revised to improve reliability and validity? ResultsResearch Question 4 Semantic changes to improve item performance Unidimensionality Improved factor loadings Reduced/eliminated secondary factor loadings Improved fit statistics Local item independence Reduced/eliminated error correlations Response category changes Better alignment of item difficulties with person measures Better tap the upper end of the scales
To what extent do male and female seminary students exhibit differing degrees of halo effect in their ratings of teachers? ResultsResearch Question 5 Traditional approaches to halo diagnosis suggest that males are more likely to exhibit halo than females. The results of the Rasch model approaches to halo diagnosis are mixed, but also suggest that males are more likely to exhibit halo than females.
Study Limitations Representativeness Generalizability Limited variability versus halo effect
Instructional Design Conclusions about the effectiveness of instruction, whatever the setting or the instructional design model, are based upon evidence that objectives have been achieved. Despite the criticism of some scholars, when carefully designed, developed, and implemented, rating scales provide a basis on which to make valid judgments about instruction and the design models upon which the instruction is based. Threats to the validity of conclusions about instructional interventions abound. Instructional designers should be aware of these threats and take appropriate steps to diagnose and mitigate them as they assess. This study notes the requirements of fundamental measurement when applying statistical analyses to assessment data and highlights the influence of halo effect upon ratings.
Study Contributions This study provides a developmental framework for scale construction that integrates Classical Test Theory, Item Response Theory, and factor analytic techniques in a way that leads to defensibly reliable data from which valid conclusions may be drawn. It also establishes a firm basis for three scales that measure traits of importance to CES that meet widely acceptable psychometric standards. Finally, this study provides evidence that males exhibit halo to a greater degree than do females among secondary students on the traits examined. Although not generalizable to other traits or other instructional settings, it raises a caution about drawing conclusions about teachers from ratings produced by differing gender distributions.
Future Research Do the twelve scales developed in this study function as desired with more mature raters (e.g., adult students, peer teachers, supervisors, trainers)? How does improved scale function impact the diagnosis of halo effect and the apparent gender-based differences revealed in this study? How can researchers meaningfully differentiate between restricted variability and halo effect?