Item Response Theory in the Secondary Classroom: What Rasch Modeling Can Reveal About Teachers, Students, and Tests. T. Jared Robinson tjaredrobinson.com.

Slides:



Advertisements
Similar presentations
Writing constructed response items
Advertisements

THE MAP TEST -or- “You can attach a meaningless number to anything. Even a child.”
Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
Introduction to Creating a Balanced Assessment System Presented by: Illinois State Board of Education.
How good are our measurements? The last three lectures were concerned with some basics of psychological measurement: What does it mean to quantify a psychological.
Item Response Theory in a Multi-level Framework Saralyn Miller Meg Oliphint EDU 7309.
LOGO One of the easiest to use Software: Winsteps
California Assessment of Student Performance and Progress
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
Item Response Theory in Health Measurement
Models for Measuring. What do the models have in common? They are all cases of a general model. How are people responding? What are your intentions in.
Research Methods in MIS
Reliability and Validity. Criteria of Measurement Quality How do we judge the relative success (or failure) in measuring various concepts? How do we judge.
Classical Test Theory By ____________________. What is CCT?
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Internal Consistency Reliability Analysis PowerPoint.
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.
        Analysis of Preschool Assessment Data Desired Results Development Profile Preschool © DRDP – PS (2010)       Ifthika “Shine” Nissar, M.A.
Measurement and Data Quality
Technical Adequacy Session One Part Three.
TEA Science Workshop #3 October 1, 2012 Kim Lott Utah State University.
Measuring Mathematical Knowledge for Teaching: Measurement and Modeling Issues in Constructing and Using Teacher Assessments DeAnn Huinker, Daniel A. Sass,
The New Studies of Religion Syllabus Implementation Package: Session Two.
Introduction: Philosophy:NaturalismDesign:Core designAssessment Policy:Formative + Summative Curriculum Studied:Oxford Curriculum, Text book board Peshawar,
Assessment in Education Patricia O’Sullivan Office of Educational Development UAMS.
ELA & Math Scale Scores Steven Katz, Director of State Assessment Dr. Zach Warner, State Psychometrician.
Assessment Whittney Smith, Ed.D.. “Physical vs. Autopsy” Formative: Ongoing, varied assessment used as a tool for learning and diagnosing Summative:
Grading and Analysis Report For Clinical Portfolio 1.
Analyzing Statistical Inferences How to Not Know Null.
Educator’s view of the assessment tool. Contents Getting started Getting around – creating assessments – assigning assessments – marking assessments Interpreting.
Lecture by: Chris Ross Chapter 7: Teacher-Designed Strategies.
8th Grade Criterion-Referenced Math Test By Becky Brandl, Shirley Mills and Deb Romanek.
University of Ostrava Czech republic 26-31, March, 2012.
Criteria for selection of a data collection instrument. 1.Practicality of the instrument: -Concerns its cost and appropriateness for the study population.
Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.
Performance Task Overview Introduction This training module answers the following questions: –What is a performance task? –What is a Classroom Activity?
Reliability: Introduction. Reliability Session Definitions & Basic Concepts of Reliability Theoretical Approaches Empirical Assessments of Reliability.
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Heriot Watt University 12th February 2003.
Chapter 6 - Standardized Measurement and Assessment
Writing A Review Sources Preliminary Primary Secondary.
2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)
 The introduction of the new assessment framework in line with the new curriculum now that levels have gone.  Help parents understand how their children.
FSM NSTT Teaching Competency Test Evaluation. The NSTT Teaching Competency differs from the three other NSTT tests. It is accompanied by a Preparation.
Slide Slide 1 Chapter 10 Correlation and Regression 10-1 Overview 10-2 Correlation 10-3 Regression 10-4 Variation and Prediction Intervals 10-5 Multiple.
Assessment and the Institutional Environment Context Institutiona l Mission vision and values Intended learning and Educational Experiences Impact Educational.
Reliability. Basics of test score theory Each person has a true score that would be obtained if there were no errors in measurement. However, measuring.
Reliability and Validity
Understanding Your PSAT/NMSQT Results
How learners learn in my teaching world…
Data Analysis and Standard Setting
for Teaching and Learning (Version 2)
Item Analysis: Classical and Beyond
UNDERSTANDING YOUR PSAT/NMSQT RESULTS
UNDERSTANDING YOUR PSAT/NMSQT RESULTS
UNDERSTANDING YOUR PSAT/NMSQT RESULTS
Understanding Your PSAT/NMSQT Results
UNDERSTANDING YOUR PSAT/NMSQT RESULTS
By ____________________
Understanding Your PSAT/NMSQT Results
UNDERSTANDING YOUR PSAT/NMSQT RESULTS
UNDERSTANDING YOUR PSAT/NMSQT RESULTS
Understanding Your PSAT/NMSQT Results
Understanding Your PSAT/NMSQT Results
Validity and Reliability II: The Basics
Understanding Your PSAT/NMSQT Results
Understanding Your PSAT/NMSQT Results
UNDERSTANDING YOUR PSAT/NMSQT RESULTS
Item Analysis: Classical and Beyond
Item Analysis: Classical and Beyond
Understanding Your PSAT/NMSQT Results
UNDERSTANDING YOUR PSAT/NMSQT RESULTS
Presentation transcript:

Item Response Theory in the Secondary Classroom: What Rasch Modeling Can Reveal About Teachers, Students, and Tests. T. Jared Robinson tjaredrobinson.com David O. McKay School of Education Brigham Young University NRMERA, 2012, Park City, UT

Purpose My purpose is to show how Rasch modeling can be applied in certain secondary education situations, and how teachers, students, and tests might benefit. This case study examines a high school biology exam using the Rasch model in order to demonstrate some possible implications of item response theory (IRT) in a secondary setting. Provide a very brief and basic introduction to IRT/Rasch modeling

Context Brookhart (2003)—Measurement theory developed for large-scale assessments not appropriate for classroom assessment. McMillan (2003)—Measurement specialists need to adapt to be more relevant to classroom assessment. Smith (2003)—traditional notions of reliability not appropriate for classrooms. Plake (1993), Stiggins (1991, 1995) —Teachers are empirically under-trained in assessment and testing. Newfields (2006)—Still important for teachers to develop assessment literacy. Rudner & Schafer (2002)—Teachers need to understand reliability and validity now more than ever.

Some of my assumptions While many types of classroom assessment defy application of measurement theory, teachers still use summative assessment in classroom settings. To the extent that thinking about such assessments in terms of measurement theory provides utility for teachers and students, it should be explored.

How big of an N is big enough? Source: John Michael Linacre,

Case Study Design This study used data from a biology test given to sophomores at a suburban high school in the mountain west. The test consisted of 35 multiple choice and true/false questions. The study analyzed data for 115 students from four different sections of a biology class all taught by the same teacher. A Rasch analysis of the data using the WINSTEPS software. Results were used to inform strengths and weaknesses of the test, as well as the general knowledge of students.

Classical Test Theory Reliability sum of variance terms3.06 sum of covariance terms3.78 sum of all terms6.84 correction factor1.03 coefficient alpha0.57

Basics of IRT/Rasch Modeling IRT/Rasch modelling has several advantages over Classical Test Theory. One is that we get much more information about how each individual item interacts with students as a function of their ability. Instead of reporting student ability scores on a percent scale of 0-100, they report scores on a logit scale that has a center point of 0 with most scores ranging from -3 to +3 (although for your test, you have students above 5). Students with positive logit scores are more able than average, and students with negative logit scores are less able than average.

Scalogram

What Rasch modeling can teach teachers about their tests One useful thing about IRT is that the item difficulty estimates are also computed on the logit scale. Thus, we can easily compare the items difficulty with student ability, like in the chart on the next slide.

What does this mean? WINSTEPS uses the mode or middle questions in terms of question difficulty to center 0 on the logit scale. This table visually demonstrates that most of the questions are much easier than these students are able. Students like this because it means that they get a good grade on the test. But this is not a good situation from a measurement perspective. A test with the pattern like the one above cannot really distinguish with any reliability the differences between the ability levels of most of the students.

Test Information Function

What does this mean? This graph illustrates that this test will give you a lot of information about students with an ability score between about -2 and +2 with the amount of information you get about students dropping off sharply after that. In areas of the graph where information is high, there is a low error in measuring student scores. In areas where information is low, there is a lot of error in estimating student scores.

What does this tell us about this test? This lack of matching between student ability and item difficulty leads to low score reliability. In this case, the reliability for the estimates of student ability is just.34. You want it to be much closer to.90 or even higher. For example, 18 students out of the 115 got a score of 32/35, or 91%. In reality, these estimates are pretty rough, because we don’t have any questions that are at that difficulty level. Those students are probably not identical in ability or knowledge, but the test is designed in a way that makes so that we can’t really know their ability with any kind of precision.

Limitations Evidence of multi-dimensionality, violating some key assumptions of Cronbach’s alpha and Rasch Measurement Only looking at one limited case ▫Difficulty level gap might be non-representative ▫Rasch modeling might be less appropriate in other schools with different testing procedures ▫Only useful to the extent that is plausible for teachers to get access to and understand

Conclusions This case is one example of where Rasch modeling has utility in understanding a test, and the students who took the test. Rasch software presents visual interpretation tools that may be easier to interpret for teachers than traditional reliability concepts. In instances where teachers teach multiple sections of one subject, or where assessments are common across teachers, Rasch modeling can be used to produce stable estimates in secondary settings.