Download presentation
Published byBenedict Reynolds Modified over 8 years ago
1
Item Response Theory Dan Mungas, Ph.D. Department of Neurology
University of California, Davis
2
What is it? Why should anyone care?
3
IRT Basics
4
Item Response Theory - What Is It
Modern approach to psychometric test development Mathematical measurement theory Associated numeric and computational methods Widely used in large scale educational, achievement, and aptitude testing More than 50 years of conceptual and methodological development
5
Item Response Theory - Methods
Dataset consists of rectangular table rows correspond to examinees columns correspond to items IRT applications simultaneously estimate examinee ability and item parameters iterative, maximum likelihood estimation algorithms processor intensive, no longer a problem
6
Basic Data Structure Subject Item1 Item2 Item3 Item4 S1 X11 X12 X13
7
Item Types Dichotomous Multiple Choice Polytomous
Information is greater for polytomous item than for the same item dichotomized at a cutpoint
8
What is the item level response
Smallest discrete unit (e.g. Object Naming) Sum of correct responses (trials in word list learning test) For practical reasons, continuous measures might have to be recoded into ordinal scales with reduced response categories (10, 15)
9
Item Response Theory - Basic Results
Item parameters difficulty discrimination correction for guessing most applicable for multiple choice items Subject Ability (in the psychometric sense) Capacity to successfully respond to test items (or propensity to respond in a certain direction) Net result of all genetic and environmental influences Measured by scales composed of homogenous items Item difficulty and subject ability are on the same scale
10
Item Characteristic Curves
11
Item Response Theory - Outcomes
Item-Level Results Item Characteristic Curve (ICC) non-linear function relating ability to probability of correct response to item Item Information Curve (IIC) non-linear function showing precision of measurement (reliability) at different ability points Both curves are defined by the item parameters
12
Item Characteristic Curves
13
Information Curves
15
Item Response Theory - Outcomes
Test-Level Results Test Characteristic Curve (TCC) non-linear function relating ability to expected total test score Test Information Curve (TIC) non-linear function showing precision of measurement (reliability) at different ability points Both sum of item level functions of included items
16
Test Characteristic Curve Mini-Mental State Examination
17
Information Curves
18
Item Response Theory - Fundamental Assumptions
Unidimensionality - items measure a homogenous, single domain Local independence - covariance among items is determined only by the latent dimension measured by the item set
19
IRT Models 1PL (Rasch) 2PL 3PL
Only Difficulty and Ability are estimated Discrimination is assumed to be equal across items 2PL Discrimination, Difficulty and Ability are estimated Guessing is assumed to not have an effect 3PL Discrimination, Difficulty, Guessing, and Ability are estimated (multiple choice items)
20
Item Response Theory - Invariance Properties
Invariance requires that basic assumptions are met Item parameters are invariant across different samples Within the range of overlap of distributions Distributions of samples can differ Ability estimates are invariant across different item sets Assumes that ability range of items spans ability range of subjects that is of interest
21
Why Do We Care - Applications of IRT in Health Care Settings
Refined scoring of tests Characterization of psychometric properties of existing tests Construction of new tests
22
Test Scoring IRT permits refined scoring of items that allows for differential weighting of items based on their item parameters
23
Physical Function Scale Hays, Morales & Reise (2000)
Item LIMITED LIMITED NOT LIMITED A LOT A LITTLE AT ALL Vigorous activities, running, Lifting heavy objects, Strenuous sports Climbing one flight Walking more than 1 mile Walking one block Bathing / dressing self Preparing meals / doing laundry Shopping Getting around inside home Feeding self
24
How to Score Test Simple approach: there are numbers that will be circled; total these up, and we have a score. But: should “limited a lot” for walking a mile receive the same weight as “limited a lot” in getting around inside the home? Should “limited a lot” for walking one block be twice as bad as “limited a little” for walking one block?
25
How IRT Can Help IRT provides us with a data-driven means of rational scoring for such measures Items that are more discriminating are given greater weight In practice, the simple sum score is often very good; improvement is at the margins
26
Description of Psychometric Properties
The Test Information Curve (TIC) shows reliability that continuously varies by ability Depicts ability levels associated with high and low reliability The standard error of measurement is directly related to information value (I(Q)) SEM(Q) = 1 / sqrt(I(Q)) SEM (Q) and I(Q) also have a direct correspondence to traditional r r (Q) = 1 - 1/ I(Q)
27
I(Q), SEM, r I(Q) SEM (s.d. units) r 1 1.00 0.00 2 0.71 0.50 4 0.75 9
0.33 0.89 12 0.29 0.92 16 0.25 0.94 25 0.20 0.96 36 0.17 0.97
28
TICs for English and Spanish language Versions of Two Scales
Mungas et al., 2004
29
Construction of New Scales
Items can be selected to create scales with desired measurement properties Can be used for prospective test development Can be used to create new scales from existing tests/item pools IRT will not overcome inadequate items
30
TICs from an Existing Global Cognition Scale and Re-Calibrated Existing Cognitive Tests
Mungas et al., 2003
31
Principles of Scale Construction
Information corresponds to assessment goals Broad and flat TIC for longitudinal change measure in population with heterogenous ability For selection or diagnostic test, peak at point of ability continuum where discrimination is most important But normal cognition spans a 4.0 s.d. range, and is even greater in demographically diverse populations
32
Other Issues In IRT Polytomous IRT models are available
Useful for ordinal (Likert) rating scales Each possible score of the item (minus 1) is treated like a separate item with a different difficulty parameter Information is greater for polytomous item than for the same item dichotomized at a cutpoint
33
Other Issues in IRT Applicable to broad range of content domains
IRT certainly applies to cognitive abilities Also applies to other health outcomes Quality of life Physical function Fatigue Depression Pain
34
Other Issues in IRT Differential Item Function - Test Bias
IRT provides explicit methods to evaluate and quantify the extent to which items and tests have different measurement properties in different groups e.g. racial and ethnic groups, linguistic groups, gender
35
English and Spanish Item Characteristic Curves for “Lamb/Cordero” Item
36
English and Spanish Item Characteristic Curves for “Stone/Piedra” Item
37
Differential Item Function (DIF)
DIF refers to systematic bias in measuring “true” ability - doesn’t address group differences in ability
38
Challenges/ Limitations of IRT
Large samples required for stable estimation for 1PL for 2PL for 3PL Analytic methods are labor intensive There are a number of (expensive *) applications readily available for IRT analyses Evaluation of basic assumptions, identification of appropriate model, and systematic IRT analysis require considerable expertise and labor * but, R!!
39
Computerized Adaptive Testing (CAT)
IRT based computer driven method Selects items that most closely match examinee’s ability Administers only items needed to achieve a pre-specified level of precision in measurement (information, s.e.m., reliability)
40
Why CAT Efficiency Administration - Scoring Standardization
Time efficiency Data collection Scoring Computer can implement complex scoring algorithms
41
CAT Example 1
42
CAT Example 2
43
Practical Considerations for CAT
44
What You Need for CAT Computer technology
Item Selection Item Administration Scale Scoring Item bank with IRT parameters Range of item difficulty relevant to measurement needs
45
What is Straightforward/Easy?
Dichotomous items Multiple choice items Ordered polytomous response scales Up to response options
46
Technical Challenges Continuous response scales (memory, timed tasks)
Can be recoded into smaller number of ordered response ranges Lose information
47
Methodological Challenges
Sample size requirements Minimally cases for stable estimation of item parameters Differential Item Function and Measurement Bias Essentially involves item calibration within groups of interest e.g., age, education, language, gender, race Available literature provides minimal guidance
48
References Hays, R. D., Morales, L. S., & Reise, S. P. (2000). Item response theory and health outcomes measurement in the 21st century. Med Care, 38(9 Suppl), II28-42. Mungas, D., Reed, B. R., & Kramer, J. H. (2003). Psychometrically matched measures of global cognition, memory, and executive function for assessment of cognitive decline in older persons. Neuropsychology, 17(3), Mungas, D., Reed, B. R., Crane, P. K., Haan, M. N., & González, H. (2004). Spanish and English Neuropsychological Assessment Scales (SENAS): Further development and psychometric characteristics. Psychological Assessment, 16(4),
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.