Presentation on theme: "1 ITEM RESPONSE THEORY MODELS & COMPOSITE MEASURES Sharon-Lise T. Normand FOCUS: How to deal with data that are dichotomous or ordinal? s index subject."— Presentation transcript:
1 ITEM RESPONSE THEORY MODELS & COMPOSITE MEASURES Sharon-Lise T. Normand FOCUS: How to deal with data that are dichotomous or ordinal? s index subject j index item (or measure) s = true unobserved score.
2 WHAT IS AN ITEM RESPONSE THEORY (IRT) MODEL? A statistical model that relates the probability of response to an item to item-specific parameters and to the subjects underlying latent trait.
3 Classical Test Theory Estimate reliability of items (coefficient ). Model: Y sj = s + sj Y sj = response s = underlying trait sj = error Normal with expectation 0 and constant variance. Item Response Theory Estimate discriminating ability of items using item-specific parameters. Responses within a subject are independent conditional on latent trait. Normality & constant variance not assumed s ~ N(0,1)
4 DICHOTOMOUS OR ORDINAL RESPONSES Item response formulation: Observed response is y sj ; generalized linear model formulation: h(P(y sj = 1 given s )) = j ( s - j ) h = link function (logit or probit) j and j are item parameters.
5 RASCH MODEL (1-PARAMETER LOGISTIC) Simplest IRT Model Y sj = 1 if subject s responds correctly to item j and 0 otherwise. s = latent ability for subject s. j = difficulty of j th item. Probability subject s responds correctly j th item: P(Y sj =1| s ) = exp( s - j ) 1+exp( s - j )
6 RASCH MODEL: 3 SUBJECTS WITH DIFFERENT TRAITS = DIFFICULTY
7 2-PARAMETER LOGISTIC Y sj = 1 if subject s responds correctly to item j and 0 otherwise. s = latent ability for subject s. j = difficulty of j th item. j = discrimination of j th item ( j > 0) Probability subject s responds correctly j th item: P(Y sj =1| s ) = exp( j ( s - j ) 1+exp( j ( s - j ))
10 EXAMPLE 1: HOSPITAL QUALITY FOR HEART FAILURE (3376 US Hospitals; 2005) Measure Median # Eligible Patients [10 th ; 90 th ] Median% Compliant [10 th ; 90 th ] LVF assessment 200 [51;580] 89 [64; 98] ACE or ARB for LVSD 34 [5;120] 83 [60; 100] Smoking cessation advice 25 [1;98] 79 [40; 100] Discharge instructions 135 [0; 469] 55 [15; 87] Teixeira-Pinto and Normand – Statistics in Medicine (2008) LVF = left ventricular function; LVSD = left ventricular systolic dysfunction
11 EXAMPLE 1: Hospital Performance Y sj = no. of eligible cases in s th hospital getting treatment j. 0j = difficulty of the j th process measure. 1j = discriminating ability of the process measure. s = underlying quality of care for s th hospital.
13 Comparing Composites: (Teixeira-Pinto and Normand, Statistics in Medicine (2008)) 2005 Data
14 EXAMPLE 2: BASIS-32 Background. BASIS-32, an instrument to assess subjective distress was originally developed using classical testing theory based on a sample of psychiatric inpatients from one hospital. Data. Self-reports of symptom and problem difficulty obtained from 2,656 psychiatric inpatients discharged from 13 US hospitals between May 2001 and April 2002. (BASIS-32 = Behavior and Symptom Identification Scale) Normand, Belanger, Eisen – Health Services Outcomes Research Methodology (2006)
15 Provide the answer that best describes the degree of difficulty you have been experiencing in each area during the PAST WEEK. Managing day-to-day lifeBeing able to feel close to others Household responsibilitiesDepression, hopelessness Leisure time or recreational activities Controlling temper, outbursts of anger, violence Adjusting to major life stressesDrinking alcoholic beverages Relationships with family membersDeveloping independence Getting along with people outside family Lack of self-confidence, feeling bad about yourself Isolation or feelings of lonelinessManic, bizarre behavior Response Options: 0 = No difficulty; 1 = A little difficulty; 2 = Moderate difficulty; 3 = Quite a bit of difficulty; 4 = Extreme difficulty.
16 GRADED RESPONSE MODEL (IRT MODEL) When response options are ordinal categorical, e.g., Y sj = 0, 1, 2, 3, or 4 where 0 = No difficulty; 1 = A little difficulty; 2 = Moderate difficulty; 3 = Quite a bit of difficulty; 4 = Extreme difficulty Need to model probability of responding in each category.
17 GRADED RESPONSE MODEL Probability subject s responds in threshold category k or higher: P(Y sj k| s )= P jk * ( s ) = exp[ j ( s - jk )] 1 + exp[ j ( s - jk )] s = latent trait (e.g., subjective distress) j = discrimination of j th item ( j > 0) j4 j3 j2 j1 = threshold parameters
18 = 6.00 CUMULATIVE PROBABILITIES = 0.90 P 4 *( ) P 1 *( )
20 BASIS-10 IRT Parameter Estimates Item j j1 j2 j3 j4 I i ( ) Managing Life 2.87-1.03-0.360.251.062.08 Responsibilities at Home 2.51-0.80-0.220.371.131.63 Responsibilities Outside Home 2.44-0.79-0.250.301.061.53 Coping with Problems 3.13-1.33-0.68-0.130.662.33 Concentrating 3.15-0.96-0.330.160.972.41 Thinking Clearly 3.10-0.89-0.280.271.022.36 Sad or Depressed 1.99-1.58-0.66-0.120.841.08 Ending Your Life 1.40-0.160.751.302.220.51 Feel Nervous 1.97-1.16-0.270.261.131.08 Feel Afraid 1.92-0.87-0.030.521.341.01
21 Concluding Remarks: (Kaplan & Normand 2006) ProsCons Summarizes a large amount of information into a simpler measure May be difficult to interpret – what do the units mean? Facilitates provider rankingDifficult to validate Improves reliability of provider measure and thus reduces the number of individual quality measures that need to be collected Does not necessarily guide quality improvement; the individual quality measures are needed Fairer to providers – different ways to get good composite scores Quality information may be wasted or hidden in the composite measure Reduces the time frame over which quality is assessed by effectively increasing the sample size The weighting scheme to create the composite score may not be transparent (scoring) Not new or unique to health care: intelligence, aptitude, mental illness, and personality used for over a century; economics/business; education (student and teacher performance); clinical trials