Presentation is loading. Please wait.

Presentation is loading. Please wait.

Application of Item Response Theory to PRO Development

Similar presentations


Presentation on theme: "Application of Item Response Theory to PRO Development"— Presentation transcript:

1 Application of Item Response Theory to PRO Development
Michael A. Kallen, PhD, MPH Associate Professor, Research Faculty Department of Medical Social Sciences Feinberg School of Medicine Northwestern University Chicago, Illinois

2 Outline The new measurement Constructing measures
Extending the usefulness of measurement

3 Outline The new measurement Constructing measures
Extending the usefulness of measurement

4 To see what’s new… The Lower Extremity Function Scale (LEFS)
from a “classical” perspective

5 LEFS ITEMS “Activities”
Extreme difficulty or unable to perform activity Quite a bit of difficulty Moderate difficulty A little bit of difficulty No difficulty 1. Perform any of your usual work, housework, or school activities 1 2 3 4 2. Perform your usual hobbies, recreational or sporting activities 3. Getting into or out of the bath 4. Walking between rooms 5. Putting on your shoes or socks 6. Squatting 7. Lifting an object, like a bag of groceries from the floor 8. Performing light activities around your home 9. Performing heavy activities around your home 10. Getting into or out of a car 11. Walking 2 blocks 12. Walking a mile 13. Going up or down 10 stairs (about 1 flight of stairs) 14. Standing for 1 hour 15. Sitting for 1 hour 16. Running on even ground 17. Running on uneven ground 18. Making sharp turns while running fast 19. Hopping 20. Rolling over in bed

6 Original LEFS Instructions
“We are interested in knowing whether you are having any difficulty at all with the activities listed below because of your lower limb problem for which you are currently seeking attention. Please provide an answer for each activity.” “Today, do you or would you have any difficulty at all with: (Circle one number on each line)”

7 Extreme difficulty or unable to perform activity
LEFS ITEMS “Activities” Extreme difficulty or unable to perform activity Quite a bit of difficulty Moderate difficulty A little bit of difficulty No difficulty 1. Perform any of your usual work, housework, or school activities 1 2 3 4 2. Perform your usual hobbies, recreational or sporting activities 3. Getting into or out of the bath 4. Walking between rooms 5. Putting on your shoes or socks 6. Squatting 7. Lifting an object, like a bag of groceries from the floor 8. Performing light activities around your home 9. Performing heavy activities around your home 10. Getting into or out of a car 11. Walking 2 blocks 12. Walking a mile 13. Going up or down 10 stairs (about 1 flight of stairs) 14. Standing for 1 hour 15. Sitting for 1 hour 16. Running on even ground 17. Running on uneven ground 18. Making sharp turns while running fast 19. Hopping 20. Rolling over in bed

8 Measurement products needed?
All patient individual item scores, so as to obtain a patient’s total score

9 The burden of measurement
For the healthcare provider/researcher: instrument administration score summation score interpretation For the patient: The actual time (and thoughtfulness) it requires to respond Being asked inappropriate questions

10 Extreme difficulty or unable to perform activity
LEFS ITEMS “Activities” Extreme difficulty or unable to perform activity Quite a bit of difficulty Moderate difficulty A little bit of difficulty No difficulty 1. Perform any of your usual work, housework, or school activities 1 2 3 4 2. Perform your usual hobbies, recreational or sporting activities 3. Getting into or out of the bath 4. Walking between rooms 5. Putting on your shoes or socks 6. Squatting 7. Lifting an object, like a bag of groceries from the floor 8. Performing light activities around your home 9. Performing heavy activities around your home 10. Getting into or out of a car 11. Walking 2 blocks 12. Walking a mile 13. Going up or down 10 stairs (about 1 flight of stairs) 14. Standing for 1 hour 15. Sitting for 1 hour 16. Running on even ground 17. Running on uneven ground 18. Making sharp turns while running fast 19. Hopping 20. Rolling over in bed

11 The cost of this burden? For the healthcare provider/researcher:
Perhaps a loss of willingness to use the instrument to collect data, score responses, or interpret a patient’s self-reported condition. For the patient: Perhaps a loss of willingness to respond in a focused, honest way to an instrument that seems unresponsive or even annoying.

12 Classical psychometrics
Its beginnings go back to the turn of the 20th century. Consider Spearman’s work in disattenuating the correlation coefficient. Demonstration of formulae for true measurement of correlation. American Journal of Psychology, 1907.

13 Psychometrics’ First “Golden Years”
The zenith of classical psychometrics arguably occurred during the time period surrounding the publication of Gulliksen’s book, Theory of Mental Tests (1950). This time period saw the development of many of the best and brightest aspects of classical psychometrics: e.g., 1) that measurements, not instruments, have psychometric properties; 2) the introduction of alpha reliability; 3) validity as the sine qua non of measurement.

14 Classical Test Theory (CTT)
CTT is a cornerstone of classical psychometrics. It is theory-based measurement: Individual scores are theory-defined as composed of a true score component and an error component. That is, observed score = true score + error

15 CTT as Theory Because it is theory, CTT or “true score theory” does not provide: hypotheses that are testable, or models that are falsifiable.

16 CTT and Circular Definitions
In CTT, item difficulty is defined as: the proportion of examines in a group of interest who answer an item correctly. Thus, an item being “hard” or “easy” depends on the ability of the group of examinees measured. As a result, it is a challenge to determine score meaning: Examinee and test characteristics are entangled; each can be interpreted only in the context of the other.

17 Scores depend on the difficulty of test items
Very Easy Test Person 1 8 Expected Score 8 Very Hard Test Person 1 8 Expected Score 0 Person Medium Test 1 8 Expected Score 5 Reprinted with permission from: Wright, B.D. & Stone, M. (1979) Best test design, Chicago: MESA Press, p. 5.

18 The New Measurement In the past 40-plus years, measurement has undergone a quiet evolution.

19 Modern Psychometrics In 1968 Lord and Novick’s book, Statistical Theories of Mental Tests, was published. In it, model-based measurement, a foundation of modern psychometrics, was introduced.

20 Item Response Theory (IRT)
IRT, a form of model-based measurement, now plays a fundamental role in modern psychometrics. It addresses CTT’s major shortcomings: By providing test-independent and group-independent measurement; by employing models that can be tested and falsified. If a proposed IRT model does NOT adequately explain the data at hand, it can be determined whether assumptions were met or an inappropriate model was used.

21 CTT vs. IRT CTT, from classical psychometrics, is
theory-based, sample and test-specific, and focuses on test performance. IRT, from modern psychometrics, is model-based, ability or functioning level-specific, and focuses on item performance.

22 Outline The new measurement Constructing measures
Extending the usefulness of measurement

23 IRT approach For any approach For the IRT approach
Getting the measure right helps in getting the measurement right For the IRT approach There have come to be accepted and recommended steps to take for getting a measure right

24 Adapteval-HIV Project (Yang, Kallen)
Goal: To develop and evaluate a set of HIV-specific item banks to support IRT-based CAT assessment Implementation on multiple technology platforms (web/phone/PDA)

25 Item Response Theory Person Latent Trait Item Location        
Poor Good Easy Hard Q Q Q Q Q Q Q Q Q Q Q Item Location

26 IRT assumptions Unidimensionality Monotonicity Local independence
Measure one “thing” only Monotonicity The “better” the trait status on a single scale item, the “better” the trait status on the overall scale Local independence Items are independent of each other statistically, after controlling for shared dimensionality

27 Monotonicity The “better” the trait status on a single scale item, the “better” the trait status on the overall scale

28 Local Independence Items are independent of each other statistically, after controlling for shared dimensionality Components of a scale item’s variance Required: shared variance, in common with the scale’s unidimensional factor Expected: some residual or noise/error variance Problem: If the residual variance of one item is correlated with that of another item, at some point the variance is no longer just noise

29 Advantages of IRT approach to measurement
Focus is on the item, not the scale Each item possesses trait estimation capacities Provides item- and group-independent measurement Not tied to sample or particular items used Makes computer adaptive testing a reality Accumulation of detailed knowledge about individual items and their functioning Customized item presentation reduces the number of patient responses needed to achieve measurements of similar quality

30 Adapteval-HIV: 14 item “sets”
Pain Fatigue Sleep Emotion Functioning Cognitive Functioning Self Care/Daily Living Physical Act./Leisure Life Satisfaction Body Image Physical Symptoms Sat w/ Medical Care Work/Employment Neg. Social Issues Pos. Social Exp.

31 Measure development process
Identify initial set of items (adapted from 9 existing instruments) (248 items) Panel (20 docs/nurses/patients/psychosocial) eval (226 items) Panel (+ input from 3 psychosocial researchers) evaluation (192 items) Pilot test: 50 patients Analysis of floor/ceiling, missing data, low SD (146 items) Primary data collection: 400 patients Primary psychometric analysis (107 items) IRT parameter calibration to implement CAT

32 Psychometric analyses: item exclusion/inclusion
Primary Criteria Missingness > 20% (2 items excluded) CFA factor loadings < .50 (16 excluded) Local dependence > .20 (5 excluded) Lack of monotonicity (9 excluded) Potential item bias (3 excluded) Secondary Criteria Multi-dimensionality (3 excluded) Failed IRT parameter convergence (2 excluded)

33 Project History Phase I Phase II 50 patients at Northwestern Web/phone
Prelim psychometric analyses (cognitive interviews, etc.) Phase II 225 each at Northwestern and UIC Web/phone/PDA Complete psychometric analyses

34 Unknown or Not Reported
Ethnicity/race and gender characteristics of primary data collection (NU + UIC) Ethnic Category Sex/Gender Females Males Unknown or Not Reported Total Hispanic or Latino 12 34 46 Not Hispanic or Latino 80 266 346 Unknown (individuals not reporting ethnicity) 1 7 8 Ethnic Category: Total of All Subjects 93 307 400 Racial Categories American Indian/Alaska Native 3 4 Asian 5 Native Hawaiian or Other Pacific Islander Black or African American 67 133 200 White 14 134 148 More Than One Race 2 13 15 16 24 Racial Categories: Total of All Subjects

35 Adapteval-HIV question distribution
Overall Total: Pool (107) Domain Pool Pain 4 Life Sat. 8 Fatigue 5 Body Img. Sleep Phy Symp 9 Emotional 23 Sat. Med. Cognitive Work 6 Self Care Neg. Soc. Phy. Act. 7 Pos. Soc.

36 Ceiling and Floor Effects
Aiming at < 20% Domain Sz Ceil Floor Pain 389 102 (26%) 0 (0%) Life Sat. 327 34 (10%) 8 (2%) Fatigue 393 49 (12%) 4 (1%) Body Img. 390 143 (37%) 3 (1%) Sleep 334 61 (18%) 5 (1%) Phy Sym 319 37 (12%) Emotion 382 Sat. Med. 362 151 (42%) Cognitive 392 88 (2%) Work 336 56 (17%) 11 (3%) Self Care 87 (24%) 17 (5%) Neg. Soc. 348 19 (5%) Phy. Act. 311 64 (21%) Pos. Soc. 333 35 (11%) *Entries in red denote the values falling within the expected ranges.

37 CFA Goodness of Fit (Standards for Unidimensionality)
Root Mean Square Error of Approximation (RMSEA) < 0.10 Comparative Fit Index (CFI) > 0.90 Standardized Root Mean Square Residual (SRMR) < 0.08 RMSEA CFI SRMR Pain 0.051 1.00 0.021 Life Sat. 0.95 0.99 0.040 Fatigue 0.060 0.020 Body Img 0.017 Sleep 0.088 0.032 Phy Symp 0.11 0.090 Emotion 0.12 0.96 0.092 Sat. Med. 0.15 0.98 Cognitive 0.099 0.048 Work 0.16 0.086 Self Care 0.093 0.036 Neg. Soc. 0.13 0.94 Phy. Act. 0.0 0.024 Pos. Soc. 0.10 0.054 *Entries in red denote the values falling within the expected ranges.

38 Scale Inter-Correlations
Inter-correlations: ranged from N=400 pain fatig sleep emot cog selfca phyact satlif body physym satmed work socneg 0.710 0.596 0.610 0.461 0.535 0.550 0.491 0.557 0.541 0.662 0.315 0.371 0.265 0.280 0.287 0.434 0.486 0.348 0.325 0.356 0.653 0.296 0.365 0.374 0.609 0.470 0.492 0.526 0.278 0.360 0.330 0.496 0.429 0.175 0.196 0.336 0.563 0.511 0.582 0.613 0.324 0.445 0.417 0.502 0.123 0.126 0.113 0.150 0.165 0.290 0.236 0.367 0.078 0.101 0.489 0.559 0.424 0.453 0.444 0.307 0.420 0.368 0.442 0.061 0.438 0.525 0.695 0.591 0.282 0.488 0.638 0.614 0.069 0.611 socpos 0.213 0.226 0.261 0.415 0.361 0.380 0.585 0.242 0.231 0.448 0.259 0.389 *Entries in red denote the values falling within the expected ranges (i.e., <0.90).

39 Internal Consistency Group comparison: Cronbach's alpha > 0.7
Individual comparison: Cronbach's alpha > 0.9 Pain .809 Life Sat. .915 Fatigue .907 Body Img. .777 Sleep .898 Phy. Symp. .818 Emotional .950 Sat. Med. .859 Cognitive .930 Work .866 Daily Living .899 Neg. Soc. .798 Phy. Act. .888 Pos. Soc. .885 *Entries in red denote the values falling within the expected ranges.

40 Computerized Adaptive Testing
1. Begin with initial trait estimate 2. Select & present optimal scale item 3. Record and score response No 5. Is stopping rule satisfied 4. Re-estimate trait Yes 7. End of battery No 6. End of assessment 8. Administer next scale item Yes Source: Wainer et al. (2000). Computerized Adaptive Testing: A Primer 2nd Ed. LEA. Mahwah N.J. 9. Stop

41 CAT performance: # of questions
Overall Total: Pool (107), Average (61.1) Domain Pool Avg/SD Mx/Mn Pain 4 3.1/1.0 4/2 Life Sat. 8 3.1/1.5 7/2 Fatigue 5 2.6/1.1 5/2 Body Img. 3.7/0.5 4/3 Sleep 3.3/1.1 Phy Symp 9 7.7/1.9 9/3 Emotional 23 4.6/2.6 11/2 Sat. Med. 4.3/1.1 Cognitive 4.9/2.5 9/2 Work 6 4.9/1.0 6/3 Self Care 3.9/1.4 6/2 Neg. Soc. 6.9/1.0 8/5 Phy. Act. 7 3.7/2.2 Pos. Soc. 4.4/1.8 8/3

42 Static (Full) vs. CAT/IRT
Correlation of assessment scores between full instrument and CAT/IRT implementation Body Image (4 items, avg. 3.7): Cognitive Functioning (9 / 4.9): Emotional Functioning (23 / 4.9): .87

43 Relief of measurement burden
For the healthcare provider/researcher, computerizing the test can relieve some of the burden. Use a scoring algorithm. Deliver score-specific interpretation. This could be accomplished witha computer administered test.

44 Relief for respondents?
Yes, if the measure is presented as a computer adaptive test (CAT), i.e., the test adapts itself or customizes itself according to the responses presented to it by an individual patient.

45 CATs and Item Structure
Item presentation order CTT: standardized All patients start at Item #1 and complete all items in order IRT: customized Patients are presented new items based on their responses to previous items IRT “logic” There is an underlying hierarchy of activities. Activities can be ordered (“calibrated”) from easiest to hardest.

46 Making sharp turns while running fast (18.) 62.8
CALIBRATIONS ITEMS 62.8 (hardest item) Making sharp turns while running fast (18.) 62.8 Running on uneven ground (17.) 60.7 Running on even ground (16.) 59.7 Hopping (19.) 53.6 Walking a mile (12.) 51.7 Performing your usual hobbies, recreational or sporting activities (2.) 51.6 Squatting (6.) 51.4 Standing for 1 hour (14.) 50.8 Performing heavy activities around your home (9.) 48.5 Going up or down 10 stairs (about 1 flight of stairs) (13.) 46.5 Performing any of your usual work, housework, or school activities (1.) 45.3 Walking 2 blocks (11.) 41.0 Lifting an object, like a bag of groceries from the floor (7.) 38.6 Getting into or out of a car (10.) 37.9 Getting into or out of the bath (3.) 37.6 Performing light activities around your home (8.) 36.1 Putting on your shoes or socks (5.) 32.1 (easiest item) Walking between rooms (4.)

47 CALIBRATION: 0-100 SCALE Making sharp turns while running fast
Running on uneven ground Running on even ground Hopping Walking a mile Performing your usual hobbies, recreational or sporting activities Squatting Standing for 1 hour Performing heavy activities around your home Going up or down 10 stairs (about 1 flight of stairs) Performing any of your usual work, housework, or school activities Walking 2 blocks Lifting an object, like a bag of groceries from the floor Getting into or out of a car Getting into or out of the bath Performing light activities around your home Putting on your shoes or socks Walking between rooms CALIBRATION: 0-100 SCALE

48 VIEW: 30-65 RANGE Making sharp turns while running fast
Running on uneven ground Running on even ground Hopping Walking a mile Performing your usual hobbies, recreational or sporting activities Squatting Standing for 1 hour Performing heavy activities around your home Going up or down 10 stairs (about 1 flight of stairs) Performing any of your usual work, housework, or school activities Walking 2 blocks Lifting an object, like a bag of groceries from the floor Getting into or out of a car Getting into or out of the bath Performing light activities around your home Putting on your shoes or socks Walking between rooms VIEW: 30-65 RANGE

49 CATs CATs have starting rules. And CATs have stopping rules.
LEFS CAT: Begin with an item of moderate difficulty. And CATs have stopping rules. LEFS CAT: When the SEM < 4 (score range: 0-100), or when the average score change for last 3 score estimates < 1, or when all LEFS items are completed.

50 CAT simulation: Focus on item selection
Item Pool: 18-items from the Lower Extremity Functioning Scale (LEFS)

51 ITEM PRESENTATION ORDER AND RESPONSE FROM LEFS EXERCISE
- 1. Performing any of your usual work, housework, or school activities (A little bit of difficulty) ITEM PRESENTATION ORDER AND RESPONSE FROM LEFS EXERCISE

52 ITEM PRESENTATION ORDER AND RESPONSE FROM LEFS EXERCISE
- 2. Walking a mile (Quite a bit) 1. Performing any of your usual work, housework, or school activities (A little bit of difficulty) ITEM PRESENTATION ORDER AND RESPONSE FROM LEFS EXERCISE

53 ITEM PRESENTATION ORDER AND RESPONSE FROM LEFS EXERCISE
- 3. Running on uneven ground (Extreme/unable) 2. Walking a mile (Quite a bit) 1. Performing any of your usual work, housework, or school activities (A little bit of difficulty) ITEM PRESENTATION ORDER AND RESPONSE FROM LEFS EXERCISE

54 ITEM PRESENTATION ORDER AND RESPONSE FROM LEFS EXERCISE
4. Making sharp turns while running fast (Extreme/unable) 3. Running on uneven ground (Extreme/unable) - 2. Walking a mile (Quite a bit) 1. Performing any of your usual work, housework, or school activities (A little bit of difficulty) ITEM PRESENTATION ORDER AND RESPONSE FROM LEFS EXERCISE

55 ITEM PRESENTATION ORDER AND RESPONSE FROM LEFS EXERCISE
4. Making sharp turns while running fast (Extreme/unable) 3. Running on uneven ground (Extreme/unable) 5. Running on even ground (Extreme/unable) - 2. Walking a mile (Quite a bit) 1. Performing any of your usual work, housework, or school activities (A little bit of difficulty) ITEM PRESENTATION ORDER AND RESPONSE FROM LEFS EXERCISE

56 ITEM PRESENTATION ORDER AND RESPONSE FROM LEFS EXERCISE
4. Making sharp turns while running fast (Extreme/unable) 3. Running on uneven ground (Extreme/unable) 5. Running on even ground (Extreme/unable) 6. Hopping (Quite a bit) 2. Walking a mile (Quite a bit) - 1. Performing any of your usual work, housework, or school activities (A little bit of difficulty) ITEM PRESENTATION ORDER AND RESPONSE FROM LEFS EXERCISE

57 ITEM PRESENTATION ORDER AND RESPONSE FROM LEFS EXERCISE
4. Making sharp turns while running fast (Extreme/unable) 3. Running on uneven ground (Extreme/unable) 5. Running on even ground (Extreme/unable) 6. Hopping (Quite a bit) 2. Walking a mile (Quite a bit) 7. Performing your usual hobbies, recreational or sporting activities (Moderate) - 1. Performing any of your usual work, housework, or school activities (A little bit of difficulty) ITEM PRESENTATION ORDER AND RESPONSE FROM LEFS EXERCISE

58 ITEM PRESENTATION ORDER AND RESPONSE FROM LEFS EXERCISE
4. Making sharp turns while running fast (Extreme/unable) 3. Running on uneven ground (Extreme/unable) 5. Running on even ground (Extreme/unable) 6. Hopping (Quite a bit) 2. Walking a mile (Quite a bit) 7. Performing your usual hobbies, recreational or sporting activities (Moderate) - 8. Performing heavy activities around your home (Moderate) 1. Performing any of your usual work, housework, or school activities (A little bit of difficulty) ITEM PRESENTATION ORDER AND RESPONSE FROM LEFS EXERCISE

59 The potential to relieve respondent burden
Only 8 individual LEFS items required responses. As many as 10 potentially inappropriate items did NOT require responses. This patient’s LEFS score is 46.02, the activity level at which the individual can function with little or no difficulty.

60 CAT simulation: Focus on precision
Item Pool: 23-items of the Modified Roland-Morris Low Back Pain Disability Questionnaire

61 Trait Continuum Low High 2.0 1.0 0.0 -1.0 -2.0 -3.0 Back Disability

62 Q1: I find it difficult to get out of a chair because of my back
A: Yes SEM = 1.4 2.0 1.0 0.0 -1.0 -2.0 -3.0

63 Q2: I stay at home most of the time.
A: No SEM = 0.67 2.0 1.0 0.0 -1.0 -2.0 -3.0

64 Q3: I can only walk short distances. A: Yes
SEM = 0.59 2.0 1.0 0.0 -1.0 -2.0 -3.0

65 Q4: I use a handrail to get up stairs.
A: Yes SEM = 0.55 2.0 1.0 0.0 -1.0 -2.0 -3.0

66 Q5: I’m not doing jobs that I usually do around the house.
A: No SEM = 0.47 2.0 1.0 0.0 -1.0 -2.0 -3.0

67 Back Disability Low High Yes No Yes Yes No 2.0 1.0 0.0 -1.0 -2.0 -3.0

68 2.0 1.0 0.0 -1.0 -2.0 -3.0 2.0 1.0 0.0 -1.0 -2.0 -3.0 2.0 1.0 0.0 -1.0 -2.0 -3.0 2.0 1.0 0.0 -1.0 -2.0 -3.0 -3.0 -2.0 -1.0 0.0 1.0 2.0

69 Estimate gets more precise with every question
SEM after each question Q3: Q4: Q5: 2.0 1.0 0.0 -1.0 -2.0 -3.0

70 What response efficiencies does a 9-item “Physical Functioning” CAT offer?
Objective: To evaluate the response efficiencies provided by a 9-item CAT: “Adapteval-HIV: Physical Functioning” 70

71 Methods 400 HIV patients from 2 clinics completed all 9 items of the “AD-HIV: Physical Functioning” scale. CAT simulations were then conducted, based on the original, full-scale responses. CATs used a set of commonly employed starting and stopping rules: begin with an item of moderate difficulty; end when a targeted score estimate precision is achieved (standard error-based). 71

72 Results 3099 out of 3600 potential responses needed to obtain CAT-based scores for 400 patients. Eliminated the need for 501 responses (14%). 245 patients (61%) needed 9 responses to obtain physical functioning scores. 155 patients (39%) required <9 responses (894 responses instead of 1395). 72

73 Results 13 distinct CAT-based scales provided scores.
These CATs ranged in length from 3-8 items. Of the 155 patients with reduced-item CATs: 35 (22.6%) had 6-item CATs, 30 (19.4%) had 8-item CATs, 29 (18.7%) had 5-item CATs, 26 (16.8%) had 7-item CATs, 23 (14.8%) had 3-item CATs, 12 (7.7%) had 4-item CATs. 73

74 Conclusions Initial studies of simulated CAT responses to “AD-HIV: Physical Functioning” suggest that this 9-item CAT can provide response efficiencies: 155 patients (39%) required <9 responses. These patients averaged a 36% response reduction. 74

75 Implications “CAT AD-HIV: Physical Functioning” offers the potential to relieve patient response burden. Scales of <10 items may provide real response efficiencies when delivered as CATs. 75

76 Issues in constructing and employing measures

77 Having a fit: impact of number of items and distribution of data on traditional criteria for assessing IRT’s unidimensionality assumption (Cook, Kallen, Amtmann) Number of items an influencing factor on studied CFA model fit statistics CFI, NNFI, RMSEA, SRMR, WRMR What about the context of CFA modeling: full scale analysis, short form analysis, item bank construction, static vs. dynamic (CAT) assessment

78 Technology aversion?

79 Education Differences
Up to HS (172) vs. Some College or More (218) p-val eta2 Pain .009 .018 Life Sat. .015 Fatigue .014 .016 Body Img. .019 Sleep .176 .005 Phy. Symp. .000 .035 Emotional .001 .028 Sat. Med. .037 .011 Cognitive Work .006 .020 Self Care .004 .021 Neg. Soc. .079 .008 Phy. Act. .065 Pos. Soc. .029 *Higher education had better scores than lower education in all domains. Results based on ANOVA. 95% significance level (p-val)

80 Income Differences p-val eta2 .000 .078 .046 .066 .001 .033 .049 .145
<20k (140) vs. 20k+ (199) p-val eta2 Pain .000 .078 Life Sat. .046 Fatigue .066 Body Img. .001 .033 Sleep .049 Phy. Symp. .145 Emotional .060 Sat. Med. .010 .020 Cognitive .088 Work .097 Self Care .070 Neg. Soc. .057 Phy. Act. .184 Pos. Soc. .058 *Higher income had better scores than lower income in all domains. Results based on ANOVA. 95% significance level (p-val) and 0.10 effect size (eta2) assumed

81 Platform Selection Process
Semi-random assignments; allowing self-selection to ensure valid input from subjects. Platform selections affected by subject sociodemographic and medical status. p-value Sex .643 Occupation .076 Ethnicity .062 Income .000 Race .153 CD4 .676 Age .921 Viral Load .440 Education Months Since Diag .023 *Results based on Pearson Chi-Square.. 95% significance level assumed.

82 Response invalidity

83 Outpatient population study of trust
Consecutive outpatients from one public, one private, and one VA clinic were recruited from waiting rooms prior to their appointments. Completed questionnaires were received from 104 African Americans and 131 White Americans. The overall sample composition was: 46% female, 57% incomes <= $20,000 per year, 36% completed high school or less, and 47% had experienced moderate to severe pain during the previous 4 weeks.

84 The Physician Trust Scale: In “easiest-to-hardest to endorse” order
1. Your doctor will do whatever it takes to get you all the care you need. (R) 6. Your doctor is totally honest in telling you about all of the different treatment options available for your condition. (R) 4. Your doctor is extremely thorough and careful. (R) 5. You completely trust your doctor’s decisions about which medical treatments are best for you. (R) 10. All in all, you have complete trust in your doctor. (R) 7. Your doctor only thinks about what is best for you. (R) 9. You have no worries about putting your life in your doctor’s hands. (R) 8. Sometimes your doctor does not pay full attention to what you are trying to tell him/her. 3. Your doctor’s medical skills are not as good as they should be. 2. Sometimes your doctor cares more about what is convenient for him/her than about your medical needs. Response options: 1-Strongly agree 2-Agree 3-Neutral 4-Disagree 5-Strongly disagree Scoring: Higher score = greater trust

85 Validity: Property of An Inference
A measure is not valid in an absolute sense. Validity interest is in: the appropriateness of inferences about individuals, which were made based on an interpretation of scores, which were derived from data collected in a specific context.

86 Studying Measurement Invalidity: Person Fit
A unique advantage of IRT measurement models: Individualized person fit information is calculated in conjunction with the estimation of person measures. This individual person level fit information, i.e., person fit statistics, can then be used for inquiries about measurement validity.

87 SF-36 Physical Functioning Items: In “easiest-to-hardest” order
Measure Item # Item Content 2.36 SF3 (hardest) Vigorous activities 1.15 SF6 Climbing several flights of stairs 0.66 SF9 Walking more than a mile 0.48 SF8 Bending, kneeling, or stooping 0.10 SF4 Moderate activities -0.18 SF10 Walking several hundred yards -0.60 SF5 Lifting or carrying groceries SF7 Climbing one flight of stairs -0.89 SF11 Walking one hundred yards -2.48 SF12 (easiest) Bathing or dressing yourself

88 SF-36 Physical Functioning Items: An Individual’s Response Validity
Item Content No, not limited at all. SF3 (hardest) Vigorous activities Yes, limited a little. SF6 Climbing several flights of stairs SF9 Walking more than a mile SF8 Bending, kneeling, or stooping SF4 Moderate activities Yes, limited a lot. SF10 Walking several hundred yards SF5 Lifting or carrying groceries SF7 Climbing one flight of stairs SF11 Walking one hundred yards SF12 (easiest) Bathing or dressing yourself

89 Using Person Fit Information
1) Identify individuals with high levels of invalidity in their responses. 2) What percent of the sample is represented by these high invalidity individuals? 3) Can these individuals be characterized – demographically or otherwise? If no, invalid responses may be randomly distributed. If yes, measurements may be less appropriate for certain groups.

90 Characterizing Invalidity
In the study employing the Physician Trust Scale: 19.4% were found to have high invalidity responses (person infit or outfit mean sq >2). This increased to 29.7% for those with Up-to-HS education, 44.1% for those with Up-to-HS education, little or no pain.

91 Results: Identify Invalid Responses
A sizeable percent of the sample (19.4) was identified as having responses with questionable validity. Characteristics common to respondents with low validity responses could be identified: Primarily - level of education; to some extent - pain status. A question exists as to the stability of study findings when analyses are conducted with and without low validity data. The validity of instrument responses across all patient subgroups of interest may not be equivalent. What, then, is effect, and what is inappropriate measurement?

92 Potential for measurement bias

93 If there is uncertainty in measurement…
In employing a measurement instrument, user expectation is individuals and population subgroups are NOT likely to be advantaged or disadvantaged by the measurements they receive. If an “advantaging” or “disadvantaging” occurs: there will be systematic error in the way in which an instrument provides measures for members of a specific group or groups The bottom line: Measurement disparities produce bias.

94 …there is uncertainty in results
The use of biased measurements in research introduces a significant question: Will conclusions hold? In a measure of health status, what if something other than state of health has influenced the way in which certain individuals respond? What, then, if observed group differences really reflect something other than what an outcomes instrument was intended to measure? A question will remain about a study’s findings: Are observations about disparities in health care due to true group differences or were they influenced by the use of culturally-biased measurements?

95 Studying Measurement Disparity: DIF
DIF is “differential item functioning.” A measurement item shows DIF if individuals of the same trait level (e.g., same level of trust) but originating from different groups do not have the same probability of endorsing an item.

96 Trust Item Difficulties: Up to HS vs. > HS Education

97 Trust Item Difficulties: Up to HS vs. > HS Education
1. Your doctor will do whatever it takes to get you all the care you need. 6. Your doctor is totally honest in telling you about all of the different treatment options available for your condition. 3. Your doctor’s medical skills are not as good as they should be. 2. Sometimes your doctor cares more about what is convenient for him/her than about your medical needs.

98 Address potential measurement bias
Four items show statistically significant DIF, with moderate to large DIF effect sizes. The trust SCALE, as a whole, appears to be understood and responded to differently by the lower education versus higher education group. Considering that 36% of the study’s sample completed high school or less, this trust instrument may not be able to provide interpretable scores for this study.

99 Outline The new measurement Constructing measures
Extending the usefulness of measurement

100 When patient-reported outcomes…
…are reported to patients

101 A pivotal study Measuring quality of life in routine oncology practice improves communication and patient well-being: a randomized controlled trial G. Velikova et al, Journal of Clinical Oncology, February 15, 2004

102 Objective To examine the effects of regular repeated collection and feedback of HRQL data to oncologists

103 Study Design: Groups Patients randomly assigned to Intervention (I)
complete HRQL questionnaires; feedback of results to physicians Attention-Control (A-C) complete HRQL questionnaires; no feedback of results to physicians Control (C) no HRQL measurement before clinic encounter

104 Improvement = “>+7 points” in FACT-G well-being from baseline
FACT-G Overall Physical Functional Emotional Social/family I vs C yes no I vs A-C A-C vs C

105 Fig 4. Proportions of patients showing clinically meaningful improvement, no change, or deterioration in Functional Assessment of Cancer–General (FACT-G) score after three encounters, by study arm. Intervention versus attention-control and control groups, P = .001; intervention and attention-control versus control, P = .003, using ordinal regression, controlling for baseline FACT-G, performance status, and time on study.

106 “Leftover” Finding Completion of questionnaires may have effect on patient well-being, without feedback to physicians

107 Improvement = “>+7 points” in FACT-G well-being from baseline
FACT-G Overall Physical Functional Emotional Social/family I vs C yes no I vs A-C A-C vs C 107

108 Patient-centric effect
Achieved by completing measures What if measures were systematically reported to patients, across time? Effect yet to be harnessed by healthcare providers or systems

109 Two PRO-based clinical care projects
Department of Palliative Care and Rehabilitation Medicine, MD Anderson Cancer-Related Fatigue Clinic, MD Anderson

110 Palliative Care (Yang, Kallen, Bruera)

111 NCI-sponsored SBIR RFP Topic 246:
“Integrating Patient-Reported Outcomes in Hospice and Palliative Care Practices”

112 PRO Assessment Home-Centered Teleoncology Care Model Phase I Findings
112 BrightOutcome/Caracal, Inc.

113 PRO Trending Home-Centered Teleoncology Care Model Phase I Findings
113 BrightOutcome/Caracal, Inc.

114 PRO Assessment Snapshot
Home-Centered Teleoncology Care Model Phase I Findings PRO Assessment Snapshot 114 BrightOutcome/Caracal, Inc.

115 Expected benefits Workflow efficiency Data accuracy
Home-Centered Teleoncology Care Model Phase I Findings Expected benefits Workflow efficiency Data accuracy Potential for temporal/causal analyses Facilitate provider-provider and patient-provider communication 115 BrightOutcome/Caracal, Inc.

116 Cancer-Related Fatigue Clinic (Kallen, Yang, Escalante)

117

118

119

120

121 Expected benefits Timely presentation of interpreted measurement information to all involved parties Improve communication, shared decision-making, and outcomes

122 Summary The new measurement Constructing measures
Extending the usefulness of measurement


Download ppt "Application of Item Response Theory to PRO Development"

Similar presentations


Ads by Google