Presentation is loading. Please wait.

Presentation is loading. Please wait.

Application of Item Response Theory to PRO Development Michael A. Kallen, PhD, MPH Associate Professor, Research Faculty Department of Medical Social Sciences.

Similar presentations


Presentation on theme: "Application of Item Response Theory to PRO Development Michael A. Kallen, PhD, MPH Associate Professor, Research Faculty Department of Medical Social Sciences."— Presentation transcript:

1 Application of Item Response Theory to PRO Development Michael A. Kallen, PhD, MPH Associate Professor, Research Faculty Department of Medical Social Sciences Feinberg School of Medicine Northwestern University Chicago, Illinois

2 Outline The new measurement Constructing measures Extending the usefulness of measurement

3 Outline The new measurement Constructing measures Extending the usefulness of measurement

4 To see whats new… The Lower Extremity Function Scale (LEFS) from a classical perspective

5 LEFS ITEMS Activities Extreme difficulty or unable to perform activity Quite a bit of difficulty Moderate difficulty A little bit of difficulty No difficulty 1. Perform any of your usual work, housework, or school activities Perform your usual hobbies, recreational or sporting activities Getting into or out of the bath Walking between rooms Putting on your shoes or socks Squatting Lifting an object, like a bag of groceries from the floor Performing light activities around your home Performing heavy activities around your home Getting into or out of a car Walking 2 blocks Walking a mile Going up or down 10 stairs (about 1 flight of stairs) Standing for 1 hour Sitting for 1 hour Running on even ground Running on uneven ground Making sharp turns while running fast Hopping Rolling over in bed01234

6 Original LEFS Instructions We are interested in knowing whether you are having any difficulty at all with the activities listed below because of your lower limb problem for which you are currently seeking attention. Please provide an answer for each activity. Today, do you or would you have any difficulty at all with: (Circle one number on each line)

7 LEFS ITEMS Activities Extreme difficulty or unable to perform activity Quite a bit of difficulty Moderate difficulty A little bit of difficulty No difficulty 1. Perform any of your usual work, housework, or school activities Perform your usual hobbies, recreational or sporting activities Getting into or out of the bath Walking between rooms Putting on your shoes or socks Squatting Lifting an object, like a bag of groceries from the floor Performing light activities around your home Performing heavy activities around your home Getting into or out of a car Walking 2 blocks Walking a mile Going up or down 10 stairs (about 1 flight of stairs) Standing for 1 hour Sitting for 1 hour Running on even ground Running on uneven ground Making sharp turns while running fast Hopping Rolling over in bed01234

8 Measurement products needed? All patient individual item scores, so as to obtain a patient s total score

9 The burden of measurement For the healthcare provider/researcher: instrument administration score summation score interpretation For the patient: The actual time (and thoughtfulness) it requires to respond Being asked inappropriate questions

10 LEFS ITEMS Activities Extreme difficulty or unable to perform activity Quite a bit of difficulty Moderate difficulty A little bit of difficulty No difficulty 1. Perform any of your usual work, housework, or school activities Perform your usual hobbies, recreational or sporting activities Getting into or out of the bath Walking between rooms Putting on your shoes or socks Squatting Lifting an object, like a bag of groceries from the floor Performing light activities around your home Performing heavy activities around your home Getting into or out of a car Walking 2 blocks Walking a mile Going up or down 10 stairs (about 1 flight of stairs) Standing for 1 hour Sitting for 1 hour Running on even ground Running on uneven ground Making sharp turns while running fast Hopping Rolling over in bed01234

11 The cost of this burden? For the healthcare provider/researcher: Perhaps a loss of willingness to use the instrument to collect data, score responses, or interpret a patient s self-reported condition. For the patient: Perhaps a loss of willingness to respond in a focused, honest way to an instrument that seems unresponsive or even annoying.

12 Classical psychometrics Its beginnings go back to the turn of the 20 th century. Consider Spearman s work in disattenuating the correlation coefficient. Demonstration of formulae for true measurement of correlation. American Journal of Psychology, 1907.

13 Psychometrics First Golden Years The zenith of classical psychometrics arguably occurred during the time period surrounding the publication of Gulliksen s book, Theory of Mental Tests (1950). This time period saw the development of many of the best and brightest aspects of classical psychometrics: e.g., 1) that measurements, not instruments, have psychometric properties; 2) the introduction of alpha reliability; 3) validity as the sine qua non of measurement.

14 Classical Test Theory (CTT) CTT is a cornerstone of classical psychometrics. It is theory-based measurement: Individual scores are theory-defined as composed of a true score component and an error component. That is, observed score = true score + error

15 CTT as Theory Because it is theory, CTT or true score theory does not provide: hypotheses that are testable, or models that are falsifiable.

16 CTT and Circular Definitions In CTT, item difficulty is defined as: the proportion of examines in a group of interest who answer an item correctly. Thus, an item being hard or easy depends on the ability of the group of examinees measured. As a result, it is a challenge to determine score meaning: Examinee and test characteristics are entangled; each can be interpreted only in the context of the other.

17 Very Easy Test Very Hard Test Medium Test Expected Score 8 Person Expected Score 0 Person Expected Score 5 Person 8 Reprinted with permission from: Wright, B.D. & Stone, M. (1979) Best test design, Chicago: MESA Press, p. 5. Scores depend on the difficulty of test items

18 The New Measurement In the past 40-plus years, measurement has undergone a quiet evolution.

19 Modern Psychometrics In 1968 Lord and Novick s book, Statistical Theories of Mental Tests, was published. In it, model-based measurement, a foundation of modern psychometrics, was introduced.

20 Item Response Theory (IRT) IRT, a form of model-based measurement, now plays a fundamental role in modern psychometrics. It addresses CTT s major shortcomings: By providing test-independent and group-independent measurement; by employing models that can be tested and falsified. If a proposed IRT model does NOT adequately explain the data at hand, it can be determined whether assumptions were met or an inappropriate model was used.

21 CTT vs. IRT CTT, from classical psychometrics, is theory-based, sample and test-specific, and focuses on test performance. IRT, from modern psychometrics, is model-based, ability or functioning level-specific, and focuses on item performance.

22 Outline The new measurement Constructing measures Extending the usefulness of measurement

23 IRT approach For any approach Getting the measure right helps in getting the measurement right For the IRT approach There have come to be accepted and recommended steps to take for getting a measure right

24 A dapteval-HIV Project (Yang, Kallen) Goal: To develop and evaluate a set of HIV-specific item banks to support IRT-based CAT assessment Implementation on multiple technology platforms (web/phone/PDA)

25 Item Response Theory Q Q Q Q GoodPoor EasyHard Person Latent Trait Item Location

26 IRT assumptions Unidimensionality Measure one thing only Monotonicity The better the trait status on a single scale item, the better the trait status on the overall scale Local independence Items are independent of each other statistically, after controlling for shared dimensionality

27 Monotonicity The better the trait status on a single scale item, the better the trait status on the overall scale

28 Local Independence Items are independent of each other statistically, after controlling for shared dimensionality Components of a scale items variance Required: shared variance, in common with the scales unidimensional factor Expected: some residual or noise/error variance Problem: If the residual variance of one item is correlated with that of another item, at some point the variance is no longer just noise

29 Advantages of IRT approach to measurement Focus is on the item, not the scale Each item possesses trait estimation capacities Provides item- and group-independent measurement Not tied to sample or particular items used Makes computer adaptive testing a reality Accumulation of detailed knowledge about individual items and their functioning Customized item presentation reduces the number of patient responses needed to achieve measurements of similar quality

30 Adapteval-HIV: 14 item sets Pain Fatigue Sleep Emotion Functioning Cognitive Functioning Self Care/Daily Living Physical Act./Leisure Life Satisfaction Body Image Physical Symptoms Sat w/ Medical Care Work/Employment Neg. Social Issues Pos. Social Exp.

31 Measure development process Identify initial set of items (adapted from 9 existing instruments) (248 items) Panel (20 docs/nurses/patients/psychosocial) eval (226 items) Panel (+ input from 3 psychosocial researchers) evaluation (192 items) Pilot test: 50 patients Analysis of floor/ceiling, missing data, low SD (146 items) Primary data collection: 400 patients Primary psychometric analysis (107 items) IRT parameter calibration to implement CAT

32 Psychometric analyses: item exclusion/inclusion Primary Criteria Missingness > 20% (2 items excluded) CFA factor loadings <.50 (16 excluded) Local dependence >.20 (5 excluded) Lack of monotonicity (9 excluded) Potential item bias (3 excluded) Secondary Criteria Multi-dimensionality (3 excluded) Failed IRT parameter convergence (2 excluded)

33 Project History Phase I 50 patients at Northwestern Web/phone Prelim psychometric analyses (cognitive interviews, etc.) Phase II 225 each at Northwestern and UIC Web/phone/PDA Complete psychometric analyses

34 Ethnicity/race and gender characteristics of primary data collection (NU + UIC) Ethnic Category Sex/Gender FemalesMales Unknown or Not ReportedTotal Hispanic or Latino Not Hispanic or Latino Unknown (individuals not reporting ethnicity) 1708 Ethnic Category: Total of All Subjects Racial Categories American Indian/Alaska Native 1304 Asian 0505 Native Hawaiian or Other Pacific Islander 1304 Black or African American White More Than One Race Unknown or Not Reported Racial Categories: Total of All Subjects

35 Adapteval-HIV question distribution DomainPoolDomainPool Pain4Life Sat.8 Fatigue5Body Img.4 Sleep5Phy Symp9 Emotional23Sat. Med.5 Cognitive9Work6 Self Care6Neg. Soc.8 Phy. Act.7Pos. Soc.8 Overall Total: Pool (107)

36 Ceiling and Floor Effects DomainSzCeilFloorDomainSzCeilFloor Pain (26%)0 (0%)Life Sat (10%)8 (2%) Fatigue39349 (12%)4 (1%)Body Img (37%)3 (1%) Sleep33461 (18%)5 (1%)Phy Sym31937 (12%)0 (0%) Emotion3825 (1%)0 (0%)Sat. Med (42%)8 (2%) Cognitive39288 (2%)0 (0%)Work33656 (17%)11 (3%) Self Care36287 (24%)17 (5%)Neg. Soc (5%)0 (0%) Phy. Act (21%)0 (0%)Pos. Soc (11%)4 (1%) Aiming at < 20% *Entries in red denote the values falling within the expected ranges.

37 CFA Goodness of Fit (Standards for Unidimensionality) Root Mean Square Error of Approximation (RMSEA) < 0.10 Comparative Fit Index (CFI) > 0.90 Standardized Root Mean Square Residual (SRMR) < 0.08 RMSEACFISRMRRMSEACFISRMR Pain Life Sat Fatigue Body Img Sleep Phy Symp Emotion Sat. Med Cognitive Work Self Care Neg. Soc Phy. Act Pos. Soc *Entries in red denote the values falling within the expected ranges.

38 Scale Inter-Correlations N=400painfatigsleepemotcogselfcaphyactsatlifbodyphysymsatmedworksocneg fatig0.710 sleep emot cog selfca phyact satlif body physym satmed work socneg socpos Inter-correlations: ranged from *Entries in red denote the values falling within the expected ranges (i.e., <0.90).

39 Internal Consistency Pain.809Life Sat..915 Fatigue.907Body Img..777 Sleep.898Phy. Symp..818 Emotional.950Sat. Med..859 Cognitive.930Work.866 Daily Living.899Neg. Soc..798 Phy. Act..888Pos. Soc..885 Group comparison: Cronbach's alpha > 0.7 Individual comparison: Cronbach's alpha > 0.9 *Entries in red denote the values falling within the expected ranges.

40 Computerized Adaptive Testing 2. Select & present optimal scale item 1. Begin with initial trait estimate 5. Is stopping rule satisfied 7. End of battery 6. End of assessment 4. Re-estimate trait 3. Record and score response 8. Administer next scale item 9. Stop No Yes No Source: Wainer et al. (2000). Computerized Adaptive Testing: A Primer 2nd Ed. LEA. Mahwah N.J.

41 CAT performance: # of questions DomainPoolAvg/SDMx/MnDomainPoolAvg/SDMx/Mn Pain43.1/1.04/2Life Sat.83.1/1.57/2 Fatigue52.6/1.15/2Body Img.43.7/0.54/3 Sleep53.3/1.15/2Phy Symp97.7/1.99/3 Emotional234.6/2.611/2Sat. Med.54.3/1.15/2 Cognitive94.9/2.59/2Work64.9/1.06/3 Self Care63.9/1.46/2Neg. Soc.86.9/1.08/5 Phy. Act.73.7/2.27/2Pos. Soc.84.4/1.88/3 Overall Total: Pool (107), Average (61.1)

42 Static (Full) vs. CAT/IRT Correlation of assessment scores between full instrument and CAT/IRT implementation Body Image (4 items, avg. 3.7):.99 Cognitive Functioning (9 / 4.9):.91 Emotional Functioning (23 / 4.9):.87

43 Relief of measurement burden For the healthcare provider/researcher, computerizing the test can relieve some of the burden. Use a scoring algorithm. Deliver score-specific interpretation. This could be accomplished witha computer administered test.

44 Relief for respondents? Yes, if the measure is presented as a computer adaptive test (CAT), i.e., the test adapts itself or customizes itself according to the responses presented to it by an individual patient.

45 CATs and Item Structure Item presentation order CTT: standardized All patients start at Item #1 and complete all items in order IRT: customized Patients are presented new items based on their responses to previous items IRT logic There is an underlying hierarchy of activities. Activities can be ordered ( calibrated ) from easiest to hardest.

46 CALIBRATIONSITEMS 62.8 (hardest item)Making sharp turns while running fast (18.) 62.8Running on uneven ground (17.) 60.7Running on even ground (16.) 59.7Hopping (19.) 53.6Walking a mile (12.) 51.7Performing your usual hobbies, recreational or sporting activities (2.) 51.6Squatting (6.) 51.4Standing for 1 hour (14.) 50.8Performing heavy activities around your home (9.) 48.5Going up or down 10 stairs (about 1 flight of stairs) (13.) 46.5Performing any of your usual work, housework, or school activities (1.) 45.3Walking 2 blocks (11.) 41.0Lifting an object, like a bag of groceries from the floor (7.) 38.6Getting into or out of a car (10.) 37.9Getting into or out of the bath (3.) 37.6Performing light activities around your home (8.) 36.1Putting on your shoes or socks (5.) 32.1 (easiest item)Walking between rooms (4.)

47 Making sharp turns while running fast Running on uneven ground Running on even ground Hopping Walking a mile Performing your usual hobbies, recreational or sporting activities Squatting Standing for 1 hour Performing heavy activities around your home Going up or down 10 stairs (about 1 flight of stairs) Performing any of your usual work, housework, or school activities Walking 2 blocks Lifting an object, like a bag of groceries from the floor Getting into or out of a car Getting into or out of the bath Performing light activities around your home Putting on your shoes or socks Walking between rooms CALIBRATION: SCALE

48 Making sharp turns while running fast Running on uneven ground Running on even ground Hopping Walking a mile Performing your usual hobbies, recreational or sporting activities Squatting Standing for 1 hour Performing heavy activities around your home Going up or down 10 stairs (about 1 flight of stairs) Performing any of your usual work, housework, or school activities Walking 2 blocks Lifting an object, like a bag of groceries from the floor Getting into or out of a car Getting into or out of the bath Performing light activities around your home Putting on your shoes or socks Walking between rooms VIEW: RANGE

49 CATs CATs have starting rules. LEFS CAT: Begin with an item of moderate difficulty. And CATs have stopping rules. LEFS CAT: When the SEM < 4 (score range: 0-100), or when the average score change for last 3 score estimates < 1, or when all LEFS items are completed.

50 CAT simulation: Focus on item selection Item Pool: 18-items from the Lower Extremity Functioning Scale (LEFS)

51 - 1. Performing any of your usual work, housework, or school activities (A little bit of difficulty) - ITEM PRESENTATION ORDER AND RESPONSE FROM LEFS EXERCISE

52 - 2. Walking a mile (Quite a bit) - 1. Performing any of your usual work, housework, or school activities (A little bit of difficulty) - ITEM PRESENTATION ORDER AND RESPONSE FROM LEFS EXERCISE

53 - 3. Running on uneven ground (Extreme/unable) - 2. Walking a mile (Quite a bit) - 1. Performing any of your usual work, housework, or school activities (A little bit of difficulty) - ITEM PRESENTATION ORDER AND RESPONSE FROM LEFS EXERCISE

54 4. Making sharp turns while running fast (Extreme/unable) 3. Running on uneven ground (Extreme/unable) - 2. Walking a mile (Quite a bit) - 1. Performing any of your usual work, housework, or school activities (A little bit of difficulty) - ITEM PRESENTATION ORDER AND RESPONSE FROM LEFS EXERCISE

55 4. Making sharp turns while running fast (Extreme/unable) 3. Running on uneven ground (Extreme/unable) 5. Running on even ground (Extreme/unable) - 2. Walking a mile (Quite a bit) - 1. Performing any of your usual work, housework, or school activities (A little bit of difficulty) - ITEM PRESENTATION ORDER AND RESPONSE FROM LEFS EXERCISE

56 4. Making sharp turns while running fast (Extreme/unable) 3. Running on uneven ground (Extreme/unable) 5. Running on even ground (Extreme/unable) 6. Hopping (Quite a bit) 2. Walking a mile (Quite a bit) - 1. Performing any of your usual work, housework, or school activities (A little bit of difficulty) - ITEM PRESENTATION ORDER AND RESPONSE FROM LEFS EXERCISE

57 4. Making sharp turns while running fast (Extreme/unable) 3. Running on uneven ground (Extreme/unable) 5. Running on even ground (Extreme/unable) 6. Hopping (Quite a bit) 2. Walking a mile (Quite a bit) 7. Performing your usual hobbies, recreational or sporting activities (Moderate) - 1. Performing any of your usual work, housework, or school activities (A little bit of difficulty) - ITEM PRESENTATION ORDER AND RESPONSE FROM LEFS EXERCISE

58 4. Making sharp turns while running fast (Extreme/unable) 3. Running on uneven ground (Extreme/unable) 5. Running on even ground (Extreme/unable) 6. Hopping (Quite a bit) 2. Walking a mile (Quite a bit) 7. Performing your usual hobbies, recreational or sporting activities (Moderate) - 8. Performing heavy activities around your home (Moderate) - 1. Performing any of your usual work, housework, or school activities (A little bit of difficulty) - ITEM PRESENTATION ORDER AND RESPONSE FROM LEFS EXERCISE

59 The potential to relieve respondent burden Only 8 individual LEFS items required responses. As many as 10 potentially inappropriate items did NOT require responses. This patient s LEFS score is 46.02, the activity level at which the individual can function with little or no difficulty.

60 CAT simulation: Focus on precision Item Pool: 23-items of the Modified Roland- Morris Low Back Pain Disability Questionnaire

61 Back Disability Low High Trait Continuum

62 Q1: I find it difficult to get out of a chair because of my back SEM = 1.4 A: Yes

63 Q2: I stay at home most of the time. SEM = 0.67 A: No

64 Q3: I can only walk short distances. SEM = 0.59 A: Yes

65 Q4: I use a handrail to get up stairs. SEM = 0.55 A: Yes

66 Q5: Im not doing jobs that I usually do around the house. SEM = 0.47 A: No

67 Yes No Yes No Low High Back Disability

68

69 Estimate gets more precise with every question SEM after each question Q1: Q2: Q3: Q4: Q5:

70 What response efficiencies does a 9-item Physical Functioning CAT offer? Objective: To evaluate the response efficiencies provided by a 9-item CAT: Adapteval-HIV: Physical Functioning

71 Methods 400 HIV patients from 2 clinics completed all 9 items of the AD-HIV: Physical Functioning scale. CAT simulations were then conducted, based on the original, full-scale responses. CATs used a set of commonly employed starting and stopping rules: begin with an item of moderate difficulty; end when a targeted score estimate precision is achieved (standard error-based).

72 Results 3099 out of 3600 potential responses needed to obtain CAT-based scores for 400 patients. Eliminated the need for 501 responses (14%). 245 patients (61%) needed 9 responses to obtain physical functioning scores. 155 patients (39%) required <9 responses (894 responses instead of 1395).

73 Results 13 distinct CAT-based scales provided scores. These CATs ranged in length from 3-8 items. Of the 155 patients with reduced-item CATs: 35 (22.6%) had 6-item CATs, 30 (19.4%) had 8-item CATs, 29 (18.7%) had 5-item CATs, 26 (16.8%) had 7-item CATs, 23 (14.8%) had 3-item CATs, 12 (7.7%) had 4-item CATs.

74 Conclusions Initial studies of simulated CAT responses to AD-HIV: Physical Functioning suggest that this 9-item CAT can provide response efficiencies: 155 patients (39%) required <9 responses. These patients averaged a 36% response reduction.

75 Implications CAT AD-HIV: Physical Functioning offers the potential to relieve patient response burden. Scales of <10 items may provide real response efficiencies when delivered as CATs.

76 Issues in constructing and employing measures

77 Having a fit: impact of number of items and distribution of data on traditional criteria for assessing IRTs unidimensionality assumption (Cook, Kallen, Amtmann) Number of items an influencing factor on studied CFA model fit statistics CFI, NNFI, RMSEA, SRMR, WRMR What about the context of CFA modeling: full scale analysis, short form analysis, item bank construction, static vs. dynamic (CAT) assessment

78 Technology aversion?

79 Education Differences Up to HS (172) vs. Some College or More (218) p-valeta 2 p-valeta 2 Pain Life Sat..015 Fatigue Body Img Sleep Phy. Symp Emotional Sat. Med Cognitive Work Self Care Neg. Soc Phy. Act Pos. Soc *Higher education had better scores than lower education in all domains. Results based on ANOVA. 95% significance level (p-val)

80 Income Differences <20k (140) vs. 20k+ (199) p-valeta 2 p-valeta 2 Pain Life Sat Fatigue Body Img Sleep Phy. Symp Emotional Sat. Med Cognitive Work Self Care Neg. Soc Phy. Act Pos. Soc *Higher income had better scores than lower income in all domains. Results based on ANOVA. 95% significance level (p-val) and 0.10 effect size (eta 2 ) assumed

81 Platform Selection Process Semi-random assignments; allowing self-selection to ensure valid input from subjects. Platform selections affected by subject sociodemographic and medical status. p-value Sex.643 Occupation.076 Ethnicity.062 Income.000 Race.153 CD4.676 Age.921 Viral Load.440 Education.000 Months Since Diag.023 *Results based on Pearson Chi-Square.. 95% significance level assumed.

82 Response invalidity

83 Outpatient population study of trust Consecutive outpatients from one public, one private, and one VA clinic were recruited from waiting rooms prior to their appointments. Completed questionnaires were received from 104 African Americans and 131 White Americans. The overall sample composition was: 46% female, 57% incomes <= $20,000 per year, 36% completed high school or less, and 47% had experienced moderate to severe pain during the previous 4 weeks.

84 The Physician Trust Scale: In easiest-to-hardest to endorse order 1. Your doctor will do whatever it takes to get you all the care you need. (R) 6. Your doctor is totally honest in telling you about all of the different treatment options available for your condition. (R) 4. Your doctor is extremely thorough and careful. (R) 5. You completely trust your doctors decisions about which medical treatments are best for you. (R) 10. All in all, you have complete trust in your doctor. (R) 7. Your doctor only thinks about what is best for you. (R) 9. You have no worries about putting your life in your doctors hands. (R) 8. Sometimes your doctor does not pay full attention to what you are trying to tell him/her. 3. Your doctors medical skills are not as good as they should be. 2. Sometimes your doctor cares more about what is convenient for him/her than about your medical needs. Response options: 1-Strongly agree 2-Agree 3-Neutral 4-Disagree 5-Strongly disagree Scoring: Higher score = greater trust

85 Validity: Property of An Inference A measure is not valid in an absolute sense. Validity interest is in: the appropriateness of inferences about individuals, which were made based on an interpretation of scores, which were derived from data collected in a specific context.

86 Studying Measurement Invalidity: Person Fit A unique advantage of IRT measurement models: Individualized person fit information is calculated in conjunction with the estimation of person measures. This individual person level fit information, i.e., person fit statistics, can then be used for inquiries about measurement validity.

87 SF-36 Physical Functioning Items: In easiest-to-hardest order MeasureItem #Item Content 2.36SF3 (hardest)Vigorous activities 1.15SF6Climbing several flights of stairs 0.66SF9Walking more than a mile 0.48SF8Bending, kneeling, or stooping 0.10SF4Moderate activities -0.18SF10Walking several hundred yards -0.60SF5Lifting or carrying groceries -0.60SF7Climbing one flight of stairs -0.89SF11Walking one hundred yards -2.48SF12 (easiest)Bathing or dressing yourself

88 SF-36 Physical Functioning Items: An Individuals Response Validity ResponseItem #Item Content No, not limited at all. SF3 (hardest)Vigorous activities Yes, limited a little. SF6Climbing several flights of stairs No, not limited at all. SF9Walking more than a mile Yes, limited a little. SF8Bending, kneeling, or stooping Yes, limited a little. SF4Moderate activities Yes, limited a lot. SF10Walking several hundred yards Yes, limited a lot. SF5Lifting or carrying groceries Yes, limited a lot. SF7Climbing one flight of stairs Yes, limited a little. SF11Walking one hundred yards No, not limited at all. SF12 (easiest)Bathing or dressing yourself

89 Using Person Fit Information 1) Identify individuals with high levels of invalidity in their responses. 2) What percent of the sample is represented by these high invalidity individuals? 3) Can these individuals be characterized – demographically or otherwise? If no, invalid responses may be randomly distributed. If yes, measurements may be less appropriate for certain groups.

90 Characterizing Invalidity In the study employing the Physician Trust Scale: 19.4% were found to have high invalidity responses (person infit or outfit mean sq >2). This increased to 29.7% for those with Up-to-HS education, 44.1% for those with Up-to-HS education, little or no pain.

91 Results: Identify Invalid Responses A sizeable percent of the sample (19.4) was identified as having responses with questionable validity. Characteristics common to respondents with low validity responses could be identified: Primarily - level of education; to some extent - pain status. A question exists as to the stability of study findings when analyses are conducted with and without low validity data. The validity of instrument responses across all patient subgroups of interest may not be equivalent. What, then, is effect, and what is inappropriate measurement?

92 Potential for measurement bias

93 If there is uncertainty in measurement… In employing a measurement instrument, user expectation is individuals and population subgroups are NOT likely to be advantaged or disadvantaged by the measurements they receive. If an advantaging or disadvantaging occurs: there will be systematic error in the way in which an instrument provides measures for members of a specific group or groups The bottom line: Measurement disparities produce bias.

94 …there is uncertainty in results The use of biased measurements in research introduces a significant question: Will conclusions hold? In a measure of health status, what if something other than state of health has influenced the way in which certain individuals respond? What, then, if observed group differences really reflect something other than what an outcomes instrument was intended to measure? A question will remain about a studys findings: Are observations about disparities in health care due to true group differences or were they influenced by the use of culturally- biased measurements?

95 Studying Measurement Disparity: DIF DIF is differential item functioning. A measurement item shows DIF if individuals of the same trait level (e.g., same level of trust) but originating from different groups do not have the same probability of endorsing an item.

96 Trust Item Difficulties: Up to HS vs. > HS Education Up to HS > HS

97 Trust Item Difficulties: Up to HS vs. > HS Education Up to HS > HS 1. Your doctor will do whatever it takes to get you all the care you need. 6. Your doctor is totally honest in telling you about all of the different treatment options available for your condition. 3. Your doctors medical skills are not as good as they should be. 2. Sometimes your doctor cares more about what is convenient for him/her than about your medical needs.

98 Address potential measurement bias Four items show statistically significant DIF, with moderate to large DIF effect sizes. The trust SCALE, as a whole, appears to be understood and responded to differently by the lower education versus higher education group. Considering that 36% of the study s sample completed high school or less, this trust instrument may not be able to provide interpretable scores for this study.

99 Outline The new measurement Constructing measures Extending the usefulness of measurement

100 When patient-reported outcomes… …are reported to patients

101 A pivotal study Measuring quality of life in routine oncology practice improves communication and patient well-being: a randomized controlled trial G. Velikova et al, Journal of Clinical Oncology, February 15, 2004

102 Objective To examine the effects of regular repeated collection and feedback of HRQL data to oncologists

103 Study Design: Groups Patients randomly assigned to Intervention (I) complete HRQL questionnaires; feedback of results to physicians Attention-Control (A-C) complete HRQL questionnaires; no feedback of results to physicians Control (C) no HRQL measurement before clinic encounter

104 Improvement = >+7 points in FACT-G well-being from baseline FACT-G Overall PhysicalFunctionalEmotionalSocial/family I vs Cyes no I vs A-Cno A-C vs Cyes no

105 Fig 4. Proportions of patients showing clinically meaningful improvement, no change, or deterioration in Functional Assessment of Cancer–General (FACT-G) score after three encounters, by study arm. Intervention versus attention-control and control groups, P =.001; intervention and attention-control versus control, P =.003, using ordinal regression, controlling for baseline FACT-G, performance status, and time on study.

106 Leftover Finding Completion of questionnaires may have effect on patient well-being, without feedback to physicians

107 Improvement = >+7 points in FACT-G well-being from baseline FACT-G Overall PhysicalFunctionalEmotionalSocial/family I vs Cyes no I vs A-Cno A-C vs Cyes no

108 Patient-centric effect Achieved by completing measures What if measures were systematically reported to patients, across time? Effect yet to be harnessed by healthcare providers or systems

109 Two PRO-based clinical care projects Department of Palliative Care and Rehabilitation Medicine, MD Anderson Cancer-Related Fatigue Clinic, MD Anderson

110 Palliative Care (Yang, Kallen, Bruera)

111 NCI-sponsored SBIR RFP Topic 246: Integrating Patient-Reported Outcomes in Hospice and Palliative Care Practices

112 PRO Assessment 112

113 PRO Trending 113

114 PRO Assessment Snapshot 114

115 Expected benefits Workflow efficiency Data accuracy Potential for temporal/causal analyses Facilitate provider-provider and patient-provider communication 115

116 Cancer-Related Fatigue Clinic (Kallen, Yang, Escalante)

117

118

119

120

121 Expected benefits Timely presentation of interpreted measurement information to all involved parties Improve communication, shared decision-making, and outcomes

122 Summary The new measurement Constructing measures Extending the usefulness of measurement


Download ppt "Application of Item Response Theory to PRO Development Michael A. Kallen, PhD, MPH Associate Professor, Research Faculty Department of Medical Social Sciences."

Similar presentations


Ads by Google