Presentation on theme: "The Patient-Reported Outcome Measurement Information System (PROMIS)"— Presentation transcript:
1The Patient-Reported Outcome Measurement Information System (PROMIS) Nan Rothrock, Ph.D.Northwestern UniversityMay 22, 2012
2Agenda Problems in patient-reported outcome measures PROMIS approach to PRO instrument developmentAvailable PROMIS instrumentsReliability, validityPROMIS and the FDA
3Challenges in PRO Measurement Many measures of same health conceptWidely varying qualityDifficult to compare and combine data. . . across studies. . . across conditionsComplexLongThese challenges were appreciated by NIH many years ago. [TRANSITION]
4What is wrong with today's static measures ? 322Questionnaire with a high precision - but small range11Questionnaire with a wide range - but low precisionStatic questionnaire are either not precise enough or their measurement range is to narrow … ceiling and floor effects …- 1- 2- 3
5“The clinical outcomes research enterprise would be enhanced greatly by the availability of a psychometrically validated, dynamic system to measure PROs efficiently in study participants with a wide range of chronic diseases and demographic characteristics.”National Institutes of Health, 2003
6PROMIS AimsAttack the Patient- Reported Outcome (PRO) “Tower of Babel”Harness modern psychometric methodsImprove quality and interpretability of PROsAccording to Genesis, the monolingual Babylonians wanted to make a name for themselves by building a mighty city and a tower “with its top in the heavens.” God disrupted the work by so confounding their speech. Workers spoke different languages and could no longer understand one another"Then they said, 'Come, let us build ourselves a city, and a tower with its top in the heavens, and let us make a name for ourselves; otherwise we shall be scattered abroad upon the face of the whole earth.'" (Genesis 11:4).Bruegel – Flemish painter. This work is currently in Vienna.Bruegel, 1563
7Resources Nine-year commitment of NIH $80+ million investment 15 funded research sites
8What is PROMIS?MethodologyMeasures (Instruments)Software
9Glossary Item = question or statement a patient answers Instrument = collection of itemsLegacy = existing instrument that is “gold standard” or a commonly used and widely accepted instrument
10PROMIS Instruments Domain focused, not disease focused Item Banks Domain = feeling, function or perception you want to measure (e.g., anxiety, physical function, general health perceptions)Item BanksA large collection of items measuring one domainAny and all items can be used to provide a scoreCan be administered as Computerized Adaptive Tests (CATs) or fixed-length short forms
11The Life Story of a PROMIS Item Focus groupsArchival data analysisBinning and winnowingDomain FrameworkLiterature reviewExpert review/ consensusLiteracy level analysisLarge-scale testingCognitive interviewsExpert item revisionTranslation reviewStatistical analysisDevelopment of new and modified items (Beginning pool 8000 total items)28 focus groupsBinning and winnowing to 1064 itemsCognitive interviews (784 items)Intellectual propertyCalibration decisionsShort formCATValidation studies
14PROMIS Current (2012) Physical Health Banks AdultPediatric/Parent ProxyPain BehaviorPain InterferencePain IntensityFatiguePain InterferenceUpper ExtremityPhysical HealthFatigueMobilitySleep DisturbanceAsthma ImpactSleep-related ImpairmentPhysical FunctionSexual Function
15PROMIS Current (2012) Mental Health Banks AdultPediatric/Parent ProxyAnxietyAnxietyDepressionDepressionAngerMental HealthAngerPsychosocial Illness ImpactApplied Cognition ConcernsApplied Cognition AbilitiesAlcohol UseAlcohol ConsequencesAlcohol Expectancies
16PROMIS Current (2012) Social Health Banks AdultPediatric/Parent ProxyAbility to Participate in Roles & ActivitiesPeer RelationshipsSatisfaction with Roles & ActivitiesSocialHealthCompanionshipEmotional SupportInformational SupportInstrumental SupportSocial Isolation
18Language Availability AvailableUniversal SpanishIn ProcessGermanPortugueseMandarin ChineseFrenchItalianNorwegianOthers – see nihpromis.org/measures/translations
19What is the metric? T Score Referenced to the US General Population Mean = 50Standard Deviation = 10Referenced to the US General Population
20Ongoing PROMIS Development AdultGI SymptomsSelf-efficacy for management of chronic diseasePediatricPain Behavior, Quality, IntensityPhysical ActivityExperience of StressSubjective Well-beingImpact of Child Illness on FamilyFamily Belongingness
21How does PROMIS compare to other PRO instruments?
22Reliability (Precision) This leads to precise measurement that improves the power and efficiency of clinical research.
23Physical Function Measurement Precision and Range PROMIS Short Form 20 itemsPROMIS Short Form 10 itemsSF itemsSE = 3.3 rel = 0.90SE = 2.3 rel = 0.95ErrorCAT 10 itemsHAQ 20 itemsrheumatoid arthritis patientsUS general population
24ValidityThis leads to precise measurement that improves the power and efficiency of clinical research.
25Scores on PROMIS measures should correlate with accepted measures of the same domain. (Concurrent Validity)
27When people experience clinical benefit or decline, their PROMIS scores should also change. (Responsiveness)
28Effect Size: PROMIS Pain Interference vs Effect Size: PROMIS Pain Interference vs. BPI Interference (Patients with pain intensity > 4 at baseline)Back pain with sciatica for at least 6 weeksScheduled for an epidural steroid injectionBaseline, 1 month, 3 month
29Effect Size: PROMIS Pain Behavior vs Effect Size: PROMIS Pain Behavior vs. Roland-Morris (Patients with pain intensity > 4 at baseline)
30PROMIS and the FDA: Agreement Importance of PRO development to include patient voicesImportance of sound measurementConfusion in selecting an instrument because of huge array of choicesOngoing discussions via Interagency Clinical Outcomes Assessment Working Group to qualify PROMIS Fatigue measures, attendance and presentations at PRO Consortium
31PROMIS and FDA: Differences Concerning Content Validity FDA Approach evaluate content validity in each clinical population in which the measure may be usedPROMIS Approach there is commonality in patients’ experiences of symptoms/outcomes and their impact on QOLNeed to re-validate a well-developed & valid instrument in a target population is questionableContent Validity = extent to which a scale or questionnaire represents the most relevant and important aspects of a concept in the context of a given measurement applicationMagasi, S. et al (2011) Content validity of patient-reported outcome measures: Perspectives from a PROMIS meeting. Quality of Life Research
32Generic fatigue = MS-specific fatigue (r=0 Generic fatigue = MS-specific fatigue (r=0.92) (Cook et al, QOLR, 2011)Content Validity = extent to which a scale or questionnaire represents the most relevant and important aspects of a concept in the context of a given measurement applicationWhen include most relevant/important aspects of fatigue for MS, see same result as general fatigue measure
33PROMIS FatigueSFv1.0 and PROMIS FatigueMS Scores by Disability Status, Fatigue Severity, and Vitality ScoresNPROMIS FatigueSF v1.0PROMIS FatigueMSMeanSDExpanded Disability Status Scale (EDSS)Mild (0-4)83184.108.40.206.2Moderate ( )10460.56.460.75.6Severe ( )438.38.7Fatigue Severity (0-10 NRS)None/Mild (0-1)1843.04.542.55.4Moderate (2-4)5851.06.051.36.6Severe (5-10)15461.75.861.95.5Vitality (item from the MOS)None/A little5263.85.364.2Some8859.96.360.1Quite a lot4455.756.06.8Very Much4547.57.347.07.9Known groups validity equal across 2 PROMIS short formsCook et al, QOLR, 2011
34Use generic measures as the foundation for PRO content validity Supplement with targeted measuresItem banking allows flexible item choice without loss of a standard scoring baseAlternative is a messy array of contenders that fail to communicate across themselves regarding severity or result interpretation
35The “Promise” of PROMIS Instruments ComparabilityProvide the ability to compare or combine results from multiple studies.Reliability and ValidityReduce response burden.Improve measurement precision.Simplify administration via computer-based administration, scoring, and reporting
36Questions. PROMIS website www. nihpromis Questions? PROMIS website Acknowledgements National Institutes of Health (Grants U54 AR , U05 AR , U01 AR ) PROMIS Pis: David Cella, Richard Gershon, San Keller, Joan Broderick, Arthur Stone, Heidi Crane, Paul Crane, Donald Patrick, Dagmar Amtmann, Karon Cook, Darren DeWalt, Chris Forrest, jim fries, stephen haley, david tulsky, dinesh khanna, brennan spiegel, paul pilkonis, carol moinpour, arnold potosky, esi morgan dewitt, lisa shulman, kevin weinfurt
39MID Methods IRT-based MIDs on a T-Score scale Multiple cross-sectional and longitudinal anchors (18)Summarized with nonparametric statistics (median, interquartile range)Due to the large number of anchor-based MID estimates calculated, we summarized the results using non parametric statistics, namely medians and interquartile ranges.All self-reported anchors (not clinical). Longitudinal anchors were prospective (e.g., performance status) or retrospective (e.g., global rating of change)The MID for a scale should be larger than its measurement error. To ensure this, the lower bound of the anchor-based MID range was not allowed to go below the SEM. The SEM for the IRT-based MIDs was the average standard error for the sample. The SEM for the raw score MIDs was based on the standard formula (SEM=SD*sqrt(1-r)In the cross-sectional analysis, anchors were used to categorize patients into multiple clinically-distinct groups. Many different anchors can be used for this purpose, provided individuals can be classified into distinct categories that are both clinically relevant but also minimally different. Score differences between adjacent, clinically distinct categories represent estimates of the MID. Effect sizes for these estimates were computed by dividing the adjacent category score difference by the overall SD for the sample.Both prospective and retrospective anchors were used. For prospective, we identified the amount of change in the anchor that represents minimally important change (e.g., 1-pt change in ECOG, 2-pt change on 0-10 pain scale, or change >=MID for a multi-item scale like FACIT-Fatigue). The MID is the mean score change within those categories of minimally important change (decline or improvement). Mean change scores on the PROMIS-Cancer scales corresponding to GRC item responses of +1 or +2 (“a little better,” “moderately better”) and -1 or -2 (“a little worse,” “moderately worse”) were considered estimates of the MID.
40T-Score MID Effect Sizes* Raw Score MID Effect Sizes‡ Short Form ResultsRecommended IRT-based T-Score MIDs and Raw Score MIDs for PROMIS-Cancer Short Forms in Advanced Cancer PatientsInstrumentT-Score MID PointsT-Score MID Effect Sizes*Raw Score MID PointsRaw Score MID Effect Sizes‡Fatigue3 -52-3Pain Interference4 -64-7Physical Function3-6Anxiety3-53-4Depression*Calculated as the T-Score MID divided by the Assessment 1 T-Score standard deviation‡Calculated as the Raw Score MID divided by the Assessment 1 Raw Score standard deviationAll lower bounds of the MID ranges were greater than the SEMs for both IRT and raw score MIDs.Although methods and results for IRT-based T-Scores were presented in detail in this presentation, the exact same methods were used to derive the raw score MIDs. That is, we did not simply take the IRT-based MIDs and transform them to a raw score scale.7-item fatigue SF
41T-Score MID Effect Sizes CAT ResultsRecommended IRT-based T-Score MIDs for PROMIS-Cancer CATsCATT-Score MID PointsT-Score MID Effect SizesFatiguePain InterferencePhysical FunctionAnxietyDepressionAll lower bounds of the MID ranges were greater than the SEMs for both IRT and raw score MIDs.