Psychometric Evaluation of Questionnaire Design and Testing Workshop December 7 2015, 10:00-11:30 am 10940 Wilshire Suite 710 DATA.

Slides:

Advertisements

Similar presentations

Test Development.

Advertisements

Estimating Reliability RCMAR/EXPORT Methods Seminar Series Drew (Cobb Room 131)/ UCLA (2 nd Floor at Broxton) December 12, 2011.

ASSESSING RESPONSIVENESS OF HEALTH MEASUREMENTS. Link validity & reliability testing to purpose of the measure Some examples: In a diagnostic instrument,

Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.

Psychological Testing Principle Types of Psychological Tests  Mental ability tests Intelligence – general Aptitude – specific  Personality scales Measure.

Part II Sigma Freud & Descriptive Statistics

Part II Sigma Freud & Descriptive Statistics

How does PROMIS compare to other HRQOL measures? Ron D. Hays, Ph.D. UCLA, Los Angeles, CA Symposium 1077: Measurement Tools to Enhance Health- Related.

Why Patient-Reported Outcomes Are Important: Growing Implications and Applications for Rheumatologists Ron D. Hays, Ph.D. UCLA Department of Medicine RAND.

Methods for Assessing Safety Culture: A View from the Outside October 2, 2014 (9:30 – 10:30 EDT) Safety Culture Conference (AHRQ Watts Branch Conference.

1 Evaluating Multi-Item Scales Using Item-Scale Correlations and Confirmatory Factor Analysis Ron D. Hays, Ph.D. RCMAR.

Evaluating Multi-Item Scales Health Services Research Design (HS 225A) November 13, 2012, 2-4pm (Public Health)

Psychometric Evaluation of Survey Data Questionnaire Design and Testing Workshop October 24, 2014, 3:30-5:00pm Wilshire Blvd. Suite 710 Los Angeles,

Item Response Theory in Patient-Reported Outcomes Research Ron D. Hays, Ph.D., UCLA Society of General Internal Medicine California-Hawaii Regional Meeting.

Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.

Item Response Theory for Survey Data Analysis EPSY 5245 Michael C. Rodriguez.

Primer on Evaluating Reliability and Validity of Multi-Item Scales Questionnaire Design and Testing Workshop October 25, 2013, 3:30-5:00pm Wilshire.

1 Health-Related Quality of Life as an Indicator of Quality of Care Ron D. Hays, Ph.D. HS216—Quality Assessment: Making the Business Case.

Instrumentation.

Foundations of Educational Measurement

Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.

McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.

Use of Health-Related Quality of Life Measures to Assess Individual Patients July 24, 2014 (1:00 – 2:00 PDT) Kaiser Permanente Methods Webinar Series Ron.

Patient-Centered Outcomes of Health Care Comparative Effectiveness Research February 3, :00am – 12:00pm CHS 1 Ron D.Hays, Ph.D.

Health-Related Quality of Life as an Indicator of Quality of Care May 4, 2014 (8:30 – 11:30 PDT) HPM216: Quality Assessment/ Making the Business Case for.

SAS PROC IRT July 20, 2015 RCMAR/EXPORT Methods Seminar 3-4pm Acknowledgements: - Karen L. Spritzer - NCI (1U2-CCA )

Tests and Measurements Intersession 2006.

Evaluating Health-Related Quality of Life Measures June 12, 2014 (1:00 – 2:00 PDT) Kaiser Methods Webinar Series Ron D.Hays, Ph.D.

Measuring Health-Related Quality of Life Ron D. Hays, Ph.D. UCLA Department of Medicine RAND Health Program UCLA Fielding School of Public Health

Measuring Health-Related Quality of Life

6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)

Measurement Issues by Ron D. Hays UCLA Division of General Internal Medicine & Health Services Research (June 30, 2008, 4:55-5:35pm) Diabetes & Obesity.

Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.

Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.

Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”

Item Response Theory (IRT) Models for Questionnaire Evaluation: Response to Reeve Ron D. Hays October 22, 2009, ~3:45-4:05pm

Evaluating Self-Report Data Using Psychometric Methods Ron D. Hays, PhD February 6, 2008 (3:30-6:30pm) HS 249F.

Multi-item Scale Evaluation Ron D. Hays, Ph.D. UCLA Division of General Internal Medicine/Health Services Research

Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire Design and Testing.

Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire Design and Testing.

Evaluating Self-Report Data Using Psychometric Methods Ron D. Hays, PhD February 8, 2006 (3:00-6:00pm) HS 249F.

Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.

Item Response Theory in Health Measurement

Ron D. Hays, Ph.D. Joshua S. Malett, Sarah Gaillot, and Marc N. Elliott Physical Functioning Among Medicare Beneficiaries International Society for Quality.

Chapter 6 - Standardized Measurement and Assessment

Measurement of Outcomes Ron D. Hays Accelerating eXcellence In translational Science (AXIS) January 17, 2013 (2:00-3:00 pm) 1720 E. 120 th Street, L.A.,

Considerations in Comparing Groups of People with PROs Ron D. Hays, Ph.D. UCLA Department of Medicine May 6, 2008, 3:45-5:00pm ISPOR, Toronto, Canada.

Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire.

Overview of Item Response Theory Ron D. Hays November 14, 2012 (8:10-8:30am) Geriatrics Society of America (GSA) Pre-Conference Workshop on Patient- Reported.

Evaluating Multi-item Scales Ron D. Hays, Ph.D. UCLA Division of General Internal Medicine/Health Services Research HS237A 11/10/09, 2-4pm,

Evaluating Multi-Item Scales Health Services Research Design (HS 225B) January 26, 2015, 1:00-3:00pm CHS.

Reducing Burden on Patient- Reported Outcomes Using Multidimensional Computer Adaptive Testing Scott B. MorrisMichael Bass Mirinae LeeRichard E. Neapolitan.

Chapter 2 Norms and Reliability. The essential objective of test standardization is to determine the distribution of raw scores in the norm group so that.

Instrument Development and Psychometric Evaluation: Scientific Standards May 2012 Dynamic Tools to Measure Health Outcomes from the Patient Perspective.

Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 25 Critiquing Assessments Sherrilene Classen, Craig A. Velozo.

Evaluating Patient-Reports about Health

Psychometric Evaluation of Items Ron D. Hays

UCLA Department of Medicine

Evaluating Patient-Reports about Health

UCLA Department of Medicine

Evaluating Multi-Item Scales

Evaluation of measuring tools: validity

Evaluating the Psychometric Properties of Multi-Item Scales

Evaluating Multi-item Scales

Multitrait Scaling and IRT: Part I

Evaluating Multi-item Scales

Evaluating Multi-Item Scales

Psychometric testing and validation (Multi-trait scaling and IRT)

Evaluating the Significance of Individual Change

UCLA Department of Medicine

Presentation transcript:

Psychometric Evaluation of Questionnaire Design and Testing Workshop December , 10:00-11:30 am Wilshire Suite 710 DATA

Patient-Reported Outcomes (PROs) “Any report of the status of a patient’s health condition that comes directly from the patient, without interpretation of the patient’s response by a clinician or anyone else” –Patient reports about their health What they can do (functioning) How they feel (well-being) 2

PRO Development Process 3 ces/UCM pdf

PRO Development Process 4 ces/UCM pdf

PRO Iterative Development Process 5 ces/UCM pdf

PRO Iterative Development Process 6 ces/UCM pdf

Physical Functioning Ability to conduct a variety of activities ranging from self-care to running Predictor of –Hospitalizations, institutionalization, and mortality Six physical functioning items included in 2010 Consumer Assessment of Healthcare Providers and Systems (CAHPS®) Medicare Survey 7

Because of a health or physical problem are you unable to do or have any difficulty doing the following activities? Walking? Getting in our out of chairs? Bathing? Dressing? Using the toilet? Eating? –I am unable to do this activity (0) –Yes, I have difficulty (1) –No, I do not have difficulty (2) 8

Medicare beneficiary sample (n = 366,701) 58% female 57% high school education or less 14% 18-64; 48% 65-74, 29% 75-84, 9% 85+ 9

Normal Curve Bell-shaped “normal” curve (68.2%, 95.4%, and 99.6%)

11

12

Item-Scale Correlations ItemItem-Scale Correlations Walking (0, 1, 2)0.71 Chairs (0, 1, 2)0.80 Bathing (0, 1, 2)0.83 Dressing (0, 1, 2)0.86 Toileting (0, 1, 2)0.84 Eating (0, 1, 2)0.75 Possible 6-item scale range: 0-12 (2% floor, 65% ceiling)

Confirmatory Factor Analysis (Polychoric* Correlations) Dressing Eating Bathing Walking Chairs Toileting * Estimated correlation between two underlying normally distributed continuous variables Residual correlations <= 0.04

Reliability Degree to which the same score is obtained when the target or thing being measured (person, plant or whatever) has not changed. Internal consistency (items) Need 2 or more items Test-retest (administrations) correlations Need 2 or more time points 15

Reliability ModelIntraclass CorrelationReliability One- way Two- way mixed Two-way random BMS = Between Ratee Mean Square N = n of ratees WMS = Within Mean Square k = n of items or raters JMS = Item or Rater Mean Square EMS = Ratee x Item (Rater) Mean Square 16

Reliability Formulas ModelIntraclass CorrelationReliability One- way Two- way mixed Two-way random BMS = Between Ratee Mean Square N = n of ratees WMS = Within Mean Square k = n of items or raters JMS = Item or Rater Mean Square EMS = Ratee x Item (Rater) Mean Square 17

Internal Consistency Reliability (Coefficient Alpha) Coefficient alpha = 0.92 (MS bms – MS ems )/MS bms Ordinal alpha = dings14/ pdf

19 Item-scale correlation matrix (“Multi-trait Scaling”)

20 Item-scale correlation matrix

Item Response Theory (IRT) IRT models the relationship between a person’s response Y i to the question (i) and his or her level of the latent construct (  ) being measured by positing b ik estimates how difficult it is to have a score of k or more on item (i). a i estimates the discriminatory power of the item.

Normal Curve (bell-shaped) z = -1 to 1 (68.2%); z = -2 to 2 (95.4%); z = -3 to 3 (99.6%)

23

24

25

Threshold #1 Parameter (Graded Response Model) Physical Functioning1 st Threshold Unable to do (lowest 2%) Walking-1.86 Chairs-1.91 Bathing-1.72 Dressing-1.78 Toileting-1.87 Eating

Threshold #2 Parameter (Graded Response Model) Physical Functioning 2 nd Threshold Unable to do or have difficulty Walking-0.55 (lowest 32%) Chairs-0.81 Bathing-1.02 Dressing-1.10 Toileting-1.27 Eating-1.53 (lowest 9%) 27

Item Parameters (Graded Response Model) Physical Functioning 1 st Threshold Unable to do 2 nd Threshold Have difficulty Slope (Discrimination) Walking Chairs Bathing Dressing Toileting Eating

29

30

31 Reliability = (Info – 1) / Info Reliability = 0.95 Reliability = 0.90

Validity Content validity: Does measure “appear” to reflect what it is intended to (expert judges or patient judgments)? –Do items operationalize concept? –Do items cover all aspects of concept? –Does scale name represent item content? Construct validity –Are the associations of the measure with other variables consistent with hypotheses? 32

Physical Function Scale Correlations r = 0.39 (self-rated general health) r = (number of chronic conditions) Cohen's rule of thumb for correlations that correspond to effect size rules of 0.20 SD, 0.50 SD and 0.80 SD are as follows: is small correlation is medium correlation is large correlation (r's of 0.10, 0.30 and 0.50 are often cited as small, medium and large, respectively). 33

Summary The 6 physical functioning items target relatively easy activities Items representing higher levels of physical functioning are needed for the majority of Medicare beneficiaries. –Lifting or carrying groceries –Doing chores like vacuuming or yard work –Running a short distance 34

Differential Item Functioning (DIF) Probability of choosing each response category should be the same for those who have the same estimated scale score, regardless of other characteristics Evaluation of DIF by subgroups 35

Location DIF Slope DIF DIF (2-parameter model) Women Men AA White Higher Score = More Depressive Symptoms I cry when upsetI get sad for no reason 36

Item Responses and Trait Levels Item 1 Item 2 Item 3 Person 1Person 2Person 3 Trait Continuum

Computer Adaptive Testing (CAT)

Reliability Target for Use of Measures with Individuals  Reliability ranges from 0-1  0.90 or above is goal  SE = SD (1- reliability) 1/2  Reliability = 1 – (SE/10) 2  Reliability = 0.90 when SE = 3.2  95 % CI = true score +/ x SE 39

In the past 7 days … I was grouchy [1 st question] –Never [39] –Rarely [48] –Sometimes [56] –Often [64] –Always [72] Estimated Anger = 56.1 SE = 5.7 (rel. = 0.68) 40

In the past 7 days … I felt like I was ready to explode [2 nd question] –Never –Rarely –Sometimes –Often –Always Estimated Anger = 51.9 SE = 4.8 (rel. = 0.77) 41

In the past 7 days … I felt angry [3 rd question] –Never –Rarely –Sometimes –Often –Always Estimated Anger = 50.5 SE = 3.9 (rel. = 0.85) 42

In the past 7 days … I felt angrier than I thought I should [4 th question] - Never –Rarely –Sometimes –Often –Always Estimated Anger = 48.8 SE = 3.6 (rel. = 0.87) 43

In the past 7 days … I felt annoyed [5 th question] –Never –Rarely –Sometimes –Often –Always Estimated Anger = 50.1 SE = 3.2 (rel. = 0.90) 44

In the past 7 days … I made myself angry about something just by thinking about it. [6 th question] –Never –Rarely –Sometimes –Often –Always Estimated Anger = 50.2 SE = 2.8 (rel = 0.92) (95% CI: ) 45

Recommended Reading Cappelleri, J. C., Lundy, J.J., & Hays, R. D. (2014). Overview of classical test theory and item response theory for quantitative assessment of items in developing patient-reported outcome measures. Clinical Therapeutics, 36 (5),

( ). / /