Evaluating Self-Report Data Using Psychometric Methods Ron D. Hays, PhD February 6, 2008 (3:30-6:30pm) HS 249F.

Slides:



Advertisements
Similar presentations
ASSESSING RESPONSIVENESS OF HEALTH MEASUREMENTS. Link validity & reliability testing to purpose of the measure Some examples: In a diagnostic instrument,
Advertisements

RELIABILITY Reliability refers to the consistency of a test or measurement. Reliability studies Test-retest reliability Equipment and/or procedures Intra-
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.
Part II Sigma Freud & Descriptive Statistics
Part II Sigma Freud & Descriptive Statistics
Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.
QUANTITATIVE DATA ANALYSIS
Concept of Measurement
Cross-Cultural Use of Measurements: Development of the Chinese SF-36 Health Survey Xinhua S. Ren, Ph.D. Boston University School of Public Health, Boston,
Chapter 14 Conducting & Reading Research Baumgartner et al Chapter 14 Inferential Data Analysis.
UNDERSTANDING RESEARCH RESULTS: STATISTICAL INFERENCE © 2012 The McGraw-Hill Companies, Inc.
The Excel NORMDIST Function Computes the cumulative probability to the value X Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc
Measurement Joseph Stevens, Ph.D. ©  Measurement Process of assigning quantitative or qualitative descriptions to some attribute Operational Definitions.
Chapter 14 Inferential Data Analysis
1 8/14/2015 Evaluating the Significance of Health-Related Quality of Life Change in Individual Patients Ron Hays October 8, 2004 UCLA GIM/HSR.
1 Health-Related Quality of Life Ron D. Hays, Ph.D. - UCLA Department of Medicine: Division of General Internal Medicine.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
AM Recitation 2/10/11.
Basic Methods for Measurement of Patient-Reported Outcome Measures Ron D. Hays, Ph.D. UCLA/RAND ISOQOL Conference on.
Repeated Measures ANOVA
PTP 560 Research Methods Week 3 Thomas Ruediger, PT.
Instrumentation.
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
Is the Minimally Important Difference Really 0.50 of a Standard Deviation? Ron D. Hays, Ph.D. June 18, 2004.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Statistics 1 The Basics Sherril M. Stone, Ph.D. Department of Family Medicine OSU-College of Osteopathic Medicine.
Which Test Do I Use? Statistics for Two Group Experiments The Chi Square Test The t Test Analyzing Multiple Groups and Factorial Experiments Analysis of.
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
Use of Health-Related Quality of Life Measures to Assess Individual Patients July 24, 2014 (1:00 – 2:00 PDT) Kaiser Permanente Methods Webinar Series Ron.
Patient-Centered Outcomes of Health Care Comparative Effectiveness Research February 3, :00am – 12:00pm CHS 1 Ron D.Hays, Ph.D.
1 Assessing the Minimally Important Difference in Health-Related Quality of Life Scores Ron D. Hays, Ph.D. UCLA Department of Medicine October 25, 2006,
Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.
Tests and Measurements Intersession 2006.
Evaluating Health-Related Quality of Life Measures June 12, 2014 (1:00 – 2:00 PDT) Kaiser Methods Webinar Series Ron D.Hays, Ph.D.
Reliability & Agreement DeShon Internal Consistency Reliability Parallel forms reliability Parallel forms reliability Split-Half reliability Split-Half.
Measuring Health-Related Quality of Life
1 Finding and Verifying Item Clusters in Outcomes Research Using Confirmatory and Exploratory Analyses Ron D. Hays, PhD May 13, 2003; 1-4 pm; Factor
Measurement Issues by Ron D. Hays UCLA Division of General Internal Medicine & Health Services Research (June 30, 2008, 4:55-5:35pm) Diabetes & Obesity.
Test Score Distribution * Low Variability.
1 Session 6 Minimally Important Differences Dave Cella Dennis Revicki Jeff Sloan David Feeny Ron Hays.
Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire Design and Testing.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Assessing Responsiveness of Health Measurements Ian McDowell, INTA, Santiago, March 20, 2001.
Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire Design and Testing.
Psychometric Evaluation of Questionnaire Design and Testing Workshop December , 10:00-11:30 am Wilshire Suite 710 DATA.
Evaluating Self-Report Data Using Psychometric Methods Ron D. Hays, PhD February 8, 2006 (3:00-6:00pm) HS 249F.
Measurement of Outcomes Ron D. Hays Accelerating eXcellence In translational Science (AXIS) January 17, 2013 (2:00-3:00 pm) 1720 E. 120 th Street, L.A.,
Chapter 13 Understanding research results: statistical inference.
Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire.
Evaluating Health-Related Quality of Life Measures Ron D. Hays, Ph.D. UCLA GIM & HSR February 9, 2015 (9:00-11:50 am) HPM 214, Los Angeles, CA.
Evaluating Multi-item Scales Ron D. Hays, Ph.D. UCLA Division of General Internal Medicine/Health Services Research HS237A 11/10/09, 2-4pm,
Evaluating Multi-Item Scales Health Services Research Design (HS 225B) January 26, 2015, 1:00-3:00pm CHS.
Health-Related Quality of Life in Outcome Studies Ron D. Hays, Ph.D UCLA Division of General Internal Medicine & Health Services Research GCRC Summer Session.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Chapter 2 Norms and Reliability. The essential objective of test standardization is to determine the distribution of raw scores in the norm group so that.
Descriptive Statistics Report Reliability test Validity test & Summated scale Dr. Peerayuth Charoensukmongkol, ICO NIDA Research Methods in Management.
Final Session of Summer: Psychometric Evaluation
Health-Related Quality of Life Assessment in Outcome Studies
Natalie Robinson Centre for Evidence-based Veterinary Medicine
15.1 The Role of Statistics in the Research Process
Evaluating the Psychometric Properties of Multi-Item Scales
Evaluating Multi-item Scales
Multitrait Scaling and IRT: Part I
Evaluating Multi-item Scales
Introduction to Psychometric Analysis of Survey Data
Evaluating the Significance of Individual Change
UCLA Department of Medicine
Presentation transcript:

Evaluating Self-Report Data Using Psychometric Methods Ron D. Hays, PhD February 6, 2008 (3:30-6:30pm) HS 249F

H E A L T H Individual Change Interest in –Knowing how many patients benefit from group intervention, or –Tracking progress on individual patients Sample –54 patients –Average age = 56; 84% white; 58% female Method –Self-administered SF-36 version 2 at baseline and at end of therapy (about 6 weeks later).

H E A L T H Physical Functioning and Emotional Well-Being at Baseline for 54 Patients at UCLA-Center for East West Medicine Hays et al. (2000), American Journal of Medicine

H E A L T H Change in SF-36 Scores Over Time Effect Size

H E A L T H t-test for within group change X D /(SD d /n ½ ) X D /(SD d /n ½ ) X D = is mean difference, SD d = standard deviation of difference

H E A L T H Significance of Group Change (T-scores) Changet-testprob. PF RP BP GH EN SF RE <- EWB PCS MCS

H E A L T H Reliable Change Index (X 2 – X 1 )/ (SEM * SQRT [2]) SEM = SD b * (1- reliability) 1/2

H E A L T H Amount of Change in Observed Score Needed for Significant Individual Change RCIEffect size PF RP BP GH EN SF RE EWB PCS MCS

H E A L T H Significant Change for 54 Cases % Improving % Declining Difference PF-1013% 2%+ 11% RP-431% 2%+ 29% BP-222% 7%+ 15% GH-5 7% 0%+ 7% EN-4 9% 2%+ 7% SF-217% 4%+ 13% RE-315% 0% EWB-519% 4%+ 15% PCS24% 7%+ 17% MCS22%11%+ 11%

H E A L T H Multiple Steps in Developing Good Survey Review literature Expert input (patients and clinicians) Define constructs you are interested in Draft items (item generation) Pretest – Cognitive interviews – Field and pilot testing Revise and test again Translate/harmonize across languages

H E A L T H What’s a Good Measure? Same person gets same score (reliability) Different people get different scores (validity) People get scores you expect (validity) It is practical to use (feasibility)

H E A L T H Scales of Measurement and Their Properties Nominal Ordinal + Interval + + Ratio Type of Scale Rank Order Equal Interval Absolute 0 Property of Numbers

H E A L T H Measurement Range for Health Outcome Measures NominalOrdinalIntervalRatio

H E A L T H Indicators of Acceptability Unit non-response Item non-response Administration time

H E A L T H Variability All scale levels are represented Distribution approximates bell-shaped "normal"

H E A L T H Measurement Error observed = true score + systematic error + random error (bias)

H E A L T H Coverage Error Does each person in population have an equal chance of selection? Sampling Error Are only some members of the population sampled? Nonresponse Error Do people in the sample who respond differ from those who do not? Measurement Error Are inaccurate answers given to survey questions? Four Types of Data Collection Errors

H E A L T H Flavors of Reliability Test-retest (administrations) Intra-rater (raters) Internal consistency (items)

H E A L T H Test-retest Reliability of MMPI r = 0.75 MMPI 317 True False True False MMPI I am more sensitive than most other people.

H E A L T H Kappa Coefficient of Agreement (Corrects for Chance) (observed - chance) kappa = (1 - chance)

H E A L T H Example of Computing KAPPA Column Sum Rater B Row Sum Rater A

H E A L T H Example of Computing KAPPA (Continued) P = (1 x 2) + (3 x 2) + (2 x 2) + (2 x 2) + (2 x 2) (10 x 10) = 0.20 c P = 9 10 = 0.90 obs. Kappa = = 0.87

H E A L T H Conclusion Kappa Poor < 0.0 Slight Poor <.40 Fair Fair Moderate Good Substantial Excellent >.74 Almost perfect Fleiss (1981) Landis and Koch (1977) Guidelines for Interpreting Kappa

H E A L T H Intraclass Correlation and Reliability ModelReliability Intraclass Correlation One-Way MS - MS MS - MS MS MS + (K-1)MS Two-Way MS - MS MS - MS Fixed MS MS + (K-1)MS Two-Way N (MS - MS ) MS - MS Random NMS +MS - MS MS + (K-1)MS + K (MS - MS )/N BMS JMS EMS BMS WMS BMS EMS BMS WMS BMS EMS BMS EMS BMS EMS JMS EMS WMS BMS EMS

H E A L T H Summary of Reliability of Plant Ratings Baseline Follow-up R TT R II R TT R II One-Way Anova Two-Way Random Effects Two-Way Fixed Effects SourceLabel Baseline MS PlantsBMS WithinWMS RatersJMS Raters X PlantsEMS

H E A L T H Raw Data for Ratings of Height ( 1/16 inch) of Houseplants (A1, A2, etc.) by Two Raters (R1, R2) A1 R R A2 R R B1 R R B2 R R C1 R R Plant Baseline Height Follow-up Height Experimental Condition

H E A L T H Ratings of Height of Houseplants (Cont.) C2 R R D1 R R D2 R R E1 R R E2 R R Plant Baseline Height Follow-up Height Experimental Condition

H E A L T H Reliability of Baseline Houseplant Ratings SourceDFSSMSF Plants Within Raters Raters x Plants Total Baseline Results Ratings of Height of Plants: 10 plants, 2 raters

H E A L T H Sources of Variance in Baseline Houseplant Height Source dfs MS Plants (N) (BMS) Within (WMS) Raters (K) (JMS) Raters x Plants (EMS) Total19

H E A L T H Cronbach’s Alpha Respondents (BMS) Items (JMS) Resp. x Items (EMS) Total Source df SSMS Alpha = = 1.8 =

H E A L T H Alpha for Different Numbers of Items and Homogeneity Number of Items (k) Average Inter-item Correlation ( r ) Alpha st = k * r 1 + (k -1) * r

H E A L T H Spearman-Brown Prophecy Formula alpha y = N alpha x 1 + (N - 1) * alpha x N = how much longer scale y is than scale x ) (

H E A L T H Example Spearman-Brown Calculations MHI-18 18/32 (0.98) (1+(18/32 –1)*0.98 = / = 0.96

H E A L T H Number of Items and Reliability for Three Versions of the Mental Health Inventory (MHI)

H E A L T H Reliability Minimum Standards 0.70 or above (for group comparisons) 0.90 or higher (for individual assessment)  SEM = SD (1- reliability) 1/2

H E A L T H Reliability of a Composite Score

H E A L T H Hypothetical Multitrait/Multi-Item Correlation Matrix

H E A L T H Multitrait/Multi-Item Correlation Matrix for Patient Satisfaction Ratings Technical Interpersonal Communication Financial Technical 10.66*0.63†0.67† *0.54†0.50† * † * † *0.60†0.56† *0.58†0.57†0.23 Interpersonal *0.63† †0.58*0.61† †0.65*0.67† †0.57*0.60† *0.58† †0.48*0.46†0.24 Note – Standard error of correlation is Technical = satisfaction with technical quality. Interpersonal = satisfaction with the interpersonal aspects. Communication = satisfaction with communication. Financial = satisfaction with financial arrangements. *Item-scale correlations for hypothesized scales (corrected for item overlap). †Correlation within two standard errors of the correlation of the item with its hypothesized scale.

H E A L T H Construct Validity Does measure relate to other measures in ways consistent with hypotheses? Responsiveness to change including minimally important difference

H E A L T H

Construct Validity for Scales Measuring Physical Functioning NoneMildSevereF-ratio Relative Validity Scale #1 Scale #2 Scale #3 Severity of Heart Disease

H E A L T H Responsiveness to Change and Minimally Important Difference (MID) HRQOL measures should be responsive to interventions that changes HRQOL Need e xternal indicators of change (Anchors) –mean change in HRQOL scores among people who have changed (“minimal” change for MID).

H E A L T H Self-Report Indicator of Change Overall has there been any change in your asthma since the beginning of the study? Much improved; Moderately improved; Minimally improved No change Much worse; Moderately worse; Minimally worse

H E A L T H Clinical Indicator of Change –“changed” group = seizure free (100% reduction in seizure frequency) –“unchanged” group = <50% change in seizure frequency

H E A L T H Responsiveness Indices (1) Effect size (ES) = D/SD (2) Standardized Response Mean (SRM) = D/SD † (3) Guyatt responsiveness statistic (RS) = D/SD ‡ D = raw score change in “changed” group; SD = baseline SD; SD † = SD of D; SD ‡ = SD of D among “unchanged”

H E A L T H Effect Size Benchmarks Small: 0.20->0.49 Moderate: 0.50->0.79 Large: 0.80 or above

H E A L T H Treatment Impact on PCS

H E A L T H Treatment Impact on MCS

H E A L T H IRT

H E A L T H Latent Trait and Item Responses Latent Trait Item 1 Response P(X 1 =1) P(X 1 =0) 1 0 Item 2 Response P(X 2 =1) P(X 2 =0) 1 0 Item 3 Response P(X 3 =0) 0 P(X 3 =2) 2 P(X 3 =1) 1

H E A L T H Item Responses and Trait Levels Item 1 Item 2 Item 3 Person 1Person 2Person 3 Trait Continuum

H E A L T H Item Characteristic Curves (1-Parameter Model) Prob. of “Yes”

H E A L T H Item Characteristic Curves (2-Parameter Model)

H E A L T H DIF – Location (Item 1) DIF – Slope (Item 2) Hispanics Whites Hispanics Dichotomous Items Showing DIF (2-Parameter Model)

H E A L T H