Assessing Responsiveness of Health Measurements Ian McDowell, INTA, Santiago, March 20, 2001.

Slides:



Advertisements
Similar presentations
Development of Patient-Centered Questionnaires FDA/Industry Workshop – Washington DC 2005 Cindy Rodenberg, Ph.D Procter & Gamble Pharmaceuticals.
Advertisements

Standardized Scales.
ASSESSING RESPONSIVENESS OF HEALTH MEASUREMENTS. Link validity & reliability testing to purpose of the measure Some examples: In a diagnostic instrument,
The Research Consumer Evaluates Measurement Reliability and Validity
Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.
VALIDITY AND RELIABILITY
Part II Sigma Freud & Descriptive Statistics
Statistical Decision Making
Concept of Measurement
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 5 Making Systematic Observations.
Today Concepts underlying inferential statistics
Research Methods in MIS
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Rosnow, Beginning Behavioral Research, 5/e. Copyright 2005 by Prentice Hall Ch. 6: Reliability and Validity in Measurement and Research.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Measurement and Data Quality
Quality Assessment 2 Quality Control.
Difference Two Groups 1. Content Experimental Research Methods: Prospective Randomization, Manipulation Control Research designs Validity Construct Internal.
BASIC STATISTICS: AN OXYMORON? (With a little EPI thrown in…) URVASHI VAID MD, MS AUG 2012.
Experimental Research
Ch 6 Validity of Instrument
Chapter 11 Research Methods in Behavior Modification.
PTP 560 Research Methods Week 3 Thomas Ruediger, PT.
Instrument Validity & Reliability. Why do we use instruments? Reliance upon our senses for empirical evidence Senses are unreliable Senses are imprecise.
Instrumentation.
Foundations of Educational Measurement
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.
T tests comparing two means t tests comparing two means.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
One-sample In the previous cases we had one sample and were comparing its mean to a hypothesized population mean However in many situations we will use.
Reliability Chapter 3.  Every observed score is a combination of true score and error Obs. = T + E  Reliability = Classical Test Theory.
Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.
Validity and Reliability Edgar Degas: Portraits in a New Orleans Cotton Office, 1873.
Reliability & Validity
EVIDENCE ABOUT DIAGNOSTIC TESTS Min H. Huang, PT, PhD, NCS.
Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 4:Reliability and Validity.
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.
Measurement Validity.
Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.
VALIDITY AND VALIDATION: AN INTRODUCTION Note: I have included explanatory notes for each slide. To access these, you will probably have to save the file.
Sample Size Considerations for Answering Quantitative Research Questions Lunch & Learn May 15, 2013 M Boyle.
Evaluating Impacts of MSP Grants Hilary Rhodes, PhD Ellen Bobronnikov February 22, 2010 Common Issues and Recommendations.
Research Ethics:. Ethics in psychological research: History of Ethics and Research – WWII, Nuremberg, UN, Human and Animal rights Today - Tri-Council.
1 Session 6 Minimally Important Differences Dave Cella Dennis Revicki Jeff Sloan David Feeny Ron Hays.
Evaluating Impacts of MSP Grants Ellen Bobronnikov Hilary Rhodes January 11, 2010 Common Issues and Recommendations.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
T tests comparing two means t tests comparing two means.
Measurement MANA 4328 Dr. Jeanne Michalski
MEASUREMENT: PART 1. Overview  Background  Scales of Measurement  Reliability  Validity (next time)
Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.
Chapter Eight: Quantitative Methods
1 Collecting and Interpreting Quantitative Data Deborah K. van Alphen and Robert W. Lingard California State University, Northridge.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
Chapter 11: Test for Comparing Group Means: Part I.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 25 Critiquing Assessments Sherrilene Classen, Craig A. Velozo.
Evaluation of measuring tools: validity
Understanding Results
Reliability & Validity
Analyzing Reliability and Validity in Outcomes Assessment Part 1
Quality Control Lecture 3
Analyzing Reliability and Validity in Outcomes Assessment
Quality Assessment The goal of laboratory analysis is to provide the accurate, reliable and timeliness result Quality assurance The overall program that.
Collecting and Interpreting Quantitative Data
Presentation transcript:

Assessing Responsiveness of Health Measurements Ian McDowell, INTA, Santiago, March 20, 2001

Link Purpose of Measure to Validation Method For example: In a diagnostic instrument, inter-rater and test-retest reliability are important; For an evaluative measure, internal consistency is paramount. For a prognostic or diagnostic instrument, criterion validity is relevant; For an evaluative measure, construct validation is central.

Responsiveness For outcome measures, sensitivity to change is a crucial characteristic ‘Responsiveness’ refers to how sensitive a measure is in indicating change over time or contrast between groups Normally considered an element of validity for an evaluative measurement.

(Responsiveness, cont’d) There is little consensus over how responsiveness should be assessed. This may be because responsiveness requires a finer breakdown than is normally given; Different facets of responsiveness are relevant to different types of measure.

Conceptions of Responsiveness The smallest change that could potentially be detected; The smallest change that could reliably be detected beyond error; The change typically observed in a population; The change observed in the subset of the population judged to have changed; The change seen in those judged to have made an important change.

Preliminary Decisions (Before We Begin!) What parameter is to be measured? (Pain, QoL, etc.) Whose perspective is important: the patient’s, the clinician’s or society’s? What if these conflict? Responsive to what? Differences between groups; within a group over time, or to compare changes over time between two groups? Unit of analysis? (Average scores, or individual classification such as a diagnosis?)

Approaches to Estimating Responsiveness 1. Theoretical (equivalent to content validity) 2. Empirical; Internal evaluation (equivalent to concurrent validity) 3. Empirical; External comparison (equivalent to criterion validity)

1. Modeling Approach Content should reflect the types of change expected to occur with the therapy: states, not traits There should be no floor or ceiling effects Scoring must ensure that change is not diluted in other factors that do not vary Scale must have fine enough gradation

2. Internal Empirical Approach Apply scale before & after; calculate effect size statistic Because measurement scales vary, results are expressed in standard deviation units: (M t - M c )/SD c Effect size comparable to a z score: if normal distribution, indicates how many percentiles a patient will move following treatment.

Effect Size Statistics 1. Use a t-test and report statistically significant differences as indicators of responsiveness 2. Removing the n from the denominator to make independent of sample size 3. Denominator can be SD of the baseline scores, or of scores among stable subjects, or of change in scores.

Effect Size Statistics (2) Refinements include correction for level of reliability. E.g., Wyrich proposed standard error of mean in denominator: SEM = SD 1 * %(1-") However, a high alpha does not ensure responsiveness if the measure includes inter-correlated traits that do not change.

Impact of including Alpha in Effect Size Calculation (at difference of 1.5 and SD of 3) Effect size alpha

Comment: Effect Sizes Useful for comparing responsiveness of different health measures Helpful in calculating the power of a study However: Formulae seem somewhat arbitrary Effect sizes offer no indication of the clinical change represented by a given shift in scores

The MID as a Criterion Introduces theme of Minimally Important Difference (MID) and its cousin, the MCID. MCID: “The smallest difference in score in the domain of interest which patients perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive cost, a change in the patient’s management” Estimate internally (using scale itself), or externally (using some other criterion)

Setting Internal MIDs 1. Apply the measurement; select change threshold seen as important by clinical experts. How much would the outcome have to change before they would alter treatment? 2. Present clinicians with written scenarios and compare each with the previous one. MCID = average difference in scores between pairs rated as ‘a little less’ or ‘a little more’.

Externally-Based MIDs Clinicians view patient scenarios and rate whether they changed significantly or not. Patients can judge the change in their own condition: ‘no change’, ‘a little better’, etc. Alternatively, clinically assess patients, then randomly assign pairs of them to hold conversations about their illness, leading to ratings of whether they were ‘better’ than the other, ‘much better’, etc.

3. External Criteria for Responsiveness 1. Establish MID or MCID. Group patients who improve (or deteriorate) > MID and compare to rest using the measure 2. Various statistics: Sensitivity, specificity & ROCs Point-biserial correlations Regression to analyse average scale change on the measure for each MCID unit change

Sensitivity (true positives) 1-specificity (false positives) AIMS2 HAQ SF-36 ROC Curve for 3 Instruments in Detecting an MCID

Questions for Discussion Are MIDs constant across range? (next slide) How can we encourage people to routinely report before and after changes in scores, SD, and alpha? Should we apply a measure to standard scenarios to get X 1 and X 2 & use this to simulate the effect size? How does this all apply to nutritional assessments?

Large None Size of change PoorGoodHealth status Notional Size of an MCID at Various Levels of Overall Health Physical function? Cognition?