MEASUREMENT Goal To develop reliable and valid measures using state-of-the-art measurement models Members: Chang, Berdes, Gehlert, Gibbons, Schrauf, Weiss.

Slides:

Advertisements

Similar presentations

Labeling claims for patient- reported outcomes (A regulatory perspective) FDA/Industry Workshop Washington, DC September 16, 2005 Lisa A. Kammerman, Ph.D.

Advertisements

Introduction to IRT/Rasch Measurement with Winsteps Ken Conrad, University of Illinois at Chicago Barth Riley and Michael Dennis, Chestnut Health Systems.

Implications and Extensions of Rasch Measurement.

Test Development.

Hong Jiao, George Macredy, Junhui Liu, & Youngmi Cho (2012)

The effect of differential item functioning in anchor items on population invariance of equating Anne Corinne Huggins University of Florida.

Research Curriculum Session II –Study Subjects, Variables and Outcome Measures Jim Quinn MD MS Research Director, Division of Emergency Medicine Stanford.

Lecture 3 Validity of screening and diagnostic tests

Addition 1’s to 20.

Principles of Measurement Lunch & Learn Oct 16, 2013 J Tobon & M Boyle.

ASSESSING RESPONSIVENESS OF HEALTH MEASUREMENTS. Link validity & reliability testing to purpose of the measure Some examples: In a diagnostic instrument,

Item Response Theory in a Multi-level Framework Saralyn Miller Meg Oliphint EDU 7309.

Part II Sigma Freud & Descriptive Statistics

Part II Sigma Freud & Descriptive Statistics

Item Response Theory in Health Measurement

PROMIS DEVELOPMENT METHODS, ANALYSES AND APPLICATIONS Presented at the Patient-Reported Outcomes Measurement Information System (PROMIS): A Resource for.

Introduction to Item Response Theory

AN OVERVIEW OF THE FAMILY OF RASCH MODELS Elena Kardanova

Models for Measuring. What do the models have in common? They are all cases of a general model. How are people responding? What are your intentions in.

Overview of field trial analysis procedures National Research Coordinators Meeting Windsor, June 2008.

Latent Change in Discrete Data: Rasch Models

Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.

© UCLES 2013 Assessing the Fit of IRT Models in Language Testing Muhammad Naveed Khalid Ardeshir Geranpayeh.

Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.

Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.

1 Reducing the duration and cost of assessment with the GAIN: Computer Adaptive Testing.

Computerized Adaptive Testing: What is it and How Does it Work?

Measurement and Data Quality

Identification of Misfit Item Using IRT Models Dr Muhammad Naveed Khalid.

Item Response Theory for Survey Data Analysis EPSY 5245 Michael C. Rodriguez.

Item Response Theory. What’s wrong with the old approach? Classical test theory –Sample dependent –Parallel test form issue Comparing examinee scores.

1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.

Modern Test Theory Item Response Theory (IRT). Limitations of classical test theory An examinee’s ability is defined in terms of a particular test The.

Instrumentation.

Measuring Mathematical Knowledge for Teaching: Measurement and Modeling Issues in Constructing and Using Teacher Assessments DeAnn Huinker, Daniel A. Sass,

PROMIS ® : Advancing the Science of PRO measurement Common Data Elements NIH CDE Webinar September 8, 2015 Ashley Wilder Smith, PhD, MPH Chief, Outcomes.

Multiple Perspectives on CAT for K-12 Assessments: Possibilities and Realities Alan Nicewander Pacific Metrics 1.

MEASUREMENT: SCALE DEVELOPMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.

Pearson Copyright 2010 Some Perspectives on CAT for K-12 Assessments Denny Way, Ph.D. Presented at the 2010 National Conference on Student Assessment June.

Item Response Theory (IRT) Models for Questionnaire Evaluation: Response to Reeve Ron D. Hays October 22, 2009, ~3:45-4:05pm

Assessing Responsiveness of Health Measurements Ian McDowell, INTA, Santiago, March 20, 2001.

NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 22, 2011 ORLANDO, FL.

Psychometric Evaluation of Questionnaire Design and Testing Workshop December , 10:00-11:30 am Wilshire Suite 710 DATA.

Item Response Theory in Health Measurement

Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.

Item Response Theory Dan Mungas, Ph.D. Department of Neurology

Considerations in Comparing Groups of People with PROs Ron D. Hays, Ph.D. UCLA Department of Medicine May 6, 2008, 3:45-5:00pm ISPOR, Toronto, Canada.

Item Response Theory Dan Mungas, Ph.D. Department of Neurology University of California, Davis.

Overview of Item Response Theory Ron D. Hays November 14, 2012 (8:10-8:30am) Geriatrics Society of America (GSA) Pre-Conference Workshop on Patient- Reported.

The Invariance of the easyCBM® Mathematics Measures Across Educational Setting, Language, and Ethnic Groups Joseph F. Nese, Daniel Anderson, and Gerald.

Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.

© 2009 Pearson Prentice Hall, Salkind. Chapter 5 Measurement, Reliability and Validity.

Instrument Development and Psychometric Evaluation: Scientific Standards May 2012 Dynamic Tools to Measure Health Outcomes from the Patient Perspective.

Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 25 Critiquing Assessments Sherrilene Classen, Craig A. Velozo.

Introduction to ASCQ-MeSM

Vertical Scaling in Value-Added Models for Student Learning

UCLA Department of Medicine

Evaluating Multi-Item Scales

Assessment Research Centre Online Testing System (ARCOTS)

Item Analysis: Classical and Beyond

Introduction to ASCQ-Me®

The psychometrics of Likert surveys: Lessons learned from analyses of the 16pf Questionnaire Alan D. Mead.

Introduction to ASCQ-Me®

Mohamed Dirir, Norma Sinclair, and Erin Strauts

A Multi-Dimensional PSER Stopping Rule

Item Analysis: Classical and Beyond

Item Analysis: Classical and Beyond

Item Response Theory Applications in Health Ron D. Hays, Discussant

Presentation transcript:

MEASUREMENT Goal To develop reliable and valid measures using state-of-the-art measurement models Members: Chang, Berdes, Gehlert, Gibbons, Schrauf, Weiss

Why Item Response Theory? Classical Test Theory (Traditional) Item Response Theory (Modern) Measures of precision fixed for all scores Precision measures vary across scores Longer scales increase reliability Shorter, targeted scales can be equally reliable (Short Form) Scale properties are sample dependent Item & scale properties are invariant within a linear transformation (DIF) Comparing person scores dependent on item set Person scores comparable across different item sets (CAT) Comparing respondents requires parallel scales Different scales can be placed on a common metric (Instrument Linking/Equating) Mixed item formats leads to unbalanced impact on total scale scores Easily handles mixed item formats Summed scores are on an ordinal scale Scores on interval scale Graphical tools for item and scale analysis

Item Response Theory (IRT) A family of mathematical descriptions of what happens when a person meets a test or survey question Relates characteristics of items (item parameters) and characteristics of persons (person latent traits) to the probability of a correct or rating/categorical response Models the test-taking behavior at the item level Item response theory (IRT) is a statistical theory consisting of mathematical models expressing the probability of endorsing a particular response to a test or survey item as a function of the abilities or latent traits of the persons and of certain characteristics of the item

Item-Person Map Person Latent Trait Item Location           Poor Good Q Q Q Q Q Q Q Q Q Q Q Q Likely (“easy”) Unlikely (“hard”) Q Q Q Q Q Q Q Q Q Q Q Q Item Location Chang & Gehlert (2002).

Dichotomous Unidimensional IRT Models 1-PL (Rasch) Difficulty (b) 2-PL Discriminating (a) 3-PL Guessing (c)

Polytomous IRT Models Polytomous 1-PL (threshold) Partial Credit Rating Scale 2-PL (threshold & discriminating) Nominal Graded Response Generalized Partial Credit 2=Yes, Limited a little 1=Yes, Limited a lot 3=No, Not Limited at all 1 2 3 * Vigorous activities, such as running, lifting heavy objects, participating in strenuous sports

Potential Advantages of Using IRT in “Geriatric” Pain Assessment Refine existing instruments Evaluate item and scale characteristics Evaluate different response formats Detect differential item functioning Evaluate person fit (clinical diagnosis) Equate/Link instruments Establish item banks and brief forms Develop computerized adaptive testing

Item Banking and CAT A B C D E F Item Pool (Sets of Questions) IRT  Q new Item Pool (Sets of Questions) IRT  Q  Q  Q  Q  Item Bank (Catalogued; Hierarchically Structured) CAT Brief Forms

Principles of Adaptive Testing IRT pre-calibrated item bank Initial item selection Test scoring method Item selection during test administration Stopping rules A procedure for estimating a person’s trait or ability level A procedure fro choosing, from an available item bank, the item that is maximally informative at a person’s current trait-level estimate A termination rule used to discontinue item administration

Item Bank Set of carefully IRT-calibrated questions Items covers entire latent trait continuum Items represent differing amounts of trait Items represent differing amounts of information Basis for tailored/adaptive testing Items can be selected to maximize precision and retain clinical relevance

Item Banking is Inter-disciplinary Psychometricians Information scientists Clinicians/healthcare providers Outcomes researchers Content experts …

Approaches to Develop Item Banks Top-Down Approach Bottom-Up Approach

Development and Maintenance of an Item Bank How to best calibrate existing items? Model selection Whose item parameters to use? Standardization? Generic vs. disease-specific Item parameter drift Anchor or Re-calibrate? How to write and best test new items?

Adaptive Test An adaptive test is a tailored, individualized measure which involves selecting a set of test items for each individual that best measures the psychological characteristics of that person (Weiss, 1985) Weiss DJ. Adaptive testing by computer. J Consult Clin Psychol. Dec 1985;53(6):774-789.

Why Computerized Adaptive Testing? Adaptive testing selects questions based on previous responses Tailored item and test difficulties Eliminates floor and ceiling effects Require fewer questions to arrive at an accurate estimate Automate question administration, data recording, scoring, and prompt reporting Allows for immediate feedback Adaptive testing is a process of test administration in which items are selected on the basis of the examinee’s responses to previously administered items CAT is a special type of computerized testing that targets the “difficulty” of questions to the “ability” of examinees

CAT Algorithm Score Item Estimate Latent Trait (Theta) Administer Item of Median Difficulty (or Screening Item) Score Item Estimate Latent Trait (Theta) Termination Criterion Satisfied Choose and Administer Next Item with Maximum Information No Yes Stop

Increase of Accuracy of Ability or Latent Trait Estimation in CAT For each item added to the test, the width of the interval decreases. Item 1-5 Item 1-4 Item 1-3 Item 1-2 Item 1 Ability ()

Potential Problems with CAT in Pain and Health Outcomes Measurement Context effects Unbalanced content Time frame Response categories Multidimensionality

What kind of short form? Question 1 0 I do not feel sad. 1 I feel sad Rarely or none of the time (less than 1 day) Some or a little of the time (1-2 days) Occasionally or a moderate amount of time (3-4 days) All of the time (5-7 days) 1. I was bothered by things that usually don't bother me Question 1 0 I do not feel sad. 1 I feel sad 2 I am sad all the time and I can’t snap out of it. 3 I am so sad or unhappy that I can’t stand it. Are you basically satisfied with your life? True/False

MORE Research Still Needed for Effective CAT Implementation Item production Item statistics Item exposure Maintaining a valid bank of items for test construction Fairness Delivery options Effects of modes of administration Cost-benefit considerations

Infrastructure of a National Geriatric Pain Item Bank Subscriber Public Individual Researchers Pharm. Industries Non-profit Institutions Government Agencies National “Central” Item Bank Customized Information Retrieval; CAT; (automated) Brief Form Collector Analyzer Builder Retriever Consortium Approval IRT Analyses Item Parameters

An Integrated Solution for Pain and Outcomes Assessments Chang, C.-H., & Yang, D. (2003, April 15). Patient-Reported Outcomes Information Technology: The PROsITTM System. ISPOR CONNECTIONS, 9(2), 5-6.