Multitrait Scaling and IRT: Part I

Slides:



Advertisements
Similar presentations
Structural Equation Modeling Using Mplus Chongming Yang Research Support Center FHSS College.
Advertisements

Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.
1 Evaluating Multi-Item Scales Using Item-Scale Correlations and Confirmatory Factor Analysis Ron D. Hays, Ph.D. RCMAR.
Overview of field trial analysis procedures National Research Coordinators Meeting Windsor, June 2008.
Reliability and Validity
When Measurement Models and Factor Models Conflict: Maximizing Internal Consistency James M. Graham, Ph.D. Western Washington University ABSTRACT: The.
Session 3 Normal Distribution Scores Reliability.
Classical Test Theory By ____________________. What is CCT?
Structural Equation Modeling Intro to SEM Psy 524 Ainsworth.
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
Primer on Evaluating Reliability and Validity of Multi-Item Scales Questionnaire Design and Testing Workshop October 25, 2013, 3:30-5:00pm Wilshire.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.
The ABC’s of Pattern Scoring Dr. Cornelia Orr. Slide 2 Vocabulary Measurement – Psychometrics is a type of measurement Classical test theory Item Response.
User Study Evaluation Human-Computer Interaction.
1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.
Tests and Measurements Intersession 2006.
Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 4:Reliability and Validity.
1 Finding and Verifying Item Clusters in Outcomes Research Using Confirmatory and Exploratory Analyses Ron D. Hays, PhD May 13, 2003; 1-4 pm; Factor
Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.
Measurement Issues by Ron D. Hays UCLA Division of General Internal Medicine & Health Services Research (June 30, 2008, 4:55-5:35pm) Diabetes & Obesity.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
Evaluating Self-Report Data Using Psychometric Methods Ron D. Hays, PhD February 6, 2008 (3:30-6:30pm) HS 249F.
Multi-item Scale Evaluation Ron D. Hays, Ph.D. UCLA Division of General Internal Medicine/Health Services Research
Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire Design and Testing.
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire Design and Testing.
Psychometric Evaluation of Questionnaire Design and Testing Workshop December , 10:00-11:30 am Wilshire Suite 710 DATA.
Evaluating Self-Report Data Using Psychometric Methods Ron D. Hays, PhD February 8, 2006 (3:00-6:00pm) HS 249F.
Reliability: Introduction. Reliability Session Definitions & Basic Concepts of Reliability Theoretical Approaches Empirical Assessments of Reliability.
Measurement Experiment - effect of IV on DV. Independent Variable (2 or more levels) MANIPULATED a) situational - features in the environment b) task.
2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)
Evaluation of structural equation models Hans Baumgartner Penn State University.
Discussion of Issues and Approaches Presented by Templin and Teresi
Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire.
Overview of Item Response Theory Ron D. Hays November 14, 2012 (8:10-8:30am) Geriatrics Society of America (GSA) Pre-Conference Workshop on Patient- Reported.
Evaluating Multi-item Scales Ron D. Hays, Ph.D. UCLA Division of General Internal Medicine/Health Services Research HS237A 11/10/09, 2-4pm,
Lesson 2 Main Test Theories: The Classical Test Theory (CTT)
Chapter 2 Norms and Reliability. The essential objective of test standardization is to determine the distribution of raw scores in the norm group so that.
Evaluating Patient-Reports about Health
MGMT 588 Research Methods for Business Studies
Psychometric Evaluation of Items Ron D. Hays
UCLA Department of Medicine
Evaluating Patient-Reports about Health
UCLA Department of Medicine
Evaluating Multi-Item Scales
assessing scale reliability
Classical Test Theory Margaret Wu.
Evaluating IRT Assumptions
12 Inferential Analysis.
Final Session of Summer: Psychometric Evaluation
Reliability, validity, and scaling
PSY 614 Instructor: Emily Bullock, Ph.D.
By ____________________
EPSY 5245 EPSY 5245 Michael C. Rodriguez
12 Inferential Analysis.
Confirmatory Factor Analysis
15.1 The Role of Statistics in the Research Process
Evaluating the Psychometric Properties of Multi-Item Scales
Evaluating Multi-Item Scales
Evaluating Multi-item Scales
Evaluating Multi-item Scales
Item Analysis: Classical and Beyond
Evaluating Multi-Item Scales
Psychometric testing and validation (Multi-trait scaling and IRT)
Introduction to Psychometric Analysis of Survey Data
Structural Equation Modeling
Evaluating the Significance of Individual Change
UCLA Department of Medicine
Presentation transcript:

Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. (drhays@ucla.edu) http://www.gim.med.ucla.edu/FacultyPages/Hays/ Questionnaire Design and Testing Workshop 2nd Floor Conference Room October 16, 2009 (3-5pm)

Multitrait Scaling Analysis • Internal consistency reliability – Item convergence

Hypothetical Item-Scale Correlations

Measurement Error observed = true score systematic error random error + + (bias)

Cronbach’s Alpha Respondents (BMS) 4 11.6 2.9 Items (JMS) 1 0.1 0.1 01 55 02 45 03 42 04 35 05 22 Respondents (BMS) 4 11.6 2.9 Items (JMS) 1 0.1 0.1 Resp. x Items (EMS) 4 4.4 1.1 Total 9 16.1 Source SS MS df Alpha = 2.9 - 1.1 = 1.8 = 0.62 2.9 2.9

Alpha for Different Numbers of Items and Homogeneity Average Inter-item Correlation ( r ) Number of Items (k) .0 .2 .4 .6 .8 1.0 2 .000 .333 .572 .750 .889 1.000 4 .000 .500 .727 .857 .941 1.000 6 .000 .600 .800 .900 .960 1.000 8 .000 .666 .842 .924 .970 1.000 Alphast = k * r 1 + (k -1) * r

Spearman-Brown Prophecy Formula ( ) N • alpha x alpha = y 1 + (N - 1) * alpha x N = how much longer scale y is than scale x

Example Spearman-Brown Calculations MHI-18 18/32 (0.98) (1+(18/32 –1)*0.98 = 0.55125/0.57125 = 0.96

Number of Items and Reliability for Three Versions of the Mental Health Inventory (MHI)

Reliability Minimum Standards 0.70 or above (for group comparisons) 0.90 or higher (for individual assessment) SEM = SD (1- reliability)1/2

Intraclass Correlation and Reliability Model Reliability Intraclass Correlation One-Way MS - MS MS - MS MS MS + (K-1)MS Two-Way MS - MS MS - MS Fixed MS MS + (K-1)MS Two-Way N (MS - MS ) MS - MS Random NMS +MS - MS MS + (K-1)MS + K (MS - MS )/N BMS WMS BMS WMS BMS BMS WMS BMS EMS BMS EMS EMS BMS EMS BMS EMS BMS EMS BMS JMS EMS BMS JMS EMS EMS

Multitrait Scaling Analysis • Internal consistency reliability – Item convergence • Item discrimination

Hypothetical Multitrait/Multi-Item Correlation Matrix

Multitrait/Multi-Item Correlation Matrix for Patient Satisfaction Ratings Technical Interpersonal Communication Financial Technical 1 0.66* 0.63† 0.67† 0.28 2 0.55* 0.54† 0.50† 0.25 3 0.48* 0.41 0.44† 0.26 4 0.59* 0.53 0.56† 0.26 5 0.55* 0.60† 0.56† 0.16 6 0.59* 0.58† 0.57† 0.23 Interpersonal 1 0.58 0.68* 0.63† 0.24 2 0.59† 0.58* 0.61† 0.18 3 0.62† 0.65* 0.67† 0.19 4 0.53† 0.57* 0.60† 0.32 5 0.54 0.62* 0.58† 0.18 6 0.48† 0.48* 0.46† 0.24 Note – Standard error of correlation is 0.03. Technical = satisfaction with technical quality. Interpersonal = satisfaction with the interpersonal aspects. Communication = satisfaction with communication. Financial = satisfaction with financial arrangements. *Item-scale correlations for hypothesized scales (corrected for item overlap). †Correlation within two standard errors of the correlation of the item with its hypothesized scale.

Confirmatory Factor Analysis Compares observed covariances with covariances generated by hypothesized model Statistical and practical tests of fit Factor loadings Correlations between factors Regression coefficients

Fit Indices 1 - Non-normed fit index: Normed fit index: 2  -  2 Normed fit index: Non-normed fit index: Comparative fit index: null model  2 2 2   null null - model df df null model  2 null - 1 df null 2  - df 1 - model model  - 2 df null null

Latent Trait and Item Responses P(X1=1) P(X1=0) 1 Item 2 Response P(X2=1) P(X2=0) Item 3 Response P(X3=0) P(X3=2) 2 P(X3=1) Latent Trait In IRT, the observed variables are responses to specific items (e.g., 0, 1). The latent variable (trait) influences the probabilities of responses to the items (e.g, P(X1=1)). Notice that the number of response options varies across items. In IRT, the response formats do no need to be the same across items.

Item Responses and Trait Levels Person 1 Person 2 Person 3 Trait Continuum

Item Response Theory (IRT) IRT models the relationship between a person’s response Yi to the question (i) and his or her level of the latent construct  being measured by positing bik estimates how difficult it is for the item (i) to have a score of k or more and the discrimination parameter ai estimates the discriminatory power of the item. If for one group versus another at the same level  we observe systematically different probabilities of scoring k or above then we will say that item i displays DIF

Item Characteristic Curves (2-Parameter Model)

PROMIS Assessment Center http://www.nihpromis.org/ http://www.assessmentcenter.net/ac1/