Download presentation
Presentation is loading. Please wait.
1
Psychometric analyses of ADNI data Paul K. Crane, MD MPH Department of Medicine University of Washington
2
Disclaimer Funding for this conference was made possible, in part by Grant R13 AG030995 from the National Institute on Aging. The views expressed do not necessarily reflect official policies of the Department of Health and Human Services; nor does mention by trade names, commercial practices, or organizations imply endorsement by the U.S. Government.
3
Outline ADNI neuropsychological battery Latent variable approaches –SEM and IRT ADAS-Cog in ADNI Memory in ADNI Executive functioning in ADNI
4
ADNI Neuropsychological Battery
5
Handout There is a handout that summarizes these tests and provides the variable names for the variables in the dataset
6
ADNI Neuropsychological Battery
7
MCI
8
AD
9
Alternate Word Lists There are also two versions of the Rey AVLT that are alternated
10
Summary Repeated administration of a rich neuropsychological battery at 6 month intervals for 2 (AD) or 3 (NC, MCI) years How do we drink from that fire hose?
11
Strategies for analyzing these data Pick a couple of tests and ignore the others –ADAS-Cog and MMSE –CDR and CDR-SB Modifications of those tests –ADAS-Tree –ADAS-Rasch Composite scores for specific domains –Z score –Something fancier using latent variable approach
12
Outline ADNI neuropsychological battery Latent variable approaches –SEM and IRT ADAS-Cog in ADNI Memory in ADNI Executive functioning in ADNI
13
Latent variable approach “Items” not intrinsically interesting, only as indicators of the underlying thing measured by the test Many nice properties follow
14
Parallel development 1: SEM “Measurement part” of the model specifies how latent constructs are modeled “Structural part” of the model specifies relationships between latent constructs and each other, and between latent constructs and other covariates
15
http://sites.google.com/site/lvmworkshop/home/downloads-general/2010-downloads
16
Bunch of indicators limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl
17
“Memory” Underlying single factor with many indicators limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl Memory
18
limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl Memory Rey word list ADAS word list MMSE words LM story
19
limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl Memory Rey word list ADAS word list MMSE words LM story HC volume This is what we care about!
20
limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl Memory* HC volume This is what we care about! … and typically don’t care whether memory is modeled this way or with all that secondary structure
21
limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl Memory Rey word list ADAS word list MMSE words LM story HC volume This is what we care about! And sometimes we care about it for 600,000 SNPs, or for voxels – we need to move outside of an SEM package for some of the analyses we want to do
22
Parallel development 2: IRT Models are nested within SEM Single factor confirmatory factor analysis model Initially worked out with binary indicators Extended 1960s to polytomous items (Samejima) It’s only the measurement part Attention to the quality of measurement and the quality of scores
23
Typical SEM example Construct Indicator 1 Indicator 2 Indicator 3 Indicator 4
24
Depression Beck Zung CESD PHQ-9
25
A closer look at PHQ-9 A 9 item depression scale Standard scores totalled up Typical SEM model would take that total score and treat it as a continuous indicator by using a linear link (single loading parameter) PHQ-9
26
PHQ-9 measurement properties
27
IRT approach to PHQ-9
28
SEM and IRT, then and now SEM was initially about total scores as indicators of constructs measured in common across tests IRT was initially about item level data that had to satisfy assumptions More recently: merging the strengths of the two approaches –Computational rather than conceptual advances, or maybe computational advances have fueled conceptual advances
29
Categorical data in SEM Runmplus code: That little code snippet tells Mplus to treat all of the elements in the local `vlist’ as categorical data Mplus default is WLSMV –Other appropriate ways of handling Major reason Mplus is dominant SEM software used at FH
30
What about IRT? Array of tools to address measurement precision Explicit focus on measurement properties and measurement precision differentiates it from SEM
31
Pretend for a moment that a single factor model was appropriate… Item response theory (IRT) developed middle of last century –Lord and Novick / Birnbaum (1968) Polytomous extension Samejima 1969 Lord 1980 Hambleton et al. 1991 XCALIBRE, Parscale, Multilog –All variations on a single factor CFA model
32
4 items each at 0.5 increments
33
Comments on that test Essentially linear test characteristic curve –Immaterial whether the standard score or the IRT score is used in analyses No ceiling or floor effect –People at the extremes of the thing measured by the test will get some right and get some wrong Pretty nice test!
34
2 items each at 0.5 increments
35
Comments on that test Essentially linear test characteristic curve –Immaterial whether the standard score or the IRT score is used in analyses No ceiling or floor effect –People at the extremes of the thing measured by the test will get some right and get some wrong Pretty nice test! But that’s what we said about the last one and it had twice as many items!
36
Why might we want twice as many items? Measurement precision / reliability CTT: summarized in a single number: alpha IRT: conceptualized as a quantity that may vary across the range of the test Information Mathematical relationship between information and standard error of measurement Intuitively makes sense that a test with 2x the items will measure more precisely / more reliably than a test with 1x the items
37
Test information curves for those two tests
38
Standard errors of measurement for the two tests
39
Comments about these information and SEM curves Information curves look more different than the SEM curves –Inverse square root relationship –TIC 100 SEM 0.10(1/10) –TIC 25 SEM 0.20(1/5) –TIC 16 SEM 0.25(1/4) –TIC 9 SEM 0.33(1/3) –TIC 4 SEM 0.50(1/2) Trade off between test length and measurement precision
40
These were highly selected “tests” It would be possible to design such a test if we started with a robust item pool Almost certainly not going to happen by accident / history What are more realistic tests?
41
Test characteristic curves for 2 26- item dichotomous tests
42
Comments on these TCCs Same number of items but very different shapes Now it may matter whether you use an IRT score or a standard score in analyses Both ceiling and floor effects
43
TICs
44
SEMs
45
Comments on the TICs and SEMs Comparing the red test and the blue test: the red test is better for people of moderate ability (more items close to where they are) –For people right in the middle, measurement precision is just as good as a test twice as long Items far away from your ability level don’t help your standard error The blue test is better for people at the extremes (more items close to where they are)
46
Where do information curves come from? Item information curves use the same parameters as the item characteristic curves (difficulty level, b, and strength of association with latent trait or ability, a) (see next slides) Test information is the sum of all of the item information curves –We can do that because of local independence
47
I(θ) = D 2 a 2 *P(θ)*Q(θ)
52
More thoughts on IICs The b parameters shift the location of the curve left and right (where is the bead on the string?) The a parameters modify the height of the curve –But not tons; location location location IRT gives us tools to evaluate and manage measurement precision
53
What about cognitive tests?
54
Cognitive screening tests
58
Typical SEM example Construct Indicator 1 Indicator 2 Indicator 3 Indicator 4 Transformation of the scale of the indicator only works if it happens to be the inverse of the function from Cecile’s model! (Blom comment)
59
How about we fix the tools? Testlet models in IRT –Wainer, Bradlow, Wang (2007) –Li, Bolt, Fu (2006) Embed everything in hierarchical Bayesian framework or SEM framework –Curtis (2010) Modify IRT tools to integrate over nuisance factors for measurement precision –Curtis and Crane (2011 under review) Sample from posterior –Mislevy (1992) –Crane, Feldman, et al., (2011 under development)
60
Outline ADNI neuropsychological battery Latent variable approaches –SEM and IRT ADAS-Cog in ADNI Memory in ADNI Executive functioning in ADNI
61
ADAS-Cog in ADNI Brief cognitive test Commonly used outcome in drug trials Described in handout Two parts: cognitive part, rater part –Ignored in several manuscripts I have reviewed Eliminating rater items does not reduce respondent burden!
62
Does it meet IRT assumptions?
63
ADAS Cog 1 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 ADAS 1 q2 q1 q14
64
More on assumptions: 1 factor CFA TESTS OF MODEL FIT Chi-Square Test of Model Fit Value 273.016* Degrees of Freedom 39** P-Value 0.0000 Chi-Square Test of Model Fit for the Baseline Model Value 4011.878 Degrees of Freedom 29 P-Value 0.0000 CFI/TLI CFI 0.941 TLI 0.956 Number of Free Parameters 63 RMSEA (Root Mean Square Error Of Approximation) Estimate 0.086 WRMR (Weighted Root Mean Square Residual) Value 1.515
65
More on assumptions TESTS OF MODEL FIT Chi-Square Test of Model Fit Value 273.016* Degrees of Freedom 39** P-Value 0.0000 Chi-Square Test of Model Fit for the Baseline Model Value 4011.878 Degrees of Freedom 29 P-Value 0.0000 CFI/TLI CFI 0.941 TLI 0.956 Number of Free Parameters 63 RMSEA (Root Mean Square Error Of Approximation) Estimate 0.086 WRMR (Weighted Root Mean Square Residual) Value 1.515 0.90 / 0.95 0.08 / 0.05
66
BUT the details Item 1 is an average across three learning trials –Already assumes we can just total them up What if we treat the data as more granular, as 3 different indicators
67
Scree
69
ADAS Cog 2 q1b q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 ADAS 2 q1c q2 q1a q14
70
ADAS Cog 2 vs standard scores
71
Typical SEM example Construct Indicator 1 Indicator 2 Indicator 3 Indicator 4 Transformation of the scale of the indicator only works if it happens to be the inverse of the function from Cecile’s model! (Blom comment)
72
Broad range of scores for each standard score
73
Typical SEM example Construct Indicator 1 Indicator 2 Indicator 3 Indicator 4 1. Transformation of the scale of the indicator only works if it happens to be the inverse of the function from Cecile’s model! (Blom comment) 2. Transformations will only work at the level of granularity included in the model
74
Returning to the ADAS story Lot more parameters (20 more) CFI, TLI a bit better RMSEA a bit worse Item 1Item 1a, 1b, 1c 0.90 / 0.95 0.08 / 0.05
75
Secondary domain structure Methods effects –Same word list multiple times –Rater items Subdomain effects
76
ADAS Cog 2 bi-factor q1b q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 ADAS 2 q1c q2 q1a q14 Word list
77
Fit for single vs bi-factor 0.90 / 0.95 0.08 / 0.05
78
Loadings
79
ADAS Cog 2 bi-factor q1b q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 ADAS 2 q1c q2 q1a q14 Word list
80
Single factor vs. secondary factor structure Single factor easier to explain but not consistent with theory, may or may not fit data as well –Single factor model has a nice history of use in educational testing –Nice tools to understand the quality of measurement Bi-factor harder to explain, more consistent with theory, may fit data better –Tools not widely available to understand the quality of measurement –Scores often highly HIGHLY correlated between single factor and bi-factor models
81
Scores correlate 0.95
82
ADAS Cog 3 bi-factor q1b q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 ADAS 3 q1c q2 q1a q14 Word list Ratings
83
Further improves fit 0.90 / 0.95 0.08 / 0.05
84
Differences in loadings
85
Information content: single factor
86
ADAS Single vs. Bi-factor Single factor is sure nice –Easy to explain –But it sure fits less well –And over-estimates measurement precision somewhat Bi-factor is sure nice –Not so hard to explain –More consistent with our theory –Fits better Measurement precision appropriately estimated at least within the SEM framework
87
I(θ) in presence of nuisance factors
89
So we integrate over the nuisance dimensions….
90
What about this?
91
Multiple versions longitudinally A bit of a challenge But nothing we can’t handle! Does require some assumptions
92
Different versions of the word list
95
Mplus ADAS model constraints 1
96
ADAS model constraints 2
97
ADAS model constraints 3
98
ADAS model constraints 4
99
ADAS model constraints 5
100
ADAS model constraints 6
101
And there’s a cost 320 lines of code
102
Limitations Only tried this with the single factor model –Thresholds likely to be unaffected –But loadings may vary Also relies on a crucial assumption that may not hold
103
Essentially co-calibration over time Cross sectional co-calibration assumption: people are people. Relationship of items to the underlying domain, controlling for the latent ability level, is invariant across people This seems problematic with memory vs. other domains when we have people with MCI and AD –Memory declines faster –Maybe not hugely so over 6 months
104
Outline ADNI neuropsychological battery Latent variable approaches –SEM and IRT ADAS-Cog in ADNI Memory in ADNI Executive functioning in ADNI
105
limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl Scr/0
106
limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl Scr/06
107
limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl Scr/0612
108
limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl Scr/0612 avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl 18
109
limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl Scr/0612 avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl 18 limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl 24
110
limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl Scr/0612 avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl 18 limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl 24 avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl 36 limmtotal ldeltotal
111
Strategies Ignore (likely a bad idea) Use flexible approach we used for ADAS-Cog –Mplus to Paul: “That model looks hard. Perhaps you could give me an easier model.” –Paul to Mplus: “No, computers don’t date.” –Mplus: “Fine. I’m not talking to you anymore.” Use dummy variables for study visit and regress versions on visit (Rich Jones) –But: single parameter does not address all differences in ADAS-Cog versions ???
112
The “ignore” strategy
113
Executive functioning in ADNI Report is in the documentation Bifactor scores
114
Acknowledgements R01 AG 029672, “Improving cognitive outcome precision & responsiveness with modern psychometrics” UW –Betsy –Elena –Elizabeth –Joey –Jonathan –Laura –McKay –RJ Hebrew SeniorLife, UC Davis, Washington U –Alden –Chengjie –Dan –Danielle –Doug –Frances –Rich
116
Depression* Item 2 Item 1 Item 4 Item 3 Item 6 Item 5 Item 8 Item 7 Item 9
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.