Presentation is loading. Please wait.

Presentation is loading. Please wait.

Psychometric analyses of ADNI data Paul K. Crane, MD MPH Department of Medicine University of Washington.

Similar presentations


Presentation on theme: "Psychometric analyses of ADNI data Paul K. Crane, MD MPH Department of Medicine University of Washington."— Presentation transcript:

1 Psychometric analyses of ADNI data Paul K. Crane, MD MPH Department of Medicine University of Washington

2 Disclaimer Funding for this conference was made possible, in part by Grant R13 AG030995 from the National Institute on Aging. The views expressed do not necessarily reflect official policies of the Department of Health and Human Services; nor does mention by trade names, commercial practices, or organizations imply endorsement by the U.S. Government.

3 Outline ADNI neuropsychological battery Latent variable approaches –SEM and IRT ADAS-Cog in ADNI Memory in ADNI Executive functioning in ADNI

4 ADNI Neuropsychological Battery

5 Handout There is a handout that summarizes these tests and provides the variable names for the variables in the dataset

6 ADNI Neuropsychological Battery

7 MCI

8 AD

9 Alternate Word Lists There are also two versions of the Rey AVLT that are alternated

10 Summary Repeated administration of a rich neuropsychological battery at 6 month intervals for 2 (AD) or 3 (NC, MCI) years How do we drink from that fire hose?

11 Strategies for analyzing these data Pick a couple of tests and ignore the others –ADAS-Cog and MMSE –CDR and CDR-SB Modifications of those tests –ADAS-Tree –ADAS-Rasch Composite scores for specific domains –Z score –Something fancier using latent variable approach

12 Outline ADNI neuropsychological battery Latent variable approaches –SEM and IRT ADAS-Cog in ADNI Memory in ADNI Executive functioning in ADNI

13 Latent variable approach “Items” not intrinsically interesting, only as indicators of the underlying thing measured by the test Many nice properties follow

14 Parallel development 1: SEM “Measurement part” of the model specifies how latent constructs are modeled “Structural part” of the model specifies relationships between latent constructs and each other, and between latent constructs and other covariates

15 http://sites.google.com/site/lvmworkshop/home/downloads-general/2010-downloads

16 Bunch of indicators limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl

17 “Memory” Underlying single factor with many indicators limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl Memory

18 limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl Memory Rey word list ADAS word list MMSE words LM story

19 limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl Memory Rey word list ADAS word list MMSE words LM story HC volume This is what we care about!

20 limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl Memory* HC volume This is what we care about! … and typically don’t care whether memory is modeled this way or with all that secondary structure

21 limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl Memory Rey word list ADAS word list MMSE words LM story HC volume This is what we care about! And sometimes we care about it for 600,000 SNPs, or for voxels – we need to move outside of an SEM package for some of the analyses we want to do

22 Parallel development 2: IRT Models are nested within SEM Single factor confirmatory factor analysis model Initially worked out with binary indicators Extended 1960s to polytomous items (Samejima) It’s only the measurement part Attention to the quality of measurement and the quality of scores

23 Typical SEM example Construct Indicator 1 Indicator 2 Indicator 3 Indicator 4

24 Depression Beck Zung CESD PHQ-9

25 A closer look at PHQ-9 A 9 item depression scale Standard scores totalled up Typical SEM model would take that total score and treat it as a continuous indicator by using a linear link (single loading parameter) PHQ-9

26 PHQ-9 measurement properties

27 IRT approach to PHQ-9

28 SEM and IRT, then and now SEM was initially about total scores as indicators of constructs measured in common across tests IRT was initially about item level data that had to satisfy assumptions More recently: merging the strengths of the two approaches –Computational rather than conceptual advances, or maybe computational advances have fueled conceptual advances

29 Categorical data in SEM Runmplus code: That little code snippet tells Mplus to treat all of the elements in the local `vlist’ as categorical data Mplus default is WLSMV –Other appropriate ways of handling Major reason Mplus is dominant SEM software used at FH

30 What about IRT? Array of tools to address measurement precision Explicit focus on measurement properties and measurement precision differentiates it from SEM

31 Pretend for a moment that a single factor model was appropriate… Item response theory (IRT) developed middle of last century –Lord and Novick / Birnbaum (1968) Polytomous extension Samejima 1969 Lord 1980 Hambleton et al. 1991 XCALIBRE, Parscale, Multilog –All variations on a single factor CFA model

32 4 items each at 0.5 increments

33 Comments on that test Essentially linear test characteristic curve –Immaterial whether the standard score or the IRT score is used in analyses No ceiling or floor effect –People at the extremes of the thing measured by the test will get some right and get some wrong Pretty nice test!

34 2 items each at 0.5 increments

35 Comments on that test Essentially linear test characteristic curve –Immaterial whether the standard score or the IRT score is used in analyses No ceiling or floor effect –People at the extremes of the thing measured by the test will get some right and get some wrong Pretty nice test! But that’s what we said about the last one and it had twice as many items!

36 Why might we want twice as many items? Measurement precision / reliability CTT: summarized in a single number: alpha IRT: conceptualized as a quantity that may vary across the range of the test Information Mathematical relationship between information and standard error of measurement Intuitively makes sense that a test with 2x the items will measure more precisely / more reliably than a test with 1x the items

37 Test information curves for those two tests

38 Standard errors of measurement for the two tests

39 Comments about these information and SEM curves Information curves look more different than the SEM curves –Inverse square root relationship –TIC 100  SEM 0.10(1/10) –TIC 25  SEM 0.20(1/5) –TIC 16  SEM 0.25(1/4) –TIC 9  SEM 0.33(1/3) –TIC 4  SEM 0.50(1/2) Trade off between test length and measurement precision

40 These were highly selected “tests” It would be possible to design such a test if we started with a robust item pool Almost certainly not going to happen by accident / history What are more realistic tests?

41 Test characteristic curves for 2 26- item dichotomous tests

42 Comments on these TCCs Same number of items but very different shapes Now it may matter whether you use an IRT score or a standard score in analyses Both ceiling and floor effects

43 TICs

44 SEMs

45 Comments on the TICs and SEMs Comparing the red test and the blue test: the red test is better for people of moderate ability (more items close to where they are) –For people right in the middle, measurement precision is just as good as a test twice as long Items far away from your ability level don’t help your standard error The blue test is better for people at the extremes (more items close to where they are)

46 Where do information curves come from? Item information curves use the same parameters as the item characteristic curves (difficulty level, b, and strength of association with latent trait or ability, a) (see next slides) Test information is the sum of all of the item information curves –We can do that because of local independence

47 I(θ) = D 2 a 2 *P(θ)*Q(θ)

48

49

50

51

52 More thoughts on IICs The b parameters shift the location of the curve left and right (where is the bead on the string?) The a parameters modify the height of the curve –But not tons; location location location IRT gives us tools to evaluate and manage measurement precision

53 What about cognitive tests?

54 Cognitive screening tests

55

56

57

58 Typical SEM example Construct Indicator 1 Indicator 2 Indicator 3 Indicator 4 Transformation of the scale of the indicator only works if it happens to be the inverse of the function from Cecile’s model! (Blom comment)

59 How about we fix the tools? Testlet models in IRT –Wainer, Bradlow, Wang (2007) –Li, Bolt, Fu (2006) Embed everything in hierarchical Bayesian framework or SEM framework –Curtis (2010) Modify IRT tools to integrate over nuisance factors for measurement precision –Curtis and Crane (2011 under review) Sample from posterior –Mislevy (1992) –Crane, Feldman, et al., (2011 under development)

60 Outline ADNI neuropsychological battery Latent variable approaches –SEM and IRT ADAS-Cog in ADNI Memory in ADNI Executive functioning in ADNI

61 ADAS-Cog in ADNI Brief cognitive test Commonly used outcome in drug trials Described in handout Two parts: cognitive part, rater part –Ignored in several manuscripts I have reviewed Eliminating rater items does not reduce respondent burden!

62 Does it meet IRT assumptions?

63 ADAS Cog 1 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 ADAS 1 q2 q1 q14

64 More on assumptions: 1 factor CFA TESTS OF MODEL FIT Chi-Square Test of Model Fit Value 273.016* Degrees of Freedom 39** P-Value 0.0000 Chi-Square Test of Model Fit for the Baseline Model Value 4011.878 Degrees of Freedom 29 P-Value 0.0000 CFI/TLI CFI 0.941 TLI 0.956 Number of Free Parameters 63 RMSEA (Root Mean Square Error Of Approximation) Estimate 0.086 WRMR (Weighted Root Mean Square Residual) Value 1.515

65 More on assumptions TESTS OF MODEL FIT Chi-Square Test of Model Fit Value 273.016* Degrees of Freedom 39** P-Value 0.0000 Chi-Square Test of Model Fit for the Baseline Model Value 4011.878 Degrees of Freedom 29 P-Value 0.0000 CFI/TLI CFI 0.941 TLI 0.956 Number of Free Parameters 63 RMSEA (Root Mean Square Error Of Approximation) Estimate 0.086 WRMR (Weighted Root Mean Square Residual) Value 1.515 0.90 / 0.95 0.08 / 0.05

66 BUT the details Item 1 is an average across three learning trials –Already assumes we can just total them up What if we treat the data as more granular, as 3 different indicators

67 Scree

68

69 ADAS Cog 2 q1b q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 ADAS 2 q1c q2 q1a q14

70 ADAS Cog 2 vs standard scores

71 Typical SEM example Construct Indicator 1 Indicator 2 Indicator 3 Indicator 4 Transformation of the scale of the indicator only works if it happens to be the inverse of the function from Cecile’s model! (Blom comment)

72 Broad range of scores for each standard score

73 Typical SEM example Construct Indicator 1 Indicator 2 Indicator 3 Indicator 4 1. Transformation of the scale of the indicator only works if it happens to be the inverse of the function from Cecile’s model! (Blom comment) 2. Transformations will only work at the level of granularity included in the model

74 Returning to the ADAS story Lot more parameters (20 more) CFI, TLI a bit better RMSEA a bit worse Item 1Item 1a, 1b, 1c 0.90 / 0.95 0.08 / 0.05

75 Secondary domain structure Methods effects –Same word list multiple times –Rater items Subdomain effects

76 ADAS Cog 2 bi-factor q1b q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 ADAS 2 q1c q2 q1a q14 Word list

77 Fit for single vs bi-factor 0.90 / 0.95 0.08 / 0.05

78 Loadings

79 ADAS Cog 2 bi-factor q1b q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 ADAS 2 q1c q2 q1a q14 Word list

80 Single factor vs. secondary factor structure Single factor easier to explain but not consistent with theory, may or may not fit data as well –Single factor model has a nice history of use in educational testing –Nice tools to understand the quality of measurement Bi-factor harder to explain, more consistent with theory, may fit data better –Tools not widely available to understand the quality of measurement –Scores often highly HIGHLY correlated between single factor and bi-factor models

81 Scores correlate 0.95

82 ADAS Cog 3 bi-factor q1b q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 ADAS 3 q1c q2 q1a q14 Word list Ratings

83 Further improves fit 0.90 / 0.95 0.08 / 0.05

84 Differences in loadings

85 Information content: single factor

86 ADAS Single vs. Bi-factor Single factor is sure nice –Easy to explain –But it sure fits less well –And over-estimates measurement precision somewhat Bi-factor is sure nice –Not so hard to explain –More consistent with our theory –Fits better Measurement precision appropriately estimated at least within the SEM framework

87 I(θ) in presence of nuisance factors

88

89 So we integrate over the nuisance dimensions….

90 What about this?

91 Multiple versions longitudinally A bit of a challenge But nothing we can’t handle! Does require some assumptions

92 Different versions of the word list

93

94

95 Mplus ADAS model constraints 1

96 ADAS model constraints 2

97 ADAS model constraints 3

98 ADAS model constraints 4

99 ADAS model constraints 5

100 ADAS model constraints 6

101 And there’s a cost 320 lines of code

102 Limitations Only tried this with the single factor model –Thresholds likely to be unaffected –But loadings may vary Also relies on a crucial assumption that may not hold

103 Essentially co-calibration over time Cross sectional co-calibration assumption: people are people. Relationship of items to the underlying domain, controlling for the latent ability level, is invariant across people This seems problematic with memory vs. other domains when we have people with MCI and AD –Memory declines faster –Maybe not hugely so over 6 months

104 Outline ADNI neuropsychological battery Latent variable approaches –SEM and IRT ADAS-Cog in ADNI Memory in ADNI Executive functioning in ADNI

105 limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl Scr/0

106 limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl Scr/06

107 limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl Scr/0612

108 limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl Scr/0612 avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl 18

109 limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl Scr/0612 avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl 18 limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl 24

110 limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl Scr/0612 avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl 18 limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl 24 avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl 36 limmtotal ldeltotal

111 Strategies Ignore (likely a bad idea) Use flexible approach we used for ADAS-Cog –Mplus to Paul: “That model looks hard. Perhaps you could give me an easier model.” –Paul to Mplus: “No, computers don’t date.” –Mplus: “Fine. I’m not talking to you anymore.” Use dummy variables for study visit and regress versions on visit (Rich Jones) –But: single parameter does not address all differences in ADAS-Cog versions ???

112 The “ignore” strategy

113 Executive functioning in ADNI Report is in the documentation Bifactor scores

114 Acknowledgements R01 AG 029672, “Improving cognitive outcome precision & responsiveness with modern psychometrics” UW –Betsy –Elena –Elizabeth –Joey –Jonathan –Laura –McKay –RJ Hebrew SeniorLife, UC Davis, Washington U –Alden –Chengjie –Dan –Danielle –Doug –Frances –Rich

115

116 Depression* Item 2 Item 1 Item 4 Item 3 Item 6 Item 5 Item 8 Item 7 Item 9


Download ppt "Psychometric analyses of ADNI data Paul K. Crane, MD MPH Department of Medicine University of Washington."

Similar presentations


Ads by Google