Continuous heterogeneity Shaun Purcell Boulder Twin Workshop March 2004
Raw data VS summary statistics ZygT1 T MZ DZ
Raw data VS summary statistics ZygT1 T
Raw data VS summary statistics ZygT1 T2age
Variance Bivariate normal distribution DataMean
Introducing Definition variables Zygosity as a definition variable “Rectangular” file data.raw
!Using definition variables Group1: Defines Matrices Calc NGroups=2 Begin Matrices; X Lower 1 1 free Y Lower 1 1 free Z Lower 1 1 free M full 1 1 free H Full 1 1 End Matrices; Begin Algebra; A= X*X'; C = Y*Y'; E = Z*Z'; End Algebra; Ma X 0 Ma Y 0 Ma Z 1 Ma M 0 Options MX%P=rawfit.txt End Group2: MZ & DZ twin pairs Data NInput_vars=4 NObservations=0 RE file=data.raw Labels id zyg t1 t2 Select t1 t2 zyg / Definition zyg / Matrices = Group 1 Means M | M / Covariances A + C + E | + C _ + C | A + C + E / Specify H -1 End H will be specified as a definition variable M, necessary for the means model Optional: request individual fit statistics for each pair A single group for both MZ & DZ twins Points to a “REctangular” data file No need to specify number of pairs Zygosity is a “Definition” variable Multiply A component by 1/H 1 x 1 matrix H represents each pair’s zygosity A model for the means [ twin1 | twin 2]
Output from zyg.mx RE FILE=DATA.RAW Rectangular continuous data read initiated NOTE: Rectangular file contained 500 records with data that contained a total of 2000 observations LABELS ID ZYG T1 T2 SELECT T1 T2 ZYG / DEFINITION ZYG / NOTE: Selection yields 500 data vectors for analysis NOTE: Vectors contain a total of 1500 observations NOTE: Definition yields 500 data vectors for analysis NOTE: Vectors contain a total of 1000 observations
Output from zyg.mx Summary of VL file data for group 2 ZYG T1 T2 Code Number Mean Variance Minimum Maximum
Output from zyg.mx MATRIX H This is a FULL matrix of order 1 by MATRIX M This is a FULL matrix of order 1 by MATRIX X This is a LOWER TRIANGULAR matrix of order 1 by MATRIX Y This is a LOWER TRIANGULAR matrix of order 1 by MATRIX Z This is a LOWER TRIANGULAR matrix of order 1 by Specify H -1
Output from zyg.mx Your model has 4 estimated parameters and 1000 Observed statistics -2 times log-likelihood of data >>> Degrees of freedom >>>>>>>>>>>>>>>> 996 Fixing X to zero Your model has 3 estimated parameters and 1000 Observed statistics -2 times log-likelihood of data >>> Degrees of freedom >>>>>>>>>>>>>>>> 997
Continuous moderators Traits often best defined continuously Many environmental moderators also likely to be continuous in nature –Age –Gestational age –Socio-economic status –Educational level –Consumption of food / alcohol / drugs How to test for G x E interaction in this case?
Continuous moderators Problems? –Stratification of sample reduced sample size –Modelling proportions of variance implicitly assumes equality of variance w.r.t moderator –Logical to assume a linear G E interaction linearity at the level of effect, not variance –No obvious statistical test for heterogeneity Heritability Age (yrs) 0% 100%
Biometrical G E model At a hypothetical single locus –additive genetic valuea –allele frequency p –QTL variance2p(1-p)a 2 Assuming a linear interaction –additive genetic valuea + M –allele frequency p –QTL variance2p(1-p)(a + M) 2
Biometrical G E model M No interaction AaAAaa a 0 -a M Interaction 1 -- 1 M Equivalently… 22 1 1
Model-fitting approach to GxE Twin 1 ACE Twin 2 ACE a c e ce a
Model-fitting approach to GxE Twin 1 ACE Twin 2 ACE a+ X M c e ce a+XMa+XM Continuous moderator variable M Can be coded 0 / 1 in the dichotomous case
Individual specific moderators Twin 1 ACE Twin 2 ACE a+ X M 1 c e ce a+XM2a+XM2
E x E interactions Twin 1 ACE Twin 2 ACE a+ X M 1 c+ Y M 1 e+ Z M 1 a+ X M 2 c+ Y M 2 e+ Z M 2
ACE - XYZ - M Twin 1 ACE Twin 2 ACE a+XM1a+XM1 c+YM1c+YM1 e+ZM1e+ZM1 a+XM2a+XM2 c+YM2c+YM2 e+ZM2e+ZM2 M m+MM1m+MM1 M m+MM2m+MM2 Main effects and moderating effects statistically and conceptually distinct
Model-fitting approach to GxE C A E Component of variance Moderator variable
Turkheimer et al (2003) 320 twin pairs recruited at birth from urban hospitals G : additive genetic variance E : SES –parental education, occupation, income X : IQ –Wechsler; Verbal, Performance, Full
A CE Full scale IQ Verbal IQ Non-Verbal IQ
Standard model Means vector Covariance matrix
Allowing for a main effect of X Means vector Covariance matrix
! Basic model + main effect of a definition variable G1: Define Matrices Data Calc NGroups=3 Begin Matrices; A full 1 1 free C full 1 1 free E full 1 1 free M full 1 1 free! grand mean B full 1 1 free ! moderator-linked means model H full 1 1 R full 1 1! twin 1 moderator (definition variable) S full 1 1! twin 2 moderator (definition variable) End Matrices; Ma M 0 Ma B 0 Ma A 1 Ma C 1 Ma E 1 Matrix H.5 Options NO_Output End
G2: MZ Data NInput_vars=6 NObservations=0 Missing =-999 RE File=f1.dat Labels id zyg p1 p2 m1 m2 Select if zyg = 1 / Select p1 p2 m1 m2 / Definition m1 m2 / Matrices = Group 1 Means M + B*R | M + B*S / Covariance A*A' + C*C' + E*E' | A*A' + C*C' _ A*A' + C*C' | A*A' + C*C' + E*E' / !twin 1 moderator variable Specify R -1 !twin 2 moderator variable Specify S -2 Options NO_Output End
G3: DZ Data NInput_vars=6 NObservations=0 Missing =-999 RE File=f1.dat Labels id zyg p1 p2 m1 m2 Select if zyg = 2 / Select p1 p2 m1 m2 / Definition m1 m2 / Matrices = Group 1 Means M + B*R | M + B*S / Covariance A*A' + C*C' + E*E' | + C*C' _ + C*C' | A*A' + C*C' + E*E' / !twin 1 moderator variable Specify R -1 !twin 2 moderator variable Specify S -2 End
MATRIX A This is a FULL matrix of order 1 by MATRIX B This is a FULL matrix of order 1 by MATRIX C This is a FULL matrix of order 1 by MATRIX E This is a FULL matrix of order 1 by MATRIX M This is a FULL matrix of order 1 by Your model has 5 estimated parameters and 800 Observed statistics -2 times log-likelihood of data >>> Degrees of freedom >>>>>>>>>>>>>>>> 795
MATRIX A This is a FULL matrix of order 1 by MATRIX B This is a FULL matrix of order 1 by MATRIX C This is a FULL matrix of order 1 by MATRIX E This is a FULL matrix of order 1 by MATRIX M This is a FULL matrix of order 1 by Your model has 4 estimated parameters and 800 Observed statistics -2 times log-likelihood of data >>> Degrees of freedom >>>>>>>>>>>>>>>> 796
Continuous heterogeneity model Means vector Covariance matrix
! GxE - Basic model G1: Define Matrices Data Calc NGroups=3 Begin Matrices; A full 1 1 free C full 1 1 free E full 1 1 free T full 1 1 free! moderator-linked A component U full 1 1 free! moderator-linked C component V full 1 1 free! moderator-linked E component M full 1 1 free! grand mean B full 1 1 free ! moderator-linked means model H full 1 1 R full 1 1! twin 1 moderator (definition variable) S full 1 1! twin 2 moderator (definition variable) End Matrices; Ma T 0 Ma U 0 Ma V 0 Ma M 0 Ma B 0 Ma A 1 Ma C 1 Ma E 1 Matrix H.5 Options NO_Output End
G2: MZ Data NInput_vars=6 NObservations=0 Missing =-999 RE File=f1.dat Labels id zyg p1 p2 m1 m2 Select if zyg = 1 / Select p1 p2 m1 m2 / Definition m1 m2 / Matrices = Group 1 Means M + B*R | M + B*S / Covariance (A+T*R)*(A+T*R) + (C+U*R)*(C+U*R) + (E+V*R)*(E+V*R) | (A+T*R)*(A+T*S) + (C+U*R)*(C+U*S) _ (A+T*S)*(A+T*R) + (C+U*S)*(C+U*R) | (A+T*S)*(A+T*S) + (C+U*S)*(C+U*S) + (E+V*S)*(E+V*S) / !twin 1 moderator variable Specify R -1 !twin 2 moderator variable Specify S -2 Options NO_Output End
G3: DZ Data NInput_vars=6 NObservations=0 Missing =-999 RE File=f1.dat Labels id zyg p1 p2 m1 m2 Select if zyg = 2 / Select p1 p2 m1 m2 / Definition m1 m2 / Matrices = Group 1 Means M + B*R | M + B*S / Covariance (A+T*R)*(A+T*R) + (C+U*R)*(C+U*R) + (E+V*R)*(E+V*R) | + (C+U*R)*(C+U*S) _ + (C+U*S)*(C+U*R) | (A+T*S)*(A+T*S) + (C+U*S)*(C+U*S) + (E+V*S)*(E+V*S) / !twin 1 moderator variable Specify R -1 !twin 2 moderator variable Specify S -2 End
Practical 1 The script: mod.mx The data: f1.dat IDzygositytrait_twin_1trait_twin_2mod_twin_1mod_twin_2 1.Any evidence for G × E for this trait ? i.e. does the A latent variable show heterogeneity with respect to the moderator variable 2.If so, in what way? i.e. how would you interpret/describe the effect?
Practical 1 : f1.dat Moderator distributionMZ pairs (trait) DZ pairs (trait) All twin 1’s v.s. moderator
nomod.mx a1.3078a 2 ~ 1.7 c1.1733c 2 ~ 1.4 e e 2 ~ 0.95 a 2 +c 2 +e 2 = 4.05 i.e. % variance is 42%, 35% and 23%
Parameter estimates: mod.mx ACE-XYZ-MACE-YZ-M A C E T U V M B
Plotting VCs For the additive genetic VC, for example –Given a, and a range of values for the moderator variable For example, a = 0.5, = -0.2 and M ranges from -2 to +2 M (a+M)2(a+M)2 (a+M)2(a+M)2 -2(0.5+(-0.2×-2)) (0.5+(-0.2×-1.5)) … +2(0.5+(-0.2×2))
Specific test of G×E -2LLDf Full model ACE-XYZ-M Sub model ACE-YZ-M Difference p-value =
Other tests TestSubmodel-2LLΔdfp-value Y ACE-XZ-M Z ACE-XY-M < 1× M ACE-XYZ C & Y AE-XZ-M All made against the full model ACE-XYZ-M, -2LL =
Confidence intervals Easy to get CIs for individual parameters Additionally, CIs on the moderated VCs are useful for interpretation e.g. a 95% CI for (a+ M) 2, for a specific M
Define two extra vectors in Group 1 P full 1 13 O Unit 1 13 Matrix P Add a 4 th group to calculate the CIs CIs Calc Matrices = Group 1 Begin Algebra; F= ( + ). ( + ) / G= ( + ). ( + ) / I= ( + ). ( + ) / End Algebra; 95 F 1 1 to F G 1 1 to G I 1 1 to I 1 13 End;
Calculation of CIs F= ( + ). ( + ) / E.g. if P were then ( + ) equals or Finally, the dot-product squares all elements to give or
Confidence intervals on VCs A C E
Other considerations Simple approach to test for heterogeneity –easily adapted, e.g. for ordinal data models Extensions / things to watch for… –scalar v.s. qualitative heterogeneity v. low power –the environment may show shared genetic influence with the trait –nonlinear effects in both mediation and moderation
X E G Main effect Moderating G E r GE
Turkheimer et al, 2003 SES IQ SES V(IQ)
Simulated twin data Moderator Standard Quadratic E(Trait) A 3 df test of any moderating effect Standard analysis : linear means model (in H A and H 0 ) Quadratic analysis : linear and quadratic means model (in H A and H 0 ) 18/50 replicates significant i.e. type I error 36% for nominal 5% level
More complex G E interaction E-risk Trait P(disease)
Include E-risk in means model E-risk Residual Trait P(disease | E-risk)
Biometrical model E-risk Additive genetic effect Quadratic form AaAAaa
ACE - XYZ - X 2 Y 2 Z 2 - M Twin 1 ACE Twin 2 ACE a + X M 1 + X M 2 1 c e ce a+XM2+XM22a+XM2+XM22