Continuous heterogeneity Shaun Purcell Boulder Twin Workshop March 2004.

Slides:



Advertisements
Similar presentations
Gene-environment interaction models
Advertisements

Qualitative predictor variables
Bivariate analysis HGEN619 class 2007.
Sampling: Final and Initial Sample Size Determination
Estimating “Heritability” using Genetic Data David Evans University of Queensland.
Multivariate Analysis Nick Martin, Hermine Maes TC21 March 2008 HGEN619 10/20/03.
Extended sibships Danielle Posthuma Kate Morley Files: \\danielle\ExtSibs.
Summarizing Data Nick Martin, Hermine Maes TC21 March 2008.
Heterogeneity Hermine Maes TC19 March Files to Copy to your Computer Faculty/hmaes/tc19/maes/heterogeneity  ozbmi.rec  ozbmi.dat  ozbmiysat(4)(5).mx.
(Re)introduction to Mx. Starting at the beginning Data preparation Mx expects 1 line per case/family Almost limitless number of families and variables.
Introduction to Linkage
Univariate Analysis in Mx Boulder, Group Structure Title Type: Data/ Calculation/ Constraint Reading Data Matrices Declaration Assigning Specifications/
Multivariate Analysis Hermine Maes TC19 March 2006 HGEN619 10/20/03.
Biometrical Genetics Pak Sham & Shaun Purcell Twin Workshop, March 2002.
Genetic Dominance in Extended Pedigrees: Boulder, March 2008 Irene Rebollo Biological Psychology Department, Vrije Universiteit Netherlands Twin Register.
Heterogeneity II Danielle Dick, Tim York, Marleen de Moor, Dorret Boomsma Boulder Twin Workshop March 2010.
Univariate Analysis Hermine Maes TC19 March 2006.
Gene x Environment Interactions Brad Verhulst (With lots of help from slides written by Hermine and Liz) September 30, 2014.
Path Analysis Frühling Rijsdijk SGDP Centre Institute of Psychiatry King’s College London, UK.
Mx Practical TC18, 2005 Dorret Boomsma, Nick Martin, Hermine H. Maes.
Introduction to Multivariate Genetic Analysis Kate Morley and Frühling Rijsdijk 21st Twin and Family Methodology Workshop, March 2008.
Path Analysis Frühling Rijsdijk. Biometrical Genetic Theory Aims of session:  Derivation of Predicted Var/Cov matrices Using: (1)Path Tracing Rules (2)Covariance.
Raw data analysis S. Purcell & M. C. Neale Twin Workshop, IBG Colorado, March 2002.
Power and Sample Size Adapted from: Boulder 2004 Benjamin Neale Shaun Purcell I HAVE THE POWER!!!
Karri Silventoinen University of Helsinki Osaka University.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
Karri Silventoinen University of Helsinki Osaka University.
Statistics 11 Correlations Definitions: A correlation is measure of association between two quantitative variables with respect to a single individual.
Continuously moderated effects of A,C, and E in the twin design Conor V Dolan & Sanja Franić (BioPsy VU) Boulder Twin Workshop March 4, 2014 Based on PPTs.
 Go to Faculty/marleen/Boulder2012/Moderating_cov  Copy all files to your own directory  Go to Faculty/sanja/Boulder2012/Moderating_covariances _IQ_SES.
Cholesky decomposition May 27th 2015 Helsinki, Finland E. Vuoksimaa.
Univariate modeling Sarah Medland. Starting at the beginning… Data preparation – The algebra style used in Mx expects 1 line per case/family – (Almost)
Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.
The importance of the “Means Model” in Mx for modeling regression and association Dorret Boomsma, Nick Martin Boulder 2008.
Gene-Environment Interaction & Correlation Danielle Dick & Danielle Posthuma Leuven 2008.
GENETIC ANALYSIS OF BINARY and CATEGORICAL TRAITS PART ONE.
Univariate Analysis Hermine Maes TC21 March 2008.
Mx modeling of methylation data: twin correlations [means, SD, correlation] ACE / ADE latent factor model regression [sex and age] genetic association.
Means, Thresholds and Moderation Sarah Medland – Boulder 2008 Corrected Version Thanks to Hongyan Du for pointing out the error on the regression examples.
Mx Practical TC20, 2007 Hermine H. Maes Nick Martin, Dorret Boomsma.
David M. Evans Multivariate QTL Linkage Analysis Queensland Institute of Medical Research Brisbane Australia Twin Workshop Boulder 2003.
Continuous heterogeneity Danielle Dick & Sarah Medland Boulder Twin Workshop March 2006.
Frühling Rijsdijk & Kate Morley
Categorical Data Frühling Rijsdijk 1 & Caroline van Baal 2 1 IoP, London 2 Vrije Universiteit, A’dam Twin Workshop, Boulder Tuesday March 2, 2004.
Welcome  Log on using the username and password you received at registration  Copy the folder: F:/sarah/mon-morning To your H drive.
Linkage in Mx & Merlin Meike Bartels Kate Morley Hermine Maes Based on Posthuma et al., Boulder & Egmond.
Introduction to Multivariate Genetic Analysis Danielle Posthuma & Meike Bartels.
Developmental Models/ Longitudinal Data Analysis Danielle Dick & Nathan Gillespie Boulder, March 2006.
QTL Mapping Using Mx Michael C Neale Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University.
March 7, 2012M. de Moor, Twin Workshop Boulder1 Copy files Go to Faculty\marleen\Boulder2012\Multivariate Copy all files to your own directory Go to Faculty\kees\Boulder2012\Multivariate.
Multivariate Genetic Analysis (Introduction) Frühling Rijsdijk Wednesday March 8, 2006.
Slides: faculty/sanja/2016/Moderating_covariances_IQ_SES/Slides/Moderating_covariances_practical.
Categorical Data HGEN
Intro to Mx HGEN619 class 2005.
CHAPTER 7 Linear Correlation & Regression Methods
Introduction to Multivariate Genetic Analysis
Heterogeneity HGEN619 class 2007.
Univariate Analysis HGEN619 class 2006.
Univariate modeling Sarah Medland.
GxE and GxG interactions
Liability Threshold Models
Why general modeling framework?
Heterogeneity Danielle Dick, Hermine Maes,
Pak Sham & Shaun Purcell Twin Workshop, March 2002
(Re)introduction to Mx Sarah Medland
Sarah Medland faculty/sarah/2018/Tuesday
Power Calculation for QTL Association
BOULDER WORKSHOP STATISTICS REVIEWED: LIKELIHOOD MODELS
Multivariate Genetic Analysis: Introduction
Presentation transcript:

Continuous heterogeneity Shaun Purcell Boulder Twin Workshop March 2004

Raw data VS summary statistics ZygT1 T MZ DZ

Raw data VS summary statistics ZygT1 T

Raw data VS summary statistics ZygT1 T2age

Variance Bivariate normal distribution DataMean

Introducing Definition variables Zygosity as a definition variable “Rectangular” file data.raw

!Using definition variables Group1: Defines Matrices Calc NGroups=2 Begin Matrices; X Lower 1 1 free Y Lower 1 1 free Z Lower 1 1 free M full 1 1 free H Full 1 1 End Matrices; Begin Algebra; A= X*X'; C = Y*Y'; E = Z*Z'; End Algebra; Ma X 0 Ma Y 0 Ma Z 1 Ma M 0 Options MX%P=rawfit.txt End Group2: MZ & DZ twin pairs Data NInput_vars=4 NObservations=0 RE file=data.raw Labels id zyg t1 t2 Select t1 t2 zyg / Definition zyg / Matrices = Group 1 Means M | M / Covariances A + C + E | + C _ + C | A + C + E / Specify H -1 End H will be specified as a definition variable M, necessary for the means model Optional: request individual fit statistics for each pair A single group for both MZ & DZ twins Points to a “REctangular” data file No need to specify number of pairs Zygosity is a “Definition” variable Multiply A component by 1/H 1 x 1 matrix H represents each pair’s zygosity A model for the means [ twin1 | twin 2]

Output from zyg.mx RE FILE=DATA.RAW Rectangular continuous data read initiated NOTE: Rectangular file contained 500 records with data that contained a total of 2000 observations LABELS ID ZYG T1 T2 SELECT T1 T2 ZYG / DEFINITION ZYG / NOTE: Selection yields 500 data vectors for analysis NOTE: Vectors contain a total of 1500 observations NOTE: Definition yields 500 data vectors for analysis NOTE: Vectors contain a total of 1000 observations

Output from zyg.mx Summary of VL file data for group 2 ZYG T1 T2 Code Number Mean Variance Minimum Maximum

Output from zyg.mx MATRIX H This is a FULL matrix of order 1 by MATRIX M This is a FULL matrix of order 1 by MATRIX X This is a LOWER TRIANGULAR matrix of order 1 by MATRIX Y This is a LOWER TRIANGULAR matrix of order 1 by MATRIX Z This is a LOWER TRIANGULAR matrix of order 1 by Specify H -1

Output from zyg.mx Your model has 4 estimated parameters and 1000 Observed statistics -2 times log-likelihood of data >>> Degrees of freedom >>>>>>>>>>>>>>>> 996 Fixing X to zero Your model has 3 estimated parameters and 1000 Observed statistics -2 times log-likelihood of data >>> Degrees of freedom >>>>>>>>>>>>>>>> 997

Continuous moderators Traits often best defined continuously Many environmental moderators also likely to be continuous in nature –Age –Gestational age –Socio-economic status –Educational level –Consumption of food / alcohol / drugs How to test for G x E interaction in this case?

Continuous moderators Problems? –Stratification of sample  reduced sample size –Modelling proportions of variance implicitly assumes equality of variance w.r.t moderator –Logical to assume a linear G  E interaction linearity at the level of effect, not variance –No obvious statistical test for heterogeneity Heritability Age (yrs) 0% 100%

Biometrical G  E model At a hypothetical single locus –additive genetic valuea –allele frequency p –QTL variance2p(1-p)a 2 Assuming a linear interaction –additive genetic valuea +  M –allele frequency p –QTL variance2p(1-p)(a +  M) 2

Biometrical G  E model M No interaction AaAAaa a 0 -a M Interaction  1 -- 1  M Equivalently… 22 1  1

Model-fitting approach to GxE Twin 1 ACE Twin 2 ACE a c e ce a

Model-fitting approach to GxE Twin 1 ACE Twin 2 ACE a+  X M c e ce a+XMa+XM Continuous moderator variable M Can be coded 0 / 1 in the dichotomous case

Individual specific moderators Twin 1 ACE Twin 2 ACE a+  X M 1 c e ce a+XM2a+XM2

E x E interactions Twin 1 ACE Twin 2 ACE a+  X M 1 c+  Y M 1 e+  Z M 1 a+  X M 2 c+  Y M 2 e+  Z M 2

ACE - XYZ - M Twin 1 ACE Twin 2 ACE a+XM1a+XM1 c+YM1c+YM1 e+ZM1e+ZM1 a+XM2a+XM2 c+YM2c+YM2 e+ZM2e+ZM2 M m+MM1m+MM1 M m+MM2m+MM2 Main effects and moderating effects statistically and conceptually distinct

Model-fitting approach to GxE C A E Component of variance Moderator variable

Turkheimer et al (2003) 320 twin pairs recruited at birth from urban hospitals G : additive genetic variance E : SES –parental education, occupation, income X : IQ –Wechsler; Verbal, Performance, Full

A CE Full scale IQ Verbal IQ Non-Verbal IQ

Standard model Means vector Covariance matrix

Allowing for a main effect of X Means vector Covariance matrix

! Basic model + main effect of a definition variable G1: Define Matrices Data Calc NGroups=3 Begin Matrices; A full 1 1 free C full 1 1 free E full 1 1 free M full 1 1 free! grand mean B full 1 1 free ! moderator-linked means model H full 1 1 R full 1 1! twin 1 moderator (definition variable) S full 1 1! twin 2 moderator (definition variable) End Matrices; Ma M 0 Ma B 0 Ma A 1 Ma C 1 Ma E 1 Matrix H.5 Options NO_Output End

G2: MZ Data NInput_vars=6 NObservations=0 Missing =-999 RE File=f1.dat Labels id zyg p1 p2 m1 m2 Select if zyg = 1 / Select p1 p2 m1 m2 / Definition m1 m2 / Matrices = Group 1 Means M + B*R | M + B*S / Covariance A*A' + C*C' + E*E' | A*A' + C*C' _ A*A' + C*C' | A*A' + C*C' + E*E' / !twin 1 moderator variable Specify R -1 !twin 2 moderator variable Specify S -2 Options NO_Output End

G3: DZ Data NInput_vars=6 NObservations=0 Missing =-999 RE File=f1.dat Labels id zyg p1 p2 m1 m2 Select if zyg = 2 / Select p1 p2 m1 m2 / Definition m1 m2 / Matrices = Group 1 Means M + B*R | M + B*S / Covariance A*A' + C*C' + E*E' | + C*C' _ + C*C' | A*A' + C*C' + E*E' / !twin 1 moderator variable Specify R -1 !twin 2 moderator variable Specify S -2 End

MATRIX A This is a FULL matrix of order 1 by MATRIX B This is a FULL matrix of order 1 by MATRIX C This is a FULL matrix of order 1 by MATRIX E This is a FULL matrix of order 1 by MATRIX M This is a FULL matrix of order 1 by Your model has 5 estimated parameters and 800 Observed statistics -2 times log-likelihood of data >>> Degrees of freedom >>>>>>>>>>>>>>>> 795

MATRIX A This is a FULL matrix of order 1 by MATRIX B This is a FULL matrix of order 1 by MATRIX C This is a FULL matrix of order 1 by MATRIX E This is a FULL matrix of order 1 by MATRIX M This is a FULL matrix of order 1 by Your model has 4 estimated parameters and 800 Observed statistics -2 times log-likelihood of data >>> Degrees of freedom >>>>>>>>>>>>>>>> 796

Continuous heterogeneity model Means vector Covariance matrix

! GxE - Basic model G1: Define Matrices Data Calc NGroups=3 Begin Matrices; A full 1 1 free C full 1 1 free E full 1 1 free T full 1 1 free! moderator-linked A component U full 1 1 free! moderator-linked C component V full 1 1 free! moderator-linked E component M full 1 1 free! grand mean B full 1 1 free ! moderator-linked means model H full 1 1 R full 1 1! twin 1 moderator (definition variable) S full 1 1! twin 2 moderator (definition variable) End Matrices; Ma T 0 Ma U 0 Ma V 0 Ma M 0 Ma B 0 Ma A 1 Ma C 1 Ma E 1 Matrix H.5 Options NO_Output End

G2: MZ Data NInput_vars=6 NObservations=0 Missing =-999 RE File=f1.dat Labels id zyg p1 p2 m1 m2 Select if zyg = 1 / Select p1 p2 m1 m2 / Definition m1 m2 / Matrices = Group 1 Means M + B*R | M + B*S / Covariance (A+T*R)*(A+T*R) + (C+U*R)*(C+U*R) + (E+V*R)*(E+V*R) | (A+T*R)*(A+T*S) + (C+U*R)*(C+U*S) _ (A+T*S)*(A+T*R) + (C+U*S)*(C+U*R) | (A+T*S)*(A+T*S) + (C+U*S)*(C+U*S) + (E+V*S)*(E+V*S) / !twin 1 moderator variable Specify R -1 !twin 2 moderator variable Specify S -2 Options NO_Output End

G3: DZ Data NInput_vars=6 NObservations=0 Missing =-999 RE File=f1.dat Labels id zyg p1 p2 m1 m2 Select if zyg = 2 / Select p1 p2 m1 m2 / Definition m1 m2 / Matrices = Group 1 Means M + B*R | M + B*S / Covariance (A+T*R)*(A+T*R) + (C+U*R)*(C+U*R) + (E+V*R)*(E+V*R) | + (C+U*R)*(C+U*S) _ + (C+U*S)*(C+U*R) | (A+T*S)*(A+T*S) + (C+U*S)*(C+U*S) + (E+V*S)*(E+V*S) / !twin 1 moderator variable Specify R -1 !twin 2 moderator variable Specify S -2 End

Practical 1 The script: mod.mx The data: f1.dat IDzygositytrait_twin_1trait_twin_2mod_twin_1mod_twin_2 1.Any evidence for G × E for this trait ? i.e. does the A latent variable show heterogeneity with respect to the moderator variable 2.If so, in what way? i.e. how would you interpret/describe the effect?

Practical 1 : f1.dat Moderator distributionMZ pairs (trait) DZ pairs (trait) All twin 1’s v.s. moderator

nomod.mx a1.3078a 2 ~ 1.7 c1.1733c 2 ~ 1.4 e e 2 ~ 0.95 a 2 +c 2 +e 2 = 4.05 i.e. % variance is 42%, 35% and 23%

Parameter estimates: mod.mx ACE-XYZ-MACE-YZ-M A C E T U V M B

Plotting VCs For the additive genetic VC, for example –Given a,  and a range of values for the moderator variable For example, a = 0.5,  = -0.2 and M ranges from -2 to +2 M (a+M)2(a+M)2 (a+M)2(a+M)2 -2(0.5+(-0.2×-2)) (0.5+(-0.2×-1.5)) … +2(0.5+(-0.2×2))

Specific test of G×E -2LLDf Full model ACE-XYZ-M Sub model ACE-YZ-M Difference p-value =

Other tests TestSubmodel-2LLΔdfp-value Y ACE-XZ-M Z ACE-XY-M < 1× M ACE-XYZ C & Y AE-XZ-M All made against the full model ACE-XYZ-M, -2LL =

Confidence intervals Easy to get CIs for individual parameters Additionally, CIs on the moderated VCs are useful for interpretation e.g. a 95% CI for (a+  M) 2, for a specific M

Define two extra vectors in Group 1 P full 1 13 O Unit 1 13 Matrix P Add a 4 th group to calculate the CIs CIs Calc Matrices = Group 1 Begin Algebra; F= ( + ). ( + ) / G= ( + ). ( + ) / I= ( + ). ( + ) / End Algebra; 95 F 1 1 to F G 1 1 to G I 1 1 to I 1 13 End;

Calculation of CIs F= ( + ). ( + ) / E.g. if P were then ( + ) equals or Finally, the dot-product squares all elements to give or

Confidence intervals on VCs A C E

Other considerations Simple approach to test for heterogeneity –easily adapted, e.g. for ordinal data models Extensions / things to watch for… –scalar v.s. qualitative heterogeneity v. low power –the environment may show shared genetic influence with the trait –nonlinear effects in both mediation and moderation

X E G Main effect Moderating G  E r GE

Turkheimer et al, 2003 SES IQ SES V(IQ)

Simulated twin data Moderator Standard Quadratic E(Trait) A 3 df test of any moderating effect Standard analysis : linear means model (in H A and H 0 ) Quadratic analysis : linear and quadratic means model (in H A and H 0 ) 18/50 replicates significant i.e. type I error 36% for nominal 5% level

More complex G  E interaction E-risk Trait P(disease)

Include E-risk in means model E-risk Residual Trait P(disease | E-risk)

Biometrical model E-risk Additive genetic effect Quadratic form AaAAaa

ACE - XYZ - X 2 Y 2 Z 2 - M Twin 1 ACE Twin 2 ACE a +  X M 1 +  X M 2 1 c e ce a+XM2+XM22a+XM2+XM22