Multivariate Twin Analysis

Slides:



Advertisements
Similar presentations
Linear Transformation of post-microaggregated data Mi-Ja Woo National Institute of Statistical Sciences.
Advertisements

Program verification: flowchart programs Book: chapter 7.
1 Program verification: flowchart programs (Book: chapter 7)
OpenMx Frühling Rijsdijk.
On / By / With The building blocks of the Mplus language.
Gradient of a straight line x y 88 66 44 2 44 4 For the graph of y = 2x  4 rise run  = 8  4 = 2 8 rise = 8 4 run = 4 Gradient = y.
STATE SPACE MODELS MATLAB Tutorial.
Principal Component Analysis and Linear Discriminant Analysis
1 General Structural Equation (LISREL) Models Week 3 #2 A.Multiple Group Models with > 2 groups B.Relationship to ANOVA, ANCOVA models C.Introduction to.
Bivariate analysis HGEN619 class 2007.
Using MX for SEM analysis. Using Lisrel Analysis of Reader Reliability in Essay Scoring Votaw's Data Tau-Equivalent Model DA NI=4 NO=126 LA ORIGPRT1 WRITCOPY.
The use of Cholesky decomposition in multivariate models of sex-limited genetic and environmental effects Michael C. Neale Virginia Institute for Psychiatric.
Multivariate Mx Exercise D Posthuma Files: \\danielle\Multivariate.
Factor analysis Caroline van Baal March 3 rd 2004, Boulder.
(Re)introduction to Mx Sarah Medland. KiwiChinese Gooseberry.
Multivariate Analysis Nick Martin, Hermine Maes TC21 March 2008 HGEN619 10/20/03.
Introduction to Multivariate Analysis Frühling Rijsdijk & Shaun Purcell Twin Workshop 2004.
Multivariate Genetic Analysis: Introduction(II) Frühling Rijsdijk & Shaun Purcell Wednesday March 6, 2002.
Summarizing Data Nick Martin, Hermine Maes TC21 March 2008.
Developmental models. Multivariate analysis choleski models factor models –y =  f + u genetic factor models –P j = h G j + c C j + e E j –common pathway.
Heterogeneity Hermine Maes TC19 March Files to Copy to your Computer Faculty/hmaes/tc19/maes/heterogeneity  ozbmi.rec  ozbmi.dat  ozbmiysat(4)(5).mx.
(Re)introduction to Mx. Starting at the beginning Data preparation Mx expects 1 line per case/family Almost limitless number of families and variables.
Introduction to Linkage
Univariate Analysis in Mx Boulder, Group Structure Title Type: Data/ Calculation/ Constraint Reading Data Matrices Declaration Assigning Specifications/
Continuous heterogeneity Shaun Purcell Boulder Twin Workshop March 2004.
Multivariate Analysis Hermine Maes TC19 March 2006 HGEN619 10/20/03.
Genetic Dominance in Extended Pedigrees: Boulder, March 2008 Irene Rebollo Biological Psychology Department, Vrije Universiteit Netherlands Twin Register.
Univariate Analysis Hermine Maes TC19 March 2006.
Mx Practical TC18, 2005 Dorret Boomsma, Nick Martin, Hermine H. Maes.
Introduction to Multivariate Genetic Analysis Kate Morley and Frühling Rijsdijk 21st Twin and Family Methodology Workshop, March 2008.
Path Analysis Frühling Rijsdijk. Biometrical Genetic Theory Aims of session:  Derivation of Predicted Var/Cov matrices Using: (1)Path Tracing Rules (2)Covariance.
Raw data analysis S. Purcell & M. C. Neale Twin Workshop, IBG Colorado, March 2002.
Multivariate Threshold Models Specification in Mx.
Power and Sample Size Adapted from: Boulder 2004 Benjamin Neale Shaun Purcell I HAVE THE POWER!!!
Karri Silventoinen University of Helsinki Osaka University.
Introduction to Multivariate Genetic Analysis (2) Marleen de Moor, Kees-Jan Kan & Nick Martin March 7, 20121M. de Moor, Twin Workshop Boulder.
Cholesky decomposition May 27th 2015 Helsinki, Finland E. Vuoksimaa.
Heterogeneity Hermine Maes TC21 March Files to Copy to your Computer Faculty/hmaes/tc19/maes/heterogeneity  ozbmi.rec  ozbmi.dat  ozbmiysat(4)(5).mx.
Univariate modeling Sarah Medland. Starting at the beginning… Data preparation – The algebra style used in Mx expects 1 line per case/family – (Almost)
Longitudinal Modeling Nathan Gillespie & Dorret Boomsma \\nathan\2008\Longitudinal neuro_f_chol.mx neuro_f_simplex.mx jepq6.dat.
Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.
The importance of the “Means Model” in Mx for modeling regression and association Dorret Boomsma, Nick Martin Boulder 2008.
Univariate Analysis Hermine Maes TC21 March 2008.
Mx modeling of methylation data: twin correlations [means, SD, correlation] ACE / ADE latent factor model regression [sex and age] genetic association.
Means, Thresholds and Moderation Sarah Medland – Boulder 2008 Corrected Version Thanks to Hongyan Du for pointing out the error on the regression examples.
Mx Practical TC20, 2007 Hermine H. Maes Nick Martin, Dorret Boomsma.
Continuous heterogeneity Danielle Dick & Sarah Medland Boulder Twin Workshop March 2006.
Frühling Rijsdijk & Kate Morley
Categorical Data Frühling Rijsdijk 1 & Caroline van Baal 2 1 IoP, London 2 Vrije Universiteit, A’dam Twin Workshop, Boulder Tuesday March 2, 2004.
Welcome  Log on using the username and password you received at registration  Copy the folder: F:/sarah/mon-morning To your H drive.
Linkage in Mx & Merlin Meike Bartels Kate Morley Hermine Maes Based on Posthuma et al., Boulder & Egmond.
Introduction to Multivariate Genetic Analysis Danielle Posthuma & Meike Bartels.
Developmental Models/ Longitudinal Data Analysis Danielle Dick & Nathan Gillespie Boulder, March 2006.
QTL Mapping Using Mx Michael C Neale Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University.
March 7, 2012M. de Moor, Twin Workshop Boulder1 Copy files Go to Faculty\marleen\Boulder2012\Multivariate Copy all files to your own directory Go to Faculty\kees\Boulder2012\Multivariate.
Multivariate Genetic Analysis (Introduction) Frühling Rijsdijk Wednesday March 8, 2006.
Slides: faculty/sanja/2016/Moderating_covariances_IQ_SES/Slides/Moderating_covariances_practical.
Categorical Data HGEN
Genetic simplex model: practical
Multivariate Analysis
Bivariate analysis HGEN619 class 2006.
Introduction to Multivariate Genetic Analysis
Heterogeneity HGEN619 class 2007.
Univariate Analysis HGEN619 class 2006.
Heterogeneity Danielle Dick, Hermine Maes,
(Re)introduction to Mx Sarah Medland
Multivariate Genetic Analysis
Multivariate Genetic Analysis: Introduction
Presentation transcript:

Multivariate Twin Analysis Multivariate Analysis of Twin Data Multivariate Twin Analysis Variable 1 Variable 2 A C E a2 c2 e2 rA rE rC a1 c1 e1 Multivariate analyses (analyses of more than one variable at a time) can tell us about the relationships between variables. We can look at variance that is shared between different variables - the covariance between them - and the variance that is unique to each variable. The degree of variance shared by two variables is given by the phenotypic correlation between them. Tom Price Frühling Rijsdijk 1

Bivariate Cholesky Decomposition Multivariate Analysis of Twin Data Bivariate Cholesky Decomposition Variable 1 Variable 2 V1 v1 V2 v3 v2 The Cholesky decomposition (above) assumes that one variable is causally prior to another. In this example, it is assumed that the covariance between Variable 1 and Variable 2 arises from the causal influence of Variable 1 on Variable 2. All the variance on Variable 1 is due to the latent factor V1, and is equivalent to v1 x v1. The covariance due to the influence of V1 on Variable 2 is represented by the path v1 x v2. The latent factor V2 is uncorrelated with V1 and causes the unique variance on Variable 2, represented by v2 x v2. This model is commonly used to model longitudinal data, for example where Variable 1 is a measurement at one time point and Variable 2 is a measurement at a later time point. On this case, the covariance between the measures represents continuity of measurement and the unique variance represents measurement change. Another use for this model might be to analyse the variance on Variable 2 distinct from the covariance with Variable 1. For example, we might wish to examine reading ability (Variable 2) distinct from IQ (variable 1). The example above is equivalent to a linear regression, where v2 estimates the regression coefficient of Variable 1 on Variable 2. 2

Longitudinal Analysis Multivariate Analysis of Twin Data Longitudinal Analysis Boomsma & van Baal, 1998 V1 V2 1.0 .63 .78 The cholesky model is commonly used to model longitudinal data, for example IQ measured at age 5 and Variable 2 is IQ measured at age 7. On this case, the covariance between the measures represents continuity of measurement and the unique variance represents measurement change. The example here shows the results from a Dutch study, standardised so that the observed variables have unit variance. In this case, the covariance (.63) is the same as the observed correlation. Question 1. Is there more continuity, or more change? How much more? IQ age 5 IQ age 7 3

Another use for the Cholesky Multivariate Analysis of Twin Data Another use for the Cholesky V1 V2 v1 v2 v3 Questions to consider 1. What might this model tell us? Another use for the Cholesky model is to analyse the variance on one variable distinct from the covariance with another. For example, we might wish to examine reading ability (second variable) distinct from IQ (first variable). The example above is equivalent to a linear regression, where v2 estimates the regression coefficient of IQ on Reading. 2. What assumptions does it make? IQ Reading 4

Bivariate Correlated Factors Model Multivariate Analysis of Twin Data Bivariate Correlated Factors Model r V1 V2 v1 v2 An alternative multivariate model is the correlated factors model. The bivariate (2 variable) model above assumes that the variances on Variable 1 and Variable 2 are due to the influence of the latent factors V1 and V2, respectively. These latent factors are linked by a correlation r. In the model above, r is equivalent to the estimated phenotypic correlation between Variable 1 and Variable 2. The covariance is given by the pathway v1 x r x v2. This model is often used to investigate the relationships between two outcomes which might be influenced by shared risk factors. The correlation r measures the degree of shared risk. Variable 1 Variable 2 5

Bivariate Cholesky Decomposition Multivariate Analysis of Twin Data Conversion Bivariate Cholesky Decomposition Variable 1 Variable 2 V1 x1 V2 x3 x2 Bivariate Correlated Factors Model r V1 V2 y1 y2 Variable 1 Variable 2 If we estimate all the paths in a Cholesky or Correlated Factors model, it is called a saturated model. This means that it is the most complete model that we can specify. In fact these models estimate one parameter for each data point. If we tried to estimate more parameters, the parameters would not be specified uniquely. When this happens we say that the parameters are not identified. Saturated Cholesky and Correlated Factor models are actually equivalent to each other. The parameter estimates from one model can be converted into parameter estimates for the other, using the rules laid out above. The Cholesky model is simple to specify and gives robust estimates. Because of this, we often use the Cholesky model to get our parameter estimates, and then convert the results into parameters that we would have got from a Correlated Factor model. y1 = x1 y2 =  ( x22 + x32 ) r = x2 / y2 6

Multivariate Analysis of Twin Data Univariate Twin Model MZ = 1.0 DZ = 0.5 MZ = 1.0 DZ = 1.0 A C A C E E a c e a c e This is a recap of the univariate (1 variable) twin model. The variance is divided into components: additive genetics (a2), shared environment (c2), and nonshared environment (e2). MZ twins share all their additive genetic variance, DZ share half their additive genetic variance. Shared environment is assumed to be the same within twin pairs. Nonshared environment is not shared within twin pairs. Twin 1 Twin 2 7

Bivariate Cholesky Decomposition Multivariate Analysis of Twin Data Bivariate Cholesky Decomposition A1 C1 E1 A2 C2 E2 x2 y2 z2 x3 y3 z3 x1 y1 z1 When we have multivariate twin data we can use an extension of the Cholesky decomposition to analyse continuity and change in the genetic and environmental influences on our variables. The figure above shows the bivariate model for one twin. In the univariate genetic model we broke down the variance on our variable into additive genetics, shared environment and nonshared environment. In this multivariate model, we can break down the unique variances and covariance in the same way. Variable 1 Twin 1 Variable 2 Twin 1 8

Multivariate Analysis of Twin Data Childhood IQ Boomsma & van Baal, 1998 A1 C1 E1 A2 E2 If we take our previous example that Variable 1 and Variable 2 represent IQ measured at 5 and 10 years of age, then using this model we can look at two things: (1) The degree to which continuity (covariance) and change (unique variance on later variable) are mediated by genetic and environmental pathways. In other words, we can decompose the phenotypic correlation between our variables into genetic and environmental covariances. Similarly, we can divide the unique variances into genetic and environmental components. (2) We can also examine the continuity and change in the genetic and environmental influences themselves. This can lead to interesting results. For example, environmental effects on childhood IQ often found to be stable for shared environment and age-specific for nonshared environment. IQ age 5 IQ age 7 9

Bivariate Correlated Factors Model Multivariate Analysis of Twin Data Bivariate Correlated Factors Model rC rA rE A1 C1 E1 a1 c1 e1 A2 E2 C2 a2 c2 e2 We can also extend the correlated factors model to analyse twin data. The figure above illustrates a bivariate genetic model for one twin. Using this model we can break down the phenotypic correlation between Variable 1 and Variable 2 into its genetic and environmental constituents. The proportion of the phenotypic correlation that is mediated by shared genetic factors is called the bivariate heritability. The genetic covariance is given by the pathway a1 x rG x a2, where rG is the genetic correlation. The genetic correlation represents the overlap between genetic influences on Variable 1 and Variable 2. Similarly, the shared environment and nonshared environment components of covariance are given by c1 x rC x c2 and e1 x rE x e2 where rC and rE are the shared environment correlation and the nonshared environment correlation. Variable 1 Twin 1 Variable 2 Twin 1 10

Multivariate Analysis of Twin Data Genetic Correlation rA = 1.0 A1 E1 A2 E2 A1 E1 A2 E2 .30 .40 .90 .80 Variable 1 Variable 2 Variable 1 Variable 2 The genetic correlation is independent of the heritability of either variable. So, for instance, the variables could be influenced mainly by the environment - but what genetic influences there are, are the same for both variables. Or, both measures could be highly heritable but not share any genetic influences (rA = 0). Low heritability, high genetic correlation High heritability, low genetic correlation 11

Multivariate Analysis of Twin Data Full Bivariate Model A1 C1 E1 a1 c1 e1 A1 C1 E1 a1 c1 e1 A2 C2 E2 A2 C2 E2 This figure illustrates the full bivariate correlated factors model for both twins. Question 1. What is bivariate heritability? Can you have a large bivariate heritability between two variables that are not very heritable? 2. What is a genetic correlation? Can you have a large genetic correlation between two variables that are not very heritable? a2 c2 e2 a2 c2 e2 Variable 1 Twin 1 Variable 2 Twin 1 Variable 1 Twin 2 Variable 2 Twin 2 12

Conversion a1 = x1 c1 = y1 e1 = z1 a2 =  ( x22 + x32 ) Multivariate Analysis of Twin Data Conversion J. C. Loehlin, Behavior Genetics, 26, 65-69. Bivariate Cholesky Decomposition Bivariate Correlated Factors Model rC rA rE A C E A C E A C E A C E x2 y2 z2 x3 y3 z3 a1 c1 e1 a1 c1 e1 x1 y1 z1 Variable 1 Twin 1 Variable 2 Twin 1 Variable 1 Twin 1 Variable 1 Twin 1 The figure above illustrates the saturated genetic Cholesky and genetic Correlated Factors models. As with the phenotypic models, the saturated genetic models are actually equivalent to each other. They should fit the data equally well (or badly). The parameter estimates from one model can be converted into parameter estimates for the other, using the simple transformations above. As before, we often use the Cholesky model to get our parameter estimates, and then convert the results into parameters that we would have got from a Correlated Factor model. a1 = x1 c1 = y1 e1 = z1 a2 =  ( x22 + x32 ) c2 =  ( y22 + y32 ) e2 =  ( z22 + z32 ) rA = x2 / a2 rC = y2 / c2 rE = z2 / e2 13

Multivariate Analysis of Twin Data Practical session 1. Use the TEDS dataset to derive MZ and DZ covariance matrices for the variables PARCA1, VOCAB1, PARCA2, VOCAB2. (The instructors will show you where to find the SPSS dataset and script that you will need.) 2. Insert the covariance matrices into the bivariate correlated factors Mx script. Is the script ready to run yet? What else will you need to do before running the script? 3. Run the Mx script and check the output. Has it run properly or are there error messages? What does the output tell you? 4. Think how you might modify the script to test the data in other ways. 14

Multivariate Analysis of Twin Data SPSS script to make covariance matrices USE ALL. COMPUTE filter_$=(atwin=1 and zyg=1). VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'. FORMAT filter_$ (f1.0). FILTER BY filter_$. EXECUTE . REGRESSION VARIABLES (COLLECT) /MISSING LISTWISE /DESCRIPTIVES COVARIANCES /DEPENDENT PARCA1 /METHOD=ENTER VOCAB1 PARCA2 VOCAB2. COMPUTE filter_$=(atwin=1 and zyg=2). 15

Multivariate Analysis of Twin Data Bivariate correlated factors Mx script ! Genetic correlated factors model #Define nvar= 2 G1: Model parameters Data Calc NGroups=4 Begin Matrices; X Lower nvar nvar Free ! genetic parameters Y Lower nvar nvar Free ! shared environment parameters Z Lower nvar nvar Free ! nonshared environment parameters L Diag nvar nvar Free ! variance estimates H Full 1 1 ! scalar .5 O Zero nvar nvar End Matrices; Begin Algebra; A= X * X' ; ! genetic variance/covariance C= Y * Y' ; ! shared environment variance/covariance E= Z * Z' ; ! nonshared environment variance/covariance End Algebra; Start .5 All Start 1 L 1 1 - L nvar nvar End [continued] 16

Multivariate Analysis of Twin Data Bivariate correlated factors Mx script G2: MZ twin pairs Data NInput_vars= 4 NObservations= XXX Cmatrix Full XXX XXX XXX XXX Labels PARCA1 VOCAB1 PARCA2 VOCAB2 Matrices= Group 1 Covariances ( L | O _ O | L ) & ( A + C + E | A + C _ A + C | A + C + E ) / Option RSidual End [continued] 17

Multivariate Analysis of Twin Data Bivariate correlated factors Mx script G3: DZ twin pairs Data NInput_vars= 4 NObservations= XXX Cmatrix Full XXX XXX XXX XXX Labels PARCA1 VOCAB1 PARCA2 VOCAB2 Matrices= Group 1 Covariances ( L | O _ O | L ) & ( A + C + E | H@A + C _ H@A + C | A + C + E ) / Option RSidual End [continued] 18

Multivariate Analysis of Twin Data Bivariate correlated factors Mx script G4: Standardise Estimates by constraining A + C + E = 1 Data Constraint Matrices = Group 1 I Unit 1 nvar End Matrices; Constrain \d2v( P ) = I; ! constrain to unit variance End G5: Calculate genetic / environmental correlations Data Calc I Iden nvar nvar Begin Algebra; U = \sqrt( I . A )~ * A * \sqrt( I . A )~; ! genetic correlations V = \sqrt( I . C )~ * C * \sqrt( I . C )~; ! SE correlations W = \sqrt( I . E )~ * E * \sqrt( I . E )~; ! NE environment correlations ! NB these are all versions of equation [7] ! another way of writing these equations is : ! U = \stnd( A ) ; etc. End Algebra; Intervals @95 A 1 1 A 2 2 C 1 1 C 2 2 E 1 1 E 2 2 U 2 1 V 2 1 W 2 1 ! See below for explanations of the matrix equations 19