Multivariate Twin Analysis

Presentation on theme: "Multivariate Twin Analysis"— Presentation transcript:

Multivariate Twin Analysis
Multivariate Analysis of Twin Data Multivariate Twin Analysis Variable 1 Variable 2 A C E a2 c2 e2 rA rE rC a1 c1 e1 Multivariate analyses (analyses of more than one variable at a time) can tell us about the relationships between variables. We can look at variance that is shared between different variables - the covariance between them - and the variance that is unique to each variable. The degree of variance shared by two variables is given by the phenotypic correlation between them. Tom Price Frühling Rijsdijk 1

Bivariate Cholesky Decomposition
Multivariate Analysis of Twin Data Bivariate Cholesky Decomposition Variable 1 Variable 2 V1 v1 V2 v3 v2 The Cholesky decomposition (above) assumes that one variable is causally prior to another. In this example, it is assumed that the covariance between Variable 1 and Variable 2 arises from the causal influence of Variable 1 on Variable 2. All the variance on Variable 1 is due to the latent factor V1, and is equivalent to v1 x v1. The covariance due to the influence of V1 on Variable 2 is represented by the path v1 x v2. The latent factor V2 is uncorrelated with V1 and causes the unique variance on Variable 2, represented by v2 x v2. This model is commonly used to model longitudinal data, for example where Variable 1 is a measurement at one time point and Variable 2 is a measurement at a later time point. On this case, the covariance between the measures represents continuity of measurement and the unique variance represents measurement change. Another use for this model might be to analyse the variance on Variable 2 distinct from the covariance with Variable 1. For example, we might wish to examine reading ability (Variable 2) distinct from IQ (variable 1). The example above is equivalent to a linear regression, where v2 estimates the regression coefficient of Variable 1 on Variable 2. 2

Longitudinal Analysis
Multivariate Analysis of Twin Data Longitudinal Analysis Boomsma & van Baal, 1998 V1 V2 1.0 .63 .78 The cholesky model is commonly used to model longitudinal data, for example IQ measured at age 5 and Variable 2 is IQ measured at age 7. On this case, the covariance between the measures represents continuity of measurement and the unique variance represents measurement change. The example here shows the results from a Dutch study, standardised so that the observed variables have unit variance. In this case, the covariance (.63) is the same as the observed correlation. Question 1. Is there more continuity, or more change? How much more? IQ age 5 IQ age 7 3

Another use for the Cholesky
Multivariate Analysis of Twin Data Another use for the Cholesky V1 V2 v1 v2 v3 Questions to consider 1. What might this model tell us? Another use for the Cholesky model is to analyse the variance on one variable distinct from the covariance with another. For example, we might wish to examine reading ability (second variable) distinct from IQ (first variable). The example above is equivalent to a linear regression, where v2 estimates the regression coefficient of IQ on Reading. 2. What assumptions does it make? IQ Reading 4

Bivariate Correlated Factors Model
Multivariate Analysis of Twin Data Bivariate Correlated Factors Model r V1 V2 v1 v2 An alternative multivariate model is the correlated factors model. The bivariate (2 variable) model above assumes that the variances on Variable 1 and Variable 2 are due to the influence of the latent factors V1 and V2, respectively. These latent factors are linked by a correlation r. In the model above, r is equivalent to the estimated phenotypic correlation between Variable 1 and Variable 2. The covariance is given by the pathway v1 x r x v2. This model is often used to investigate the relationships between two outcomes which might be influenced by shared risk factors. The correlation r measures the degree of shared risk. Variable 1 Variable 2 5

Bivariate Cholesky Decomposition
Multivariate Analysis of Twin Data Conversion Bivariate Cholesky Decomposition Variable 1 Variable 2 V1 x1 V2 x3 x2 Bivariate Correlated Factors Model r V1 V2 y1 y2 Variable 1 Variable 2 If we estimate all the paths in a Cholesky or Correlated Factors model, it is called a saturated model. This means that it is the most complete model that we can specify. In fact these models estimate one parameter for each data point. If we tried to estimate more parameters, the parameters would not be specified uniquely. When this happens we say that the parameters are not identified. Saturated Cholesky and Correlated Factor models are actually equivalent to each other. The parameter estimates from one model can be converted into parameter estimates for the other, using the rules laid out above. The Cholesky model is simple to specify and gives robust estimates. Because of this, we often use the Cholesky model to get our parameter estimates, and then convert the results into parameters that we would have got from a Correlated Factor model. y1 = x1 y2 =  ( x22 + x32 ) r = x2 / y2 6

Multivariate Analysis of Twin Data
Univariate Twin Model MZ = 1.0 DZ = 0.5 MZ = 1.0 DZ = 1.0 A C A C E E a c e a c e This is a recap of the univariate (1 variable) twin model. The variance is divided into components: additive genetics (a2), shared environment (c2), and nonshared environment (e2). MZ twins share all their additive genetic variance, DZ share half their additive genetic variance. Shared environment is assumed to be the same within twin pairs. Nonshared environment is not shared within twin pairs. Twin 1 Twin 2 7

Bivariate Cholesky Decomposition
Multivariate Analysis of Twin Data Bivariate Cholesky Decomposition A1 C1 E1 A2 C2 E2 x2 y2 z2 x3 y3 z3 x1 y1 z1 When we have multivariate twin data we can use an extension of the Cholesky decomposition to analyse continuity and change in the genetic and environmental influences on our variables. The figure above shows the bivariate model for one twin. In the univariate genetic model we broke down the variance on our variable into additive genetics, shared environment and nonshared environment. In this multivariate model, we can break down the unique variances and covariance in the same way. Variable 1 Twin 1 Variable 2 Twin 1 8

Multivariate Analysis of Twin Data
Childhood IQ Boomsma & van Baal, 1998 A1 C1 E1 A2 E2 If we take our previous example that Variable 1 and Variable 2 represent IQ measured at 5 and 10 years of age, then using this model we can look at two things: (1) The degree to which continuity (covariance) and change (unique variance on later variable) are mediated by genetic and environmental pathways. In other words, we can decompose the phenotypic correlation between our variables into genetic and environmental covariances. Similarly, we can divide the unique variances into genetic and environmental components. (2) We can also examine the continuity and change in the genetic and environmental influences themselves. This can lead to interesting results. For example, environmental effects on childhood IQ often found to be stable for shared environment and age-specific for nonshared environment. IQ age 5 IQ age 7 9

Bivariate Correlated Factors Model
Multivariate Analysis of Twin Data Bivariate Correlated Factors Model rC rA rE A1 C1 E1 a1 c1 e1 A2 E2 C2 a2 c2 e2 We can also extend the correlated factors model to analyse twin data. The figure above illustrates a bivariate genetic model for one twin. Using this model we can break down the phenotypic correlation between Variable 1 and Variable 2 into its genetic and environmental constituents. The proportion of the phenotypic correlation that is mediated by shared genetic factors is called the bivariate heritability. The genetic covariance is given by the pathway a1 x rG x a2, where rG is the genetic correlation. The genetic correlation represents the overlap between genetic influences on Variable 1 and Variable 2. Similarly, the shared environment and nonshared environment components of covariance are given by c1 x rC x c2 and e1 x rE x e2 where rC and rE are the shared environment correlation and the nonshared environment correlation. Variable 1 Twin 1 Variable 2 Twin 1 10

Multivariate Analysis of Twin Data
Genetic Correlation rA = 1.0 A1 E1 A2 E2 A1 E1 A2 E2 .30 .40 .90 .80 Variable 1 Variable 2 Variable 1 Variable 2 The genetic correlation is independent of the heritability of either variable. So, for instance, the variables could be influenced mainly by the environment - but what genetic influences there are, are the same for both variables. Or, both measures could be highly heritable but not share any genetic influences (rA = 0). Low heritability, high genetic correlation High heritability, low genetic correlation 11

Multivariate Analysis of Twin Data
Full Bivariate Model A1 C1 E1 a1 c1 e1 A1 C1 E1 a1 c1 e1 A2 C2 E2 A2 C2 E2 This figure illustrates the full bivariate correlated factors model for both twins. Question 1. What is bivariate heritability? Can you have a large bivariate heritability between two variables that are not very heritable? 2. What is a genetic correlation? Can you have a large genetic correlation between two variables that are not very heritable? a2 c2 e2 a2 c2 e2 Variable 1 Twin 1 Variable 2 Twin 1 Variable 1 Twin 2 Variable 2 Twin 2 12

Conversion a1 = x1 c1 = y1 e1 = z1 a2 =  ( x22 + x32 )
Multivariate Analysis of Twin Data Conversion J. C. Loehlin, Behavior Genetics, 26, Bivariate Cholesky Decomposition Bivariate Correlated Factors Model rC rA rE A C E A C E A C E A C E x2 y2 z2 x3 y3 z3 a1 c1 e1 a1 c1 e1 x1 y1 z1 Variable 1 Twin 1 Variable 2 Twin 1 Variable 1 Twin 1 Variable 1 Twin 1 The figure above illustrates the saturated genetic Cholesky and genetic Correlated Factors models. As with the phenotypic models, the saturated genetic models are actually equivalent to each other. They should fit the data equally well (or badly). The parameter estimates from one model can be converted into parameter estimates for the other, using the simple transformations above. As before, we often use the Cholesky model to get our parameter estimates, and then convert the results into parameters that we would have got from a Correlated Factor model. a1 = x1 c1 = y1 e1 = z1 a2 =  ( x22 + x32 ) c2 =  ( y22 + y32 ) e2 =  ( z22 + z32 ) rA = x2 / a2 rC = y2 / c2 rE = z2 / e2 13

Multivariate Analysis of Twin Data
Practical session 1. Use the TEDS dataset to derive MZ and DZ covariance matrices for the variables PARCA1, VOCAB1, PARCA2, VOCAB2. (The instructors will show you where to find the SPSS dataset and script that you will need.) 2. Insert the covariance matrices into the bivariate correlated factors Mx script. Is the script ready to run yet? What else will you need to do before running the script? 3. Run the Mx script and check the output. Has it run properly or are there error messages? What does the output tell you? 4. Think how you might modify the script to test the data in other ways. 14

Multivariate Analysis of Twin Data
SPSS script to make covariance matrices USE ALL. COMPUTE filter_$=(atwin=1 and zyg=1). VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'. FORMAT filter_$(f1.0). FILTER BY filter_$. EXECUTE . REGRESSION VARIABLES (COLLECT) /MISSING LISTWISE /DESCRIPTIVES COVARIANCES /DEPENDENT PARCA1 /METHOD=ENTER VOCAB1 PARCA2 VOCAB2. COMPUTE filter_\$=(atwin=1 and zyg=2). 15

Multivariate Analysis of Twin Data
Bivariate correlated factors Mx script ! Genetic correlated factors model #Define nvar= 2 G1: Model parameters Data Calc NGroups=4 Begin Matrices; X Lower nvar nvar Free ! genetic parameters Y Lower nvar nvar Free ! shared environment parameters Z Lower nvar nvar Free ! nonshared environment parameters L Diag nvar nvar Free ! variance estimates H Full 1 1 ! scalar .5 O Zero nvar nvar End Matrices; Begin Algebra; A= X * X' ; ! genetic variance/covariance C= Y * Y' ; ! shared environment variance/covariance E= Z * Z' ; ! nonshared environment variance/covariance End Algebra; Start .5 All Start 1 L L nvar nvar End [continued] 16

Multivariate Analysis of Twin Data
Bivariate correlated factors Mx script G2: MZ twin pairs Data NInput_vars= 4 NObservations= XXX Cmatrix Full XXX XXX XXX XXX Labels PARCA1 VOCAB1 PARCA2 VOCAB2 Matrices= Group 1 Covariances ( L | O _ O | L ) & ( A + C + E | A + C _ A + C | A + C + E ) / Option RSidual End [continued] 17

Multivariate Analysis of Twin Data
Bivariate correlated factors Mx script G3: DZ twin pairs Data NInput_vars= 4 NObservations= XXX Cmatrix Full XXX XXX XXX XXX Labels PARCA1 VOCAB1 PARCA2 VOCAB2 Matrices= Group 1 Covariances ( L | O _ O | L ) & ( A + C + E | + C _ + C | A + C + E ) / Option RSidual End [continued] 18

Multivariate Analysis of Twin Data
Bivariate correlated factors Mx script G4: Standardise Estimates by constraining A + C + E = 1 Data Constraint Matrices = Group 1 I Unit 1 nvar End Matrices; Constrain \d2v( P ) = I; ! constrain to unit variance End G5: Calculate genetic / environmental correlations Data Calc I Iden nvar nvar Begin Algebra; U = \sqrt( I . A )~ * A * \sqrt( I . A )~; ! genetic correlations V = \sqrt( I . C )~ * C * \sqrt( I . C )~; ! SE correlations W = \sqrt( I . E )~ * E * \sqrt( I . E )~; ! NE environment correlations ! NB these are all versions of equation [7] ! another way of writing these equations is : ! U = \stnd( A ) ; etc. End Algebra; A 1 1 A 2 2 C 1 1 C 2 2 E 1 1 E 2 2 U 2 1 V 2 1 W 2 1 ! See below for explanations of the matrix equations 19