Institute of Psychiatry King’s College London, UK

Slides:



Advertisements
Similar presentations
OpenMx Frühling Rijsdijk.
Advertisements

CORRELATIONAL RESEARCH o What are the Uses of Correlational Research?What are the Uses of Correlational Research? o What are the Requirements for Correlational.
1 Regression as Moment Structure. 2 Regression Equation Y =  X + v Observable Variables Y z = X Moment matrix  YY  YX  =  YX  XX Moment structure.
Topic 12: Multiple Linear Regression
StatisticalDesign&ModelsValidation. Introduction.
Structural Equation Modeling Using Mplus Chongming Yang Research Support Center FHSS College.
General Structural Equation (LISREL) Models
Structural Equation Modeling
Confirmatory Factor Analysis
Linear Regression and Binary Variables The independent variable does not necessarily need to be continuous. If the independent variable is binary (e.g.,
Structural Equation Modeling
Causal Modelling and Path Analysis. Some Notes on Causal Modelling and Path Analysis. “Path analysis is... superior to ordinary regression analysis since.
Correlation and Simple Regression Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
Path Analysis Danielle Dick Boulder Path Analysis Allows us to represent linear models for the relationships between variables in diagrammatic form.
Multivariate Genetic Analysis: Introduction(II) Frühling Rijsdijk & Shaun Purcell Wednesday March 6, 2002.
Summarizing Variation Michael C Neale PhD Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University.
ACDE model and estimability Why can’t we estimate (co)variances due to A, C, D and E simultaneously in a standard twin design?
“Ghost Chasing”: Demystifying Latent Variables and SEM
Univariate Analysis in Mx Boulder, Group Structure Title Type: Data/ Calculation/ Constraint Reading Data Matrices Declaration Assigning Specifications/
David M. Evans Sarah E. Medland Developmental Models in Genetic Research Wellcome Trust Centre for Human Genetics Oxford United Kingdom Twin Workshop Boulder.
Path Analysis Frühling Rijsdijk SGDP Centre Institute of Psychiatry King’s College London, UK.
Basic Mathematics for Portfolio Management. Statistics Variables x, y, z Constants a, b Observations {x n, y n |n=1,…N} Mean.
Introduction to Multivariate Genetic Analysis Kate Morley and Frühling Rijsdijk 21st Twin and Family Methodology Workshop, March 2008.
Path Analysis Frühling Rijsdijk. Biometrical Genetic Theory Aims of session:  Derivation of Predicted Var/Cov matrices Using: (1)Path Tracing Rules (2)Covariance.
Raw data analysis S. Purcell & M. C. Neale Twin Workshop, IBG Colorado, March 2002.
LECTURE 16 STRUCTURAL EQUATION MODELING.
Correlation. The sample covariance matrix: where.
Structural Equation Modeling Intro to SEM Psy 524 Ainsworth.
Lecture 5 Correlation and Regression
Structural Equation Modeling Continued: Lecture 2 Psy 524 Ainsworth.
Path Analysis HGEN619 class Method of Path Analysis allows us to represent linear models for the relationship between variables in diagrammatic.
Factor Analysis and Inference for Structured Covariance Matrices
Summarizing Variation Matrix Algebra Benjamin Neale Analytic and Translational Genetics Unit, Massachusetts General Hospital Program in Medical and Population.
The Multiple Correlation Coefficient. has (p +1)-variate Normal distribution with mean vector and Covariance matrix We are interested if the variable.
Karri Silventoinen University of Helsinki Osaka University.
10 SEM is Based on the Analysis of Covariances! Why?Analysis of correlations represents loss of information. A B r = 0.86r = 0.50 illustration.
Univariate modeling Sarah Medland. Starting at the beginning… Data preparation – The algebra style used in Mx expects 1 line per case/family – (Almost)
STATISTICS 12.0 Correlation and Linear Regression “Correlation and Linear Regression -”Causal Forecasting Method.
Path Analysis. Path Diagram Single headed arrowruns from cause to effect Double headed bent arrow: correlation The model above assumes that all 5 variables.
MANAGEMENT SCIENCE AN INTRODUCTION TO
Genetic Theory Pak Sham SGDP, IoP, London, UK. Theory Model Data Inference Experiment Formulation Interpretation.
Module III Multivariate Analysis Techniques- Framework, Factor Analysis, Cluster Analysis and Conjoint Analysis Research Report.
STATISTICS 12.0 Correlation and Linear Regression “Correlation and Linear Regression -”Causal Forecasting Method.
Mx modeling of methylation data: twin correlations [means, SD, correlation] ACE / ADE latent factor model regression [sex and age] genetic association.
The Chain Rule. The Chain Rule Case I z x y t t start with z z is a function of x and y x and y are functions of t Put the appropriate derivatives along.
Categorical Data Frühling Rijsdijk 1 & Caroline van Baal 2 1 IoP, London 2 Vrije Universiteit, A’dam Twin Workshop, Boulder Tuesday March 2, 2004.
MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.
Introduction to Multivariate Genetic Analysis Danielle Posthuma & Meike Bartels.
March 7, 2012M. de Moor, Twin Workshop Boulder1 Copy files Go to Faculty\marleen\Boulder2012\Multivariate Copy all files to your own directory Go to Faculty\kees\Boulder2012\Multivariate.
Multivariate Genetic Analysis (Introduction) Frühling Rijsdijk Wednesday March 8, 2006.
Introduction to Multivariate Genetic Analysis
MRC SGDP Centre, Institute of Psychiatry, Psychology & Neuroscience
Path Analysis Danielle Dick Boulder 2008
Chapter 3: TWO-VARIABLE REGRESSION MODEL: The problem of Estimation
Structural Equation Modeling
Univariate modeling Sarah Medland.
Two-Variable Regression Model: The Problem of Estimation
عنوان عنوان فرعی.
CHAPTER- 17 CORRELATION AND REGRESSION
Correlation and Regression
Linear regression Fitting a straight line to observations.
Pak Sham & Shaun Purcell Twin Workshop, March 2002
Structural Equation Modeling
Sarah Medland faculty/sarah/2018/Tuesday
BOULDER WORKSHOP STATISTICS REVIEWED: LIKELIHOOD MODELS
Brad Verhulst & Sarah Medland
Causal Relationships with measurement error in the data
Multivariate Genetic Analysis: Introduction
Autoregressive and Growth Curve Models
Structural Equation Modeling
Presentation transcript:

Institute of Psychiatry King’s College London, UK Path Analysis Frühling Rijsdijk SGDP Centre Institute of Psychiatry King’s College London, UK

Structural Equation Modelling Twin Model Twin Data Assumptions Data Preparation Biometrical Genetic Theory Hypothesised Sources of Variation Observed Variation Summary Statistics Matrix Algebra Model Equations Path Diagrams Covariance Algebra This slide represents an overview of the topics and lectures that will be covered this week. It is intended to serve as a visual aid to understand where different lectures fit in. The diagram consists of 2 ‘arms’: Twin Model and Twin Data. In the Model side are addressed the hypothesised sources of variation (based on BG theory) and how the model is expressed in both equation form and in path diagrams: by means of cov algebra and/or path tracing we derive the predicted Var/Cov of the Model. The Data arm is concerned with all data preparation processes in order to derive the observed var/Cov of the data. The link between both arms represents structural equation model fitting: the estimation of the parameters of the tested model; estimation method; deciding on the model fit > decision process etc. Path Tracing Rules Predicted Var/Cov from Model Observed Var/Cov from Data Structural Equation Modelling (Maximum Likelihood)

Path Analysis Developed by the geneticist Sewall Wright (1920) Now widely applied to problems in genetics and the behavioural sciences.

Path Analysis This technique allows us to present linear relationships between variables in diagrams and to derive predictions for the variances and covariances of the variables under the specified model. The relationships can also be represented as structural equations and covariance matrices All three forms are mathematically complete, it is possible to translate from one to the other. Structural equation modelling (SEM) represents a unified platform for path analytic and variance components models.

A system of linear model equations or In (twin) models, expected relationships between observed variables are expressed by: A system of linear model equations or Path diagrams which allow the model to be represented in schematic form Both allow us to derive predictions for the variances and covariances of the variables under the specified model

Aims of this Session Derivation of Predicted Var-Cov matrices of a model using: (1) Path Tracing (2) Covariance Algebra

Path Diagram Conventions Observed Variable Latent Variable In path diagrams we use square boxes to represent observed or measured variables and circles to represent unobserved or latent variables. The relationship among variables is represented by two kinds of arrows: a straight, single-headed arrow represents a causal relationship, and a curved two-headed arrow represents a covariance or correlational path. Causal Path Covariance Path

Path Diagrams for the Classical Twin Model

Model for an MZ PAIR e c a a c e 1 1 E C A A C E 1 1 1 1 1 1 e c a a c e Twin 1 Twin 2 This path diagram represents an ACE twin model for MZ pairs. The model represents specific hypotheses about (causal) relationships of latent genetic and environmental factors (single-headed arrows) on the observed variables and also about the correlations (curved arrows) between these latent factors between individuals in a pair. The meaning of ‘causal’ is the assumption that change in the variable at the tail of the arrow will result in change in the variable at the head of the arrow, with all other variables in the diagram held constant. The causal relationships represented by straight arrows are assumed to be linear. Variables that do not receive causal input from any one variable in the diagram are referred to as independent, source or predictor variables. Variables that do, are referred to as dependent variables. In general, only independent variables are connected by double-headed arrows. Because the latent variables have no scale, the double-headed arrow from the factor to itself is fixed to 1 in order to give the explained variance of A, C and E the same scale as the observed variable (e.g. height). The path coefficients a, c and e are the regression coefficient which tells us to what extent a change in the variable at the tail of the arrow is transmitted to the variable at the head of the arrow. Model for an MZ PAIR Note: a, c and e are the same cross twins

Model for a DZ PAIR e c a a c e 1 .5 E C A A C E 1 1 1 1 1 1 e c a a c e Twin 1 Twin 2 Model for a DZ PAIR Note: a, c and e are also the same cross groups

Path Tracing The covariance between any two variables is the sum of all legitimate chains connecting the variables The numerical value of a chain is the product of all traced path coefficients in it A legitimate chain is a path along arrows that follow 3 rules: The numerical value of each chain is the product of the path coefficients of all constituent arrows: both causal effects and correlations.

(i) Trace backward, then forward, or simply forward from one variable to another. NEVER forward then backward! Include double-headed arrows from the independent variables to itself. These variances will be 1 for latent variables Loops are not allowed, i.e. we can not trace twice through the same variable The first rule may seem less arbitrary if you realise that it allows variables to be connected by common cause, but not by common consequence. (iii) There is a maximum of one curved arrow per path. So, the double-headed arrow from the independent variable to itself is included, unless the chain includes another double-headed arrow (e.g. a correlation path)

The Variance Since the variance of a variable is the covariance of the variable with itself, the expected variance will be the sum of all paths from the variable to itself, which follow Wright’s rules

Variance of Twin 1 AND Twin 2 (for MZ and DZ pairs) E C A 1 1 1 e c a Twin 1

Variance of Twin 1 AND Twin 2 (for MZ and DZ pairs) E C A 1 1 1 e c a Twin 1

Variance of Twin 1 AND Twin 2 (for MZ and DZ pairs) E C A 1 1 1 e c a Twin 1

Variance of Twin 1 AND Twin 2 (for MZ and DZ pairs) a*1*a = a2 E C A 1 1 1 + e c a Twin 1

Variance of Twin 1 AND Twin 2 (for MZ and DZ pairs) a*1*a = a2 E C A 1 1 1 + c*1*c = c2 e c a + e*1*e = e2 Twin 1 Total Variance = a2 + c2 + e2

Covariance Twin 1-2: MZ pairs

Covariance Twin 1-2: MZ pairs

Covariance Twin 1-2: MZ pairs Total Covariance = a2 +

Covariance Twin 1-2: MZ pairs Total Covariance = a2 + c2

Predicted Var-Cov Matrices Tw1 Tw2 Tw1 Tw2 Tw1 Tw2 Tw1 Tw2

ADE Model e d a a d e 1(MZ) / 0.25 (DZ) E D A A D E Twin 1 Twin 2 1/.5 Why ¼? This will hopefully have been explained in previous session. Given the 4 possible offspring genotypes Amam, Afaf, Afam, afAm (m=mother/f=father) two siblings (or DZ twin pairs) will have 16 possible genotype combinations. If we tabulate these 16, you can see that we can identify 3 distinct combinations: When sibs share 2 alleles IBD (from same parent), left diagonal, 4 cells; P2= 4/16 = 1/4 When sibs share 0 alleles IBD, right diagonal, 4 cells; P0 = 4/16 = ¼ When sibs share 1 alleles IBD, rest of the cells, 8 cells; P1 = 8/16 = 1/2 But why is P2 used for the Dominance Genetic Variance contribution to correlations across individuals? Since D is the interaction effect between alleles at the same locus, these interaction effects can only be correlated across relatives if the genotypes are exactly the same, i.e. IBD 2. This is ¼ in sibs and DZ twins and 1 in MZ twins. Since the prob of IBD=2 is 0 in all other relations, Dominance does not contribute to the genetic covariance (Covg = Ra * Va + P2*Vd). Twin 1 Twin 2

Predicted Var-Cov Matrices Tw1 Tw2 Tw1 Tw2 Tw1 Tw2 Tw1 Tw2

this model is just identified ACE or ADE Cov(mz) = a2 + c2 or a2 + d2 Cov(dz) = ½ a2 + c2 or ½ a2 + ¼ d2 VP = a2 + c2 + e2 or a2 + d2 + e2 3 unknown parameters (a, c, e or a, d, e), and only 3 distinct predicted statistics: Cov MZ, Cov DZ, Vp) this model is just identified In the classical twin model (twin data only) we have enough information to estimate at the most 3 model parameters since we have only 3 distinct predicted statistics. However not any combination of A C D or E is identified: we always need E in the model since it includes measurement error. C and D cannot be tested at the same time with twin only data: we can have an ACE or ADE model.

Effects of C and D are confounded The twin correlations indicate which of the two components is more likely to be present: Cor(mz) = a2 + c2 or a2 + d2 Cor(dz) = ½ a2 + c2 or ½ a2 + ¼ d2 If a2 =.40, c2 =.20 rmz = 0.60 rdz = 0.40 If a2 =.40, d2 =.20 rdz = 0.25 ACE ADE

ADCE: classical twin design + adoption data Cov(mz) = a2 + d2 + c2 Cov(dz) = ½ a2 + ¼ d2 + c2 Cov(adopSibs) = c2 VP = a2 + d2 + c2 + e2 4 unknown parameters (a, c, d, e), and 4 distinct predicted statistics: Cov MZ, Cov DZ, Cov adopSibs, Vp) this model is just identified In the classical twin model (twin data only) we have enough information to estimate at the most 3 model parameters since we have only 3 distinct predicted statistics. However not any combination of A C D or E is identified: we always need E in the model since it includes measurement error. C and D cannot be tested at the same time with twin only data: we can have an ACE or ADE model.

Path Tracing Rules are based on Covariance Algebra

Three Fundamental Covariance Algebra Rules Var (X) = Cov(X,X) Cov (aX,bY) = ab Cov(X,Y) Cov (X,Y+Z) = Cov (X,Y) + Cov (X,Z)

Example 1 1 A a Y Y = aA The variance of a dependent variable (Y) caused by independent variable A, is the squared regression coefficient multiplied by the variance of the independent variable

Example 2 .5 1 1 A A a a Y Z Y = aA Z = aA

Summary Path Tracing and Covariance Algebra have the same aim : to work out the predicted Variances and Covariances of variables, given a specified model