Some Extensions to GCTA

Slides:



Advertisements
Similar presentations
Gene-environment interaction models
Advertisements

Modeling Interaction Between Family Members When you are by yourself and have no one to ask.
Extending GCTA: Effects of the Genetic Environment 2014 International Workshop on Statistical Genetic Methods for Human Complex Traits Boulder, CO. Lindon.
1 Statistical Considerations for Population-Based Studies in Cancer I Special Topic: Statistical analyses of twin and family data Kim-Anh Do, Ph.D. Associate.
Extended Twin Kinship Designs. Limitations of Twin Studies Limited to “ACE” model No test of assumptions of twin design C biased by assortative mating,
Quantitative Genetics Theoretical justification Estimation of heritability –Family studies –Response to selection –Inbred strain comparisons Quantitative.
The Inheritance of Complex Traits
Biometrical Genetics QIMR workshop 2013 Slides from Lindon Eaves.
Biometrical genetics Manuel Ferreira Shaun Purcell Pak Sham Boulder Introductory Course 2006.
Biometrical genetics Manuel Ferreira Shaun Purcell Pak Sham Boulder Introductory Course 2006.
Estimating “Heritability” using Genetic Data David Evans University of Queensland.
Genetic Theory Manuel AR Ferreira Egmond, 2007 Massachusetts General Hospital Harvard Medical School Boston.
Bivariate Model: traits A and B measured in twins cbca.
Extended sibships Danielle Posthuma Kate Morley Files: \\danielle\ExtSibs.
Biometrical Genetics Pak Sham & Shaun Purcell Twin Workshop, March 2002.
Gene x Environment Interactions Brad Verhulst (With lots of help from slides written by Hermine and Liz) September 30, 2014.
Methods in Genetic Epidemiology Lindon Eaves VIPBG March 2006.
A B C D (30) (44) (28) (10) (9) What is a gene? A functional group of DNA molecules (nucleotides) that is responsible for the production of a protein.
“Beyond Twins” Lindon Eaves, VIPBG, Richmond Boulder CO, March 2012.
Multifactorial Traits
Karri Silventoinen University of Helsinki Osaka University.
Continuously moderated effects of A,C, and E in the twin design Conor V Dolan & Sanja Franić (BioPsy VU) Boulder Twin Workshop March 4, 2014 Based on PPTs.
Causes of schizophrenia
Copyright © 2010, Pearson Education Inc., All rights reserved.  Prepared by Katherine E. L. Norris, Ed.D.  West Chester University This multimedia product.
The Causes of Variation 2014 International Workshop on Statistical Genetic Methods for Human Complex Traits Boulder, CO. Lindon Eaves, VIPBG, Richmond.
Genetic Analysis of Human Diseases Chapter Overview Due to thousands of human diseases having an underlying genetic basis, human genetic analysis.
Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Gene-Environment Interaction & Correlation Danielle Dick & Danielle Posthuma Leuven 2008.
Achievement & Ascription in Educational Attainment Genetic & Environmental Influences on Adolescent Schooling François Nielsen.
Biometrical Genetics Lindon Eaves, VIPBG Richmond Boulder CO, 2012.
Mx modeling of methylation data: twin correlations [means, SD, correlation] ACE / ADE latent factor model regression [sex and age] genetic association.
Powerful Regression-based Quantitative Trait Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Model building & assumptions Matt Keller, Sarah Medland, Hermine Maes TC21 March 2008.
Introduction to Genetic Theory
Biometrical Genetics Shaun Purcell Twin Workshop, March 2004.
Causes of schizophrenia The Genetic Explanation. Learning Objectives By the end of this lesson you will: Be able to outline how the genetic approach explains.
Partitioning of variance – Genetic models
University of Colorado at Boulder
Extended Pedigrees HGEN619 class 2007.
Gene-Environment Interaction
NORMAL DISTRIBUTIONS OF PHENOTYPES
PBG 650 Advanced Plant Breeding
Biometrical Genetics 2016 International Workshop on Statistical Genetic Methods for Human Complex Traits Boulder, CO. Lindon Eaves, VIPBG, Richmond VA.
Univariate Twin Analysis
Gene-environment interaction
Introduction to Multivariate Genetic Analysis
PSYC 206 Lifespan Development Bilge Yagmurlu.
Combining SEM & GREML in OpenMx
NORMAL DISTRIBUTIONS OF PHENOTYPES
MRC SGDP Centre, Institute of Psychiatry, Psychology & Neuroscience
Can resemblance (e.g. correlations) between sib pairs, or DZ twins, be modeled as a function of DNA marker sharing at a particular chromosomal location?
Behavior Genetics The Study of Variation and Heredity
Marker heritability Biases, confounding factors, current methods, and best practices Luke Evans, Matthew Keller.
The Causes of Variation
The Genetic Basis of Complex Inheritance
GENES AND HEREDITY.
Genetics of qualitative and quantitative phenotypes
GxE and GxG interactions
Correlation for a pair of relatives
Pak Sham & Shaun Purcell Twin Workshop, March 2002
Chapter 7 Multifactorial Traits
Exercise: Effect of the IL6R gene on IL-6R concentration
(Virginia Commonwealth University)
Pi = Gi + Ei Pi = pi - p Gi = gi - g Ei = ei - e _ _ _ Phenotype
Rodney White Conference on Financial Decisions and Asset Markets
BOULDER WORKSHOP STATISTICS REVIEWED: LIKELIHOOD MODELS
TO WHAT EXTENT DOES GENETIC INHERITANCE INFLUENCE BEHAVIOR?
The Basic Genetic Model
Rater Bias & Sibling Interaction Meike Bartels Boulder 2004
Presentation transcript:

Some Extensions to GCTA International Workshop on Statistical Methods for Human Complex Traits Boulder, CO, March 2016 Lindon Eaves, Beate St. Pourcain, David Evans

“Quantitative Genetics” Estimates genetic (and environmental) contribution to variation by regressing measures of phenotypic similarity on genetic (and environmental) similarity

Classical Approach (Fisher, Wright, Falconer, Mather etc.) Estimates genetic correlation from expected genetic resemblance based on genetic relatedness (MZ, DZ twins, siblings, half-siblings, cousins etc.)

“GCTA” “Genome-Wide Complex Trait Analysis” Visscher, Yang, Lee, Goddard Estimates genetic correlation from empirical genetic resemblance of “unrelated” individuals based on identity by state across using genome-wide single nucleotide polymorphisms (“SNPs”)

Basic GCTA Model V = G + E V=Phenotypic variance between individuals G=(Additive) genetic variance E=(Unique) environmental variance S= AG + IE Where S = expected phenotypic covariance matrix between individuals; A = empirical genetic relatedness matrix between individuals

Genetic Effects on Behavioral phenotype extend beyond individual BUT…. Humans are “More than” individual packets of DNA – have families, feet, brains and communities Genetic Effects on Behavioral phenotype extend beyond individual

Questions for GCTA: 1. What can go wrong with GCTA if you ignore other genetic and environmental processes known to affect variation in natural populations and human traits? [Negative heuristic] 2 How might GCTA be enhanced to encompass some of the subtleties of quantitative inheritance recognized in plant and animal species and human kinship studies? [Positive heuristic]

Extending the Phenotype Parents World Me Spouse Siblings Child Extended Phenotype Eaves et al., 2005, In Kendler and Eaves, 2005.

Can analyze in family studies e.g. twins and kinships of twins

“The Genetic Environment” A Place to Start: M-GCTA Include effects of the Maternal Genotype on Offspring Behavior (“Genetic Maternal Effects”) Extensive background in plant and animal genetics (reciprocal crosses, diallels, dam and sire effects) Good models and examples from human studies (extended kinships of twins, maternal and paternal half-sibships) Extensively measured and genotyped mother-child pairs (e.g. “ALSPAC”)

Environmental Effects of the Maternal Genotype

Reference: Eaves LJ, St. Pourcain B, Davey-Smith G, York TP, Evans DM. (2014) Resolving the effects of maternal and offspring genotype on dyadic outcomes in Genome Wide Complex Trait Analysis (“M-GCTA”). Behavior Genetics, 44:445-555.

Path model for offspring and maternal effects in mother-child dyads. GCM Maternal genotype P Phenotype 1/2 e h E GCC Environment Offspring genotype Offspring

Path model for offsrpring and maternal effects in mother-child dyads. GMM GCM Maternal genotype m P Phenotype (Dyadic) 1/2 1/2 e h E GMC GCC Environment Maternal Offspring genotype Offspring

Path model for offspring and maternal effects in mother-child dyads. GMM GCM Maternal genotype m P Phenotype (Dyadic) 1/2 1/2 e c h E GMC GCC Environment Maternal Offspring genotype Offspring

V = h2 + c2 + m2 + mc + e2 (path model) Or: V = G + M + Q + E Basic model for components of variance V = h2 + c2 + m2 + mc + e2 (path model) Or: V = G + M + Q + E (Biometrical-genetic model where: M=m2, G=(c2 + h2) and Q = mc) V=Phenotypic variance G=(Additive) genetic variance (“A” in ACE model) M=Shared environmental variance due to indirect effects of maternal genotype on offspring phenotype (“genetic maternal effects”) Q=Covariance arising from overlap between genes with direct and indirect effect (“passive rGE”, +ve or –ve) (M+Q->”C” in ACE model)

Pair i Pair j aij Mother gi dij dji gj Offspring bij Egi=1/2 Pattern of Genetic Relatedness Within and Between Typical Unrelated Mother-Child Pairs Pair i Pair j aij GMi GMj Mother gi dij dji gj GCi GCj Offspring bij Egi=1/2

Components of genetic relatedness matrix for mothers and offspring Notes: Key to genetic components (c.f. Figure 2): MM=maternal copies of genes having indirect maternal effects (m) when present in the mother; CM = maternal copies of genes having direct effects (h) when present in offspring; MC= offspring copies of genes having indirect maternal effects (m) when present in the mother (may also have a direct effect, c>0, on offspring when present in offspring); CC = offspring copies of genes having direct effects (h) when present in offspring. Component matrices are NxN where N=number of mother-child pairs in the sample.

Including Offspring Phenotype in Genetic Model Genes with offspring-specific effects aij CMi CMj Mother-child Pair “I” Mother-child Pair “j” aij MMi gj MMj gi dji dij CCi CCj m m bij gj h h gi Pi Pj dji dij c c Environment “Residuals” e Ei Ej e MCi MCj bij Genes with maternal effects (may also have direct expression in child)

= AM + BG + DQ + IE where D=D+D’ Covariance structure of unrelated individuals given empirical estimates of genetic relatedness = AM + BG + DQ + IE where D=D+D’ Given phenotypes of N subjects and genetic relatedness matrices, A, M and D estimated from SNPs, can estimate genetic parameters by FIML (openMX) or REML (GCTA using Yang et als. program).

Can Fudge it in GCTA Package (If you ignore constraint on Q) Because Yang et al’s GCTA program allows total covariance to be partitioned among multiple (independent) sources (chromosomes) Thus: S = B1 G1 + B2 G2 + B3 G3 + ….+IE where Gi is the genetic variance contributed by the ith chromosome and Bi the genetic relatedness derived from SNPs on that chromosome. Our M-GCTA model has a similar linear form.

Results of fitting “GCTA” model (OpenMx) for direct and maternal effects to simulated offspring phenotypes in random samples of 1000 genotyped mother-child pairs

Comparison of estimates obtained from simulated data by FIML in OpenMx and REML in GCTA.

Application to Real Data

“ALSPAC” The Avon Longitudinal Study of Parents and Children (“ALSPAC”) UK Medical Research Council Wellcome Trust University of Bristol http://www.bris.ac.uk/alspac/researchers/data-access/data-dictionary

ALSPAC Early 1990’s Avon County, UK Population-based Birth cohort study Longitudinal follow-up 14,541 women and children Mothers genotyped on the Illumina 660K SNP chip. Children genotyped on the Illumina 550K SNP chip. After cleaning etc. genetic relatedness computed for 4761 unrelated mother-offspring pairs

Proof of Principle Maternal, self-reported, stature - should be no effects of offspring genotype. Neonatal birth length (crown-heel) – might show effects of both maternal and fetal genotype

Results of fitting “GCTA” model for direct and maternal effects to maternal stature and birth length data in the ALSPAC cohort. Results are presented as standardized variance components.

Conclusions? It seems to work – so far Subject to all the ifs and buts that go with using genome-wide SNPs to estimate genetic relatedness (?assortative mating – parental behavior affects organization of genome – “top-down causality”).

Still need Software that can handle large matrices AND Allows flexible estimation of structural models with non-linear constraints

Update # Fit full model TIME1<-proc.time() MxTest<-mxModel(“mGCTA", mxData(observed=Yobs, type="raw"), mxMatrix(type="Full",nrow=1,ncol=1,free=TRUE,values=5,name="M"), mxMatrix(type="Full",nrow=1,ncol=1,free=TRUE,values=5,name="G"), mxMatrix(type="Full",nrow=1,ncol=1,free=TRUE,values=6,name="Q"), mxMatrix(type="Full",nrow=1,ncol=1,free=TRUE,values=5,name="E"), mxMatrix(type="Iden",nrow=N,ncol=N, name="I"), mxMatrix(type="Full",nrow=1,ncol=N, values=rep(1,N),name="Unit") , mxMatrix(type="Full",nrow=1,ncol=1,free=TRUE,values=Ybar,name="mu"), mxMatrix(type="Full",nrow=N,ncol=N, values=Alpha,condenseSlots=TRUE,name="Alpha"), mxMatrix(type="Full",nrow=N,ncol=N, values=Beta,condenseSlots=TRUE,name="Beta"), mxMatrix(type="Full",nrow=N,ncol=N, values=Delta,condenseSlots=TRUE,name="Delta"), mxAlgebra(expression=G%x%Beta+M%x%Alpha+Q%x%Delta+E%x%I,name="ESigma"), mxAlgebra(expression=mu%x%Unit,name="Ey"), mxFIMLObjective(covariance="GCTA.ESigma", means="GCTA.Ey",dimnames=selvars) ) Results<-mxRun(MxTest) TIME2<-proc.time() - TIME1 TIME2 summary(Results)

So far: GRM matrices simulated for N=5000 mother-child pairs Single outcome variable Fitted A and E (simple genetic model) Used NPSOL optimizer with numerical differentiation Took 1hr 47min on laptop Doesn’t blow up Seems to get right answer Code still needs updating

ROB?