Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman.

Slides:



Advertisements
Similar presentations
15 The Genetic Basis of Complex Inheritance
Advertisements

Note that the genetic map is different for men and women Recombination frequency is higher in meiosis in women.
Qualitative and Quantitative traits
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
Basics of Linkage Analysis
. Parametric and Non-Parametric analysis of complex diseases Lecture #6 Based on: Chapter 25 & 26 in Terwilliger and Ott’s Handbook of Human Genetic Linkage.
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
QTL Mapping R. M. Sundaram.
Quantitative Genetics Up until now, we have dealt with characters (actually genotypes) controlled by a single locus, with only two alleles: Discrete Variation.
1 15 The Genetic Basis of Complex Inheritance. 2 Multifactorial Traits Multifactorial traits are determined by multiple genetic and environmental factors.
The Inheritance of Complex Traits
Quantitative Genetics Theoretical justification Estimation of heritability –Family studies –Response to selection –Inbred strain comparisons Quantitative.
Estimating “Heritability” using Genetic Data David Evans University of Queensland.
Chapter 10 Simple Regression.
Quantitative Genetics
Chapter 5 Human Heredity by Michael Cummings ©2006 Brooks/Cole-Thomson Learning Chapter 5 Complex Patterns of Inheritance.
Chapter 11 Multiple Regression.
Quantitative Genetics
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Linkage Analysis in Merlin
Copy the folder… Faculty/Sarah/Tues_merlin to the C Drive C:/Tues_merlin.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Standardization of Pedigree Collection. Genetics of Alzheimer’s Disease Alzheimer’s Disease Gene 1 Gene 2 Environmental Factor 1 Environmental Factor.
Quantitative Trait Loci, QTL An introduction to quantitative genetics and common methods for mapping of loci underlying continuous traits:
Introduction to Regression Analysis. Two Purposes Explanation –Explain (or account for) the variance in a variable (e.g., explain why children’s test.
ConceptS and Connections
Broad-Sense Heritability Index
Understanding Statistics
Multifactorial Traits
Process of Genetic Epidemiology Migrant Studies Familial AggregationSegregation Association StudiesLinkage Analysis Fine Mapping Cloning Defining the Phenotype.
Chapter 5 Characterizing Genetic Diversity: Quantitative Variation Quantitative (metric or polygenic) characters of Most concern to conservation biology.
Karri Silventoinen University of Helsinki Osaka University.
Quantitative Genetics
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Whole genome approaches to quantitative genetics Leuven 2008.
Presented by Alicia Naegle Twin Studies. Important Vocabulary Monozygotic Twins (MZ)- who are identical twins Dizygotic Twins (DZ)- who are twins that.
Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.
Lecture 15: Linkage Analysis VII
1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
Lecture 21: Quantitative Traits I Date: 11/05/02  Review: covariance, regression, etc  Introduction to quantitative genetics.
Epistasis / Multi-locus Modelling Shaun Purcell, Pak Sham SGDP, IoP, London, UK.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Mx modeling of methylation data: twin correlations [means, SD, correlation] ACE / ADE latent factor model regression [sex and age] genetic association.
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Chapter 22 - Quantitative genetics: Traits with a continuous distribution of phenotypes are called continuous traits (e.g., height, weight, growth rate,
Powerful Regression-based Quantitative Trait Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
David M. Evans Multivariate QTL Linkage Analysis Queensland Institute of Medical Research Brisbane Australia Twin Workshop Boulder 2003.
Principal Component Analysis
IV. Variation in Quantitative Traits A. Quantitative Effects.
Introduction to Genetic Theory
Quantitative Genetics as it Relates to Plant Breeding PLS 664 Spring 2011 D. Van Sanford.
QTL Mapping Using Mx Michael C Neale Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University.
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
Regression Models for Linkage: Merlin Regress
Chapter 7. Classification and Prediction
15 The Genetic Basis of Complex Inheritance
Regression-based linkage analysis
Chapter 7 Multifactorial Traits
Chapter 7 Beyond alleles: Quantitative Genetics
Linkage Analysis Problems
BOULDER WORKSHOP STATISTICS REVIEWED: LIKELIHOOD MODELS
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman Press Chapter 5, Statistics in Human Genetics by P. Sham (1998) Arnold Press Chapter 8, Mathematical and Statistical Methods for Genetic Analysis by K. Lange (2002) Springer

What is a Quantitative Trait? A quantitative trait has numerical values that can be ordered highest to lowest. Examples include height, weight, cholesterol level, reading scores etc. There are discrete values where the values differ by a fixed amount and continuous values where the difference in two values can be arbitrarily small. Most methods for quantitative traits assume that the data are continuous (at least approximately).

Why use quantitative traits? (1)More power. Fewer subjects may need to be examined (phenotyped) if one uses the quantitative trait rather than dichotomizing it to create qualitative trait. affecteds xywz unaffecteds Individuals w and x have similar trait values, yet w is grouped with z and x is grouped with y. Note that even among affecteds, knowing the trait value is useful (v and z are more similar than v and w). v

Why use Quantitative Traits? (2) The genotype to phenotype relationship may be more direct. Affection with a disease could be the culmination of many underlying events involving gene products, environmental factors and gene-environment interactions. The underlying events may differ among people, resulting in heterogeneity. Some quantitative traits are more likely under the control of a single gene than others. An example: Intermediate traits like factor IX levels are influenced by fewer genes than clotting times. Genes influencing factor IX level will be easier to map than genes influencing clotting times.

Why use quantitative traits? (3) End stage disease may be too late. If the disease is late onset, then parents may not be available anymore. However if there is a quantitative trait that is known to predict increased risk of the disease, then it might be measured earlier in a person’s lifetime. Their parents may also be available for genotyping resulting in more information.

Why not use quantitative traits? (1) The quantitative trait doesn’t meet the assumptions of the proposed statistical method. For example many methods assume the quantitative traits are unimodal but not all quantitative traits are unimodal. (2) The values of the quantitative trait might be very unreliable. (3) There are no good intermediate quantitative phenotypes for a particular disease. The quantitative traits available aren’t telling the whole story.

Components of the Phenotypic Variance of a Quantitative Trait The total variance in a quantitative trait, termed the phenotypic variance, can be partitioned into the variance due to genetic components, the environmental components and gene- environment interaction components. genes shared environment Independent environment genes Trait value Poly- genes

Components of Phenotypic Variance of a Quantitative Trait Often we make simplifying assumptions, for example that there is no variance component due to interactions, that there is no shared environment and that all genes are acting independently. genes Independent environment Trait value In this case we can write the phenotypic variance, V P, as the sum of the genetic variance, V G, and the environmental variance, V E. V P = V E +V G

The Additive and Dominance Components of Variance V G = V A + V D V A, the additive genetic variance is attributed the inheritance of individual alleles. V D, the dominance genetic variance is attributed to the alleles acting together as genotypes. V G / V P = heritability in the broad-sense. V A /V P = heritability in the narrow-sense.

The degree of correlation between two relatives depends on the theoretical kinship coefficient An important measure of family relationship is the theoretical Kinship coefficient. It is the probability that two alleles, at a randomly chosen locus, one chosen randomly from individual i and one from j are identical by descent. The kinship coefficient does not depend on the observed genotype data.

Covariance between relatives under an polygenic model depends on the theoretical kinship coefficient and the probability that, at any arbitrary autosomal locus, the pair share both genes IBD Relationshipkinship coefficientP(IBD=2) covariance parent-offspring1/4 0 1/2*V A full siblings1/4 1/4 1/2*V A +1/4*V D uncle-nephew1/8 0 1/4*V A first cousins1/16 0 1/8*V A Note: This doesn’t depend on any measured genotype effects (marker information).

Covariance among relatives also depends upon the allele sharing at a trait locus Allele Sharing: Identity-by-Descent (IBD) Parental genotypes 1 / 2 3 / 4 1 / 32 / 4 1 / 3 1 / 4 2 / 3 1 / 3 Alleles shared IBD Proportion of Alleles shared IBD

The proportion of alleles shared IBD is equivalent to twice the conditional kinship coefficient. The conditional kinship coefficient is the probability that a gene chosen randomly from person i at a specific locus matches a gene chosen randomly from person j given the available genotype information at markers.

We expect two siblings with similar, extreme trait values to share more alleles IBD at the trait locus than two siblings who have dissimilar extreme trait values.

The dependence of the trait’s covariance on the IBD sharing at a marker is a function of the distance between the trait and the marker loci as well as the strength of the QTL. As the map distance increases, the covariance of the trait values becomes less dependent on IBD sharing at the marker and so the apparent QTL variance component will decrease.

We expect two siblings with similar, extreme trait values to share more alleles IBD at the trait locus than two siblings who have dissimilar extreme trait values. Or another to think about it, we expect that the correlation among trait values will depend on IBD sharing.

Another way to test whether the covariance among relatives trait values is correlated with the IBD sharing at a locus is to use a variance component model. QTL mapping using a variance component model

A simple variance component model has one major trait locus, a polygenic effect, environmental factors that are independent of genetic effects and independent across family members (no household effects). The major gene and polygenic effects are also independent Polygenes Independent environment QTL Trait value

Mathematically: Y i =  +  T a i +g i +q i +e i where  is the population mean, a are the “environmental” predictor variables, q is the major trait locus, g is the polygenic effect, and e is the residual error. Polygenes Independent environment QTL Trait value

The variance of Y is: var(Y)=  V A +V D +V G +V E and for relatives i and j: where V A = additive genetic variance, V D = dominance genetic variance, V G = polygenic variance,  ij = the theoretical kinship coefficient for i and j, = the conditional kinship coefficient for i and j at a map location = the probability that i and j share both alleles ibd at a map location, Bottom line: the trait covariance increases as ibd sharing increases.

Some things to consider: The estimates of V A and V D are not the actual variances due to the QTL - they depend on how far the map location is from the QTL and the sampled data. For a parent-child pair, cov(Y i,Y j )=  V A +  V G for any map location. Why is the conditional kinship coefficient always 1/4? Why is the dominance variance missing from this equation? For two siblings i and j,

Often V D is assumed to be negligible (V D = 0): Then for any relative pair i and j Under the null hypothesis of no linkage to the map location,

Variance component methods of linkage analysis example overview: (1) Estimate the IBD sharing at specified locations along the genome using marker data. (2) Estimate the variance components  V G and V E, under the null model by maximizing the likelihood. (3) Given the IBD sharing, estimate the variance components, V A,V G and V E by maximizing the likelihood using the IBD sharing at specified map positions Z. (4) Calculate the location score for each map position Z, Identify the map positions where the location score is large.

As the number of traits increases the complexity of the log- likelihood also increases The loglikelihood is maximized using a steepest ascent algorithm. It becomes more and more difficult to find the global maximum as multiple local maximum exist. One “solution” is to use several starting points for the maximization. s1s1 f1f1 s2s2 f2f2

QTL Example: The mystery trait example from the Mendel manual: Besides the usual commands: PREDICTOR = Grand :: Trait1 PREDICTOR = SEX :: Trait1 PREDICTOR = AGE :: Trait1 PREDICTOR = BMI :: Trait1 COEFFICIENT_FILE = Coefficient19b.in <ibd info from sibwalk QUANTITATIVE_TRAIT = Trait1 COVARIANCE_CLASS = Additive <polygenic COVARIANCE_CLASS = Environmental COVARIANCE_CLASS = Qtl <now specify an additive qtl GRID_INCREMENT = <spacing of the map points ANALYSIS_OPTION = Polygenic_Qtl VARIABLE_FILE = Variable19b.in PROBAND = 1 PROBAND_FACTOR = PROBAND

Results Get a summary file and a full output file The summary file looks like: MARKER MAP LOCATION AIC NUMBER OF DISTANCE SCORE FACTORS Marker Marker Marker Marker AIC = -2*ln(L(Z))+2n The smaller the AIC the better the fit n = number of parameters – number of constraints Factors will be explained in a little while.

There is more information in the output file including parameter estimates. However, the estimates of locus specific additive variance and narrow sense heritability obtained from genome wide scans are upwardly biased. Therefore these estimates could lead one to over estimate the importance of the QTL in determining trait values (Goring et al, 2001, AJHG 69: ).

The model we have been considering is very simple: (1)When examining large pedigrees, it may be possible to consider more realistic models for the environmental covariance. Example: Modeling common environmental effects using a household indicator. H=1 if i and j are members of the same household and H=0 if i and j are not members of the same household. (2) The variance component model can use more than one quantitative trait at once as the outcome.

Using more than one quantitative trait in the analysis The model extends so that multiple traits can be considered at the same time. The phenotypic variance is now a matrix. The variance components get more complicated. Instead of one term per variance component, there are (1+…+n) = (n+1)*n/2 terms where n is the number of quantitative traits. As an example, consider two traits X and Y.

For technical reasons it is better to reparameterize the variances using factor analytic approach Factor refers to hidden underlying variables that capture the essence of the data Each variance component is parameters in terms of factors. We will illustrate with the additive genetic variance matrix for two traits X and Y (in principle any number of traits or any of the components could have been used). There exists a matrix such that:         AA A  

Factors can be used to search for pleiotropic effects? Could a single factor explain QTL variance component? A single factor is consistent with pleiotropy although there may be other explanations a single factor. When are more than two traits we could have reduced numbers of factors.

Reduction in Parameters Recall the original factor matrix for the QTL Set  A2 = 0

Modifications to the control file QUANTITATIVE_TRAIT = Trait1 QUANTITATIVE_TRAIT = Trait2 PREDICTOR = Grand :: Trait1 PREDICTOR = SEX :: Trait1 PREDICTOR = AGE :: Trait1 PREDICTOR = BMI :: Trait1 PREDICTOR = Grand :: Trait2 PREDICTOR = SEX :: Trait2 PREDICTOR = AGE :: Trait2 PREDICTOR = BMI :: Trait2 COVARIANCE_CLASS = Additive COVARIANCE_CLASS = Environmental COVARIANCE_CLASS = Qtl

One factor explains the results as well as two MARKER MAP LOCATION AIC NUMBER OF DISTANCE SCORE FACTORS Marker Marker Marker Marker factors: Marker Marker Marker Marker

Summary Variance component models can be used to understand the correlations among traits in families They can also be used to map QTLs Variance component models provide a powerful approach for multivariate quantitative trait data.