Bayesian MCMC QTL mapping in outbred mice Andrew Morris, Binnaz Yalcin, Jan Fullerton, Angela Meesaq, Rob Deacon, Nick Rawlins and Jonathan Flint Wellcome.

Slides:



Advertisements
Similar presentations
The genetic dissection of complex traits
Advertisements

Introduction to Haplotype Estimation Stat/Biostat 550.
Why this paper Causal genetic variants at loci contributing to complex phenotypes unknown Rat/mice model organisms in physiology and diseases Relevant.
. Exact Inference in Bayesian Networks Lecture 9.
Gene by environment effects. Elevated Plus Maze (anxiety)
METHODS FOR HAPLOTYPE RECONSTRUCTION
Bayesian Estimation in MARK
Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Bayesian Methods with Monte Carlo Markov Chains III
Basics of Linkage Analysis
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
Joint Linkage and Linkage Disequilibrium Mapping
A multi-phenotype protocol for fine scale mapping of QTL in outbred heterogeneous stock mice LC Solberg, C Arboledas, P Burns, S Davidson, G Nunez, A Taylor,
1 QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004.
1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
What is a chromosome?.
Sessão Temática 2 Análise Bayesiana Utilizando a abordagem Bayesiana no mapeamento de QTL´s Roseli Aparecida Leandro ESALQ/USP 11 o SEAGRO / 50ª RBRAS.
Genetic Traits Quantitative (height, weight) Dichotomous (affected/unaffected) Factorial (blood group) Mendelian - controlled by single gene (cystic fibrosis)
Tutorial #11 by Anna Tzemach. Background – Lander & Green’s HMM Recombinations across successive intervals are independent  sequential computation across.
Approximate Bayesian Methods in Genetic Data Analysis Mark A. Beaumont, University of Reading,
Mapping Basics MUPGRET Workshop June 18, Randomly Intermated P1 x P2  F1  SELF F …… One seed from each used for next generation.
1 Bayesian inference of genome structure and application to base composition variation Nick Smith and Paul Fearnhead, University of Lancaster.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Linkage and LOD score Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Material Model Parameter Identification via Markov Chain Monte Carlo Christian Knipprath 1 Alexandros A. Skordos – ACCIS,
From QTL to QTG: Are we getting closer? Sagiv Shifman and Ariel Darvasi The Hebrew University of Jerusalem.
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Fine mapping QTLs using Recombinant-Inbred HS and In-Vitro HS William Valdar Jonathan Flint, Richard Mott Wellcome Trust Centre for Human Genetics.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU.
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Quantitative Genetics
QTL Mapping in Heterogeneous Stocks Talbot et al, Nature Genetics (1999) 21: Mott et at, PNAS (2000) 97:
Stable Multi-Target Tracking in Real-Time Surveillance Video
Estimating Genealogies from Marker Data Dario Gasbarra Matti Pirinen Mikko Sillanpää Elja Arjas Biometry Group Department of Mathematics and Statistics.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Lecture 15: Linkage Analysis VII
Bayesian Phylogenetics. Bayes Theorem Pr(Tree|Data) = Pr(Data|Tree) x Pr(Tree) Pr(Data)
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
Association between genotype and phenotype
California Pacific Medical Center
Www. geocities.com/ResearchTriangle/Forum/4463/anigenetics.gif.
BayesNCSU QTL II: Yandell © Bayesian Interval Mapping 1.what is Bayes? Bayes theorem? Bayesian QTL mapping Markov chain sampling18-25.
Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture Duke University Machine Learning Group Presented by Kai Ni August.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Powerful Regression-based Quantitative Trait Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Chapter 2: Bayesian hierarchical models in geographical genetics Manda Sayler.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
MCMC Output & Metropolis-Hastings Algorithm Part I
Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor
Bayesian inference Presented by Amir Hadadi
Power to detect QTL Association
Error Checking for Linkage Analyses
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
IBD Estimation in Pedigrees
Linkage Analysis Problems
How do external factors change an organism's genes to affect their offspring? Lecture 1: During meiosis the chromatids move apart from each other during.
Presentation transcript:

Bayesian MCMC QTL mapping in outbred mice Andrew Morris, Binnaz Yalcin, Jan Fullerton, Angela Meesaq, Rob Deacon, Nick Rawlins and Jonathan Flint Wellcome Trust Centre for Human Genetics University of Oxford

Motivation Analysis of heterogeneous stock (HS) mice provides reasonable evidence of (at least) one QTL for anxiety related trait in ~4Mb region of chromosome 1, encompassing cluster of RGS genes. Intensive sequencing of HS founder strains has identified two sub-regions with high probability of containing QTLs. Can we replicate these findings in other samples of mice? Can we refine the location of potential QTLs? Can we distinguish between single and multiple QTL effects?

MF1 sample Sample of MF1 outbred mice obtained from Harlan. Large sibships of mice phenotyped for anxiety related trait and genotyped at more than 40 SNPs and microsatellites in 4Mb candidate region. Parental phenotype and marker information not generally available. 93 sibships, 729 phenotyped offspring.

Method outline Reconstruct marker haplotypes in sibships and estimate inheritance vectors, taking account of uncertainty in phase assignment. Approximate distribution of location of di- allelic QTL(s) (mutant/normal) in candidate region given phenotypes and inheritance vectors, allowing for uncertainty in parental QTL alleles. Compare additive vs dominant genetic effect models and single QTL vs multiple QTL models.

Inheritance vectors (1)

Inheritance vectors (2) Parental genotypes Possible offspring inheritance vectors: Offspring genotype

Inheritance vectors (2) θ recombination fraction Parental genotypes Possible offspring inheritance vectors: (1-θ) 0.5θ 0.5(1-θ) Homozygous parents Offspring genotype

Inheritance vectors (2) θ recombination fraction Parental genotypes Possible offspring inheritance vectors: p(1-θ)/T (1-p)θ/T Unknown parental phase Offspring genotype p 1-p Estimate p from offspring genotypes across sibship

Inheritance vectors (2) θ recombination fraction Parental genotypes Possible offspring inheritance vectors: Missing parental genotypes Offspring genotype p 1-p Sum probabilities over all parental genotypes!

Inheritance vectors (2) Parental genotypes Possible offspring inheritance vectors: Offspring genotype Sensitivity to genotyping errors

Inhertitance vectors (3) For MF1 sample, parental genotypes not currently available. Too many combinations of parental genotypes to consider all markers simultaneously. Use overlapping sliding window of five markers, and combine information across windows.

Bayesian framework (1) Goal is to approximate posterior distribution of location(s) of QTL(s), f(x|Y,V), given offspring phenotypes, Y, and estimated inheritance vectors V. Recovered by integration f(x|Y,V) = ∫ M ∫ Q f(x,M,Q|Y,V) dQdM over genetic effect model parameters, M, and parental QTL alleles, Q.

Bayesian framework (2) By Bayes' theorem f(x,M,Q|Y,V) = C f(Y|x,M,Q,V) f(x,M,Q) where f(Y|x,M,Q,V) is the likelihood of the phenotype data and f(x,M,Q) is the prior density of location(s), genetic effect model parameters and parental QTL alleles. Assume independent uniform priors for location(s) and genetic effect parameters, and that all assignments of parental QTL alleles are equally likely, a priori. Hence f(x,M,Q) is constant.

Likelihood calculations (1) Conditional on inheritance vectors and parental QTL alleles, the phenotype of offspring, k, in the same sibship, i, will be independent: f(Y|x,M,Q,V) = Π i Π k f(y ik |x,M,Q,V) where y ik is distributed N(μ ik,σ 2 ) and, under a single QTL model: μ ik = s i + aq ik + d[I(q ik =1)] The sibship effect, s i, is distributed N(λ,σ S 2 ). The number of mutant QTL alleles, q ik = (0,1,2), will have a distribution determined by x, Q and V, so weight likelihood according to corresponding inheritance vector probabilities…

Likelihood calculations (2) L θ L x θ R R Parents Offspring

Likelihood calculations (2) L θ L x θ R R Parents Offspring (1-θ L ) 2 (1-θ R ) 2 /T(1-θ L )θ L (1-θ R )θ R /Tθ L 2 θ R 2 /T(1-θ L )θ L (1-θ R )θ R /T q ik = 1 q ik = 2 q ik = 0 q ik = 1

MCMC algorithm Employ Metropolis-Hastings MCMC algorithm to approximate target posterior distribution f(x,M,Q|Y,V). Random initial parameter configuration, P = {x,M,Q}. Propose small change to parameter configuration, P*. Accept new parameter configuration with probability f(P*|Y,V)/f(P|Y,V), otherwise current configuration retained. On convergence, each configuration accepted (or retained) by the algorithm represents a random draw from f(P|Y,V).

MF1 analysis Comparison of four models: no QTLs in candidate region (null); one additive QTL in candidate region; one dominant QTL in candidate region; two dominant QTLs in candidate region. Assume uniform recombination rate across candidate region, a priori. 2.2 million iterations of MCMC algorithm, thinned to every 2,000 th output. Initial 200,000 iterations excluded as burn-in, resulting in 1,000 thinned sampling outputs.

MF1 analysis: 1 dominant QTL 95% credibility interval: Mb

MF1 analysis: 2 dominant QTLs 95% credibility intervals: Mb (DOM1) Mb (DOM2) DOM1 DOM2

MF1 analysis: 2 dominant QTLs 95% credibility intervals: Mb (DOM1) Mb (DOM2) DOM1 DOM2

MF1 analysis: Comparison of models Model mScaled log[f(M=m|Y,V)] Posterior probability Null additive QTL dominant QTL dominant QTLs

MF1 analysis: ongoing work Model 3 dominant QTLs in candidate region. Incorporate parental genotype information. Additional genotyping in vicinity of DOM1. Sensitivity to marker selection and genotyping error. Investigate properties of algorithm under null model by random permutation of offspring phenotypes.

Summary Bayesian MCMC method developed to approximate distribution of location of QTLs in candidate region. Designed for use with large sibships of outbred mice, but could be generalised to other pedigree structures. Analysis of MF1 sample suggests evidence of (at least) two QTLs, one in the vicinity of RGS18.