Background The demographic events experienced by populations influence their genealogical history and therefore the pattern of neutral polymorphism observable.

Slides:



Advertisements
Similar presentations
Lab 3 : Exact tests and Measuring of Genetic Variation.
Advertisements

METHODS FOR HAPLOTYPE RECONSTRUCTION
Recombination and genetic variation – models and inference
Sampling distributions of alleles under models of neutral evolution.
A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University.
MALD Mapping by Admixture Linkage Disequilibrium.
Atelier INSERM – La Londe Les Maures – Mai 2004
Signatures of Selection
Forward Genealogical Simulations Assumptions:1) Fixed population size 2) Fixed mating time Step #1:The mating process: For a fixed population size N, there.
Islands in Africa: a study of structure in the source population for modern humans Rosalind Harding Depts of Statistics, Zoology & Anthropology, Oxford.
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
Tracing the dispersal of human populations By analysis of polymorphisms in the Non-recombining region of the Human Y Chromosome Underhill et al 2000 Nature.
From population genetics to variation among species: Computing the rate of fixations.
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
Scott Williamson and Carlos Bustamante
Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF.
Inferring Evolutionary History with Network Models in Population Genomics: Challenges and Progress Yufeng Wu Dept. of Computer Science and Engineering.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.
Monte Carlo methods for estimating population genetic parameters Rasmus Nielsen University of Copenhagen.
Inference of Genealogies for Recombinant SNP Sequences in Populations Yufeng Wu Computer Science and Engineering Department University of Connecticut
Lecture 13 – Performance of Methods Folks often use the term “reliability” without a very clear definition of what it is. Methods of assessing performance.
Review Session Monday, November 8 Shantz 242 E (the usual place) 5:00-7:00 PM I’ll answer questions on my material, then Chad will answer questions on.
Modes of selection on quantitative traits. Directional selection The population responds to selection when the mean value changes in one direction Here,
Molecular phylogenetics
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
Speciation history inferred from gene trees L. Lacey Knowles Department of Ecology and Evolutionary Biology University of Michigan, Ann Arbor MI
Gil McVean Department of Statistics, Oxford Approximate genealogical inference.
Quantifying uncertainty in species discovery with approximate Bayesian computation (ABC): single samples and recent radiations Mike HickersonUniversity.
TOWARDS TESTING THE EPIDEMIC CLONE MODEL OF BACTERIAL PATHOGENS Daniel J. Wilson, Gilean A.T. McVean and Martin C.J. Maiden Peter Medawar Building for.
Population assignment likelihoods in a phylogenetic and demographic model. Jody Hey Rutgers University.
ABC The method: practical overview. 1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics.
Simon Myers, Gil McVean Department of Statistics, Oxford Recombination and genetic variation – models and inference.
Conservation Genetics of the Plains Topminnow, Fundulus sciadicus The plains topminnow (Fundulus sciadicus) is a freshwater killifish endemic to the Great.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Getting Parameters from data Comp 790– Coalescence with Mutations1.
Coalescent Models for Genetic Demography
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Population genetics. coalesce 1.To grow together; fuse. 2.To come together so as to form one whole; unite: The rebel units coalesced into one army to.
Why phylogenetics? Barbara Holland School of Physical Sciences University of Tasmania.
By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.
Testing the Neutral Mutation Hypothesis The neutral theory predicts that polymorphism within species is correlated positively with fixed differences between.
Copyright © Cengage Learning. All rights reserved. 5 Joint Probability Distributions and Random Samples.
Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of.
A Little Intro to Statistics What’s the chance of rolling a 6 on a dice? 1/6 What’s the chance of rolling a 3 on a dice? 1/6 Rolling 11 times and not getting.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Monkey Business Bioinformatics Research Center University of Aarhus Thomas Mailund Joint work with Asger Hobolth, Ole F. Christiansen and Mikkel H. Schierup.
Lecture 19 – Species Tree Estimation
An Algorithm for Computing the Gene Tree Probability under the Multispecies Coalescent and its Application in the Inference of Population Tree Yufeng Wu.
IMa2(Isolation with Migration)
Population Genetics As we all have an interest in genomic epidemiology we are likely all either in the process of sampling and ananlysising genetic data.
Signatures of Selection
Neutrality Test First suggested by Kimura (1968) and King and Jukes (1969) Shift to using neutrality as a null hypothesis in positive selection and selection.
Montgomery Slatkin  The American Journal of Human Genetics 
Pipelines for Computational Analysis (Bioinformatics)
Fig. 2. —The 26 models implemented in this study
Statistical Modeling of Ancestral Processes
Testing the Neutral Mutation Hypothesis
The coalescent with recombination (Chapter 5, Part 1)
Montgomery Slatkin  The American Journal of Human Genetics 
Trees & Topologies Chapter 3, Part 2
Trees & Topologies Chapter 3, Part 2
Coupling Genetic and Ecological-Niche Models to Examine How Past Population Distributions Contribute to Divergence  L. Lacey Knowles, Bryan C. Carstens,
Maternal History of Oceania from Complete mtDNA Genomes: Contrasting Ancient Diversity with Recent Homogenization Due to the Austronesian Expansion  Ana T.
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
Characteristics of Neutral and Deleterious Protein-Coding Variation among Individuals and Populations  Wenqing Fu, Rachel M. Gittelman, Michael J. Bamshad,
Pier Francesco Palamara, Laurent C. Francioli, Peter R
Lactase Haplotype Diversity in the Old World
Messages through Bottlenecks: On the Combined Use of Slow and Fast Evolving Polymorphic Markers on the Human Y Chromosome  Peter de Knijff  The American.
Bruce Rannala, Jeff P. Reeve  The American Journal of Human Genetics 
Presentation transcript:

Background The demographic events experienced by populations influence their genealogical history and therefore the pattern of neutral polymorphism observable within and between extant populations (see Figure 1). For example, the number of alleles shared between very closely related species depends on the time at which the species split and whether gene flow occurred since the split (see Figure 2). Thus, polymorphism data can be used to estimate the demographic parameters describing the history of two incipient species (see Figure 3). Here, we consider a simple model in which two populations split T generations ago and the number of migrants exchanged between them is M per generation. Na, N1 and N2 are the effective population sizes for the ancestral, first and second descendant populations, respectively. We denote the set of parameters by . Our goal is to estimate the posterior distribution of  given the data. Rather than using all the data to estimate these parameters, we summarize the data for each locus by four statistics known to be sensitive to the parameters of interest (see Figure 1 for details). Given a genealogy, the probability of obtaining these statistics can be calculated explicitly. We therefore take the following approach to obtain an estimate of the posterior distribution of the parameters: Specifically, we pick a set of parameters independently from prior distributions, then simulate a genealogical history for each locus and calculate u= p(D|G,  ). We then weight the values of the parameters by u to obtain an estimate of their posterior probability. Keystone Symposia. Genome Sequence Variation. Jan 08 – Jan 13, 2006 Estimating divergence times and testing for migration using multi- locus polymorphism data Céline Becquet a, Andrea S. Putnam b, Peter Andolfatto b, and Molly Przeworski a Dept. of Human Genetics, Chicago, IL, USA, a ; University of California at San Diego, La Jolla, CA, b Table 1 – Comparison between methods PaperSpeedAdvantagesDrawbacks Wakeley & Hey 1997FastMethod of moments estimatorSummary of data Allows for recombinationLow accuracy Multiple lociS s & S f >0 required Can use genotype dataModel with no migration Nielsen & Wakeley 2001SlowUses all the dataUses one locus Allows for uncertainty in nuisance parameters Recombination not allowed Haplotype data required More general model (allows for migration) Hey & Nielsen 2004SlowSame as Nielsen & Wakeley 2001 Recombination not allowed Haplotype data required Multiple loci Leman et al. 2005Fast Approximate Bayesian Computation approach Summary of data Uses one locus No need for S s & S f >0Recombination not allowed Can use genotype dataModel with no migration Our methodSlowSame as Leman et al. 2005Summary of data More general model Allows for recombination Multiple loci An example of polymorphism data at a locus in three sequences sampled from each of two populations. The horizontal lines represent aligned sequences; the colored squares, disc and ovals stands for segregating sites. We use the following summaries of the polymorphism data at each locus: the number of segregating sites specific to sample one (S 1 ), specific to sample two (S 2 ), shared between samples from both populations (S shared ) and fixed in either population sample (S fixed ). S 1 =1 S 2 =2 S shared =1 S fixed = a b c Fig-2. Effects of divergence time and migration on polymorphism data Examples of genealogical histories for three sequences sampled from each of two closely related populations, under different models. The patterns of polymorphism and divergence expected under each model are indicated below. For simplicity, we present a single genealogy, but for recombining loci, there may be many histories within a single region (i.e. there is an ancestral recombination graph, rather than a tree). The vertical branches represent ancestral lineages for the six sequences; they are colored according to whether a mutation would lead to a fixed, shared or unique polymorphism in the sample (see Figure 1). In c, gene flow occurred (yellow line), thus sequence 3 was sampled in population one but its ancestor came from population two. Posterior distribution Calculated explicitly Estimated from coalescent simulations Prior distributions on parameters Future directions Our current method is relatively slow when using data from multiple loci because it is searching a huge space of possible histories and parameters. We would like to speed up the method and extend it to more complex models. To do so, we will need to account for two sources of variance: in the genealogies and the parameters. We therefore plan to generate many genealogies for the same set of parameters in order to improve the accuracy of our estimate of p(D|  ) and use Markov Chain Monte Carlo in order to better explore the parameter space. Abstract Population divergence times are of interest in many contexts, from human genetics to conservation biology. These times can be estimated from polymorphism data. However, existing approaches make a number of assumptions (e.g., no recombination within loci or no migration since the split) that limit their applicability. To overcome these limitations, we developed an Approximate Bayesian Computation approach to estimate population parameters for a simple split model, allowing for migration as well as intralocus recombination. Application to simulated data suggests that the approach provides fairly accurate estimates of population sizes and divergence times and has high power to detect migration since the split. We illustrate the potential of the method by applying it to polymorphism data from five highly recombining loci surveyed in two closely related species of Lepidoptera (Papilio glaucus and P. canadensis). Fig-1. Summary statistics used for estimation a. A gene genealogy for a recent divergence time without migration 123a b c T N1N1 N2N2 NaNa Excess of shared polymorphisms (occurring along the red branch) and few fixed sites (purple branch). b. A gene genealogy for an old divergence time without migration T 123a b c Few shared polymorphisms (none here) and an excess of fixed sites. c. A gene genealogy for an old divergence time with migration T 123a b c M Excess of shared polymorphism and few fixed sites. Application to two Papilio species Fig-3. Performance on a small simulated data set Mean of the divergence time (a) and the ratio of ancestral to current population size (b). The estimates are based on polymorphism data from ten simulated loci of 1 kb, generated with: a sample size of 20 individuals from each population, the population mutation rates θ 1 =θ 2 =θ a =.001, T=5x10 4 generations and M=5. Each vertical line refers to a data set (Y-axis), the red line indicates the true value and the X-axis range corresponds to the range of the prior distribution. As can be seen, the divergence times tend to be over-estimated, while the ancestral population size estimates are more accurate. Fig-4. Ranges of P. glaucus and P. canadensis. A narrow hybrid zone forms where the ranges meet. Female mimetic morph of P. glaucus is shown with yellow morphs. We applied our method to data from five highly recombining loci sampled in two species of Lepidoptera (Papilio glaucus and P. canadensis). These two species are known to exchange migrants and experience high levels of recombination. In order to examine the sensitivity to assumptions about migration, we compared the parameter estimates obtained in models with and without gene flow: the time of divergence appears to be under- estimated and ancestral population size over-estimated when migration is ignored (see Table 2). Table 2 - Effect of model on estimation EstimatorModelN1N2Na T (in generation) Migration Rate* Mean Migration2.65E E E E No Migration2.59E E E E+05 Median Migration2.55E E E E No Migration2.52E E E E+05 * The posterior probability of migration is >.999, while the prior probability is only.5. Thus, there is strong support for gene flow, as expected.