Scott Williamson and Carlos Bustamante

Slides:

Advertisements

Similar presentations

Background The demographic events experienced by populations influence their genealogical history and therefore the pattern of neutral polymorphism observable.

Advertisements

Recombination and genetic variation – models and inference

Sampling distributions of alleles under models of neutral evolution.

Modeling Populations forces that act on allelic frequencies.

Change in frequency of the unbanded allele (q) as a function of q for island populations. Equilibrium points a)Strong selection for q, little migration.

Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae

Atelier INSERM – La Londe Les Maures – Mai 2004

Signatures of Selection

Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.

Molecular Evolution Revised 29/12/06

Forward Genealogical Simulations Assumptions:1) Fixed population size 2) Fixed mating time Step #1:The mating process: For a fixed population size N, there.

Molecular evolution: how do we explain the patterns of variation observed in DNA sequences? how do we detect selection by comparing silent site substitutions.

Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.

From population genetics to variation among species: Computing the rate of fixations.

1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.

Polymorphism Structure of the Human Genome Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA

Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.

Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.

Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF.

Evolutionary Computational Intelligence Lecture 9: Noisy Fitness Ferrante Neri University of Jyväskylä.

Monte Carlo methods for estimating population genetic parameters Rasmus Nielsen University of Copenhagen.

Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.

TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.

Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.

Hidenki Innan and Yuseob Kim Pattern of Polymorphism After Strong Artificial Selection in a Domestication Event Hidenki Innan and Yuseob Kim A Summary.

Molecular phylogenetics

Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.

Lecture 21: Tests for Departures from Neutrality November 9, 2012.

MIGRATION  Movement of individuals from one subpopulation to another followed by random mating.  Movement of gametes from one subpopulation to another.

The Evolution of Populations Chapter 23 Biology – Campbell Reece.

The Evolution of Populations.  Emphasizes the extensive genetic variation within populations and recognizes the importance of quantitative characteristics.

Bioinformatics 2011 Molecular Evolution Revised 29/12/06.

Lecture 3: population genetics I: mutation and recombination

Course outline HWE: What happens when Hardy- Weinberg assumptions are met Inheritance: Multiple alleles in a population; Transmission of alleles in a family.

1 Evolutionary Change in Nucleotide Sequences Dan Graur.

Identifying and Modeling Selection Pressure (a review of three papers) Rose Hoberman BioLM seminar Feb 9, 2004.

Patterns of divergent selection from combined DNA barcode and phenotypic data Tim Barraclough, Imperial College London.

Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.

Coalescent Models for Genetic Demography

The plant of the day Bristlecone pine - Two species Pinus aristata (CO, NM, AZ), Pinus longaeva (UT, NV, CA) Thought to reach an age far greater than any.

Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.

Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.

Selectionist view: allele substitution and polymorphism

The influence of population size on patterns of natural selection in mammals Carolin Kosiol Cornell University 21 st December 2007 Isaac.

Lecture 20 : Tests of Neutrality

NEW TOPIC: MOLECULAR EVOLUTION.

By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.

Lab 11 :Test of Neutrality and Evidence for Selection

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

The plant of the day Pinus longaevaPinus aristata.

Testing the Neutral Mutation Hypothesis The neutral theory predicts that polymorphism within species is correlated positively with fixed differences between.

In populations of finite size, sampling of gametes from the gene pool can cause evolution. Incorporating Genetic Drift.

Modelling evolution Gil McVean Department of Statistics TC A G.

Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.

Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,

Robert Page Doctoral Student in Dr. Voss’ Lab Population Genetics.

Lecture 6 Genetic drift & Mutation Sonja Kujala

CS479/679 Pattern Recognition Dr. George Bebis

Why study population genetic structure?

Signatures of Selection

The neutral theory of molecular evolution

Neutrality Test First suggested by Kimura (1968) and King and Jukes (1969) Shift to using neutrality as a null hypothesis in positive selection and selection.

The Neutral Theory M. Kimura, 1968

Testing the Neutral Mutation Hypothesis

The ‘V’ in the Tajima D equation is:

Genetic drift in finite populations

Selection and Reduced Population Size Cannot Explain Higher Amounts of Neandertal Ancestry in East Asian than in European Human Populations Bernard Y.

Characteristics of Neutral and Deleterious Protein-Coding Variation among Individuals and Populations Wenqing Fu, Rachel M. Gittelman, Michael J. Bamshad,

Dr. Xijiang Yu Shandong Agricultural University

Presentation transcript:

Scott Williamson and Carlos Bustamante Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University

Inferring natural selection from samples Statistical tests of the neutral theory (lots) Methods for detecting selective sweeps (lots) Parametric inference: estimating selection parameters, etc. Quantification of selective constraint, deleterious mutation

The demography problem Many existing methods assume random mating, constant population size These assumptions don’t apply in most natural populations The effect of demography can mimic the effect of natural selection

Natural selection and population growth Inferring selection from the frequency spectrum while correcting for demography The McDonald-Kreitman test: does recent population growth cause you to misidentify negative selection as adaptive evolution?

The frequency spectrum: an example Site A G C T 163 975 1972 2188 3529 4424 4961 5286 7019 1 2 3 4 5 Sequence Count Frequency class: Frequency class Ancestral Derived

Natural selection and the frequency spectrum Equilibrium neutral and positively selected frequency spectra Neutral 2Ns=2 Count Frequency class

Natural selection and the frequency spectrum Equilibrium neutral and negatively selected frequency spectra Neutral 2Ns=-2 Count Frequency class

Natural selection vs. demography Non-stationary neutral and equilibrium selected frequency spectra Population growth, neutral Equilibrium, 2Ns=-2 Count Frequency class

How do we distinguish selection from demography? McDonald-Kreitman approach: Use a priori information to classify changes as “neutral” (e.g. synonymous, non-coding) or “potentially selected” (e.g. non-synonymous) Putatively neutral changes are treated as a standard for patterns of neutral evolution in a particular sample Potentially selected sites are compared to the neutral standard Can we develop a neutral standard for the frequency spectrum?

Comparing frequency spectra for different classes of mutation Observed frequency spectra This talk: Likelihood ratio test of neutrality at potentially selected sites, using information from the neutral sites Biologically meaningful measure of the difference between the two spectra Putatively neutral Potentially selected Count Frequency class

Comparing frequency spectra for different classes of mutation Observed frequency spectra A model-based approach: Fit a neutral demographic model to estimate demographic parameters Putatively neutral Potentially selected Count Given those parameter estimates, fit a selective demographic model to estimate selection parameters, test hypotheses Frequency class

Comparing frequency spectra for different classes of mutation Observed frequency spectra Requirements: Demographic model Frequency spectrum predictions from the model under neutrality Frequency spectrum predictions from the model subject to natural selection Putatively neutral Potentially selected Count Frequency class

Theory: population growth model 2-epoch model  NC Population size NA =NA/NC time now Model parameters: ,

Theory: predicting the frequency spectrum Definitions: xi Number of sites in frequency class i f(q,t;) Distribution of allele frequency, q, at time t n Sample size Predictions:

Theory: the distribution of allele frequency Poisson Random Field approach (Sawyer and Hartl 1992): Use single-locus diffusion theory to predict the distribution of allele-frequency If sites are independent (i.e. in linkage equilibrium) and identically distributed, then the single-locus theory applies across sites To get f, we need to solve the diffusion equation:

Theory: time-dependent solution, neutral case The forward equation under neutrality: Kimura’s (1964) solution, given some initial allele frequency, p:

Theory: time-dependent solution, neutral case Applying Kimura’s solution to the 2-epoch model: ancestral mutations Kimura’s (1964) solution, given some initial allele frequency, p: Distribution of allele frequency:

Theory: time-dependent solution, neutral case Expected frequency spectrum after a change in population size (=0.01) 0.8 0.6 P(i,n;,0.01) 0.4 0.2 1 2 3 4 5 6 7 8 9 frequency class

Theory: time-dependent solution, neutral case Multinomial likelihood:  Maximum likelihood estimates of  and   Likelihood ratio test of population growth

Comparing frequency spectra for different classes of mutation Observed frequency spectra Requirements: Demographic model Frequency spectrum predictions from the model under neutrality Frequency spectrum predictions from the model subject to natural selection Putatively neutral Potentially selected Count Frequency class

Theory: time-dependent solution, selected case The forward equation with selection: where =2NCs Initial condition:

Theory: time-dependent solution, selected case Numerically solve the forward equation using the Crank-Nicolson finite differencing scheme Use this approximation of f to evaluate the likelihood function: Fix  and  to their MLEs from the neutral data Optimize the likelihood for . Likelihood ratio test of neutrality:

Theory: time-dependent solution, selected case How can we be sure that the numerical solution actually works? Von Neumann stability analysis: solution is unconditionally stable Numerical solution converges to the stationary distribution after ~4NC generations Comparison with time-dependent neutral predictions: Kimura, Crank, and Nicolson all agree with each other

Human Polymorphism Data From Stephens et al. (2001) 80 individuals, geographically diverse ancestry 313 genes, 720 kb sequenced ~3000 SNPs (72% non-coding, 13% synonymous, 15% non-synonymous)

Results for non-coding changes, assuming neutrality Model MLEs ln(L) 2-epoch  = 0.016  = 0.13 -5674.6 Equilibrium neutral -6046.6 (P0, d.f. 2) Goodness-of-fit -5608.3 (P=0.54, d.f. 76)

Results for non-synonymous changes, categorized by Grantham’s distance Category S P-value conservative 136 -2.24 0.52 moderate 137 -6.08 0.07 radical 107 -8.44 0.02 all nonsyn 380 -4.88 0.10

Ongoing work and future directions Simulate, simulate, simulate How robust is the method to different types of demographic forces? How does linkage among some sites affect the analysis? How does estimation error affect the LRTs? Numerical solution for different demographic scenarios (e.g. bottleneck, population structure) Variable selective effects among new mutations

The McDonald-Kreitman test Sn Number of non-synonymous segregating sites Dn Number of non-synonymous fixed differences Ss Number of synonymous segregating sites Ds Number of synonymous fixed differences Adaptive evolution Negative selection Extensions: Sawyer and Hartl (1992), Rand and Kann (1996), Smith and Eyre-Walker (2002), Bustamante et al. (2002), others

Demography and the McDonald-Kreitman test Robust to different demographic scenarios because it implicitly conditions on the underlying genealogy (see Nielsen 2001) However, under some demographic scenarios it’s possible to misidentify the type of selection Weak negative selection with population growth When the population size is small, non-synonymous deleterious mutations might be fixed by drift Once the population size becomes large, the level of non-synonymous polymorphism would be reduced (relative to the level of synonymous polymorphism)

Demography and the McDonald-Kreitman test Over what range of parameter values might you misidentify negative selection as adaptive evolution? How large is the effect? Eyre-Walker (2002): Addressed these questions, finding that recent population growth or bottlenecks can cause you to misidentify negative selection Assumed that levels of polymorphism and fixation rates changed instantaneously with population size

Demography and the McDonald-Kreitman test where tdiv is the divergence time, measured in 2NC generations

Demography and the McDonald-Kreitman test =0.1, tdiv=10 10 10 =0.1, tdiv=4 1 1 0.01 0.1 1 0.01 0.1 1 Expected Neutrality Index (NI) =1, tdiv=4 =1, tdiv=10 10 10 1 1 0.01 0.1 1 0.01 0.1 1  (=NA/NC)

Demography and the McDonald-Kreitman test: Preliminary results It is possible to misidentify negative selection for some parameter combinations But…the parameter range over which this is true is probably smaller than previously thought, as is the magnitude of the effect

Summary Model-based approach to correcting for demography while inferring selection Evidence for very recent population growth in humans Reasonable estimates of selection parameters for classes of non-synonymous changes McDonald-Kreitman test: negative selection + population growth problem not as severe as previously thought Numerical methods for solving the diffusion are fast, accurate, and fun!

Acknowledgements Collaborator: Carlos Bustamante Data: Genaissance Pharmaceuticals Helpful discussions: Bret Payseur, Rasmus Nielsen, Matt Dimmic, Jim Crow, Hiroshi Akashi, Graham Coop