Molecular evolution:   how do we explain the patterns of variation observed in DNA sequences? how do we detect selection by comparing silent site substitutions.

Slides:



Advertisements
Similar presentations
Neutral Theory of Molecular Evolution most base substitutions are selectively neutral drift dominates evolution at the molecular level Under drift, rate.
Advertisements

Evolution of genomes.
Chapter 19 Evolutionary Genetics 18 and 20 April, 2004
Sampling distributions of alleles under models of neutral evolution.
Random fixation and loss of heterozygosity
Change in frequency of the unbanded allele (q) as a function of q for island populations. Equilibrium points a)Strong selection for q, little migration.
Lecture 19: Causes and Consequences of Linkage Disequilibrium March 21, 2014.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Atelier INSERM – La Londe Les Maures – Mai 2004
Signatures of Selection
14 Molecular Evolution and Population Genetics
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
From population genetics to variation among species: Computing the rate of fixations.
Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.
Polymorphism Structure of the Human Genome Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
1 Mendelian Genetics in Populations II Migration, Genetic Drift, and Nonrandom Mating.
The origins & evolution of genome complexity Seth Donoughe Lynch & Conery (2003)
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Molecular Evolution Course #27615 Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence.
Scott Williamson and Carlos Bustamante
Mendelian Genetics in Populations – 1
Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Molecular Evolution Course #27615 Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence.
Population Genetics direct extension of Mendel’s laws, molecular genetics, and the ideas of Darwin Instead of genetic transmission between individuals,
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Chapter 3 Substitution Patterns Presented by: Adrian Padilla.
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
Molecular Clock. Rate of evolution of DNA is constant over time and across lineages Resolve history of species –Timing of events –Relationship of species.
Lecture 21: Tests for Departures from Neutrality November 9, 2012.
In the deterministic model, the time till fixation depends on the selective advantage, but fixation is guaranteed.
Section 4 Evolution in Large Populations: Mutation, Migration & Selection Genetic diversity lost by chance and selection regenerates through mutation.
Models of Molecular Evolution I Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.1 – 7.2.
BASIC FACTS ABOUT MALARIA n Four Plasmodium species cause human malaria: P. falciparum (the most virulent), P. vivax, P. malariae, and P. ovale. Human.
Deviations from HWE I. Mutation II. Migration III. Non-Random Mating IV. Genetic Drift V. The Neutral Theory.
Lecture 3: population genetics I: mutation and recombination
The Molecular Clock? By: T. Michael Dodson. Hypothesis For any given macromolecule (a protein or DNA sequence) the rate of evolution is approximately.
Genetic Linkage. Two pops may have the same allele frequencies but different chromosome frequencies.
The Biology and Genetic Base of Cancer. 2 (Mutation)
Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.
Selectionist view: allele substitution and polymorphism
Class 22 DNA Polymorphisms Based on Chapter 10 Recombinant DNA Technology Copyright © 2010 Pearson Education Inc.
Lecture 20 : Tests of Neutrality
NEW TOPIC: MOLECULAR EVOLUTION.
Molecular evolution Part I: The evolution of macromolecules.
The plant of the day Pinus longaevaPinus aristata.
Testing the Neutral Mutation Hypothesis The neutral theory predicts that polymorphism within species is correlated positively with fixed differences between.
In populations of finite size, sampling of gametes from the gene pool can cause evolution. Incorporating Genetic Drift.
Human survivorship Developed Developing Bob May (2007), TREE 22:
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Human survivorship Developed Developing Bob May (2007), TREE 22:
LBA ProtPars. LBA Prot Dist no Gamma and no alignment.
Lecture 6 Genetic drift & Mutation Sonja Kujala
Hudson Kreitman Aguadé 1987
Genetic Linkage.
Population Genetics direct extension of Mendel’s laws, molecular genetics, and the ideas of Darwin Instead of genetic transmission between individuals,
Evolution of gene function
The neutral theory of molecular evolution
Neutrality Test First suggested by Kimura (1968) and King and Jukes (1969) Shift to using neutrality as a null hypothesis in positive selection and selection.
Human Chimp How does DNA evolve? Nucleotide substitutions
Linkage and Linkage Disequilibrium
The Neutral Theory M. Kimura, 1968
Genetic Linkage.
Distances.
Detection of the footprint of natural selection in the genome
Testing the Neutral Mutation Hypothesis
The ‘V’ in the Tajima D equation is:
1. "HARD" Selection can 'cost' a population individuals:
Genetic Drift, followed by selection can cause linkage disequilibrium
Genetic Linkage.
Reminder The AP Exam registration is open in Naviance. The Exam is on Monday, May 13. I’ll let you know when the next test/homework will be.
Presentation transcript:

Molecular evolution:   how do we explain the patterns of variation observed in DNA sequences? how do we detect selection by comparing silent site substitutions to replacement substitutions? how do we detect selection by comparing fixed differences between species to polymorphisms within species? how do we detect selection by using hitchhiking? Goal: understand the logic behind key tests.

Neutralist vs. selectionist view Are most substitutions due to drift or natural selection? “Neutralist” vs. “selectionist” Agree that: Most mutations are deleterious and are removed. Some mutations are favourable and are fixed. Dispute: Are most replacement mutations that fix beneficial or neutral? Is observed polymorphism due to selection or drift? Don’t know! Different faculty here have different views.

Reminder: substitution vs. polymorphism What happen after a mutation changes a nucleotide in a locus Polymorphism: mutant allele is one of several present in population Substitution: the mutant allele fixes in the population. (New mutations at other nucleotides may occur later.)

Substitution schematic Individual 1 2 3 4 5 6 7 Time 0: aaat aaat aaat aaat aaat aaat aaat Time 10: aaat aaat aaat aaat acat aaat aaat Time 20: aaat aaat acat aaat acat acat acat Time 30: acat acat acat acat acat acat acat Time 40: acat acat actt acat acat acat acat Times 10-29: polymorphism Time 30: mutation fixed -> substitution Time 40: new mutation: polymorphism

Reminder: substitution rates for neutral mutations Most neutral mutations are lost Only 1 out of 2N fix Most that are lost go quickly (< 20 generations for population sizes from 100 - 2000) Most replacement mutations are lost since deleterious: rate of loss is faster than neutral

Data in favor of neutrality Substitutions in DNA appear to be clock-like Figure 6.21 Why are these observations evidence for neutrality? it’s a bit subtle. We think silent sites don’t matter for selection – so they must be neutral. If mutation rate is relatively constant, then a steady rate. fine. But, we expect replacement substitutions to matter. Yet, we find a steady rate – suggesting that the substitutions are being driven by selection.

Drift model pseudocode Population with 2N – 1 copies of allele A, 1 of allele a For each generation, draw from prior generation alleles. -> generate a random number. If less than f(A), new allele = A. Otherwise, allele = a. -> repeat until 2N alleles drawn Check to see outcome of drift ->If a is lost, start over. ->If a has fixed, note the number of years ->Otherwise, next year with the new allele frequencies Repeat 100x per population size Test populations of 100, 500, 1000, 1500, and 2000

Times to fix for neutral alleles (Only 1/2N fix: how long do they take Estimated formula: fixation time = 4.07 * N – 57 Theoretical formula: fixation time = 4N

Puzzle for neutrality Rates of substitution are clock-like per year, not per generation. Years Substitutions Actual pattern rabbits elephants Years Substitutions Expected pattern rabbits elephants However, there are some strange patterns that don’t quite fit with neutrality. From mutation data, we expect a correlation between substitution rate and generation time. We don’t see one. This means that something more complex is going on.

Revised theory: the nearly – neutral theory Figure 6.22 So, a compromise theory was born. Nearly neutral. Idea is that generation time and population size balance out. Longer generation time – slower mutation rate.

Can we distinguish selection from drift using sequence data? Compare two species: infer where substitutions have occurred. Silent site substitutions should be neutral (dS) Non-synonymous substitutions are expected to be deleterious (usually) (dN) so, expect < 1 Translation: rate of non-synonymous (dN) is less than the rate of synonymous substitutions (dS) So, does this theory match the data? Take a look at two more ways to look at this. first, we can look at lots of genes. We expect in any gene to find more silent substitions than replacements.

and inferences about selection < 1: replacements are deleterious = 1: replacements are neutral But, sometimes we find more replacements. How can this be? > 1: replacements are beneficial

What happens to fixation time with selection? Model pseudocode Population with 2N – 1 copies of allele A, 1 of allele a WA = 1 + s; Wa = 1 For each generation, draw from prior generation alleles. -> generate a random number. If greater than f(A), new alleel = a. Otherwise, test fitness: if random < WA, new allele = A. -> repeat until 2N alleles drawn Check to see outcome of drift ->If a is lost, start over. ->If a has fixed, note the number of years ->Otherwise, next year with the new allele frequencies Repeat 100x per fitness Test populations of 100

Time to fix favourable allele

Time to fix: neutral vs. favourable Simulation results: black – neutral mutations; red – favourable mutations

Time to fixation: drift is slow Neutral: New mutations per generation: 2Neu Probability of fixing a new mutation: 1 / 2Ne Fixations per generation: = 2Neu * 1 / 2Ne = u Time to fix: 4Ne Favored by selection New mutations per generation: 2Neu (but how many favourable??) Favored mutation probability of fixing: 2|s| Fixations per generation: 2Neu * 2|s| * prob. favourable Time to fix: 2 ln (2Ne) / |s| 2 ln (2Ne) / |s| << 4Ne Key: drift is slow. Shorter time to fixation Derivations of these results are tough! See Kimura (1962) and Kimutra and Ohta (1969).

Time to fixation: favourable and neutral

dN / dS data: BRCA1 > 1 < 1 Figure 6.21

Molecular evidence of selection II: McDonald-Kreitman Test is very conservative: many selective events may be missed. Example: immunoglobins. = 0.37 overall We suspect selection favoring new combinations at key sites. Antigen recognition sites: > 3.0

Evidence of selection II: McDonald-Kreitman test

McDonald-Kreitman test III If evolution of protein is neutral, the percentage of mutations that alter amino acids should be the same along any branch If all mutations are neutral, all should have the same probability of persisting So: dN / dS among polymorphisms should be the same as within fixed differences

McDonald-Kreitman logic Silent sites - always neutral - fix slowly - contribute to polymorphism Replacement sites mainly unfavourable if neutral, fix at same rate as silent and contribute to polymorphism proportion of replacement mutations that are neutral determines dN / dS for polymorphism if favourable, fix quickly and do not contribute to polymorphism: higher dN / dS for fixed differences, lower rate for polymorphism

Time to fixation: favourable and neutral

Polymorphism and fixation Neutral Deleterious Silent Replacement 1 / 2N neutral mutations fix

Polymorphism and fixation Neutral Deleterious Favourable Silent Replacement 1 / 2N neutral mutations fix - slow 2|s| fix -fast Neutral Favourable

dN / dS for neutral and favourable Polymorphism dN dN dS dS Fixation dN dN dS dS = < poly fixed poly fixed

McDonald-Kreitman hypotheses H0: All mutations are neutral. Then, dN / dS for polymorphic sites should equal dN / dS for fixed differences H1: replacements are favoured. Favoured mutations fix rapidly, so dN / dS for polymorphic < dN / dS fixed

Example of MK test: ADH in Drosophilia Compare sequences of D. simulans and D. yakuba for ADH (alcohol dehydrogenase) Fixed differences Polymorphic sites Replacement 7 2 Silent 17 42 % fixed 7 / 24 = 29% 2 / 44 = 5% Significance? Use χ2 test for independence

Evidence of selection III: selective sweeps Imagine a new mutation that is strongly favored (e.g. insecticide resistance in mosquitoes)

Detecting selection using linkage: G6PD in humans Natural history: Located on X chromosome encodes glucose-6-phosphate dehydrogenase Red blood cells lack mitochondria Glycolysis only NADPH only via pentose-phosphate shunt –requires G6PD NADPH needed for glutathione, which protects against oxidation No mt: : no citric acid cycle, no electron transport chain

G6PD and malaria Malaria (Plasmodium falciparum) infects red blood cells Has limited G6PD function typically (but can produce the enzyme) Uses NADPH from red blood cell In G6PD deficient individuals?

G6PD mutants Different mutants result in different levels of enzymatic activity Severe mutants result in destruction of red blood cells and anemia Most common mutant: G6PD-202A Usually mild effects: may increase risk of miscarriage Prediction: G6PD and malaria?

Frequency of G6PD deficiency

Has G6PD-202A been selected? 14 markers up to 413,000 bp from G6PD LD? Long distance LD implies strong, recent selection

Has G6PD-202A been selected? Fig 7.14 Linkage disquilibrium kb from core region

Alternative hypothesis: drift caused linkage disequilibrium Three possibilities: a new allele will have high LD but rare if common, means old – so low LD or, disappears So, if high frequency, high LD – selection only. G6PD-202A Allele frequency Figure 7.14b

Detecting selection II: CCR532

Detecting selection II: CCR5Δ32 Stephens (1998) found strong disequilibrium between CCR5-Δ32 and nearby markers Implies recent origin (< 2000 years): recombination breaks down linkage Implies selected

Detecting selection II: CCR5Δ32 But: new data – November 2005. Better map:

Detecting selection: summary Several approaches to detecting selection dN / dS McDonald-Kreitman test using hitchhiking Challenges of each method?

Other uses of molecular data: the coalescent Any two alleles in a population share a common ancestor in the last generation 1 / 2Ne Therefore, going backwards in time, the expected time to find the common ancestor is 1 / (1 / 2Ne) = 2Ne

Coalescent II

Coalescent and sequences Imagine that you have two sequences at a locus. They shared a common ancestor 2Ne generations ago. They accumulate mutations at rate u per generation per basepair. 2Ne generations / lineage * 2 lineages * u = 4Neu differences per basepair between the two sequences.

Coalescent example We sequence 1000 base pairs from two sequences, and find 16 base pair differences, how large is the population/ Assume u = 2 x 10-8. 4Neu * 1000 = 16; 8 x 10-5 * Ne = 16; Ne * 10-5 = 2; Ne = 200,000

Neutral theory as a null model

Additional readings Eyre-Walker (2006) The genomic rate of adaptive evolution. Trends in Ecology and Evolution 29:569-575. (Well-written review) Gillespie (2004). Population genetics: a concise guide. John Hopkins: Baltimore, MD. (Very short, clear, but dense!) Graur and Li (2000) Fundamentals of molecular evolution. Sinauer: Sunderland, MA. (Very clear) Kimura (1962) On the probability of fixation of mutant genes in populations. Genetics 47:713-719. (If you really want the derivation) Kimura and Ohta (1969) The average number of generations until fixation of a mutant gene in a finite population. Genetics 61:763-771. (If you really want the derivation) Sabeti et al (2006) The case for selection at CCR5-32. PLoS Biology 3:1963-1969. Questions: 1. Explain why clock-like rates of substitutions per year did not fit with the neutral theory. See posted molecular evolution practice questions: highly recommended!