Genome Evolution © Amos Tanay, The Weizmann Institute Genome evolution Lecture 2: population genetics I: models and drift.

Slides:



Advertisements
Similar presentations
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Advertisements

Alleles = A, a Genotypes = AA, Aa, aa
Sampling distributions of alleles under models of neutral evolution.
MIGRATION  Movement of individuals from one subpopulation to another followed by random mating.  Movement of gametes from one subpopulation to another.
Modeling Populations forces that act on allelic frequencies.
Chapter 17 Population Genetics and Evolution, part 2 Jones and Bartlett Publishers © 2005.
Essentials of Biology Sylvia S. Mader
Population Genetics I. Evolution: process of change in allele
Forward Genealogical Simulations Assumptions:1) Fixed population size 2) Fixed mating time Step #1:The mating process: For a fixed population size N, there.
14 Molecular Evolution and Population Genetics
Lecture 3: population genetics II: selection
13-1 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Evolution The process of change in the genetic makeup.
2: Population genetics break.
Population Genetics: Populations change in genetic characteristics over time Ways to measure change: Allele frequency change (B and b) Genotype frequency.
Population Genetics What is population genetics?
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
Genome Evolution © Amos Tanay, The Weizmann Institute Genome evolution Lecture 2: population genetics I: drift and mutation.
Modeling evolutionary genetics Jason Wolf Department of ecology and evolutionary biology University of Tennessee.
Genome evolution: a sequence-centric approach Lecture 8-9: Concepts in population genetics.
Chapter 23 Population Genetics © John Wiley & Sons, Inc.
Genetic drift & Natural Selection
Population Genetics Reconciling Darwin & Mendel. Darwin Darwin’s main idea (evolution), was accepted But not the mechanism (natural selection) –Scientists.
Population Genetics Learning Objectives
Genome Evolution © Amos Tanay, The Weizmann Institute Genome evolution Lecture 2: population genetics I: models and drift.
MIGRATION  Movement of individuals from one subpopulation to another followed by random mating.  Movement of gametes from one subpopulation to another.
Genetic Drift Random change in allele frequency –Just by chance or chance events (migrations, natural disasters, etc) Most effect on smaller populations.
Population Genetics is the study of the genetic
14 Population Genetics and Evolution. Population Genetics Population genetics involves the application of genetic principles to entire populations of.
DEFINITIONS: ● POPULATION: a localized group of individuals belonging to the same species ● SPECIES: a group of populations whose individuals have the.
Evolution of Populations. Variation and Gene Pools  Genetic variation is studied in populations. A population is a group of individuals of the same species.
Lecture 3: population genetics I: mutation and recombination
1 Random Genetic Drift 2 Conditions for maintaining Hardy-Weinberg equilibrium: 1. random mating 2. no migration 3. no mutation 4. no selection 5.infinite.
Evolution of Populations Chapter 16. Gene Pool The combine genetic information of a particular population Contains 2 or more Alleles for each inheritable.
Course outline HWE: What happens when Hardy- Weinberg assumptions are met Inheritance: Multiple alleles in a population; Transmission of alleles in a family.
Deviations from HWE I. Mutation II. Migration III. Non-Random Mating IV. Genetic Drift A. Sampling Error.
CHANGE IN POPULATIONS AND COMMUNITIES. Important Terms Communities are made up of populations of different species of organisms that live and potentially.
Copyright © 2008 Pearson Education Inc., publishing as Pearson Benjamin Cummings Chapter 23 The Evolution of Populations.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Coalescent Models for Genetic Demography
1 Population Genetics Definitions of Important Terms Population: group of individuals of one species, living in a prescribed geographical area Subpopulation:
Evolution and Population GENETICS
Population Genetics The Study of how Populations change over time.
Population and Evolutionary Genetics
Lecture 16 Tuesday, April 9, 2013 BiSc 001 Spring 2013 Guest Lecture Dr. Jihye Park.
By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.
Genome Evolution. Amos Tanay 2010 Genome evolution Lecture 4: population genetics III: selection.
The plant of the day Pinus longaevaPinus aristata.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Population Genetics.
Genome Evolution. Amos Tanay 2010 Genome evolution Lecture 4: population genetics III: selection.
Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of.
Evolution of populations Ch 21. I. Background  Individuals do not adapt or evolve  Populations adapt and evolve  Microevolution = change in allele.
Evolution of Populations. Individual organisms do not evolve. This is a misconception. While natural selection acts on individuals, evolution is only.
Evolution of Populations
8 and 11 April, 2005 Chapter 17 Population Genetics Genes in natural populations.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Lecture 6 Genetic drift & Mutation Sonja Kujala
11.1 Genetic Variation Within Population KEY CONCEPT A population shares a common gene pool.
Evolution of Populations Population- group of individuals of the same species that live in the same area and interbreed. Gene Pool- populations genetic.
Topics How to track evolution – allele frequencies
Gil McVean Department of Statistics
MIGRATION Movement of individuals from one subpopulation to another followed by random mating. Movement of gametes from one subpopulation to another followed.
Lecture 7 Effective population size Bottlenecks Coalescence theory basics Sonja Kujala
The Evolution of Populations: Population Genetics
Deviations from HWE I. Mutation II. Migration III. Non-Random Mating
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
MIGRATION Movement of individuals from one subpopulation to another followed by random mating. Movement of gametes from one subpopulation to another followed.
The Evolution of Populations
The coalescent with recombination (Chapter 5, Part 1)
Lecture: Natural Selection and Genetic Drift and Genetic Equilibrium
Genetic drift in finite populations
Presentation transcript:

Genome Evolution © Amos Tanay, The Weizmann Institute Genome evolution Lecture 2: population genetics I: models and drift

Genome Evolution © Amos Tanay, The Weizmann Institute Reading Course slides on the web Hartl and Clark, Principle of population genetics Rick Durret – Probability models for DNA sequence evolution Gillespie – Pop. Genetics, A concise guide Hein et al. - Gene Genealogies, Variation and Evolution: A Primer in Coalescent Theory Wakeley: Coalescent Theory, an introduction Graur and Li: Intro to molecular evolution Classics: Kimura, Dobzhanski..

Genome Evolution © Amos Tanay, The Weizmann Institute Studying Populations Models A set of individuals, genomes Ancestry relations or hierarchies Experiments Fields studies, diversity/genotyping Experimental evolution Åland Islands, Glanville fritillary populationGlanville fritillary mtDNA human migration patterns

Genome Evolution © Amos Tanay, The Weizmann Institute Human population Growth: Year-10, Estimate (Millions)

Genome Evolution © Amos Tanay, The Weizmann Institute The Data: the hapmap project 1 million SNPs (single nucleotide polymorphisms) 4 populations:30 trios (parents/child) from Nigeria (Yoruba - YRI) 30 trios (parents/child) from Utah (CEU) 45 Han chinease (Beijing - CHB) 44 Japanease (Tokyo - JPT) Haplotyping – each SNP/individual. No just determining heterozygosity/homozygosity – haplotyping completely resolve the genotypes (phasing)

Genome Evolution © Amos Tanay, The Weizmann Institute The Data: the hapmap project Because of linkage, the partial SNP Map largely determine all other SNPs!! The idea is that a group of “tag SNPs” Can be used for representing all genetic Variation in the human population. This is extremely important in association studies that look for the genetic cause of disease.

Genome Evolution © Amos Tanay, The Weizmann Institute Correlation on SNPs between populations

Genome Evolution © Amos Tanay, The Weizmann Institute Modeling…

Genome Evolution © Amos Tanay, The Weizmann Institute The Hardy-Weinberg Model Diploid organisms Two copies of each allele/gene/base Homozygous / Heterozygous Sexual Reproduction Mating haplotypes Large population, No migration Fixed size, closed system Non-overlapping generations Synchronous process Not as bad as it may look like Random mating New generation is being selected from the existing haplotypes with replacement No mutations, no selection (will add these later)

Genome Evolution © Amos Tanay, The Weizmann Institute The Hardy-Weinberg Model Hardy-Weinberg equilibrium: AA Aa aa aA AA Aa aa aA Random mating Non overlapping generations With the model assumption, equilibrium is reached within one generation Non-overlapping generations Synchronous process Not as bad as it may look like Random mating New generation is being selected from the existing haplotypes with replacement No mutations, no selection (will add these later)

Genome Evolution © Amos Tanay, The Weizmann Institute Frequency estimates We will be dealing with estimation of allele frequencies. To remind you, when sampling n times from a population with allele of frequency p, we get an estimate that is distributed as a binomial variable. This can be further approximated using a normal distribution: When estimating the frequency out of the number of successes we therefore have an error that looks like:

Genome Evolution © Amos Tanay, The Weizmann Institute Testing Hardy-Weinberg(HW) using chi-square statistics HW is over simplifying everything, but can be used as a baseline to test if interesting evolution is going on for some allele Classical example is the blood group genotypes M/N (Sanger 1975) (this genotype determines the expression of a polysaccharide on red blood cell surfaces – so they were quantifiable before the genomic era..): MM MN NN Observed Expected (HW) Chi-square significance can be computed from the chi-square distribution with df degrees of freedom. Here: df = #classes - #parameters – 1 = 3(MN/NN/MM) – 1 (p) – 1 = 1

Genome Evolution © Amos Tanay, The Weizmann Institute Modeling population: the Wright-Fisher model Generation t Generation t N 1234 ….. Haploid model NfNf NmNm NfNf NmNm ….. Diploid model

Genome Evolution © Amos Tanay, The Weizmann Institute Wright-Fisher model for genetic drift We follow the frequency of an allele in the population, until fixation (f=2N) or loss (f=0) We can model the frequency as a Markov process on a variable X (the number of A alleles) with transition probabilities: Sampling j alleles from a population 2N population with i alleles. In larger population the frequency would change more slowly (the variance of the binomial variable is pq/2N – so sampling wouldn’t change that much) 0 2N 1 2N-1 Loss Fixation

Genome Evolution © Amos Tanay, The Weizmann Institute Coalescent and fixation

Genome Evolution © Amos Tanay, The Weizmann Institute Drift and fixation probability Theorem (fixation in drift): In the Wright-Fisher model, the probability of fixation in the A’s allele state, given a population of 2N alleles out of which i are A, is: Proof: The mean of the binomial sample in the n’th step is np: Which means that the expected number of A’s is constant in time. Intuitively: Since 0 and 2N are absorbing states, given sufficient time, the wright-Fisher process will converge to either 0 or 2N. Define: More formally:

Genome Evolution © Amos Tanay, The Weizmann Institute Figure 7.4 Drift Experiments with drifting fly populations: 107 Drosophila melanogaster populations. Each consisted originally of 16 brown eys (bw) heterozygotes. At each generation, 8 males and 8 females were selected at random from the progenies of the previous generation. The bars shows the distribution of allele frequencies in the 107 populations Number of bw 75 alleles Generation Number of populations

Genome Evolution © Amos Tanay, The Weizmann Institute The geometric distribution: reminder Rolling a dice, and recording the time until first appearance of k (waiting time) Lack of memory: Moments: “Intersection”:

Genome Evolution © Amos Tanay, The Weizmann Institute The exponential distribution: reminder The limit of the geometric distribution when the time step is going to 0: Density: Moments: “Intersection”: t tt P=at Memory less! Probability: M=2 M=4

Genome Evolution © Amos Tanay, The Weizmann Institute Coalescence Coalescent at time -1? Coalescent at time -T? No coalescence for k samples? Distribution of time from k to k-1:

Genome Evolution © Amos Tanay, The Weizmann Institute The continuous time coalescent When sampling K new individuals, the chances of peaking up the same parent twice is roughly: Present Past Theorem: The amount of time during which there are k lineages, t k has approximately an exponential distribution with mean 2N * (2/(k(k-1))) When looking at k individuals, we can trace their coalescent backwards and ask when did they had k-1,k-2, or one common ancestor. Proof: the probability of not merging k lineages in n generations is: We scale the time by N, so it is like an exponential : This is correct for any k, so going backward from present time, we can estimate the time to coalescent at each step The expected value is

Genome Evolution © Amos Tanay, The Weizmann Institute The coalescent The expected time to the common ancestor of n individuals: Present Past Theorem: The probability that the most recent common ancestor of a sample of size n is the same as that of the population converges to (n- 1)/(n+1) as the population size increase. When looking at n individuals, we can trace their coalescent backwards and ask when did they had n-1,n-2, or one common ancestor. 4N is the magic number

Genome Evolution © Amos Tanay, The Weizmann Institute Diffusion approximation and Kimura’s solution Fisher, and then Kimura approximated the drift process using a diffusion equation (heat equation): The density of population with frequency x..x+dx at time t The flux of probability at time t and frequency x The change in the density equals the differences between the fluxes J(x,t) and J(x+dx,t), taking dx to the limit we have: The if M(x) is the mean change in allele frequency when the frequency is x, and V(x) is the variance of that change, then the probability flux equals: Heat diffusion Fokker-Planck Kolmogorov Forward eq.

Genome Evolution © Amos Tanay, The Weizmann Institute Diffusion approximation and Kimura’s solution Fisher, and then Kimura approximated the drift process using a diffusion equation (heat equation). We start with working on the time step dy and frequency step dx The probability that the population have allele frequency x time t We limit changes from t to t+dt and x+-dx. The population can be on x at t+dt if: It was at x and stayed there: It was at x-dx and moved to x: It was at x+dx and moved to x: the probability that the frequency increased from x by dx, due to mutation/selection The probability of dx increase or decrease due to drift

Genome Evolution © Amos Tanay, The Weizmann Institute Diffusion approximation and Kimura’s solution Fisher, and then Kimura approximated the drift process using a diffusion equation (heat equation). We start with working on the time step dy and frequency step dx The probability that the population have allele frequency x time t the probability that the frequency increased from x by dx, due to mutation/selection The probability of dx increase or decrease due to drift For drift the variance is binomial: And we assume no selection: Still not easy to solve analytically…

Genome Evolution © Amos Tanay, The Weizmann Institute Changes in allele-frequencies, Fisher-Wright model After about 4N generations, just 10% of the cases are not fixed and the distribution becomes flat.

Genome Evolution © Amos Tanay, The Weizmann Institute Absorption time and Time to fixation According to Kimura’s solution, the mean time for allele fixation, assuming initial probability p and assuming it was not lost is: The mean time for allele loss is (the fixation time of the complement event):

Genome Evolution © Amos Tanay, The Weizmann Institute Effective population size 4N generations looks light a huge number (in a population of billions!) But in fact, the wright-Fisher model (like the hardy-weinberg model) is based on many non- realistic assumption, including random mating – any two individuals can mate The effective population size is defined as the size of an idealized population for which the predicted dynamics of changes in allele frequency are similar to the observed ones For each measurable statistics of population dynamics, a different effective population size can be computed For example, the expected variance in allele frequency is expressed as: But we can use the same formula to define the effective population size given the variance:

Genome Evolution © Amos Tanay, The Weizmann Institute Effective population size: changing populations So the effective population size is dominated by the size of the smallest bottleneck Bottlenecks can occur during migration, environmental stress, isolation Such effects greatly decrease heterozygosity (founder effect – for example Tay-Sachs in “ashkenazim”) Bottlenecks can accelerate fixation of neutral or even deleterious mutations as we shall see later. If the population is changing over time, the dynamics will be affect by the harmonic mean of the sizes: Human effective population size in the recent 2My is estimated around 10,000 (due to bottlenecks). (so when was our T 1 ?)

Genome Evolution © Amos Tanay, The Weizmann Institute Effective population size: unequal sex ratio, and sex chromosomes So if there are 10 times more females in the population, the effective population size is 4*x*10x/(11x)=4x, much less than the size of the population (11x). If there are more females than males, or there are fewer males participating in reproduction then the effective population size will be smaller: Any combination of alleles from a male and a female Another example is the X chromosome, which is contained in only one copy for males.

Genome Evolution © Amos Tanay, The Weizmann Institute Population genetics Drift: The process by which allele frequencies are changing through generations Mutation: The process by which new alleles are being introduced Recombination: the process by which multi-allelic genomes are mixed Selection: the effect of fitness on the dynamics of allele drift Epistasis: the drift effects of fitness dependencies among different alleles “Organismal” effects: Ecology, Geography, Behavior