Lecture 3: population genetics II: selection

Lecture 3: population genetics II: selection
Genome evolution Lecture 3: population genetics II: selection

Population genetics Drift: The process by which allele frequencies are changing through generations Mutation: The process by which new alleles are being introduced Recombination: the process by which multi-allelic genomes are mixed Selection: the effect of fitness on the dynamics of allele drift Epistasis: the drift effects of fitness dependencies among different alleles “Organismal” effects: Ecology, Geography, Behavior

Wright-Fischer model for genetic drift
individuals ∞ gametes N individuals ∞ gametes We follow the frequency of an allele in the population, until fixation (f=2N) or loss (f=0) We can model the frequency as a Markov process on a variable X (the number of A alleles) with transition probabilities: Sampling j alleles from a population 2N population with i alleles. In larger population the frequency would change more slowly (the variance of the binomial variable is pq/2N – so sampling wouldn’t change that much) Loss 1 2N-1 2N Fixation

The Moran model Instead of working with discrete generation, we replace at most one individual at each time step A A A Replace by sampling from the current population a a X A A A a a a A A A A A A We assume time steps are small, what kind of mathematical models is describing the process?

Continuous time Markov processes
Conditions on transitions: Markov Kolmogorov Theorem: exists (may be infinite) exists and finite

Rates and transition probabilities
The process’s rate matrix: Transitions differential equations (backward form):

The Moran model Replace by sampling from the current X population
Assume the rate of replacement for each individual is 1, We derive a model similar to Wright-Fischer, but in continuous time. A process on a random variable counting the number of allele A: Loss 1 i-1 i i+1 2N-1 2N Fixation “Birth” Rates: “Death”

Fixation probability Loss Fixation “Birth” Rates: “Death”
1 i-1 i i+1 2N-1 2N Fixation “Birth” Rates: “Death” In fact, in the limit, the Moran model converge to the Wright-Fischer model, for example: Theorem: When going backward in time, the Moran model generate the same distribution of genealogy as Wright-Fischer, only that the time is twice as fast Theorem: In the Moran model, the probability that A becomes fixed when there are initially I copies is i/2N Proof: like the proof for the Wright-Fischer model. The expected X value is unchanged since the probability of births and deaths is the same

Fixation time Expected fixation time assuming fixation
Theorem: In the Moran model, let p = i / 2N, then: Proof: not here..

Selection Fitness: the relative reproductive success of an individual (or genome) Fitness is only defined with respect to the current population. Fitness is unlikely to remain constant in all conditions and environments Sampling probability is multiplied by a selection factor 1+s Mutations can change fitness A deleterious mutation decrease fitness. It would therefore be selected against. This process is called negative or purifying selection. A advantageous or beneficial mutation increase fitness. It would therefore be subject to positive selection. A neutral mutation is one that do not change the fitness.

Neutrality Don’t let it confuse you… Background Directed Purifying
Adaptive Negative Positive Forces that drives genomic conservation Forces that drives genome change

Adaptive evolution in a tumor model
Selection Human fibroblasts + telomerase Passaged in the lab for many months Spontaneously increasing growth rate V. Rotter

Selection in haploids: infinite populations, discrete generations
This is a common situation: Bacteria gaining antibiotic residence Yeast evolving to adapt to a new environment Tumors cells taking over a tissue Allele Frequency Relative fitness Fitness represent the relative growth rate of the strain with the allele A It is common to use s as w=1+s, defining the selection coefficient Gamete after selection Generation t: Ratio as a function of time:

Selection in haploid populations: dynamics
Growth = 1.5 We can model it in continuous time: Growth = 1.2 In infinite population, we can just consider the ratios:

Computing w Example (Hartl Dykhuizen 81):
E.Coli with two gnd alleles. One allele is beneficial for growth on Gluconate. A population of E.coli was tracked for 35 generations, evolving on two mediums, the observed frequencies were: Gluconate:  0.898 Ribose:  0.587 For Gluconate: log(0.898/0.102) - log(0.455/0.545) = 35logw log(w) = 0.292, w=1.0696 Compare to w=0.999 in Ribose.

Fixation probability: selection in the Moran model
When population is finite, we should consider the effect of selection more carefully Loss 1 i-1 i i+1 2N-1 2N Fixation The models assume the fitness is the probability of the offspring to be viable. If it is not, then there will not be any replacement “Birth” Rates: “Death” Theorem: In the Moran model, with selection s>0

Theorem: In the Moran model, with selection s>0 Note: Note: Variant (Kimura 62): The probability of fixation in the Wright-Fischer model with selection is: Reminder: we should be using the effective population size Ne

Theorem: In the Moran model, with selection s>0 Proof: First define: Hitting time Fixation given initial i “A”s The rates of births is bi and of deaths is di, so the probability a birth occur before a death is bi/(bi+di). Therefore:

Fixation probabilities and population size

Selection and fixation
Recall that the fixation time for a mutation (assuming fixation occurred) is equal the coalescent time: Theorem: In the Moran model: Theorem (Kimura): (As said: twice slower) Fixation process: 1.Allele is rare – Number of A’s are a superciritcal branching process” Selection 2. Alelle 0<<p<<1 – Logistic differential equation – generally deterministic 3. Alelle close to fixation – Number of a’s are a subcritical branching process Drift

Selection in diploids Assume: Genotype Fitness Frequency
(Hardy Weinberg!) There are different alternative for interaction between alleles: a is completely dominant: one a is enough – f(Aa) = f(aa) a is Complete recessive: f(Aa) = f(AA) codominance: f(AA)=1, f(Aa)=1+s, f(aa)=1+2s overdominance: f(Aa) > f(AA),f(aa) The simple (linear) cases are no qualitatively different from the haploid scenario

Mutation-Selection balance
When an allele is weakly deleterious, mutations can play a major role in driving allele frequencies Genotype New allele frequency, without mutation Fitness Frequency (HW) New allele frequency, assuming mutation A a ignore (q<<1) What is the equilibrium frequency of the deleterious allele?

Mutation-Selection balance: Huntington disease
a neurological genetic disease appearing after age 35 Resulting from a dominant mutation – how does this disease survive in the human population? Although it may be fatal, the fitness is not very low due to the late age of onset (estimated w12=0.81) Human population: 70 per million (Europe) to 1 per million (Africa) h>0, and we can estimate the mutation rate at the Huntington locus, as hsq’ = 10-6 (1-0.81) = 1.9x107 to 70x10-6 (1-0.81) = 1.3x10-6

Mutation-Selection balance: Haldane-Muller
The average fitness of the population, given recurrent mutations in rate m at a locus with negative fitness s. Assume perfect recessivity (h=0): Assuming partial dominance (h>0) The Haldane-Muller principle: the effect of mutation on the average population fitness depends only on the mutation rate, not on the fitness of the alleles!!

Overdominance A SNP affecting the beta-globin gene make the encoded protein defected. The resulted red blood cells are curved and elongated, and are removed from the circulation Homozygous for the mutation will usually die from anemia without intensive care Heterozygous individual will have mild anemia, but will deal better with the malaria parasite Plasmodium fliciparum (maybe because infeceted red cells become sickled) wiki (historical) Malaria distribution Sickle-cell anemia

Other types of selection
Different fitness for different individuals. e.g., male vs. female For example male genes that take up female resources in mammals This was suggested to lead to the phenomenon of imprinting where cells are expressing only the maternal or paternal allele Imprinted genes are much like haploids

Other types of selection
Frequency-, Density-dependent selection: when the fitness depend on the frequency of the allele or the population size. Fecundity selection: different reproductive potential for mating pairs. Effects of heterogeneous environment Effects that apply directly to the haplotype: gametic selection/meiotic drive (e.g., killing your homologous chromosome reproductive potential) Sexual selection: male advertising the reproductive potential, or confronting other males Kin selection: (“origin of altruism”)

Recombination and selection

Linkage and selection Linkage interfere with the purging of deleterious mutations and reduce the efficiency of positive selection! Beneficial Beneficial Beneficial Weakly deleterious Selective sweep or Hitchhiking effect or genetic draft (Gillespie) Hill-Robertson effect

Linkage and selection The variance in allele frequency is used to define the effective population size Simplistically, assume a neutral locus is evolving such that a selective sweep is affecting a fully linked locus at rate d. A sweep will fixate the allele with probability p, and we further assume that the sweep happens instantly: This is very rough, but it demonstrates the basic intuition here: sweeps reduce the effective selection in a way that can be quantified through reduction in the effective population size. C – the average frequency of the neutral allele after the sweep

Infinite alleles model
Adding mutations with probability m, the coalescent process is extended by killing lineages (time is speeded up by a 2N factor): Coalescent: mutation: Probability model (Hoppe’s Urn): Selecting from an urn with one black ball of mass q and more balls with other colors and mass 1. Each time the black ball is selected, a new ball with a new color is added to the urn. If another color is selected, the selected ball and another ball from the same color are returned to the urn. Theorem: Hoppe’s Urn and the Coalescent with killing are equivalent Back in time

Infinite sites model In the infinite sites model, mutation occur at distinct sites. It is more adequate for the current datasets that include vast DNA sequences Theorem: Let u be the mutation rate for a locus under consideration, and set q=4Nu. Under the infinite sites model, the expected number of segregating sites is: Proof: Let tj be the amount of time in the coalescent during which there are j lineages. We showed earlier that tj has approximately an exponential distribution with mean 2/(j(j-1)). The total amount of time in the tree for a sample size n is: Mutations occur at rate 2Nu:

Infinite sites model Theorem: q=4Nu. Under the infinite sites model, the number of segregating sites Sn has Proof: Let sj be the number of segregating sites created when there were j lineages. While there are j lineages, we may get mutations at rate 2Nuj, and coalescence at rate j(j-1)/2. Mutations occur before coalescence with probability: k successes: It’s a shifted geometric distribution:

Watterson’s estimator, using the infinite site model
We can estimate q=4Nu for Sn Theorem: For the watterson’s estimator It is possible to compute other statistics using the infinite sites model, and compare them to the neutral expectation. This can be very generally done today using sampling: Generate a large number of random genealogies (using the model we presented) Compute the distribution of your statistics on this random case Compare it to the value you observe in your population if you find a singifnicant bias, then the model is wrong, possibly the locus is not neutral

Lecture 3: population genetics II: selection

Similar presentations

Presentation on theme: "Lecture 3: population genetics II: selection"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 3: population genetics II: selection

Similar presentations

Presentation on theme: "Lecture 3: population genetics II: selection"— Presentation transcript:

Similar presentations

About project

Feedback