Presentation on theme: ". Basic Principles of Population Genetics Lecture 4 This slide show follows closely Chapter 1 of Lang’s book. Prepared by Dan Geiger. Background Readings:"— Presentation transcript:
. Basic Principles of Population Genetics Lecture 4 This slide show follows closely Chapter 1 of Lang’s book. Prepared by Dan Geiger. Background Readings: Chapter 1, Mathematical and statistical Methods for Genetic Analysis, 1997, Kenneth Lang.
2 Founders’ allele frequency A 1 /A 2 B 1 /B 2 A’ 1 /A’ 2 B’ 1 /B’ 2 A” 1 /A” 2 B” 1 /B” 2 In order to write down the likelihood function of a data given a pedigree structure and a recombination value , one need to specify the probability of the possible genotypes of each founder. Assuming random mating we have, Pr(G 1,G 2 )=Pr( A 1 /A 2, B 1 /B 2 ) Pr( A’ 1 /A’ 2, B’ 1 /B’ 2 ) The likelihood function also consists of transmission matrices that depend on and penetrances matrices to be discussed later.
3 Hardy-Weinberg and Linkage Equilibriums The task at hand is to establish a theoretical basis for specifying the probability Pr(A 1 /A 2, B 1 /B 2 ) of a multilocus, from allele frequencies. We will derive under various assumptions the following two rules which are widely used in genetic analysis (Linkage & Association) and which ease computations a great deal. Of course, the assumptions are not satisfied for all genetic analyses. Hardy-Weinberg (HW) Equilibrium: Pr( A 1 /A 2 ) = P A1 · P A2, namely, the probability of an ordered genotype A 1 /A 2 is the product of the frequencies of the alleles constituting that genotype. Linkage Equilibrium: Pr( A 1 B 1 ) = P A1 · P B1, namely, the probability of a haplotype A 1,B 1 is the product of the frequencies of the alleles constituting that haplotype. A1A1 A2A2 B1B1 B2B2 These rules imply: Pr( A 1 /A 2, B 1 /B 2 )=P A1 · P A2 · P B1 · P B2
4 A simple setup to study HW equilibrium Consider a bi-allelic locus A with alleles A 1, A 2. Let u, v, and w be the frequencies of unordered genotypes A 1 /A 1, A 1 /A 2, A 2 /A 2. Clearly, u+v+w=1. But, the Hardy-Weinberg equilibrium states that also u = p 1 2 v = 2 p 1 p 2 (The factor 2 because A 1 /A 2 genotypes are not ordered.) w = p ( p 1 +p 2 ) 2 =1 Clearly these relations do not hold for arbitrary frequencies u, v, w ; only for those values in the image of this polynomial mapping. How are these frequencies related to allele frequencies p 1 and p 2 of A 1 and A 2,respectively ? Answer: p 1 = u + ½v and p 2 = ½v + w
5 Assumptions made to Justify HW 1.Infinite population size 2.Discrete generations 3.Random mating 4.No selection 5.No migration 6.No mutation 7.Equal initial genotype frequencies in the two sexes HW equilibrium can be shown to hold under more relaxed sets of assumptions as well. These assumption are clearly not universal.
6 What happens after one generation ? Mating Type- Unordered genotype Nature of Offspring and segregation ratios Frequency of mates A 1 /A 1 x A 1 /A 1 A 1 /A 1 u2u2 A 1 /A 1 x A 1 /A 2 ½ A 1 /A 1 + ½ A 1 /A 2 2uv A 1 /A 1 x A 2 /A 2 A 1 /A 2 2uw A 1 /A 2 x A 1 /A 2 ¼ A 1 /A 1 + ½ A 1 /A 2 + ¼ A 2 /A 2 v2v2 A 1 /A 2 x A 2 /A 2 ½ A 1 /A 2 + ½ A 2 /A 2 2vw A 2 /A 2 x A 2 /A 2 A 2 /A 2 w2w2 ( u+v+w ) 2 =1 Frequency of A 1 /A 1 after one generation : u’=u 2 + ½ (2uv)+ ¼v 2 = (u+ ½v) 2 = p 1 2
7 After one generation … Frequency of A 1 /A 1 : u’=u 2 +uv+ ¼v 2 = (u+ ½v) 2 = p 1 2 Frequency of A 1 /A 2 : v’= Frequency of A 2 /A 2 : w’=¼v 2 + vw + w 2 = (½v+w) 2 = p 2 2 Hardy-Weinberg seems to be established after one generation, but So, after one generation the genotype frequencies u,v,w change to u’,v’,w’ as follows (using the previous table): u’,v’,w’ are frequencies for the second generation while p 1 and p 2 are defined as the allele frequencies of the first generation. Are these also the allele frequencies of the second generation ? uv+2uw + ½v 2 + vw = 2(u+½v)(½v+w) = 2p 1 p 2 Yes ! Because p’ 1 = u’+ ½v’ = p p 1 p 2 = p 1 and similarly p’ 2 = p 2.
8 After yet another generation … Frequency of A 1 /A 1 : u”=(u’+ ½v’) 2 = (p p 1 p 2 ) 2 = p 1 2 Frequency of A 2 /A 2 : w”=(½v’+w’) 2 = (p p 1 p 2 ) 2 = p 2 2 Frequency of A 1 /A 2 : v”= 2(u’+ ½v’)(½v’+w’) = 2(p p 1 p 2 )(p p 1 p 2 )= 2p 1 p 2 Hardy-Weinberg is indeed established after one generation; allele and genotype frequencies do not change under the assumptions we have made. Can you trace where each assumption is used ? Have we reached equilibrium ? Let’s look at one more generation and see that genotype frequencies are now fixed.
9 Use of Assumptions in the derivation 1.Infinite population size 2.Discrete generations (mating amongst i th generation members only) 3.Random mating 4.No selection 5.No migration 6.No mutation 7.Equal initial genotype frequencies in the two sexes Mating Type- Unordered genotype Nature of Offspring and segregation ratios Frequency of mates A 1 /A 1 x A 1 /A 2 ½ A 1 /A 1 + ½ A 1 /A 2 2uv Segregation ratios below assume 1,2,3,7 Frequency formula of A 1 /A 1 after one generation : u 2 + ½ (2uv)+ ¼v 2 assume 4,5,6.
10 An alternative justification Previously, we started with arbitrary genotype frequencies u, v, w and showed that they are modified after one generation to satisfy HW equilibrium. Now we start with arbitrary allele frequencies p 1 and p 2. Random mating is equivalent to random pairing of alleles; each person contributes one allele with the prescribed frequencies. The frequency p’ 1 of A 1 in this new generation is p ½ (2p 1 p 2 )= p’ 1 and the frequency of A 2 in this new generation is p ½ (2p 1 p 2 )= p’ 2. So after one generation allele frequency is fixed and satisfies the HW equilibrium. Exercise: Generalize the argument to k-allelic loci. So the frequency of A 1 /A 1 in the new generation is p 1 2, that of A 1 /A 2 is 2 p 1 p 2, and that of A 2 /A 2 is p 2 2. Argument completed ?
11 HW equilibrium at X-linked loci Consider an allele at an X-linked locus. At generation n, let q n denote that allele’s frequency in females and r n denote that allele’s frequency in males. More explicitly, Questions: What is the frequency p n of the allele in the population ? Does p n converge and to which value p ? Does q n and r n converge to the same value ?
12 Argument Outline Let p = p 0 = 2/3 q 0 + 1/3 r 0. We will now show that both q n and r n converge quickly to p (but not in one generation as before). Having shown this claim, the female genotype frequency of A 1 /A 1 must be p 2, that of A 1 /A 2 is 2 p(1-p), and that of A 2 /A 2 is (1 - p) 2, satisfying HW equilibrium. For male, genotypes A 1 and A 2 have frequencies p and 1-p. Assuming equal number of males and females, we have p n = 2/3 q n + 1/3 r n for every n.
13 The recursion equations Because a male always gets his X chromosome from his mother, and his mother precedes him by one generation, r n = q n-1 (Eq. 1.1) Similarly, females get half their X-chromosomes from females and half from males, q n = ½ q n-1 + ½ r n-1 (Eq. 1.2) Eqs 1.1 and 1.2 imply: 2/3 q n +1/3 r n = 2/3 q n-1 + 1/3 r n-1 2/3(½ q n-1 + ½ r n-1 ) + 1/3 q n-1 = It follows that the allele frequency p n = 2/3 q n + 1/3 r n never changes and remains equal to p 0 = p. To see that q n converges to p, we need to relate the difference q n -p with the difference q n-1 -p.
14 The fixed point solution q n -p = q n - 3/2 p + ½ p = ½ q n-1 + ½ r n-1 - 3/2 (2/3 q n-1 + 1/3 r n-1 ) + ½ p = - ½ q n-1 + ½ p (just cancel terms) = - ½ (q n-1 - p) So in each step the difference diminishes by half and q n approaches p in a zigzag manner. Hence, r n = q n-1 also converges to p. What does this mean ? Continuing in this manner, q n -p= - ½ (q n-1 - p) = (- ½) 2 (q n-2 - p) = …= (- ½) n (q 0 - p) 0 Having shown this claim, the female genotype frequency of A 1 /A 1 must be p 2, that of A 1 /A 2 is 2 p(1-p), and that of A 2 /A 2 is (1 - p) 2, satisfying HW equilibrium. For male, genotypes A 1 and A 2 have frequencies p and 1-p. HW equilibrium is not reached in one generation but gets there fast (quite there in 5 generations).
15 Linkage equilibrium Let A i be allele at locus A with frequency p i Let B j be allele at locus B with frequency q j Denote the recombination between these loci by f and m for females and males, respectively. Let = ( f + m )/2. Linkage equilibrium means that Pr(A i B j ) = p i q j AiAi A’ i BjBj B’ j We use the same assumptions employed earlier to demonstrate linkage equilibrium, namely, to show that P n (A i B j ) converges to p i q j at a rate that is fastest when the recombination is the largest.
16 Convergence Proof P n (A i B j ) = ½ [gamete from female] + ½ [gamete from male] = ½ [ (1- f )P n-1 (A i B j ) + f p i q j ] + No recombination recombination = ½ [ (1- f )P n-1 (A i B j ) + f p i q j ] + ½ [ (1- m )P n-1 (A i B j ) + m p i q j ] = (1- )P n-1 (A i B j ) + p i q j So, P n (A i B j ) - p i q j = (1- ) [P n-1 (A i B j ) – p i q j ]= …= (1- ) n [P 0 (A i B j ) – p i q j ] Exercise: Repeat this analysis for three loci (Problem 7, with guidance, in Kenneth Lang’s book). ½ [gamete from male] In short, we have established,. For loci on different chromosomes, the deviation from linkage is halved each generation. For close loci with small , convergence is slow.
17 Ramifications for Association studies Many diseases are thought to been caused by a single random mutation that survived and propagated to offspring, generation after generation. Would we see association at random population samples? If the mutation happened many generations ago, no trace will be significant. Allele frequency will reach linkage equilibrium ! We need a combination of close markers and recentallele age of the disease. Association studies like that are also called linkage disequilibrium mapping or LD mapping in short. Marker Mutated locus Suppose there is a close marker:
18 Selection and Fitness Fitness of a genotype is the expected genetic contribution of that genotype to the next generation, or to how many offspring it contributes an allele. Let the fitness of the three genotypes of an autosomal bi-allelic locus be denoted by w A/A, w A/a and w a/a. If p n and q n are the allele frequencies of A and a, then the average fitness under HW equilibrium, is w A/A p n 2 + w A/a 2p n q n + w a/a q n 2. Conventions: Since only the ratios of fitness of various genotypes matter, namely, w A/A /w A/a and w a/a /w A/a, we arbitrarily set w A/a =1 and define w A/A = 1-r, w a/a = 1-s, where r 1 and s 1. Interpretation: When s=r=0, there is no selection. When r is negative A/A has advantage over A/a. Similarly with negative s. When r is positive (must be fraction), A/A has a disadvantage over A/a. When both s and r are positive, there is a heterozygous advantage.
19 Assuming selection exists … In our new notations the average fitness w n at generation n is given by w n (1-r)p n 2 + 2p n q n + (1-s)q n 2 = 1-rp n 2 -sq n 2 First, note that p n+1 = [(1-r)p n 2 + p n q n ] / w n p n p n+1 - p n = [(1-r)p n 2 + p n q n ] / w n - p n = [(1-r)p n 2 + p n q n - (1-rp n 2 -sq n 2 )p n ] / w n = [p n q n (s- (r+s) p n )] / w n Our goal is to study the equilibrium of allele frequencies under various selection possibilities (namely, different values for r and s ). To find equilibrium we study the difference p n p n+1 - p n A/A A/a a/a
20 Interpretation when r>0 and s 0 Claim: When ( r>0 and s 0 ), p n 0, i.e., allele A disappears. In the opposite case ( r 0 and s>0 ), allele a should be driven to extinction. (Why is this extinction process sometimes halted in real life ? ) We just derived p n = [p n q n (s- (r+s) p n )] / w n Convergence occurs when p n =0, namely, when p n =0, p n =1 (i.e., q n =0 ) or p n =s/(r+s). Where should it converge to ? Proof: When ( r>0 and s 0 ), the linear function g(p)=s-(r+s) p satisfies g(0) 0 and g(1) < 0, hence it is negative at (0,1). Thus, p n monotonically decreases at each step. So p n must approach 0 at equilibrium. Similarly, with the other case.
21 when r and s have the same sign Conclusion I (for negative sign): If r and s are negative, ( p n ) > 1, so p n 0 for p 0 above s/(r+s), and p n 1 for p 0 below s/(r+s). In other words, s/(r+s) is an unstable equilibrium.
22 when r and s are both positive Conclusion II: If both r and s are positive, p n s/(r+s) and this point is a stable equilibrium. If both r and s are positive (Heterozygous advantage), then Hence has a constant sign and declines in magnitude. Conclusion III (rate of convergence): If p 0 s/(r+s), namely the starting point is near equilibrium, then, and we get (locally) a geometric convergence
23 Heterozygous advantage However, if the A/a genotype has an advantage over other genotypes, then the defective allele would be kept around. Technically, if both r and s are positive, then the A/a genotype has the best fit. If we observe a recessive disease that is maintained in high frequency, how can we explain it ? Intuition says that it should disappear. The best evidence for such phenomena is the sickle cell anemia. In some part of Africa, this anemia, despite being a recessive disease, is kept in high frequency. It turns out that the A/a genotype appears to provide protection against malaria ! (so it has high fit in swamp-like areas).
24 Sickle cell anemiaאנמיה חרמשית - Medical Encyclopedia Red blood cells, sickle cell Sickle cell anemia is an inherited autosomal recessive blood disease in which the red blood cells produce abnormal pigment (hemoglobin). The abnormal hemoglobin causes deformity of the red blood cells into crescent or sickle-shapes, as seen in this photomicrograph. The sickle cell mutation is a single nucleotide substitution (A T) at codon 6 in the beta-hemoglobin gene, resulting in the following substitution of amino acids: GAG (Glu) GTG (Val). Source (Edited):
25 Facts about Sickle cell Disease Sickle Cell Disease is much more common in certain ethnic groups affecting approximately one out of every 500 African Americans. Because people with sickle trait were more likely to survive malaria outbreaks in Africa than those with normal hemoglobin, it is believed that this genetically aberrant hemoglobin evolved as a protection against malaria. Although sickle cell disease is inherited and present at birth, symptoms usually don't occur until after 4 months of age. Sickle cell anemia may become life-threatening when damaged red blood cells break down (and other circumstances). Repeated crises can cause damage to the kidneys, lungs, bones, eyes, and central nervous system. Blocked blood vessels and damaged organs can cause acute painful episodes. These painful crises, which occur in almost all patients at some point in their lives. Some patients have one episode every few years, while others have many episodes per year. The crises can be severe enough to require admission to the hospital for pain control.
26 Balance of Mutation and Selection Most mutations are neutral or deleterious. We discuss balance between deleterious mutations and selection. Let denote the mutation rate from a to A. Suppose the equilibrium frequency of allele A is p and of a is q=1-p. When is a balance achieved between selection (say, preferring allele a ) and mutation that changes allele a back to allele A ? The frequencies p and q must satisfy the equilibrium condition: This yields 1- rp 2 = 1- and thus p 2 = /r and a balance is achieved that retains both alleles. (r>0, s=0) (recessive disease)
27 Finite Population Genetic Drift Source: Gideon Greenspan After 800 generations, by simulation, from the ten alleles only two remain: numbered 5 and number 7. Alelle 10 Alelle 5