Presentation is loading. Please wait.

Presentation is loading. Please wait.

Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.

Similar presentations


Presentation on theme: "Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC."— Presentation transcript:

1 Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC

2 DNA sequence variation Patterns of DNA sequence variation are affected by mutation recombination population structure changes in population size natural selection genetic drift

3 DNA sequence variation Patterns of DNA sequence variation are affected by mutation recombination population structure changes in population size natural selection genetic drift

4 Standard double strand break model of recombination Gene conversion Crossover (with gene conversion) Slide courtesy of M. Przeworski

5 Standard double strand break model of recombination Gene conversion Crossover (with gene conversion) Approximated as Gene conversion Crossover Ignore patchworks. e.g. Slide courtesy of M. Przeworski

6 Gene conversion Most population genetic models ignore gene conversion. However gene conversion has a strong effect on the levels of linkage disequilibrium between closely linked sites. Recombinants are produced at a rate proportional to the genetic distance between the sites. Recombinants are produced at a rate that is roughly independent of the distance between the sites. Crossing overGene conversion

7 Effect of gene conversion on patterns of linkage disequilibrium (LD) Gene conversion leads to a steeper decay of LD at short distances. average r 2 Physical distance between markers (bps) no gene conversion gene conversion Figure courtesy of M. Przeworski

8 Implications of high levels of gene conversion To detect natural selection (Andolfatto and Nordborg 1998; Berry and Barbadilla 2000)

9 Implications of high levels of gene conversion To detect natural selection (Andolfatto and Nordborg 1998; Berry and Barbadilla 2000) For linkage disequilibrium-based association studies

10 Parameters  = 4N e r co where N e is the effective population size and r co is the crossover rate per bp per generation f = r gc / r co where r gc is the rate of gene conversion initiation per bp per generation t = mean gene conversion tract length. We assume that gene conversion tract lengths follow a geometric distribution.

11 General Approach Ideally we would calculate the probability of the data as a function of the recombination parameters. However, full likelihood methods (e.g., Fearnhead & Donnelly 2001) are too computationally intensive. The composite likelihood approach calculates likelihoods for small subsets of the data, then multiplies these likelihoods over many subsets.

12 Composite likelihood (Frisse et al. 2001) Sequence 1 a c c g a t g c g t a a g c t Sequence 2 g t a g a t g c g t c a g c t Sequence 3 g t a g t c g t g t c g g c c Sequence 4 a c a g t c g t g t c g g t t Sequence 5 a c a g t c g t g t a g g t t Sequence 6 a c c g a c g c c c a a g c t Sequence 7 a c c g a t g c c c a a g c t Sequence 8 a c c g a t g c c c a a g c c Sequence 9 a c c t a t g c g t a a g c t Sequence 10 a c c g a t a c g t c g g t t Sequence 11 a c a g a c g c g t c g c c t Sequence 12 g t a g a t g c c c a a g c t

13 Composite likelihood (Frisse et al. 2001) Sequence 1 a c c g a t g c g t a a g c t Sequence 2 g t a g a t g c g t c a g c t Sequence 3 g t a g t c g t g t c g g c c Sequence 4 a c a g t c g t g t c g g t t Sequence 5 a c a g t c g t g t a g g t t Sequence 6 a c c g a c g c c c a a g c t Sequence 7 a c c g a t g c c c a a g c t Sequence 8 a c c g a t g c c c a a g c c Sequence 9 a c c t a t g c g t a a g c t Sequence 10 a c c g a t a c g t c g g t t Sequence 11 a c a g a c g c g t c g c c t Sequence 12 g t a g a t g c c c a a g c t

14 Composite likelihood (Frisse et al. 2001) Sequence 1 a c c g a t g c g t a a g c t Sequence 2 g t a g a t g c g t c a g c t Sequence 3 g t a g t c g t g t c g g c c Sequence 4 a c a g t c g t g t c g g t t Sequence 5 a c a g t c g t g t a g g t t Sequence 6 a c c g a c g c c c a a g c t Sequence 7 a c c g a t g c c c a a g c t Sequence 8 a c c g a t g c c c a a g c c Sequence 9 a c c t a t g c g t a a g c t Sequence 10 a c c g a t a c g t c g g t t Sequence 11 a c a g a c g c g t c g c c t Sequence 12 g t a g a t g c c c a a g c t

15 Composite likelihood (Frisse et al. 2001) Sequence 1 a c c g a t g c g t a a g c t Sequence 2 g t a g a t g c g t c a g c t Sequence 3 g t a g t c g t g t c g g c c Sequence 4 a c a g t c g t g t c g g t t Sequence 5 a c a g t c g t g t a g g t t Sequence 6 a c c g a c g c c c a a g c t Sequence 7 a c c g a t g c c c a a g c t Sequence 8 a c c g a t g c c c a a g c c Sequence 9 a c c t a t g c g t a a g c t Sequence 10 a c c g a t a c g t c g g t t Sequence 11 a c a g a c g c g t c g c c t Sequence 12 g t a g a t g c c c a a g c t

16 Composite likelihood (Wall 2004) Sequence 1 a c c g a t g c g t a a g c t Sequence 2 g t a g a t g c g t c a g c t Sequence 3 g t a g t c g t g t c g g c c Sequence 4 a c a g t c g t g t c g g t t Sequence 5 a c a g t c g t g t a g g t t Sequence 6 a c c g a c g c c c a a g c t Sequence 7 a c c g a t g c c c a a g c t Sequence 8 a c c g a t g c c c a a g c c Sequence 9 a c c t a t g c g t a a g c t Sequence 10 a c c g a t a c g t c g g t t Sequence 11 a c a g a c g c g t c g c c t Sequence 12 g t a g a t g c c c a a g c t

17 Composite likelihood (Wall 2004) Sequence 1 a c c g a t g c g t a a g c t Sequence 2 g t a g a t g c g t c a g c t Sequence 3 g t a g t c g t g t c g g c c Sequence 4 a c a g t c g t g t c g g t t Sequence 5 a c a g t c g t g t a g g t t Sequence 6 a c c g a c g c c c a a g c t Sequence 7 a c c g a t g c c c a a g c t Sequence 8 a c c g a t g c c c a a g c c Sequence 9 a c c t a t g c g t a a g c t Sequence 10 a c c g a t a c g t c g g t t Sequence 11 a c a g a c g c g t c g c c t Sequence 12 g t a g a t g c c c a a g c t

18 Composite likelihood (Wall 2004) Sequence 1 a c c g a t g c g t a a g c t Sequence 2 g t a g a t g c g t c a g c t Sequence 3 g t a g t c g t g t c g g c c Sequence 4 a c a g t c g t g t c g g t t Sequence 5 a c a g t c g t g t a g g t t Sequence 6 a c c g a c g c c c a a g c t Sequence 7 a c c g a t g c c c a a g c t Sequence 8 a c c g a t g c c c a a g c c Sequence 9 a c c t a t g c g t a a g c t Sequence 10 a c c g a t a c g t c g g t t Sequence 11 a c a g a c g c g t c g c c t Sequence 12 g t a g a t g c c c a a g c t

19 Composite likelihood (Wall 2004) Sequence 1 a c c g a t g c g t a a g c t Sequence 2 g t a g a t g c g t c a g c t Sequence 3 g t a g t c g t g t c g g c c Sequence 4 a c a g t c g t g t c g g t t Sequence 5 a c a g t c g t g t a g g t t Sequence 6 a c c g a c g c c c a a g c t Sequence 7 a c c g a t g c c c a a g c t Sequence 8 a c c g a t g c c c a a g c c Sequence 9 a c c t a t g c g t a a g c t Sequence 10 a c c g a t a c g t c g g t t Sequence 11 a c a g a c g c g t c g c c t Sequence 12 g t a g a t g c c c a a g c t

20 Simulations We ran simulations of 5 Kb loci with n = 50, θ = ρ = 0.001 / bp, f = 4 and t = 125 bp. We analyze each locus individually as well as groups of 5, 20 and 100 loci (assuming each locus is evolutionarily independent). For each group, we estimate f over a grid of values using the methods of Frisse et al. (2001) and Wall (2004).

21 Distribution of estimates of f (1 locus) Triplet method Pair method Estimated value of f Frequency

22 Distribution of estimates of f (5 loci) Triplet method Pair method Estimated value of f Frequency

23 Distribution of estimates of f (20 loci) Triplet method Pair method Estimated value of f Frequency

24 Distribution of estimates of f (100 loci) Triplet method Pair method Estimated value of f Frequency

25 Estimating ρ and f jointly Triplet method Pair method Number of loci Probability

26 Conclusions For estimating gene conversion rates, the triplet composite likelihood method is slightly more accurate than the pairwise composite likelihood method. Both methods are not very accurate on an absolute scale.

27 Further directions Modify method to handle unphased data, missing data, ascertainment bias, etc. Variation in recombination rates Confounding factors: –Multiple hits –Sequencing errors –Population history –Natural selection


Download ppt "Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC."

Similar presentations


Ads by Google