Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Additive Distances Between DNA Sequences MPI, June 2012.

Similar presentations


Presentation on theme: "1 Additive Distances Between DNA Sequences MPI, June 2012."— Presentation transcript:

1 1 Additive Distances Between DNA Sequences MPI, June 2012

2 Additive Evolutionary distance : The number of substitutions which occurred during the sequence evolution ACAC CCCC C G T A 1 2 3 1 site 1 site 2 substitutions Some substitutions are hidden, due to overwriting. Therefore, the exact number of subst. is usually larger than the number of observed changes. site 3 0

3 3 Edge weight = Expected number of substit’s per site AACA…GTCTTCGAGGCCC u v AGCA…GCCTATGCGACCT MPI, June 2012 0100…0200110121001 0.321 Number of substitutions per site

4 4 When the exact number of substitutions between any two sequences is known, NJ (and any other algorithm which reconstructs trees from the exact distances) returns the correct evolutionary tree. Interleaf distances: sum of edge weights v u 0.5 0.42 0.3 d(u,v) = 1.12

5 5 Estimating # of substitutions from observed substitutions requires Substitution Model JC [Jukes Cantor 1969] Kimura 2 Parameter (K2P) [Kimura 1980] HKY [Hasegawa, Kishino and Yano 1985] TN [Tamura and Nei 1993] GTR: Generalised time-reversible [Tavaré 1986] …and more…

6 6 Distance estimation in the Jukes Cantor model

7 7 Jukes Cantor model: All substitutions are equally like JC generic rate matrix t is the expected # of substitutions per site u v t uv R uv =

8 8 Substitution Matrix P (Theory of Markov Processes) R = Rate Matrix R P =

9 9 JC distance estimation: First estimate the substitution matrix u AACA…GTCTTCGAGGCCC v AGCA…GCCTATGCGACCT an Estimation of P uv From observed substit’s

10 10 Estimate t from estimation of p(t) by “reverse engineering” Solve the formula for p(t)

11 11 Checking the effect of estimation-errors in Reconstructing Quartets

12 12 Quartets Reconstruction = Finding the correct split AC BD AB C D AC DB Quartets are trees with four leaves. They have three possible (fully resolved) topologies, called splits: Distance methods resolves splits by the 4 point method

13 13 The 4 points method AC BD The 4-point condition: w sep The 4-point condition for estimated distances:

14 14 Evaluate the accuracy of reconstructing quartets using evolutionary distances root D C A B t is “evolutionary time” The diameter of the quartet is 22t

15 15 Phase A: simulate evolution D C A B

16 16 Phase B: reconstruct the split by the 4p condition DCBA                   Apply the 4p condition. Is the recontruction correct? compute distances between sequences, Repeat this process 10,000 times, count number of failures

17 17 This test was applied on the model quartet with various diameters  For each diameter, mark the fraction (percentage) of the simulations in which the reconstruction failed (next slide) ……

18 18 Performance of K2P distances in resolving quartets, small diameters: 0.01-0.2 Template quartet

19 19 Performance for larger diameters “site saturation”

20 20 Repeat this experiment on the Hasegawa tree Assume the JC model. Reconstruct by the NJ algorithm (use any variants of NJ available in MATLAB)

21 Hasegawa Tree 21


Download ppt "1 Additive Distances Between DNA Sequences MPI, June 2012."

Similar presentations


Ads by Google