Presentation is loading. Please wait.

Presentation is loading. Please wait.

. Distance-Based Phylogenetic Reconstruction ( part II ) Tutorial #11 © Ilan Gronau.

Similar presentations


Presentation on theme: ". Distance-Based Phylogenetic Reconstruction ( part II ) Tutorial #11 © Ilan Gronau."— Presentation transcript:

1 . Distance-Based Phylogenetic Reconstruction ( part II ) Tutorial #11 © Ilan Gronau

2 . Phylogenetic Reconstruction We’d like to study the evolutionary history of species

3 . Distance-Based Reconstruction Given ML pairwise ( evolutionary ) distances between species, find the edge-weighted tree best describing this metric The input: distance matrix – D – D(i,i) = 0 – D(i,j) = D(j,i) – [ D(i,j) ≤ D(i,k) + D(k,j) ] The Output: edge-weighted tree – T If D is additive, then D T = D Otherwise, return a tree best ‘fitting’ the input – D. Note: Usually ML-estimated pairwise distances are not additive, but they are ‘close’ to some additive metric metric BearRaccoonWeaselSealDog Bear 026342932 Raccoon 260424448 Weasel 344204451 Seal 2944 050 Dog 324851500 Bear Raccoon Weasel Seal Dog 13 25.25 20 5.25 18.25 1.75

4 . Neighbor-Joining Algorithms Agglomerative approach: (bottom-up) 1.Find a pair of taxa neighbors – i,j 2.Connect them to a new internal vertex – v (Define edge weights) 3.Remove i,j from taxon-set, and add v (Define distances from v ) 4.Return to (1)  When only 2 taxa are left, connect them Consistency: Given an additive metric D T : - We always choose a pair of neighbors in T (stage 1) - The reduced distance-matrix is consistent with the reduced tree (stage 3) Neighbors: taxa connected by a 2-edge path By induction: We eventually reconstruct T

5 . UPGMA (U nweighted P air G roup M ethod with A rithmetic-Mean ) UPGMA algorithm: 1.Find a pair of taxa of minimal distace– i,j 2.Connect them to a new internal vertex v 3.Remove i,j from taxon-set, and add v ( D(v,k) = αD(i,k) +(1- α)D(j,k) ) 4.Return to (1)  When only 2 taxa are left, connect them Consistency ? - Given an additive metric D T, do we always choose a pair of neighbors in T ? abcd a 0141527 b 0315 c 014 d 0 c 13 1 1 1 a b d UPGMA chooses b,c Closest taxon is not necessarily a neighbor α, 1- α – proportional to the number of ‘original’ taxa i,j represent

6 . Ultrametric Trees Edge-weighted trees which have a point (root) equidistant from all leaves Additive metrics consistent with an ultrametric tree are called ultrametrics A distance-matrix is ultrametric iff it obeys the 3-point condition: “ Any subset of three taxa can be labelled i,j,k such that d(i,j) ≤ d(j,k) = d(i,k) ” 6 6.5 3.5 4 3 2 2 time

7 . UPGMA Additional notes: In the reduction formula D(v,k) can be set to any value within the interval defined by D(i,k) and D(j,k).  In particular: D(v,k) = ½(D(i,k) + D(j,k)) ( WPGMA algorithm)  If we use: D(v,k) = min {D(i,k), D(j,k)} we get the ‘closest’ ultrametric from below (unique subdominant ultrametric) Run-time analysis: ―Naïve implementation: Θ(n 3 ) ―By keeping a sorted version of each row in D : Θ(n 2 log(n)) ―Third variant can be executed in: Θ(n 2 ) 1.Find a pair of taxa of minimal distace– i,j 2.Connect them to a new internal vertex v 3.Remove i,j from taxon-set, and add v ( D(v,k) = αD(i,k) +(1- α)D(j,k) )

8 . Consistent distance-based reconstruction:  Given an additive metric D, find the unique tree T, s.t. D T = T. Reminder: A metric is additive iff it obeys the 4-point condition: “Any subset of four taxa can be labelled i,j,k,l such that d(i,j) + d(k,l) ≤ d(i,l) + d(j,k) = d(i,k) + d(j,l)” Next Time … Distance matrices Additive matrices Ultrametric matrices

9 . Saitou & Nei’s Neighbor Joining S&N algorithm: 1.Find a pair of taxa maximizing Q(i,j) = r(i) + r(j) – (n-2)D(i,j) 2.Connect them to a new internal vertex v with edges of weights: 3.Remove i,j from taxon-set, and add v - D(v,k) = ½ ( D(i,k) +D(j,k) -D(i,j) ) 4.Return to (1)  When only 2 taxa are left, connect them (with edge of length D(i,j) ) If D is additive (consistent with some tree T ): Q(i,j) is maximized for neighbor-pairs If i,j are neighbors then stages (2,3) are consistent k ij v n – current #taxa shown in class Conclusion: In such a case, given D, NJ returns T

10 . Saitou & Nei’s Neighbor Joining Complexity analysis Run-time analysis: In each iteration we need to recalculate r(∙) for all taxa Q(∙,∙) values are ‘scrambled’ in each iteration Stage (1) takes O(n 2 ) Total complexity - O(n 3 ) No known way to speed this up significantly S&N algorithm: 1.Find a pair of taxa maximizing Q(i,j) = r(i) + r(j) – (n-2)D(i,j) 2.Connect them to a new internal vertex v with edges of weights: 3.Remove i,j from taxon-set, and add v - D(v,k) = ½ ( D(i,k) +D(j,k) -D(i,j) ) Note: There are consistent reconstruction algorithms which run in O(n 2 ) or even O(n∙log(n)) time.

11 . S&N’s NJ on Non-Additive Data Example: BearRaccoonWeaselSealDog Bear 026342932 Raccoon 260424448 Weasel 344204451 Seal 2944 050 Dog 324851500 D: D(B,R) + D(W,S) ; D(B,W) + D(R,S) ; D(B,S) + D(R,W) 26 + 44 (68) ; 34 + 44 (78) ; 29 + 42 (71) D is not additive

12 . S&N’s NJ Example: 1 st iteration BRWSD B 026342932 R 0424448 W 04451 S 050 D 0 D: BearDogRaccoonWeaselSealB-D 626 BRWSD B 0203190201206 R 0205195197 W 0206199 S 0198 D 0 Q: BRWSD 121160171167181 r :

13 . S&N’s NJ Example: 2 nd iteration B-DRWS 02126.523.5 R 04244 W 0 S 0 D: BearDogRaccoonWeaselSealB-D 626 B-DRWS 0136130.5135.5 R 0 130.5 W 0136 S 0 Q: B-DRWS 71107112.5111.5 r : B-D-R 1.5 19.5 Calculate difference from old values to new ones

14 . S&N’s NJ Example: 3 rd iteration B-D-RWS 023.7523.25 W 044 S 0 D: BearDogRaccoonWeaselSealB-D 626 Q: B-D-RWS 4767.7567.25 r : B-D-R 1.5 19.5 B-D-RWS 091 W 0 S 0 Reconstruct the unique tree over 3 taxa 1.5 W-S 22.25 21.75

15 . How Good Is The Tree? BearDogRaccoonWeaselSeal B-D 626 B-D-R 1.5 19.5 1.5 W-S 22.25 21.75 We observe the perturbations from the input matrix to the one implied by the output tree BRWSD B 026342932 R 0424448 W 04451 S 050 D 0 D: BRWSD B 02731.2530.7532 R 043.2542.7547 W 04451.25 S 050.75 D 0 D T : BRWSD B 012.751.750 R 01.25 1 W 000.25 S 00.75 D 0 |D-D T |: How good is this?

16 . How Good Is The Tree? BearDogRaccoonWeaselSeal B-D 626 B-D-R 1.5 19.5 1.5 W-S 22.25 21.75 Compare with other algorithms: BRWSD B 012.751.750 R 01.25 1 W 000.25 S 00.75 D 0 |D-D T2 |: BearRaccoonWeaselSeal Dog BR 13 BRS 18.25 5.25 BRSW 20 1.75 BRSWD 22.625 2.625 |D-D T1 |: NJ UPGMA BRWSD B 0067.513.25 R 027.52.75 W 045.75 S 04.75 D 0

17 . Can we do better? Given a distance-matrix D, find an edge-weighted tree T, which minimizes ||D,D T || p For p = 1,2,∞ this task was shown to be NP-hard For p = 1,2 this task was shown to be NP-hard for ultrametric trees as well For p = ∞ : ― this task is easy ( O(n 2 ) algorithm) for ultrametric trees ― 3-approximation algorithm for general trees No algorithm which gives any good guarantees for non-additive data


Download ppt ". Distance-Based Phylogenetic Reconstruction ( part II ) Tutorial #11 © Ilan Gronau."

Similar presentations


Ads by Google