Presentation is loading. Please wait.

Presentation is loading. Please wait.

PLGW01 - September 2007. 1 Inferring Phylogenies from LCA distances (back to the basics of distance-based phylogenetic reconstruction) Ilan Gronau Shlomo.

Similar presentations


Presentation on theme: "PLGW01 - September 2007. 1 Inferring Phylogenies from LCA distances (back to the basics of distance-based phylogenetic reconstruction) Ilan Gronau Shlomo."— Presentation transcript:

1 PLGW01 - September 2007. 1 Inferring Phylogenies from LCA distances (back to the basics of distance-based phylogenetic reconstruction) Ilan Gronau Shlomo Moran Technion, Israel

2 PLGW01 - September 2007. 2 Distance-Based Phylogenetic Reconstruction Compute distances between all taxon-pairs Find a tree (edge-weighted) best-describing the distances 4 5 7 2 1 2 10 6 1

3 PLGW01 - September 2007. 3 Basics (Sanity check): Reconstruction algorithms should be consistent, i.e. reconstruct the true tree from accurate (ie, additive) distances. Essential Extras:  Robustness to noise : Reconstruct the correct tree (or parts of it) given noisy distances.  Efficiency: Low time/space complexity. 4 5 7 2 1 2 10 6 1 Distance-Based Reconstruction

4 PLGW01 - September 2007. 4 Neighbor Joining Methods A taxon-pair i,j is chosen and replaced by a new taxon v i,j are connected to new taxon v (i.e. are cherries in the reconstructed tree) Method recursively applied on reduced matrix An agglomerative clustering approach:

5 PLGW01 - September 2007. 5 The Two Basic Components of NJ Methods At each iteration the algorithm performs: 1.Selection: select neighboring taxons consistency: if input is additive, selected taxa are cherries in the corresponding tree 2.Reduction: compute distances from the new taxon consistency: the reduced matrix should fit the reduced tree. usually can be achieved in more than one way

6 PLGW01 - September 2007. 6 Saitou & Nei’s NJ Algorithm (1987)  Robustness: Considered highly reliable in practice  Time complexity - θ(n 3 )  ~13,000 citations ( Science Citation Index )  Implemented in numerous phylogenetic packages Questions:  What makes Saitou&Nei’s neighbor selection criterion so good?  Is there any simpler consistent neighbor-selection criterion? Saitou & Nei’s  selection criterion

7 PLGW01 - September 2007. 7 Simple Selection Criterion: LCA Distances In a rooted tree, LCA(i,j) is the distance between the root and the least common ancestor of i,j Taxon-pair with deepest LCA are neighbors Also pair i,j with “locally deepest” LCA: For neighbors i,j with parent v : i j r j i j k Consistent (and complete)  neighbor-selection criterion v

8 PLGW01 - September 2007. 8 Deepest LCA Neighbor Joining Algorithm Phase I i r j calculate LCA-distances: Choose root taxon r Calculate LCA-distances from r using Farris Transform: L(i,j) = ½ ( D(r,i) + D(r,j) - D(i,j) )

9 PLGW01 - September 2007. 9 n -1 neighbor-joining iterations At each iteration: Selection: Choose taxon pair i,j, s.t. L(i,j) = max i’≠j’ { L(i’,j’) } Connect i,j to new taxon v Reduction: Replace i,j with new taxon v, and reduce L : For k≠v, L(v,k)= α L(i,k) + (1- α )L(j,k) (α – reduction parameter, may be re-defined each iteration ) Deepest LCA Neighbor Joining Algorithm Phase II

10 PLGW01 - September 2007. 10 Calculating LCA-distances (the matrix L) - θ(n 2 ) time Neighbor joining algorithm: n-1 neighbor joining iterations: -Reduction step takes O(n) time per iteration - Bottleneck is in neighbor selection An amortized θ(n 2 ) implementation of neighbor selection: Join “locally deepest” pair and not necessarily “globally deepest” pair, using the “Nearest Neighbor Chain” clustering technique [Benzecri 82, Juan 82, Murtagh 84, +] Simple and Optimal θ(n 2 ) Implementation of DLCA

11 PLGW01 - September 2007. 11 DLCA: Intermediate Summary A simple and intuitive consistent neighbor selection criterion Implemented in optimal time complexity (faster than NJ) Robustness to noise: We consider two theoretical criteria for robustness: Reconstruction of “ Buneman edges ” Atteson ’ s “ edge-reconstruction radius ” What about the noise ?!

12 PLGW01 - September 2007. 12 P Q Buneman’s Edges [Buneman ’71] D (i,j)+D (k,l) < D (i,k)+D (j,l), D (i,l)+D (j,k) e An edge e induces a split (P|Q) of the taxon set e is a “Buneman edge” (w.r.t. Distance matrix D) iff all taxon-quartet (i,j,k,l) which “crosses” e (i.e. i,j ∊ P, k,l ∊ Q ) agree with e’s split: “Buneman Robustness criterion”: the algorithm should reconstruct all the Buneman edges. j i l k

13 PLGW01 - September 2007. 13 Edge reconstruction-radius: A has edge-reconstruction radius of ε if for each edge e: If ||D-D T || ∞ < ε∙w (e): Then A correctly reconstructs e.  A satisfies Buneman’s criterion A has optimal edge-radius of ½ Atteson’s Edge-Reconstruction radius [Atteson ‘99] Atteson: edge-reconstruction radius ≤ ½ e w(e) Noise≤ ε w(e) (for all distances)

14 PLGW01 - September 2007. 14 NJ : -edge-reconstruction radius = ¼ [Atteson ’99, Mihaescu et al ‘06] (hence it does not satisfy the Buneman Criterion) DLCA (using “conservative reductions”): - Satisfies the Buneman Criterion - Hence it has edge-reconstruction radius = ½ Robustness of NJ and DLCA By these criteria, DLCA is also more robust than NJ And in Practice…???

15 PLGW01 - September 2007 15 0 0 0 0 0 0 0 0 0 D Testing on Simulated Data DNAdist from PHYLIP T’ Compare topologies through RF-distance T ATTCG … ATACG … ACTGG … ATTCG … ATACG … ACTGG … ATTCG … ACTGG … ATTCG … ATACG … ACTGG … ATACG … AGTGG … DLCA / NJ Note that DLCA may produce n different trees – One for each taxon root. CTACG…

16 PLGW01 - September 2007. 16 DLCA vs. Saitou&Nei’s NJ L(i,k)  max{L(i,k),L(j,k)} L(i,k)  ½(L(i,k) + L(j,k)) - 2000 trees - 1 simulation per tree Tree Source: The Methods and Algorithms in Bioinformatics (MAB) lab, LIRMM.http://www.lirmm.fr/~guindon/simul/

17 PLGW01 - September 2007. 17 Robustness of DLCA – a Summary DLCA is superior to NJ by Buneman&Atteson criteria, but (on the average) is inferior to NJ on simulated data. Where lies the reason for this “conflict”? Take another look at Saitou &Nei selection criterion

18 PLGW01 - September 2007. 18 i.e., NJ tends to selects taxon-pairs with average deepest lca Averaging “smoothes” noise Averaging does not affect worst-case noise (The bound 1/4 on the reconstruction radius of NJ uses an highly improbable scenario) Saitou & Nei’s  Selection Criterion… … expressed by LCA distances 

19 PLGW01 - September 2007. 19 Future Directions Use pivotal nature of DLCA to achieve better results: Pre-processing: use “good” taxa as roots Post-processing: return “best” tree among n possible outputs. Find robustness criteria which explain the robustness of NJ: Instead of considering worst-case noise (as Atteson’s criterion), consider stochastic noise.

20 PLGW01 - September 2007. 20 For more information… "Neighbor Joining Algorithms for Inferring Phylogenies via LCA-Distances" ( JCB 14(1) pp. 1-15, 2007) "Optimal Implementations of UPGMA and Other Common Clustering Algorithms” (to Appear in IPL) Our websites: www.cs.technion.ac.il/~ilangr www.cs.technion.ac.il/~moran

21 PLGW01 - September 2007 21 Thank You


Download ppt "PLGW01 - September 2007. 1 Inferring Phylogenies from LCA distances (back to the basics of distance-based phylogenetic reconstruction) Ilan Gronau Shlomo."

Similar presentations


Ads by Google