Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distance-Based Phylogenetic Reconstruction Tutorial #8 © Ilan Gronau, edited by Itai Sharon.

Similar presentations


Presentation on theme: "Distance-Based Phylogenetic Reconstruction Tutorial #8 © Ilan Gronau, edited by Itai Sharon."— Presentation transcript:

1 Distance-Based Phylogenetic Reconstruction Tutorial #8 © Ilan Gronau, edited by Itai Sharon

2 Phylogenetic Reconstruction We’d like to study the evolutionary history of species Problems: No information regarding extinct species Many possible tree topologies

3 3 Common Terminology A B C D E Edges represent distance between nodes Root (Ancestral node) Internal nodes (common ancestors) Leaves TAXA (genes, proteins, species etc.)

4 Phylogenetic Reconstruction Approach 1: (Character based)  Given a probabilistic model (HMM) of evolution, find the most probable tree to yield the known set of species. Problem: Finding ML tree is very hard Evolutionary models are very complex, with many parameters Estimating parameters using EM  Many local maxima  Small trees (up to 5 taxa) are relatively easy  Big trees (more than 50 taxa) are almost impossible Approach 2: (Distance based)  Given ML pairwise ( evolutionary ) distances between species, find the edge-weighted tree best describing this metric Note: ML pairwise distances = ML trees spanning two species

5 Distance-Based Reconstruction Given ML pairwise ( evolutionary ) distances between species, find the edge-weighted tree best describing this metric The input: distance matrix – D – D(i,i) ≥ 0 – D(i,i) = 0 – D(i,j) = D(j,i) – D(i,j) ≤ D(i,k) + D(k,j) The Output: edge-weighted tree – T If D is additive, then D T = D Otherwise, return a tree best ‘fitting’ the input – D. Note: Usually ML-estimated pairwise distances are not additive, but they are ‘close’ to some additive metric metric BearRaccoonWeaselSealDog Bear 026342932 Raccoon 260424448 Weasel 344204451 Seal 2944 050 Dog 324851500 Bear Raccoon Weasel Seal Dog 13 25.25 20 5.25 18.25 1.75

6 Neighbor-Joining Algorithms Agglomerative approach: (bottom-up) 1.Find a pair of taxa neighbors – i,j 2.Connect them to a new internal vertex – v (Define edge weights) 3.Remove i,j from taxon-set, and add v (Define distances from v ) 4.Return to (1)  When only 2 taxa are left, connect them Consistency: Given an additive metric D T : - We always choose a pair of neighbors in T (stage 1) - The reduced distance-matrix is consistent with the reduced tree (stage 3) Neighbors: taxa connected by a 2-edge path By induction: We eventually reconstruct T

7 UPGMA (U nweighted P air G roup M ethod with A rithmetic-Mean ) UPGMA algorithm: 1.Find a pair of taxa of minimal distace– i,j 2.Connect them to a new internal vertex v 3.Remove i,j from taxon-set, and add v ( D(v,k) = αD(i,k) +(1- α)D(j,k) ) 4.Return to (1)  When only 2 taxa are left, connect them Consistency ? - Given an additive metric D T, do we always choose a pair of neighbors in T ? abcd a 0141527 b 0315 c 014 d 0 c 13 1 1 1 a b d UPGMA chooses b,c Closest taxon is not necessarily a neighbor α, 1- α – proportional to the number of ‘original’ taxa i,j represent

8 Molecular Clock Reminder: Edge weights correspond to evolutionary distance If rate of evolution is universally constant:  The root is equidistant from all taxa  Closest taxon-pair is a neighbor-pair 6 6.5 3.5 4 3 2 2 time

9 Molecular Clock Reminder: Edge weights correspond to evolutionary distance Rate of evolution is different in each branch  Most observed evolutionary trees  Closest taxon-pair is not necessarily a neighbor-pair 6 5 1 9 3 1 2 3.5 time

10 Ultrametric Trees Edge-weighted trees which have a point (root) equidistant from all leaves Additive metrics consistent with an ultrametric tree are called ultrametrics A distance-matrix is ultrametric iff it obeys the 3-point condition: “ Any subset of three taxa can be labelled i,j,k such that d(i,j) ≤ d(j,k) = d(i,k) ” 6 6.5 3.5 4 3 2 2

11 UPGMA on Ultrametrics UPGMA algorithm: 1.Find a pair of taxa of minimal distace– i,j 2.Connect them to a new internal vertex v 3.Remove i,j from taxon-set, and add v ( D(v,k) = αD(i,k) +(1- α)D(j,k) ) 4.Return to (1)  When only 2 taxa are left, connect them Consistency for ultrametrics: Given an ultrametric U T : - We always choose a pair of neighbors in T (stage 1) - The reduced distance-matrix is consistent with the reduced tree (stage 3)

12 Consistency for ultrametrics: Given an ultrametric U T : - We always choose a pair of neighbors in T (stage 1) - The reduced distance-matrix is consistent with the reduced tree (stage 3) If i,j are neighbors in an ultrametric tree, then D(i,k) = D(j,k) for all k. - or - If D(i,j) is minimal in an ultrametric, then D(i,k) = D(j,k) for all k. k ij UPGMA on Ultrametrics 1.Find a pair of taxa of minimal distace– i,j 2.Connect them to a new internal vertex v 3.Remove i,j from taxon-set, and add v ( D(v,k) = αD(i,k) +(1- α)D(j,k) )

13 UPGMA on Ultrametrics Consistency for ultrametrics: Given an ultrametric U T : - We always choose a pair of neighbors in T (stage 1) - The reduced distance-matrix is consistent with the reduced tree (stage 3) 1.Find a pair of taxa of minimal distace– i,j 2.Connect them to a new internal vertex v 3.Remove i,j from taxon-set, and add v ( D(v,k) = αD(i,k) +(1- α)D(j,k) ) Assume, to the contrary, that i,j are not neighbors The path connecting i,j contains at least 3 non-zero weight edges v – the least-common ancestor (lca) of i,j.  There is a taxon k, s.t. D(j,k) (or D(i,k) ) is smaller than D(i,j). k i j v contradiction changed!!

14 UPGMA on Non-Ultrametric Data Edge-weights are set so that UPGMA always returns an ultrametric tree (we won’t prove) Example: BearRaccoonWeaselSealDog Bear 026342932 Raccoon 260424448 Weasel 344204451 Seal 2944 050 Dog 324851500 D: D is not ultrametric

15 UPGMA on Non-Ultrametric Data Example: 1 st iteration BRWSD B 026342932 R 0424448 W 04451 S 050 D 0 D: BearRaccoonWeaselSealDogB-R 13 B-RWSD 03836.540 W 04451 S 050 D 0 α = ½ 1.Find a pair of taxa of minimal distace– i,j 2.Connect them to a new internal vertex v 3.Remove i,j from taxon-set, and add v ( D(v,k) = αD(i,k) +(1- α)D(j,k) )

16 UPGMA on Non-Ultrametric Data Example: 2 nd iteration D: B-RWSD 03836.540 W 04451 S 050 D 0 BearRaccoonWeaselSealDog BR 13 B-R-S 18.25 18.25-13=5.25 B-R-SWD 040 43 ⅓ W 051 D 0 α = ⅓ 1.Find a pair of taxa of minimal distace– i,j 2.Connect them to a new internal vertex v 3.Remove i,j from taxon-set, and add v ( D(v,k) = αD(i,k) +(1- α)D(j,k) )

17 UPGMA on Non-Ultrametric Data Example: 3 rd iteration D: BearRaccoonWeaselSealDog BR 13 B-R-S 18.25 18.25-13=5.25 B-R-S-WD 045¼ D 0 B-R-SWD 040 43 ⅓ W 051 D 0 BRSW 20 20-18.25=1.75 α = ¼ 1.Find a pair of taxa of minimal distace– i,j 2.Connect them to a new internal vertex v 3.Remove i,j from taxon-set, and add v ( D(v,k) = αD(i,k) +(1- α)D(j,k) )

18 UPGMA on Non-Ultrametric Data Example: 4 th iteration D: B-R-S-WD 045.25 D 0 BearRaccoonWeaselSealDog BR 13 BRS 18.25 18.25-13=5.25 BRSW 20 20-18.25=1.75 BRSWD 22.625 22.625-20=2.625 1.Find a pair of taxa of minimal distace– i,j 2.Connect them to a new internal vertex v 3.Remove i,j from taxon-set, and add v ( D(v,k) = αD(i,k) +(1- α)D(j,k) )

19 UPGMA Additional notes: In the reduction formula D(v,k) can be set to any value within the interval defined by D(i,k) and D(j,k).  In particular: D(v,k) = ½(D(i,k) + D(j,k)) ( WPGMA algorithm)  If we use: D(v,k) = min {D(i,k), D(j,k)} we get the ‘closest’ ultrametric from below (unique subdominant ultrametric) Run-time analysis: ―Naïve implementation: O(n 3 ) ―By keeping a sorted version of each row in D : O(n 2 log(n)) ―Third variant can be executed in: O(n 2 ) 1.Find a pair of taxa of minimal distace– i,j 2.Connect them to a new internal vertex v 3.Remove i,j from taxon-set, and add v ( D(v,k) = αD(i,k) +(1- α)D(j,k) )


Download ppt "Distance-Based Phylogenetic Reconstruction Tutorial #8 © Ilan Gronau, edited by Itai Sharon."

Similar presentations


Ads by Google