Presentation is loading. Please wait.

Presentation is loading. Please wait.

9/1/2005 1 Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein.

Similar presentations


Presentation on theme: "9/1/2005 1 Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein."— Presentation transcript:

1 9/1/2005 1 Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein

2 2 9/1/2005 Introduction – additive trees In the last lecture we saw the concept of distance based phylogenetic trees d(i,j) is the distance between the objects indexed i and j In particular, we discussed additive sets, in which:  For each i: d(i,i) = 0, and for each j  i: d(i,j)  0  For each i,j: d(i,j) = d(j,i)  For each i,j,k: d(i,k) ≤ d(i,j) + d(j,k) [triangle inequality]  Any subset of four objects can be labelled i,j,k,l such that d(i,j) + d(k,l) ≤ d(i,l) + d(j,k) = d(i,k) + d(j,l) [four points condition] An additive set defines a tree. Every tree defines an additive distance matrix between its leaves

3 3 9/1/2005 Molecular clocks Let us assume that “stable” mutations in the genome occur uniformly over long time periods This defines a “molecular clock” – each mutation stands for a constant period of time We can therefore approximate the time since any two taxa diverged from their last common ancestor by the number of differences between the genomes in conserved regions

4 4 9/1/2005 Ultrametric trees Given a group of taxa with distances, if we assume the “molecular clock” model and wish to find the evolutionary tree, the number of mutations from the last common ancestor to every taxon should be similar This means that the distance from the root of the evolutionary tree to each leaf is the same Such a tree is called an Ultrametric tree

5 5 9/1/2005 Ultrametric trees (cont.) If we have a set of objects with a distance between them, we want to know if this set is ultrametric For ultrametric sets, these condition hold:  For each i: d(i,i) = 0, and for each j  i: d(i,j)  0  For each i,j: d(i,j) = d(j,i)  For each i,j,k: d(i,k) ≤ max{d(i,j), d(j,k)} [ultrametric condition] The last condition can be replaced by this one:  Any subset of three objects can be labelled i,j,k such that d(i,j) ≤ d(j,k) = d(i,k)

6 6 9/1/2005 Ultrametric trees (cont.) An ultrametric set is also additive The opposite is not always true Distance matrices Additive matrices Ultrametric matrices

7 7 9/1/2005 Ultrametric decision Given a set of n objects with distances, we want to determine if the set is ultrametric The naïve approach – go over all triplets, and check if the ultrametric condition holds Complexity – O(n 3 ) More efficient algorithms exists (Gusfield gives a simple O(n 2 logn) and a more sophisticated O(n 2 ) algorithm with partial proofs)

8 8 9/1/2005 Approximations However, for most biological data there is no accurate “ultrametric solution” This means that some heuristic is needed The most popular method is UPGMA, which stands for Unweighted Pair Group Method using Arithmetic mean Introduced by Sokal and Michener (1958)

9 9 9/1/2005 UPGMA Input: A set of n objects, with a distance between every two objects Output: an ultrametric tree with the given objects as leaves The main data structures used by the algorithm are a graph G=(V,E) which contains trees with the objects as leaves, and a distance matrix between each two roots of trees in the graph

10 10 9/1/2005 UPGMA (cont.) Initialization: Each object in a separate tree, distance by input We will use an example of 5 mammal species BearRaccoonWeaselSealDog Bear 026342932 Raccoon 260424448 Weasel 344204451 Seal 2944 050 Dog 324851500 BearRaccoonWeaselSealDog

11 11 9/1/2005 UPGMA (cont.) We iterate until there is only one tree At each iteration we perform:  Find the two trees x and y with minimal distance d(x,y)  Add a new node, and connect the roots of x and y to this node. The result is a new tree z. The height of the root of z is d(x,y)/2  Compute the distance between z and the other remaining trees (without x and y)

12 12 9/1/2005 UPGMA (cont.) First iteration: BearRaccoonWeaselSealDog Bear 026342932 Raccoon 260424448 Weasel 344204451 Seal 2944 050 Dog 324851500 BearRaccoonWeaselSealSea lionBR 13

13 9/1/2005 UPGMA (cont.) Update computation – denote the number of leaves in the tree x by n x, then for each t  x,y we set: BearRaccoonWeaselSealDog Bear 026342932 Raccoon 260424448 Weasel 344204451 Seal 2944 050 Dog 324851500 BRWeaselSealDog BR 03836.540 Weasel 3804451 Seal 36.544050 Dog 4051500

14 14 9/1/2005 UPGMA (cont.) Second iteration: BearRaccoonWeaselSealDog BR 13 BRWeaselSealDog BR 03836.540 Weasel 3804451 Seal 36.544050 Dog 4051500 BRS 18.25 18.25-13=5.25

15 15 9/1/2005 UPGMA (cont.) Third iteration: BRSWeaselDog BRS 04043.3 Weasel 40051 Dog 43.3510 BearRaccoonWeaselSealDog BR 13 BRS 18.25 18.25-13=5.25 BRSW 20 20-18.25=1.75

16 16 9/1/2005 UPGMA (cont.) Fourth (and last) iteration: BRSWDog BRSW 045.25 Dog 45.250 BearRaccoonWeaselSealDog BR 13 BRS 18.25 18.25-13=5.25 BRSW 20 20-18.25=1.75 BRSWD 22.625 22.625-20=2.625

17 17 9/1/2005 UPGMA - complexity A simple implementation takes n-1 iterations, where in each iteration we find the minimal distance at O(n 2 ), with total complexity of O(n 3 ) We can keep a list of the smallest distance in each row. This way it takes O(n) to find the minimal distance, while updating the list is also O(n) at each iteration. Therefore, the total complexity is O(n 2 ).

18 18 9/1/2005 Ultrametric evaluation UPGMA gives us an ultrametric tree Is this tree the best possible? Depends on how we measure the quality of an approximated tree for a given matrix Let U(i,j) be the distance in the ultrametric tree U between the objects indexed i and j The L  norm is defined by:

19 19 9/1/2005 Ultrametric evaluation (cont.) There is an O(n 2 ) algorithm for finding the ultrametric tree U with minimal L  norm (Farach, Kannan and Warnow, 1995) Is this tree the best possible? It would be better to include all distances The L 1 norm is defined by: Finding U with minimal L 1 norm is NP-hard! (Day, 1987)


Download ppt "9/1/2005 1 Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein."

Similar presentations


Ads by Google