Presentation is loading. Please wait.

Presentation is loading. Please wait.

. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield 17.1-17.3, Setubal&Meidanis 6.1.

Similar presentations


Presentation on theme: ". Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield 17.1-17.3, Setubal&Meidanis 6.1."— Presentation transcript:

1 . Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield 17.1-17.3, Setubal&Meidanis 6.1

2 2 Ultrametric trees as special weighted trees Definition: An Ultrametric tree is a rooted weighted tree all of whose leaves are at the same depth. Edge weights can be represented by the distances of internal vertices from the leaves Note: each internal vertex has at least two children AEDCB 8 5 3 3 0: 3 3 3 3 2 5 5 3

3 3 LCA and distances in Ultrametric Tree Let LCA(i,j) denote the lowest common ancestor of leaves i and j. Let D(i,j) be the height of LCA(i, j), and dist(i,j) be the distance from i to j. Claim: For any pair of leaves i, j in an ultrametric tree: D(i,j)= 0.5 dist(i,j). ABCDE A08853 B0388 C088 D05 E0 A E D C B 8 5 3 3

4 4 Identifying Ultrametric Distances Definition: A symmetric matrix D of dimension L by L is ultrametric iff for each 3 indices i, j, k : D(i,j) ≤ max {D(i,k),D(j,k)}. jk i96 j9 Theorem: The following conditions are equivalent for an L  L symmetric matrix D: 1. D is ultrametric 2. There is an ultrametric tree of L leaves such that for each pair of leaves i,j: D(i,j) = height(LCA(i,j)) = ½ dist(i,j). Note: D(i,j) ≤ max {D(i,k),D(j,k)} is easier to check than the 4 points condition. Therefore the theorem implies that ultrametric additive sets are easier to characterize then arbitrary additive sets

5 5 Properties of ultrametric matrices used in the proof of the theorem Definition: Let D be an L by L matrix, and let S  {1,...,L}. D[S] is the submatrix of D consisting of the rows and columns with indices from S. Claim 1: D is ultrametric iff for every S  {1,...,L}, D[S] is ultrametric. Claim 2: If D is ultrametric and max i,j D(i,j)=m,, then m appears in every row of D. jk ?? jm One of the “?” Must be m

6 6 Ultrametric tree  Ultrametric matrix There is an ultrametric tree s.t. D(i,j)=dist(i,j).  D is an ultrametric matrix: By properties of Least Common Ancestors in trees i j k D(k,i) = D(j,i) ≥ D(k,j)

7 7 Ultrametric matrix  Ultrametric tree Induction Base D is an ultrametric matrix  D has an ultrametric tree : By induction on L, the size of D. Basis: L= 1: T is a leaf L= 2: T is a tree with two leaves 09 0 0 i j ij i i ii 9 ji

8 8 Induction step Induction step: L>2. Let S 1 ={i: D(1,i) =m}, and S 2 ={1,..,L}-S (note: 0<|S 1 |<L) By Claim 1, D[S 1 ] and D[S 2 ] are ultrametric. Construct a tree T 1 for S 1, rooted at m 1 ≤ m. Construct a tree T 2 for S 2 with root labeled m 2 < m (if m 2 =0 then T 2 is a leaf). Join T 1 and T 2 to T with a root labeled m. m=m 1 m 2 < m T2T2 T1T1 [The construction when m 1 = m]

9 9 Correctness Proof Need to prove: T is an ultrametric tree for D ie, D(i,j) is the label of the LCA of i and j in T. If i and j are in the same subtree, this holds by induction. Else D(1,i)= m and D(1,j) ≠ m, hence D(i,j) = m. ij ml im m=m 2 m1m1 T1T1 T2T2

10 10 Complexity Analysis Let f(L) be the time complexity for L×L matrix. f(1)= f(2) = constant. For L>2: u Constructing S 1 and S 2 : O(L). Let |S 1 | = k, |S 2 | = L-k. u Constructing T 1 and T 2 : f(k)+f(L-k). u Joining T 1 and T 2 to T: Constant. Thus we have: f(L) ≤ max k [ f(k) + f(L-k)] +cL, 0 < k < L. f(L) = cL 2 satisfies the above. Need an appropriate data structure!

11 11 Recall: identifying Additive Trees via Ultrametric trees We solve the additive tree problem by reducing it to the ultrametric problem as follows: 1.Given an input matrix D=D(i,j) of distances, transform it to a matrix D’= D’(i,j), where D’(i,j) is the height of the LCA of i and j in the corresponding ultrametric tree T’. 2.Construct the ultrametric tree, T’, for D’. 3.Reconstruct the additive tree T from T’.

12 12 How D’ is constructed from D D’(i,j) should be the height of the Least Common Ancestror of i and j in T’, the ultrametric tree hanged at k: Thus, D’(i,j) = M - d(k,m), where d(k,m) is computed by: a b cd 2 1 3 4 2 9 7

13 13 The transformation D  D’  T’  T abcd a0999 b077 c04 d0 abcd a0397 b086 c06 d0 D a b c d 2 1 3 4 2 D’ a b cd 9 7 4 M=9 T T’

14 14 Character-based methods for constructing phylogenies In this approach, trees are constructed by comparing the characters of the corresponding species. Characters may be morphological (teeth structures) or molecular (homologous DNA sequences). One common approach is Maximum Parsimony Assumptions: u Independence of characters (no interactions) u Best tree is one where minimal changes take place

15 15 1. Maximum Parsimony Input: four nucleotide sequences: AAG, AAA, GGA, AGA taken from four species. Question: Which evolutionary tree best explains these sequences ? AGA AAA GGA AAG AAA 2 1 1 Total #substitutions = 4 One Answer (the parsimony principle): Pick a tree that has a minimum total number of substitutions of symbols between species and their originator in the phylogenetic tree.

16 16 Example Continued There are many trees possible. For example: AGA GGA AAA AAG AAA AGA AAA 1 1 1 Total #substitutions = 3 GGA AAA AGA AAG AAA 1 1 2 Total #substitutions = 4 The left tree is preferred over the right tree. The total number of changes is called the parsimony score.

17 17 Example With One Letter u Suppose we have five species, such that three have ‘C’ and two ‘T’ at a specified position u Minimal tree has one evolutionary change: C C C C C T T T T  C

18 18 Extension to Many Letters u What is the parsimony score of AardvarkBisonChimpDogElephant A : CAGGTA B : CAGACA C : CGGGTA D : TGCACT E : TGCGTA We do it character after character; each score is computed independently of the others.

19 19 Weighted Parsimony Scores Weighted Parsimony score: l Each change is weighted by a score c(a,b). l The weighted parsimony score reduces to the parsimony score when c(a,a)=0 and c(a,b)=1 for all b  a.

20 20 Evaluating Weighted Parsimony Scores Each position is independent and computed by itself. Use Dynamic Programming on a given tree. u if k is a node with children i and j, then S(k,a) = min x (S(i,x)+c(a,x)) + min y (S(j,y)+c(a,y)) k i j S(i,x) S(k,a)  the minimum score of subtree rooted at k when k has character a. S(j,y) S(k,a)

21 21 Evaluating Parsimony Scores Dynamic programming on a given tree Initialization:  For each leaf i set S(i,a) = 0 if i is labeled by a, otherwise S(i,a) =  Iteration:  if k is node with children i and j, then S(k,a) = min x (S(i,x)+c(a,x)) + min y (S(j,y)+c(a,y)) Termination:  cost of tree is min x S(r,x) where r is the root Comment: To reconstruct an optimal assignment, we need to keep in each node k and for each character a the two characters x, y that bring about the minimum when k has character a.

22 22 Cost of Evaluating Parsimony for binary trees If there are n nodes, m characters, and k possible values for each character, then complexity is O(nmk 2 ). Of course, we still need to search over possible trees and find the best one. One usually resorts to heuristic search techniques.

23 23 2. Perfect Phylogeny Data on species is given by a Character State Matrix. Cell (p,i) has value j iff character i of object (species) p has state j. Goal: constructing evolution tree for the species. Character Objectc1c1 c2c2 c3c3 c4c4 c5c5 A11200 B20121 C32331 D03410 E11001

24 24 Motivation: Evolution Tree Internal nodes correspond to speciation events, where some character (attribute) is acquired. Assumptions: 1. No reversals (characters are not lost) 2. No convergences (a character is created only once)

25 25 Perfect Phylogeny for a 0-1 Matrix A 0-1 matrix: Each character is either 0 (non exists) or 1 (exists). u Each of the n objects label exactly one leaf of T u Each of the m characters labels exactly one edge of T u Object p has exactly the characters labeling the path from p to the root. A perfect phylogeny for the matrix: Tree with no convergence, no reversals. 12345 A11000 B00100 C11001 D00110 E01000 A E D C B 4 3 2 1 5

26 26 The (Binary) Perfect Phylogeny Problem Problem: Given a 0-1 matrix M, determine if it has a perfect phylogeny, and construct one if it does. (Note: edges are labeled by characters: edge labeled by i represent changing character i’s state from 0 to 1). 12345 A11000 B00100 C11001 D00110 E01000 A E D C B 4 3 2 1 5

27 27 Solution to Perfect Phylogeny Problem Definition: Given a 0-1 matrix M, O k ={j:M jk =1}, ie: O k is the set of objects that have character k. Theorem: M has a perfect phylogenetic tree iff the sets {O i } are laminar, ie: for all i, j, either O i and O j are disjoint, or one includes the other. 12345 A11000 B00100 C11001 D00110 E01000 12 3 45 A11000 B00101 C11001 D00110 E01001 LaminarNot Laminar

28 28 Proof  : Assume M has a perfect phylogeny, and let i, j be given. Consider the edges labeled i and j. Case 1: There is a root to leaf path containing both. Then one is included in the other (2 and 1 below). Case 2: not case 1. Then they are disjoint (2 and 3 below). A E D C B 4 3 2 1 5

29 29 Proof (cont.)  : Assume for all i, j, either O i and O j are disjoint, or one includes the other. We prove by induction on the number of characters that it has a perfect phylogenetic tree for the matrix. Basis: one character. Then there are at most two objects, one with and one without this character. 1 A1 B0 1 AB

30 30 Proof (cont.)  : Induction step: Assume correctness for n-1 characters, and consider a matrix with n characters (non-zero columns). WLOG assume that O 1 is not contained in O j for j > 1. Let S 1 be the set of objects that have character 1, and S 2 be the remaining objects. Then each character belongs to objects in S 1 or S 2, but not both (prove!). By induction there are trees T 1 and T 2 for S 1 and S 2. Combining them as below gives the desired tree. 12345 A11000 B00100 C11001 D00110 E10000 T1T1 T2T2 1


Download ppt ". Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield 17.1-17.3, Setubal&Meidanis 6.1."

Similar presentations


Ads by Google