# Maximum Parsimony Probabilistic Models of Evolutions Distance Based Methods Lecture 12 © Shlomo Moran, Ilan Gronau.

## Presentation on theme: "Maximum Parsimony Probabilistic Models of Evolutions Distance Based Methods Lecture 12 © Shlomo Moran, Ilan Gronau."— Presentation transcript:

Maximum Parsimony Probabilistic Models of Evolutions Distance Based Methods Lecture 12 © Shlomo Moran, Ilan Gronau

2 Maximum Parsimony A Character-based reconstruction method Input: u h sequences (one per species), all of length k. Goal: u Find a tree whose leaves are labeled by the input sequences, and an assignment of sequences to internal nodes, such that the total number of substitutions is minimized.

3 Parsimony score AGA GGA AAA AAG AAA AGA AAA 1 1 1 Parsimony score = 3 GGA AAA AGA AAG AAA 1 1 2 Parsimony score = 4 The parsimony score of a leaf-labeled tree T is the minimum possible number of mutations over all assignments of sequences to internal vertices of T.

4 Parsimony Based Reconstruction We have here both the small and big problems: 1. The small problem: find the parsimony score for a given leaf labeled tree. 2.The big problem: Find a tree whose leaves are labeled by the input sequences, with the minimum possible parsimony score. 3.We will see efficient algorithms for (1). (2) is hard.

5 Fitch Algorithm: Maximum Parsimony for a Given Tree Input: A rooted binary leaf labeled tree. Output: Most parsimonious assignment of states to internal vertices Work on each position independently. Make one pass from the leaves to the root, and another pass from the root to the leaves. A A/T A A C T A A A/C

6 Fitch’s Algorithm, More detailed  Traverse tree from leaves to root, fix a set of possible states (e.g. nucleotides) for each internal vertex  Traverse tree from root to leaves, pick a unique state for each internal vertex

7 Fitch’s Algorithm – Phase 1  D o a post-order (from leaves to root) traversal of tree, assign to each vertex a set of possible states. Each leaf has a unique possible state, given by the input.  The possible states R i of internal node i with children j and k is given by:

8 Fitch’s Algorithm – Phase 1 Claim (to be proved soon): # of substitutions in optimal solution = # of union operations TC T CT C C T A G C AGC GC

9 Fitch’s Algorithm – Phase 2  do a pre-order (from root to leaves) traversal of tree  The state of the root is an arbitrary r root  R root  The state r j of internal node j with parent i is selected as follows:

10 Fitch’s Algorithm – Phase 2 C T T C C T A G C AG G The algorithm could also select C as the assignment to the root. All other assignments cannot be changed. Complexity: O(nk), where n is the number of leaves and k is the number of states. For m characters the complexity is O(nmk). T C C C

11 Proof of Fitch’s Algorithm We’ll show that Fitch minimizes the parsimony score of the leaf labeled input tree.. u Definitions: l For a leaf-labeled tree T, let T* be an optimal assignment of labels to internal nodes of T. T*(v) be the assignment at internal node. l Let T v be the tree rooted at v.

12 u Claim: Let R i be the set of states kept at the 1 st phase at vertex i. Then s  R i iff there exists an optimal assignment T i * with T i * (i) = s. u Proof: By induction on the tree height h. l Basis: h=1 I.If both children have the same state – zero change. II.Otherwise – exactly one change. AA A AB A  B

13 Induction step: Assume correctness for height h and prove for h+1. Let p 1 and p 2 be the optimal costs of the subtrees of i’s children. If the intersection of i’s children lists is not empty, then the optimal score is p 1 +p 2 and it can be achieved by labeling i with any member in the intersection, and only in this way. Otherwise, the optimal score is p 1 +p 2 +1, and it can be achieved by labeling i with any member in the union of the lists, and only in this way. A,B C,D A,B,C,D A,B B,C B

14 Weighted Maximum Parsimony. Some mutations may be more probable than others. Hence, a natural generalization of the Maximum Parsimony problem is the Weighted Parsimony. You’ll see it in the tutorial.

15 Weighted Parsimony (Sankoff’s algorithm) Weighted Parsimony score: l Input: Tree with characters at the leaves, and a weight function on the mutations: c(a,b) is the weight of the mutation a  b. l Output: assignment of characters to internal vertices which minimizes the total weight of the mutations l The weighted parsimony score reduces to the parsimony score when c(a,a)=0 and c(a,b)=1 for all b other than a.

16 Weighted Parsimony on a Given Tree Each position is independent and computed by itself. Use Dynamic programming. u if i is a node with children j and k, then S(i,a) = min b (S(j,b)+c(a,b)) + min b’ (S(k,b’)+c(a,b’)) i j k S(j,b) S(j,b)  the optimal score of a subtree rooted at j when j has the character b. S(k,b’) S(i,a)

17 Evaluating Parsimony Scores Dynamic programming on a given tree Initialization:  For each leaf i set S(i,a) = 0 if i is labeled by a, otherwise S(i,a) =  Iteration:  For each node with children j and k : S(i,a) = min x (S(j,x)+c(a,x)) + min y (S(k,y)+c(a,y)) Termination:  cost of tree is min x S(r,x) where r is the root Comment: To reconstruct an optimal assignment, we need to keep in each node i and for each character a two characters x, y that minimize the cost when i has character a.

18 Cost of Evaluating Parsimony for binary trees For a tree with n nodes and a single character with k values, the complexity is O(nk 2 ). When there are m such characters, it is O(nmk 2 ).

Is Maximum Parsimony A Reliable Criterion? The motivation for the Perfect Phylogeny and Maximum Parsimony methods comes from models where the characters are “significant”, and hence the number of observed mutations is likely to be as small as possible. When the characters are DNA sequences, common models of evolution assume that mutations are random events. A natural question is whether maximum parsimony is a good method for reconstructing phylogenies in such models. Next we formulate and discuss this question. 19

Probabilistic Models of Evolution A simple (yet quite common) model of evolution, called Jukes Cantor (JC), assumes: 1.Mutations at different “sites” are i.i.d (independent identically distributed). 2.On each edge, all mutations have the same probability. Other models usually assume 1, but give different probabilities to different types of mutations. 20

The JC model: each edge (u,v) corresponds to a probabilistic mutation matrix P uv. u v AGCT A 1-3pppp G p pp C pp p T ppp P uv = p dpeneds on the “length” of the edge 21

A “Model Tree” A model tree in the JC model is an evolution tree which evolves according to the JC model. Formally, it consists of: 1.A directed tree T=(V,E) 2.A distribution of DNA letters at the root. 3.Assignment of JC transition matrices to the edges of T. The JC model (and other common models) assume that the distribution at the root is uniform: Each letter occurs with probability 0.25. This distribution is preserved in all other vertices of the tree. 22

23 A “model quartet” in the JC model root D C A B Each edge may have a different mutation probability

Consistency of Reconstruction Algorithms A tree reconstruction method (like maximum parsimony) is said to be “consistent” for a probabilistic model of evolution, if the following holds for any phylogenetic tree which fits the model: When the sequences length goes to , the reconstructed tree is w.h.p. the true tree. For the maximum parsimony method, this is equivalent to: The true tree is w.h.p. a most parsimonious tree. 24

25 Of specific interest: reconstructing quartets D C A B Correct reconstruction of (undirected) quartets is equivalent to finding the split defined by the middle edge, (A,B;C,D)

26 Example: Checking Consistency of Maximum Parsimony on Quartet Reconstruction D C A B (500 DNA bases) Phase 1: Simulate evolution on the given quartet

27 DCBA Phase 2: Find a most parsimonious tree for the sequences at the leaves.

28 MP is consistent for the given model tree if w.h.p. the most parsimonious tree gives the correct split As we will see next, Maximum Parsimony is not Consistent for certain quartets Most Parsimonious Tree

29 Consistency Question for Maximum Parsimony Assuming JC model, the consistency question for the Maximum Parsimony method for a given model-tree is the following: Assume that the mutations along the edges occurred by the JC model. Is the true tree likely to have a minimum parsimony score?

30 Inconsistency of Maximum Parsimony Maximum Parsimony is not consistent for the JC and other similar probabilistic models of evolution of DNA. In such models there are some scenarios of evolution, in which the most parsimonious tree is w.h.p. different from the true tree. We illustrate this on quartets. A quartet on 4 species have 3 possible topologies (splits): 1 2 3 4 1 3 2 4 1 4 2 3

31 A quartet which is unlikely to be reconstructed by maximum parsimony A AA 1 4 32 Consider the following model quartet, where the probability for a substitution is proportional to edge lengths. In this tree, characters in 2 and 3 are w.h.p. as the origin, and in 1 and 4 are more likely to be different.

32 A AA 1 4 32 Parsimony may be useless/misleading for reconstructing the true tree Assume the (likely) scenario where leaves 2 and 3 are the same. There are 4 patterns of substitution for leaves 1,4. A I A A II G C III G G IV G

33 Case I all topologies get same parsimony score A AA 1 4 32 AA 1 2 3 4 A A A A 1 3 2 4 A A A A 1 4 2 3 A A A A Score=0

34 Case II all topologies get same score A AA 1 4 32 GA 1 2 3 4 A A A G 1 3 2 4 A A A G 1 4 2 3 A G A A Score=1

35 Case III …same A AA 1 4 32 GC 1 2 3 4 A A C G 1 3 2 4 A A C G 1 4 2 3 A G C A Score=2

36 Case III most parsimonious topology is wrong A AA 1 4 32 CC 1 2 3 4 A A C C 1 3 2 4 A A C C 1 4 2 3 A C C A Score=2 Score=1

37 Parsimony is useful only in the least likely cases A CA 1 4 32 AC For most parsimonious tree to be the correct tree, it is necessary that 2 and 3 will have different characters – which is less likely than all other cases

38 Another problem with Maximum Parsimony (and other Character Based Algorithms): Efficiency There are no efficient algorithms for solving the “big” problem for maximum parsimony/Perfect phylogeny (both are known to be NP hard). Mainly for this reason, the most used approaches for solving the big problem are distance based methods.

39 Distance-based Methods for Constructing Phylogenies This approach attempts to overcome the two weaknesses of maximum parsimony: 1. It start by estimating inter-taxa distances from a well defined statistical model of evolution (distances correspond to probability of changes) 2. It provides efficient algorithms for the big problem. Basic idea: The differences between species (usually represented by sequences of characters) are transformed to numerical distances, and an edge weighted tree realizing these distances is constructed.

40 Distance-Based Reconstruction Compute distances between all taxon-pairs Find a tree (edge-weighted) best-describing the distances 4 5 7 2 1 2 10 6 1

41 Distance-based methods for constructing phylogenies Common issues: u Evolutionary model: molecular clocks vs. variable rates of evolution u Algorithms for exact distances: do not handle real data. u Algorithms for noisy distances.

42 Data  Distances  Trees 1. Modeling question: given the data (eg DNA sequences of the taxa), how do we define distances between taxa? 2. Algorithmic question: Decide if the distances define a tree (ultrametric or additive – to be defined later), and if so, construct that tree. 3. In reality, the computed distances are noisy. So we need the algorithm to return a tree which approximates the distances of the input data. In the following we shall study items 2 and 1, and briefly discuss item 3.

43 Ultrametric and Tree Metric A distance metric on a set M of L objects is a function (represented by a symmetric matrix) satisfying: u d(i,i)=0, and for i≠j, d(i,j)>0 u d(i,j)=d(j,i). u For all i,j,k it holds that d(i,k) ≤ d(i,j)+d(j,k). A metric is ultrametric if it corresponds to distances between leaves of a tree which admits molecular clock. It is a tree metric, or additive, if it corresponds to distances between nodes in a weighted tree.

44 1 st model: Molecular Clock  Ultrametric Trees molecular clock assumes a constant rate of evolution. Namely, the distances from any extinct taxon (internal vertex) to all its current descendants are identical. A rooted tree satisfying this property is called ultrametric.

45 Ultrametric trees Definition: An ultrametric tree is a rooted weighted tree all of whose leaves are at the same depth. Basic property: Define the height of the leaves to be 0. Then edge weights can be represented by the heights of internal vertices. AEDCB 8 5 0: 33 3 3 2 5 5 3 Edge weights: Internal-vertices heights: 3 3

46 Least Common Ancestor and distances in Ultrametric Tree Let LCA(i,j) denote the least common ancestor of leaves i and j. Let height(LCA(i, j)) be its distance from the leaves, and dist(i,j) be the distance from i to j. Observation: For any pair of leaves i, j in an ultrametric tree: height(LCA(i,j)) = 0.5 dist(i,j). ABCDE A08853 B0388 C088 D05 E0 AEDCB 8 5 3 3

47 Ultrametric Matrices Definition: A distance matrix* U of dimension L  L is ultrametric iff for each 3 indices i, j, k : U(i,j) ≤ max {U(i,k),U(j,k)}. jk i96 j9 Theorem: The following conditions are equivalent for an L  L distance matrix U: 1. U is an ultrametric matrix. 2. There is an ultrametric tree with L leaves such that for each pair of leaves i,j: U(i,j) = height(LCA(i,j)) = ½ dist(i,j). * Recall: distance matrix is a symmetric matrix with positive non-diagonal entries, 0 diagonal entries, which satisfies the triangle inequality.

48 Ultrametric tree  Ultrametric matrix There is an ultrametric tree s.t. U(i,j)=½dist(i,j).  U is an ultrametric matrix: By properties of Least Common Ancestors in trees i j k U(k,i) = U(j,i) ≥ U(k,j)

49 Ultrametric matrix  Ultrametric tree: The proof is based on the below two observations: Definition: Let U be an L  L matrix, and let S  {1,...,L}. U[S] is the submatrix of U consisting of the rows and columns with indices from S. Observation 1: U is ultrametric iff for every S  {1,...,L}, U[S] is ultrametric. Observation 2: If U is ultrametric and max i,j U(i,j)=M, then M appears in every row of U. jk i?? jM One of the “?” Must be M

50 Ultrametric matrix  Ultrametric tree: Proof by induction U is an ultrametric matrix  U has an ultrametric tree : By induction on L, the size of U. Basis: L= 1: T is a leaf L= 2: T is a tree with two leaves 09 0 0 i j ij i i ii 9 ji

51 Induction step Induction step: L>2. Use the 1 st row to split the set {1,…,L} to two subsets: S 1 ={i: U(1,i) =M}, S 2 ={1,..,L}-S (note: 0<|S i | { "@context": "http://schema.org", "@type": "ImageObject", "contentUrl": "http://images.slideplayer.com/11/3197556/slides/slide_51.jpg", "name": "51 Induction step Induction step: L>2.", "description": "Use the 1 st row to split the set {1,…,L} to two subsets: S 1 ={i: U(1,i) =M}, S 2 ={1,..,L}-S (note: 0<|S i |

52 Induction step By Observation 1, U 1= U[S 1 ] and U 2 = U[S 2 ] are ultrametric. Let M 1 (M 2 ) be the maximal entries in U 1 (U 2 resp.). Note that M 1 ≤ M, and M 2 < M (M 2 is the 2 nd largest element in row 1( if M 2 =0 then T 2 is a leaf). By induction there are ultrametric trees T 1 and T 2 for U 1 and U 2. Join T 1 and T 2 to T with a root as shown. T2T2 T1T1 M2M2 M M1M1

53 Proof (end) Need to prove: T is an ultrametric tree for U ie, U(i,j) is the label of the LCA of i and j in T. If i and j are in the same subtree, this holds by induction. Else LCA(i,j) = M (since they are in different subtrees). Also, [U(1,i)= M and U(1,j) ≠ M]  U(i,j) = M. ij Ml iM T2T2 T1T1 M2M2 M M1M1 ij

Download ppt "Maximum Parsimony Probabilistic Models of Evolutions Distance Based Methods Lecture 12 © Shlomo Moran, Ilan Gronau."

Similar presentations