. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield 17.1-17.3, Setubal&Meidanis 6.1.

Slides:



Advertisements
Similar presentations
Maximum Parsimony Probabilistic Models of Evolutions Distance Based Methods Lecture 12 © Shlomo Moran, Ilan Gronau.
Advertisements

8.3 Representing Relations Connection Matrices Let R be a relation from A = {a 1, a 2,..., a m } to B = {b 1, b 2,..., b n }. Definition: A n m  n connection.
PHYLOGENETIC TREES Bulent Moller CSE March 2004.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Greedy Algorithms Greed is good. (Some of the time)
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
. Intro to Phylogenetic Trees Lecture 5 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran. Slight modifications by Benny.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
Discrete Structure Li Tak Sing( 李德成 ) Lectures
Phylogenetic Trees Lecture 4
Phylogenetic reconstruction
The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
Bioinformatics Algorithms and Data Structures
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
. Phylogenetic Trees - Parsimony Tutorial #12 Next semester: Project in advanced algorithms for phylogenetic reconstruction (236512) Initial details in:
. Perfect Phylogeny Tutorial #11 © Ilan Gronau Original slides by Shlomo Moran.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
. Phylogenetic Trees Lecture 1 Credits: N. Friedman, D. Geiger, S. Moran,
. Phylogenetic Trees Lecture 3 Based on: Durbin et al 7.4; Gusfield 17.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Strings and.
Data Structures – LECTURE 10 Huffman coding
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogeny Tree Reconstruction
. Phylogenetic Trees - Parsimony Tutorial #11 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Strings and.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
Perfect Phylogeny MLE for Phylogeny Lecture 14
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogenetic Trees Lecture 2
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
Phylogenetic trees Sushmita Roy BMI/CS 576
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
Induction and recursion
1 Chapter 7 Building Phylogenetic Trees. 2 Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances –UPGMA method.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Phylogenetics II.
. Phylogenetic Trees Lecture 11 Sections 6.1, 6.2, in Setubal et. al., 7.1, 7.1 Durbin et. al. © Shlomo Moran, based on Nir Friedman. Danny Geiger, Ilan.
Discrete Structures Lecture 12: Trees Ji Yanyan United International College Thanks to Professor Michael Hvidsten.
 Rooted tree and binary tree  Theorem 5.19: A full binary tree with t leaves contains i=t-1 internal vertices.
Evolutionary tree reconstruction (Chapter 10). Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Evolutionary tree reconstruction
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Comp. Genomics Recitation 8 Phylogeny. Outline Phylogeny: Distance based Probabilistic Parsimony.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
1 Alignment Matrix vs. Distance Matrix Sequence a gene of length m nucleotides in n species to generate an… n x m alignment matrix n x n distance matrix.
Phylogenetic Trees - Parsimony Tutorial #13
. Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
5.6 Prefix codes and optimal tree Definition 31: Codes with this property which the bit string for a letter never occurs as the first part of the bit string.
Chapter AGB. Today’s material Maximum Parsimony Fixed tree versions (solvable in polynomial time using dynamic programming) Optimal tree search.
Section Recursion 2  Recursion – defining an object (or function, algorithm, etc.) in terms of itself.  Recursion can be used to define sequences.
Discrete Structures Li Tak Sing( 李德成 ) Lectures
Phylogenetic Trees - Parsimony Tutorial #12
dij(T) - the length of a path between leaves i and j
Character-Based Phylogeny Reconstruction
CS 581 Tandy Warnow.
Phylogeny.
Perfect Phylogeny Tutorial #10
Presentation transcript:

. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1

2 Ultrametric trees as special weighted trees Definition: An Ultrametric tree is a rooted weighted tree all of whose leaves are at the same depth. Edge weights can be represented by the distances of internal vertices from the leaves Note: each internal vertex has at least two children AEDCB :

3 LCA and distances in Ultrametric Tree Let LCA(i,j) denote the lowest common ancestor of leaves i and j. Let D(i,j) be the height of LCA(i, j), and dist(i,j) be the distance from i to j. Claim: For any pair of leaves i, j in an ultrametric tree: D(i,j)= 0.5 dist(i,j). ABCDE A08853 B0388 C088 D05 E0 A E D C B

4 Identifying Ultrametric Distances Definition: A symmetric matrix D of dimension L by L is ultrametric iff for each 3 indices i, j, k : D(i,j) ≤ max {D(i,k),D(j,k)}. jk i96 j9 Theorem: The following conditions are equivalent for an L  L symmetric matrix D: 1. D is ultrametric 2. There is an ultrametric tree of L leaves such that for each pair of leaves i,j: D(i,j) = height(LCA(i,j)) = ½ dist(i,j). Note: D(i,j) ≤ max {D(i,k),D(j,k)} is easier to check than the 4 points condition. Therefore the theorem implies that ultrametric additive sets are easier to characterize then arbitrary additive sets

5 Properties of ultrametric matrices used in the proof of the theorem Definition: Let D be an L by L matrix, and let S  {1,...,L}. D[S] is the submatrix of D consisting of the rows and columns with indices from S. Claim 1: D is ultrametric iff for every S  {1,...,L}, D[S] is ultrametric. Claim 2: If D is ultrametric and max i,j D(i,j)=m,, then m appears in every row of D. jk ?? jm One of the “?” Must be m

6 Ultrametric tree  Ultrametric matrix There is an ultrametric tree s.t. D(i,j)=dist(i,j).  D is an ultrametric matrix: By properties of Least Common Ancestors in trees i j k D(k,i) = D(j,i) ≥ D(k,j)

7 Ultrametric matrix  Ultrametric tree Induction Base D is an ultrametric matrix  D has an ultrametric tree : By induction on L, the size of D. Basis: L= 1: T is a leaf L= 2: T is a tree with two leaves i j ij i i ii 9 ji

8 Induction step Induction step: L>2. Let S 1 ={i: D(1,i) =m}, and S 2 ={1,..,L}-S (note: 0<|S 1 |<L) By Claim 1, D[S 1 ] and D[S 2 ] are ultrametric. Construct a tree T 1 for S 1, rooted at m 1 ≤ m. Construct a tree T 2 for S 2 with root labeled m 2 < m (if m 2 =0 then T 2 is a leaf). Join T 1 and T 2 to T with a root labeled m. m=m 1 m 2 < m T2T2 T1T1 [The construction when m 1 = m]

9 Correctness Proof Need to prove: T is an ultrametric tree for D ie, D(i,j) is the label of the LCA of i and j in T. If i and j are in the same subtree, this holds by induction. Else D(1,i)= m and D(1,j) ≠ m, hence D(i,j) = m. ij ml im m=m 2 m1m1 T1T1 T2T2

10 Complexity Analysis Let f(L) be the time complexity for L×L matrix. f(1)= f(2) = constant. For L>2: u Constructing S 1 and S 2 : O(L). Let |S 1 | = k, |S 2 | = L-k. u Constructing T 1 and T 2 : f(k)+f(L-k). u Joining T 1 and T 2 to T: Constant. Thus we have: f(L) ≤ max k [ f(k) + f(L-k)] +cL, 0 < k < L. f(L) = cL 2 satisfies the above. Need an appropriate data structure!

11 Recall: identifying Additive Trees via Ultrametric trees We solve the additive tree problem by reducing it to the ultrametric problem as follows: 1.Given an input matrix D=D(i,j) of distances, transform it to a matrix D’= D’(i,j), where D’(i,j) is the height of the LCA of i and j in the corresponding ultrametric tree T’. 2.Construct the ultrametric tree, T’, for D’. 3.Reconstruct the additive tree T from T’.

12 How D’ is constructed from D D’(i,j) should be the height of the Least Common Ancestror of i and j in T’, the ultrametric tree hanged at k: Thus, D’(i,j) = M - d(k,m), where d(k,m) is computed by: a b cd

13 The transformation D  D’  T’  T abcd a0999 b077 c04 d0 abcd a0397 b086 c06 d0 D a b c d D’ a b cd M=9 T T’

14 Character-based methods for constructing phylogenies In this approach, trees are constructed by comparing the characters of the corresponding species. Characters may be morphological (teeth structures) or molecular (homologous DNA sequences). One common approach is Maximum Parsimony Assumptions: u Independence of characters (no interactions) u Best tree is one where minimal changes take place

15 1. Maximum Parsimony Input: four nucleotide sequences: AAG, AAA, GGA, AGA taken from four species. Question: Which evolutionary tree best explains these sequences ? AGA AAA GGA AAG AAA Total #substitutions = 4 One Answer (the parsimony principle): Pick a tree that has a minimum total number of substitutions of symbols between species and their originator in the phylogenetic tree.

16 Example Continued There are many trees possible. For example: AGA GGA AAA AAG AAA AGA AAA Total #substitutions = 3 GGA AAA AGA AAG AAA Total #substitutions = 4 The left tree is preferred over the right tree. The total number of changes is called the parsimony score.

17 Example With One Letter u Suppose we have five species, such that three have ‘C’ and two ‘T’ at a specified position u Minimal tree has one evolutionary change: C C C C C T T T T  C

18 Extension to Many Letters u What is the parsimony score of AardvarkBisonChimpDogElephant A : CAGGTA B : CAGACA C : CGGGTA D : TGCACT E : TGCGTA We do it character after character; each score is computed independently of the others.

19 Weighted Parsimony Scores Weighted Parsimony score: l Each change is weighted by a score c(a,b). l The weighted parsimony score reduces to the parsimony score when c(a,a)=0 and c(a,b)=1 for all b  a.

20 Evaluating Weighted Parsimony Scores Each position is independent and computed by itself. Use Dynamic Programming on a given tree. u if k is a node with children i and j, then S(k,a) = min x (S(i,x)+c(a,x)) + min y (S(j,y)+c(a,y)) k i j S(i,x) S(k,a)  the minimum score of subtree rooted at k when k has character a. S(j,y) S(k,a)

21 Evaluating Parsimony Scores Dynamic programming on a given tree Initialization:  For each leaf i set S(i,a) = 0 if i is labeled by a, otherwise S(i,a) =  Iteration:  if k is node with children i and j, then S(k,a) = min x (S(i,x)+c(a,x)) + min y (S(j,y)+c(a,y)) Termination:  cost of tree is min x S(r,x) where r is the root Comment: To reconstruct an optimal assignment, we need to keep in each node k and for each character a the two characters x, y that bring about the minimum when k has character a.

22 Cost of Evaluating Parsimony for binary trees If there are n nodes, m characters, and k possible values for each character, then complexity is O(nmk 2 ). Of course, we still need to search over possible trees and find the best one. One usually resorts to heuristic search techniques.

23 2. Perfect Phylogeny Data on species is given by a Character State Matrix. Cell (p,i) has value j iff character i of object (species) p has state j. Goal: constructing evolution tree for the species. Character Objectc1c1 c2c2 c3c3 c4c4 c5c5 A11200 B20121 C32331 D03410 E11001

24 Motivation: Evolution Tree Internal nodes correspond to speciation events, where some character (attribute) is acquired. Assumptions: 1. No reversals (characters are not lost) 2. No convergences (a character is created only once)

25 Perfect Phylogeny for a 0-1 Matrix A 0-1 matrix: Each character is either 0 (non exists) or 1 (exists). u Each of the n objects label exactly one leaf of T u Each of the m characters labels exactly one edge of T u Object p has exactly the characters labeling the path from p to the root. A perfect phylogeny for the matrix: Tree with no convergence, no reversals A11000 B00100 C11001 D00110 E01000 A E D C B

26 The (Binary) Perfect Phylogeny Problem Problem: Given a 0-1 matrix M, determine if it has a perfect phylogeny, and construct one if it does. (Note: edges are labeled by characters: edge labeled by i represent changing character i’s state from 0 to 1) A11000 B00100 C11001 D00110 E01000 A E D C B

27 Solution to Perfect Phylogeny Problem Definition: Given a 0-1 matrix M, O k ={j:M jk =1}, ie: O k is the set of objects that have character k. Theorem: M has a perfect phylogenetic tree iff the sets {O i } are laminar, ie: for all i, j, either O i and O j are disjoint, or one includes the other A11000 B00100 C11001 D00110 E A11000 B00101 C11001 D00110 E01001 LaminarNot Laminar

28 Proof  : Assume M has a perfect phylogeny, and let i, j be given. Consider the edges labeled i and j. Case 1: There is a root to leaf path containing both. Then one is included in the other (2 and 1 below). Case 2: not case 1. Then they are disjoint (2 and 3 below). A E D C B

29 Proof (cont.)  : Assume for all i, j, either O i and O j are disjoint, or one includes the other. We prove by induction on the number of characters that it has a perfect phylogenetic tree for the matrix. Basis: one character. Then there are at most two objects, one with and one without this character. 1 A1 B0 1 AB

30 Proof (cont.)  : Induction step: Assume correctness for n-1 characters, and consider a matrix with n characters (non-zero columns). WLOG assume that O 1 is not contained in O j for j > 1. Let S 1 be the set of objects that have character 1, and S 2 be the remaining objects. Then each character belongs to objects in S 1 or S 2, but not both (prove!). By induction there are trees T 1 and T 2 for S 1 and S 2. Combining them as below gives the desired tree A11000 B00100 C11001 D00110 E10000 T1T1 T2T2 1