. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.

Slides:



Advertisements
Similar presentations
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Advertisements

Maximum Parsimony Probabilistic Models of Evolutions Distance Based Methods Lecture 12 © Shlomo Moran, Ilan Gronau.
An introduction to maximum parsimony and compatibility
WSPD Applications.
PHYLOGENETIC TREES Bulent Moller CSE March 2004.
Introduction to Graphs
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.
. Intro to Phylogenetic Trees Lecture 5 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran. Slight modifications by Benny.
One of the most important problems is Computational Geometry is to find an efficient way to decide, given a subdivision of E n and a point P, in which.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Tirgul 8 Graph algorithms: Strongly connected components.
Phylogenetic Trees Lecture 4
Amplicon-Based Quasipecies Assembly Using Next Generation Sequencing Nick Mancuso Bassam Tork Computer Science Department Georgia State University.
Markov Chains Lecture #5
Rajat K. Pal. Chapter 3 Emran Chowdhury # P Presented by.
Problem Set 2 Solutions Tree Reconstruction Algorithms
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken.
Bioinformatics Algorithms and Data Structures
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
. Perfect Phylogeny Tutorial #11 © Ilan Gronau Original slides by Shlomo Moran.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
. Phylogenetic Trees Lecture 3 Based on: Durbin et al 7.4; Gusfield 17.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Strings and.
Phylogeny Tree Reconstruction
. Phylogenetic Trees - Parsimony Tutorial #11 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Strings and.
Phylogenetic Networks of SNPs with Constrained Recombination D. Gusfield, S. Eddhu, C. Langley.
. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
Perfect Phylogeny MLE for Phylogeny Lecture 14
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
. Phylogenetic Trees Lecture 13 This class consists of parts of Prof Joe Felsenstein’s lectures 4 and 5 taken from:
Chapter 2 Graph Algorithms.
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
“On an Algorithm of Zemlyachenko for Subtree Isomorphism” Yefim Dinitz, Alon Itai, Michael Rodeh (1998) Presented by: Masha Igra, Merav Bukra.
PHYLOGENETIC TREES Dwyane George February 24,
Chapter 9. Chapter Summary Relations and Their Properties n-ary Relations and Their Applications (not currently included in overheads) Representing Relations.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
CSCI 115 Chapter 7 Trees. CSCI 115 §7.1 Trees §7.1 – Trees TREE –Let T be a relation on a set A. T is a tree if there exists a vertex v 0 in A s.t. there.
Phylogenetics II.
. Phylogenetic Trees Lecture 11 Sections 6.1, 6.2, in Setubal et. al., 7.1, 7.1 Durbin et. al. © Shlomo Moran, based on Nir Friedman. Danny Geiger, Ilan.
Mathematical Preliminaries. Sets Functions Relations Graphs Proof Techniques.
5.5.2 M inimum spanning trees  Definition 24: A minimum spanning tree in a connected weighted graph is a spanning tree that has the smallest possible.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
5.5.3 Rooted tree and binary tree  Definition 25: A directed graph is a directed tree if the graph is a tree in the underlying undirected graph.  Definition.
Graph Colouring L09: Oct 10. This Lecture Graph coloring is another important problem in graph theory. It also has many applications, including the famous.
Mathematical Preliminaries
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Chapter 9. Chapter Summary Relations and Their Properties n-ary Relations and Their Applications (not currently included in overheads) Representing Relations.
Phylogenetic Trees - Parsimony Tutorial #13
. Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran.
Binary Relations Definition: A binary relation R from a set A to a set B is a subset R ⊆ A × B. Example: Let A = { 0, 1,2 } and B = {a,b} {( 0, a), (
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Section Recursion 2  Recursion – defining an object (or function, algorithm, etc.) in terms of itself.  Recursion can be used to define sequences.
by d. gusfield v. bansal v. bafna y. song presented by vikas taliwal
Phylogenetic Trees - Parsimony Tutorial #12
Lectures on Network Flows
PC trees and Circular One Arrangements
Character-Based Phylogeny Reconstruction
Lectures on Graph Algorithms: searching, testing and sorting
CS 581 Tandy Warnow.
Phylogeny.
Perfect Phylogeny Tutorial #10
Presentation transcript:

. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1

2 Final Exam Details The Final Exam will take Place on Thursday, , 0900, at Taub 4. Allowed Material: Course&Tutorial slides+ the textbooks of the course (Durbin et el, Setubal&Meidanis, Gusfield).

3 2. The perfect phylogeny problem u A character is assumed to be a property which distinguishes between species (e.g. dental structure). u A characters state is a value of the character (human dental structure). u Problem: Given set of species, specified by their characters, reconstruct their evolutionary tree.

4 Characters as Colorings A coloring of a tree T=(V,E) is a mapping C:V  [set of colors] A partial coloring of T is a mapping defined on a subset of the vertices U  V: C:U  [set of colors] U=

5 Each character defines a (partial) coloring of the correspondeing phylogenetic tree: Characters as Colorings (2) Species ≡ Vertices States ≡ Colors

6 Convex Colorings (and Characters) C Definition: A (partial/total) coloring of a tree is convex iff its d-carriers are mutually disjoint Let T=(V,E) be a partially colored tree, and d be a color. The d-carrier is the minimal subtree of T containing all vertices colored d

7 A character is Homoplasy free (avoids reversal and convergence transitions) ↕ The corresponding (partial) coloring is convex Convexity  Homoplasy Freedom

8 The Perfect Phylogeny Problem u Input: a set of species, and many characters. u Question: is there a tree T containing the species as vertices, in which all the characters (colorings) are convex? (always possible for one chracter)

9 Input: Partial colorings (C 1,…,C k ) of a set of vertices U (in the example: 3 total colorings: left, center, right, each by two colors). Problem: Is there a tree T=(V,E), s.t. U  V and for i=1,…,k,, C i is a convex (partial) coloring of T? RBRRBRRRR BBRRRB The Perfect Phylogeny Problem (pure graph theoretic setting) NP-Hard In general, in P for some special cases

10 Perfect Phylogeny for a 0-1 Matrix Rows correspond to objects, columns to characters. Each character has two states: 0 (non exists) or 1 (exists). A tree T is a perfect phylogeny for the matrix iff it has the following properties: A.Each of the n objects corresponds to a leaf of T. B.Each of the m characters labels exactly one edge of T. C.Object p has character i  i labels an edge on the path from p to the root. Note: [B and C]  [each character is convex on T] C1C1C2C3C4C5 A11000 B00100 C11001 D00110 E01000 A E D C B C4 C3 C2 C1 C5

11 Perfect Phylogeny for a 0-1 Matrix By the definition, for each character C there is one edge in which it is converted from 0 to 1. In the below tree, the edge on which character C2 is converted to 1 is marked. The resulted tree is convex for this character. C1C2C3C4C5 A1 B0 C1 D0 E1 A E D C B C2

12 The (Binary) Perfect Phylogeny Problem Problem: Given a 0-1 matrix M, determine if it has a perfect phylogeny in which the root has 0 for all characters, and construct one if it does. (Note: edges are labeled by characters: edge labeled by i represent changing character i’s state from 0 to 1). As we show below, the answer is yes for our matrix: C1C2C3C4C5 A11000 B00100 C11001 D00110 E01000 A E D C B C4 C3 C2 C1 C5

13 Efficient algorithm for the Binary Perfect Phylogeny Problem Definition: Given a 0-1 matrix M, O k ={j:M jk =1}, ie: O k is the set of objects that have character Ck. Theorem: M has a perfect phylogenetic tree iff the sets {O i } are laminar, ie: for all i, j, either O i and O j are disjoint, or one includes the other. C1C2C3C4C5 A11000 B00100 C11001 D00110 E01000 C1C2 C3 C4C5 A11000 B00101 C11001 D00110 E01001 LaminarNot Laminar

14 Proof  : Assume M has a perfect phylogeny, and let Ci, Cj be given. Consider the edges labeled Ci and Cj. Case 1: There is a root to leaf path containing both edges. Then one is included in the other (C2 and C1 below). Case 2: not case 1. Then they are disjoint (C2 and C3). A E D C B C4 C3 C2 C1 C5

15 Proof (cont.)  : Assume for all i, j, either O i and O j are disjoint, or one includes the other. We prove by induction on the number of characters that M has a perfect phylogenetic tree for the matrix. Basis: one character. Then there are at most two objects, one with and one without this character. C1 A1 B0 AB

16 Proof (cont.)  : Induction step: Assume correctness for n-1 characters, and consider a matrix with n characters (non-zero columns). WLOG assume that O 1 is not contained in O j for j > 1. Let S 1 be the set of objects j for which M j1 = 1, and S 2 be the remaining objects. Then each character belongs to objects in S 1 or S 2, but not both (prove!). By induction there are trees T 1 and T 2 for S 1 and S 2. Combining them as below gives the desired tree. C1C2C3C4C5 A11000 B00100 C11001 D00110 E10000 T1T1 T2T2 1

17 Efficient Implementation 1 Sort the columns (characters) by decreasing value when considered as binary numbers. (Time complexity: O(mn), using radix sort). Claim: If the binary value of column i is larger than that of column j, then O i is not a proper subset of O j. Proof: O i – O j > 0 means the 1’s in O i are not covered by the 1’s in O j. C1C2C3C4C5 A11000 B00100 C11001 D00110 E01000 C2C1C3C5C4 A11000 B00100 C11010 D00101 E10000

18 Efficient Implementation(2) 2. Make a backwards linked list of the 1’s in each row (leftmost 1 in each row points at itself). Time complexity: O(mn). C2C1C3C5C5C4 A11000 B00100 C11010 D00101 E10000 Claim: If the columns are sorted, then the set of columns is laminar iff for each column i, all the links leaving column i point at the same column. Can be checked in O(mn) time.

19 Examples A11000 B00100 C11010 D00101 E10000 A11000 B00100 C11010 D00101 E10110 laminarNot laminar

20 Efficient Implementation(3) 3. When the matrix is laminar, the tree edges corresponding to characters are defined by the backwards links in the matrix. C2C1C3C5C4 A11000 B00100 C11010 D00101 E10000 A E D C B C3 C2 C1 C5 remaining edges and leaves are determined by the characters of each object. Needs O(mn) time.

21 A scenario where Maximum Parsimony (and Perfect Phylogeny) are misleading A AA Consider a model with 4 letters (DNA), where the probability for a substitution is proportional to time. In the following topology, 2 and 3 are likely to be like the origin, but 4 and 5 can be different. In this case, Maximum Parsimony is misleading.

22 Parsimony may be useless/misleading A A C G A G G G I Uninformative II Uninformative III Uninformative A AA IV Misinformative For leaves 1,4 there are 4 combinations of substitution. In the first three, all three topologies will obtain the same parsimony score. In the fourth, a wrong topology will score best

23 Parsimony may be Useless Case I A AA AA A A A A A A A A A A A A Score=0

24 Parsimony Imay be useless Case II A AA GA A A A G A A A G A G A A Score=1

25 Parsimony may be misleading Case III A AA GC A A C G A A C G A G C A Score=2

26 Parsimony may be misleading Case III A AA CC A A C C A A C C A C C A Score=2 Score=1

27 Parsimony may be misleading A CA AC CA A CA AC AA Will infer correctly only in the rare case of a change on the central edge, or In an even more rare case of a parallel change from A to C on the pendant edges to 1 and 2.

28 3. Maximum Likelihood Approach Consider the phylogenetic tree to be a stochastic process. AGA GGA AAA AAG AAA AGA AAA The likelihood of transition from character a to charcter b is given by parameters  b|a. The liklihood of a letter a in the root is q a. Given the complete tree, its probability is defined by the values of the  b|a ‘s and the q a ’s.

29 Maximum Likelihood Approach(2) When the data consists only of the leaves sequences (but the topology is fixed): AGA GGA AAA AAG Write down the likelihood of the data (leaves sequences) given the tree. Use EM to estimate the  b|a parameters. When the tree is not given: Search for the tree that maximizes Prob(data|Tree,  EM )