. Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran.

Slides:



Advertisements
Similar presentations
Mathematical Preliminaries
Advertisements

Great Theoretical Ideas in Computer Science
Maximum Parsimony Probabilistic Models of Evolutions Distance Based Methods Lecture 12 © Shlomo Moran, Ilan Gronau.
PHYLOGENETIC TREES Bulent Moller CSE March 2004.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Tirgul 8 Graph algorithms: Strongly connected components.
The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau.
Lectures on Network Flows
Data Structures, Spring 2004 © L. Joskowicz 1 Data Structures – LECTURE 14 Strongly connected components Definition and motivation Algorithm Chapter 22.5.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
1 A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield Department of Computer Science.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
. Perfect Phylogeny Tutorial #11 © Ilan Gronau Original slides by Shlomo Moran.
1.4 Exercises (cont.) Definiton: A set S of points is said to be affinely (convex) independent if no point of S is an affine combination of the others.
. Distance-Based Phylogenetic Reconstruction ( part II ) Tutorial #11 © Ilan Gronau.
CSE 780 Algorithms Advanced Algorithms Graph Alg. DFS Topological sort.
. Robustness to Noise in Distance-Based Phylogenetic Reconstruction Methods Tutorial #12 © Ilan Gronau.
1 Perfect Matchings in Bipartite Graphs An undirected graph G=(U  V,E) is bipartite if U  V=  and E  U  V. A 1-1 and onto function f:U  V is a perfect.
. Phylogenetic Trees Lecture 3 Based on: Durbin et al 7.4; Gusfield 17.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Strings and.
Costas Busch - RPI1 Mathematical Preliminaries. Costas Busch - RPI2 Mathematical Preliminaries Sets Functions Relations Graphs Proof Techniques.
Courtesy Costas Busch - RPI1 Mathematical Preliminaries.
. Phylogenetic Trees - Parsimony Tutorial #11 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
Distance-Based Phylogenetic Reconstruction Tutorial #8 © Ilan Gronau, edited by Itai Sharon.
Phylogenetic Networks of SNPs with Constrained Recombination D. Gusfield, S. Eddhu, C. Langley.
. Robustness to Noise in Distance-Based Phylogenetic Reconstruction Methods Tutorial #13 © Ilan Gronau.
CSE 550 Computer Network Design Dr. Mohammed H. Sqalli COE, KFUPM Spring 2007 (Term 062)
. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
Perfect Phylogeny MLE for Phylogeny Lecture 14
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
1 CSCI 2400 section 3 Models of Computation Instructor: Costas Busch.
A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:
PHYLOGENETIC TREES Dwyane George February 24,
Chapter 9. Chapter Summary Relations and Their Properties n-ary Relations and Their Applications (not currently included in overheads) Representing Relations.
Huei-Hun E Tseng1 and Martin Tompa BMC Bioinformatics 2009 Presenter : Seyed Ali Rokni Algorithms for locating extremely conserved elements in multiple.
Discrete Math for CS Binary Relation: A binary relation between sets A and B is a subset of the Cartesian Product A x B. If A = B we say that the relation.
Phylogenetics II.
. Phylogenetic Trees Lecture 11 Sections 6.1, 6.2, in Setubal et. al., 7.1, 7.1 Durbin et. al. © Shlomo Moran, based on Nir Friedman. Danny Geiger, Ilan.
Targil 6 Notes This week: –Linear time Sort – continue: Radix Sort Some Cormen Questions –Sparse Matrix representation & usage. Bucket sort Counting sort.
Mathematical Preliminaries. Sets Functions Relations Graphs Proof Techniques.
Fall 2005Costas Busch - RPI1 Mathematical Preliminaries.
Prof. Busch - LSU1 Mathematical Preliminaries. Prof. Busch - LSU2 Mathematical Preliminaries Sets Functions Relations Graphs Proof Techniques.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Lecture 11 Algorithm Analysis Arne Kutzner Hanyang University / Seoul Korea.
Mathematical Preliminaries
Chapter 9. Chapter Summary Relations and Their Properties n-ary Relations and Their Applications (not currently included in overheads) Representing Relations.
NPC.
1 Mathematical Preliminaries. 2 Sets Functions Relations Graphs Proof Techniques.
PC-Trees vs. PQ-Trees. 2 Table of contents Review of PQ-trees –Template operations Introducing PC-trees The PC-tree algorithm –Terminal nodes –Splitting.
Complexity ©D.Moshkovits 1 2-Satisfiability NOTE: These slides were created by Muli Safra, from OPICS/sat/)
 2004 SDU 1 Lecture5-Strongly Connected Components.
. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.
C&O 355 Lecture 19 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.
Computational Molecular Biology Pooling Designs – Inhibitor Models.
Section Recursion 2  Recursion – defining an object (or function, algorithm, etc.) in terms of itself.  Recursion can be used to define sequences.
by d. gusfield v. bansal v. bafna y. song presented by vikas taliwal
Bipartite Matching Lecture 8: Oct 7.
dij(T) - the length of a path between leaves i and j
Perfect Matchings in Bipartite Graphs
Lectures on Network Flows
PC trees and Circular One Arrangements
Character-Based Phylogeny Reconstruction
Lectures on Graph Algorithms: searching, testing and sorting
CSCI-2400 Models of Computation.
Mathematical Preliminaries
Phylogeny.
Perfect Phylogeny Tutorial #10
Presentation transcript:

. Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran

2 The underlying model: A character-vector is given for every specie in S. Each character represents some observable trait. Each character takes values from a finite set. Basic Underlying Assumption: characters are homoplasy free. Perfect Phylogeny

3 no reversals Homoplasy-Free Characters no convergence Homoplasy-free characters induce a convex coloring of the phylogenetic tree The Perfect Phylogeny Problem: Given character-vectors for S, find: -a phylogenetic tree T over S. ( S is the leaf-set of T ) -convex character assignments to all vertices of T. ! This problem is generally NP-hard ! If exists

4 Directed binary characters: 0 – property exists 1 – property doesn’t exist Initially (at the root) all propertied do not exist. Input: binary coloring ( C 1,…, C m ) of a set S ( n x m binary matrix M ) Problem: Find a phylogenetic tree T over S (if one exists), s.t. 1.For j=1,…,m, the partial coloring induced by C j is convex in T. 2.The root has state 0 in all characters. Directed Binary Perfect Phylogeny  We will present a polynomial-time solution

5 A E D C B (11000) (00100) (01000) (00110) (11001) m characters n species Example C 1 C 2 C 3 C 4 C 5 A B C D E Input: Possible output: (00000) (11000) (01000) (00100) C2C2 C3C3 zero-root

6 A tree is a directed perfect phylogeny for a given 0/1 matrix iff we can map each character to an edge/vertex on which this character was “turned on”. C 1 C 2 C 3 C 4 C 5 A B C D E A E D C B C4C4 C3C3 C1C1 C5C5 Example: An Important Observation C2C2 origin of C 2

7 Laminar Matrices Definitions: Ÿ O j – set of objects that have character C j ( O j ={i : M ij =1} ). Ÿ A collection of sets {S 1,…, S k } is laminar if  for all i, j, either S i and S j are disjoint, or one includes the other. Theorem: A binary matrix M has a perfect phylogenetic tree iff the collection {O 1,…, O m } is laminar. C 1 C 2 C 3 C 4 C 5 A B C D E C 1 C 2 C 3 C 4 C 5 A B C D E Laminar Not Laminar

8 Proof of Theorem  Assume M has a perfect phylogeny. Consider the edges labeled C i and C j : Ÿ If there is a root-to-leaf path containing both edges ( C 1,C 2 below ), then O i includes O j or vice-versa. Ÿ Otherwise, O i and O j are disjoint ( C 1,C 3 below ). A E D C B C4C4 C3C3 C5C5 C1C1 C2C2

9  Assume that the collection {O 1,…, O k } is laminar. We prove by induction on the number of characters k that M has a perfect phylogenetic tree. Basis: one character. There are at most two (distinct) objects, one with and one without this character. C1C1 A 1 B 0 C1C1 AB root Proof of Theorem (cont)

10  Assume that the collection {O 1,…, O k } is laminar. Induction step: assume correctness for n-1 characters. Consider a matrix with n characters (non-zero columns), and assume WLOG that O 1 is not contained in O j for all j > 1. Ÿ S 1 – the set of objects i for which M i1 = 1. S 2 – the remaining objects. Ÿ Claim: each character belongs to objects in S 1 or S 2, but not to both. By induction there are trees T 1 and T 2 for S 1 and S 2. C 1 C 2 C 3 C 4 C 5 A11000 B00100 C11001 D00110 E10000 T1T1 T2T2 C1C1 S 1 ={ A,C,E } S 2 ={ B,D } Proof of Theorem (cont) why is this?

11 Efficient Implementation 1. Sort the columns (characters) according to decreasing binary value. Claim: If the binary value of column i is larger than that of column j, then O i is not a proper subset of O j. Proof: O i > O j means the 1 ’s in O i are not covered by the 1 ’s in O j. C 1 C 2 C 3 C 4 C 5 A B C D E C 2 C 1 C 3 C 5 C 4 A B C D E 10000

12 why is this? 2. Make a backwards linked list of the 1 ’s in each row Claim: If the columns are sorted, then the set of columns is laminar iff for each column i, all the links leaving column i point at the same column.  If the matrix is laminar then these pointers define the inclusion hierarchy Efficient Implementation (cont) C 2 C 1 C 3 C 5 C 4 A B C D E C 2 C1C1 C3C3 C5C5 C4C4 A B C D E 00110

13 (11000) (00100) (01000) (00110) (11001) (00000) (11000) (10000) (00100) 3. If the matrix is laminar, compute the inclusion hierarchy 4. Reconstruct topology of the phylogenetic tree and ancestral character states Efficient Implementation (cont) C 2 C 1 C 3 C 5 C 4 A B C D E C5C5 C1C1 C2C2 C4C4 C3C3 A E D C B C4C4 C3C3 C5C5 C1C1 C2C2

14 1. Sort the columns (characters) according to decreasing binary value. 2. Make a backwards linked list of the 1 ’s in each row 3. If the matrix is laminar, compute the inclusion hierarchy 4. Reconstruct topology of the phylogenetic tree and ancestral character states Complexity: O(mn) – use radix (bucket) sort in stage 1. Efficient Implementation - Summary C 1 C 2 C 3 C 4 C 5 A B C D E C 2 C 1 C 3 C 5 C 4 A B C D E 10000