. Perfect Phylogeny Tutorial #11 © Ilan Gronau Original slides by Shlomo Moran.

Slides:



Advertisements
Similar presentations
Boosting Textual Compression in Optimal Linear Time.
Advertisements

Maximum Parsimony Probabilistic Models of Evolutions Distance Based Methods Lecture 12 © Shlomo Moran, Ilan Gronau.
PHYLOGENETIC TREES Bulent Moller CSE March 2004.
Chapter 10: Trees. Definition A tree is a connected undirected acyclic (with no cycle) simple graph A collection of trees is called forest.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Greedy Algorithms Greed is good. (Some of the time)
Tutorial 6 of CSCI2110 Bipartite Matching Tutor: Zhou Hong ( 周宏 )
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Tirgul 8 Graph algorithms: Strongly connected components.
The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau.
Data Structures, Spring 2004 © L. Joskowicz 1 Data Structures – LECTURE 14 Strongly connected components Definition and motivation Algorithm Chapter 22.5.
CSC5160 Topics in Algorithms Tutorial 2 Introduction to NP-Complete Problems Feb Jerry Le
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
What is the next line of the proof? a). Let G be a graph with k vertices. b). Assume the theorem holds for all graphs with k+1 vertices. c). Let G be a.
. Distance-Based Phylogenetic Reconstruction ( part II ) Tutorial #11 © Ilan Gronau.
A general approximation technique for constrained forest problems Michael X. Goemans & David P. Williamson Presented by: Yonatan Elhanani & Yuval Cohen.
. Robustness to Noise in Distance-Based Phylogenetic Reconstruction Methods Tutorial #12 © Ilan Gronau.
Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
1 Perfect Matchings in Bipartite Graphs An undirected graph G=(U  V,E) is bipartite if U  V=  and E  U  V. A 1-1 and onto function f:U  V is a perfect.
Is the following graph Hamiltonian- connected from vertex v? a). Yes b). No c). I have absolutely no idea v.
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 23 Instructor: Paul Beame.
Rooted Trees. More definitions parent of d child of c sibling of d ancestor of d descendants of g leaf internal vertex subtree root.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Strings and.
CSE182-L17 Clustering Population Genetics: Basics.
. Phylogenetic Trees - Parsimony Tutorial #11 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Strings and.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
Distance-Based Phylogenetic Reconstruction Tutorial #8 © Ilan Gronau, edited by Itai Sharon.
Phylogenetic Networks of SNPs with Constrained Recombination D. Gusfield, S. Eddhu, C. Langley.
. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
Perfect Phylogeny MLE for Phylogeny Lecture 14
Tirgul 7 Review of graphs Graph algorithms: – BFS (next tirgul) – DFS – Properties of DFS – Topological sort.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
Important Problem Types and Fundamental Data Structures
Perfect Gaussian Elimination and Chordality By Shashank Rao.
PHYLOGENETIC TREES Dwyane George February 24,
Lecture 22 More NPC problems
Phylogenetics II.
. Phylogenetic Trees Lecture 11 Sections 6.1, 6.2, in Setubal et. al., 7.1, 7.1 Durbin et. al. © Shlomo Moran, based on Nir Friedman. Danny Geiger, Ilan.
The Neighbor Joining Tree-Reconstruction Technique Lecture 13 ©Shlomo Moran & Ilan Gronau.
Incomplete Directed Perfect Phylogeny Itsik Pe'er, Tal Pupko, Ron Shamir, and Roded Sharan SIAM Journal on Computing Volume 33, Number 3, pp
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Data Structures & Algorithms Graphs
Lecture 11 Algorithm Analysis Arne Kutzner Hanyang University / Seoul Korea.
. Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran.
 2004 SDU 1 Lecture5-Strongly Connected Components.
. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
An Algorithm for the Consecutive Ones Property Claudio Eccher.
Approximation Algorithms Greedy Strategies. I hear, I forget. I learn, I remember. I do, I understand! 2 Max and Min  min f is equivalent to max –f.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
5.6 Prefix codes and optimal tree Definition 31: Codes with this property which the bit string for a letter never occurs as the first part of the bit string.
Computational Molecular Biology Pooling Designs – Inhibitor Models.
by d. gusfield v. bansal v. bafna y. song presented by vikas taliwal
Breadth-First Search (BFS)
Mathematical Foundations of AI
dij(T) - the length of a path between leaves i and j
Perfect Matchings in Bipartite Graphs
PC trees and Circular One Arrangements
What is the next line of the proof?
Character-Based Phylogeny Reconstruction
Lecture 12 Algorithm Analysis
Lecture 10 Algorithm Analysis
Lecture 12 Algorithm Analysis
Lecture 12 Algorithm Analysis
Important Problem Types and Fundamental Data Structures
Perfect Phylogeny Tutorial #10
Presentation transcript:

. Perfect Phylogeny Tutorial #11 © Ilan Gronau Original slides by Shlomo Moran

2 The input: a species-characters matrix The ouput: a tree with n leaves corresponding to the input species Phylogenetic Reconstruction n species k characters Each character represents some observable trait. Each character takes values from a finite set.

3 Find a homoplasy-free tree explaining the input vector set (if such a tree exists) Perfect Phylogeny

4 no reversals Homoplasy-Free Characters no convergence Homoplasy-free characters induce a convex coloring of the phylogenetic tree The Perfect Phylogeny Problem: Given character-vectors for S, find: -a phylogenetic tree T over S. ( S is the leaf-set of T ) -convex character assignments to all vertices of T. ! This problem is generally NP-hard ! If exists

5 Directed binary characters: 0 – property exists 1 – property doesn’t exist Initially (at the root) all propertied do not exist. Input: k binary colorings C 1... C k of the species set S. Output: 1. A rooted phylogenetic tree T over S. 2. k binary colorings C’ 1... C’ k of the vertices of T which are: a.extensions of C 1... C k. b.induce a ‘0’ coloring at the root of T. Directed Binary Perfect Phylogeny  We will present a polynomial-time solution Or notification that such a tree doesn’t exist

6 A E D C B (11000) (00100) (01000) (00110) (11001) k characters n species Example C 1 C 2 C 3 C 4 C 5 A B C D E Input: Possible output: (00000) (11000) (01000) (00100) C2C2 C3C3 zero-root

7 A tree is a directed perfect phylogeny for a given 0/1 matrix iff we can map each character to a vertex on which this character was “turned on”. C 1 C 2 C 3 C 4 C 5 A B C D E A E D C B C4C4 C3C3 C1C1 C5C5 Example: An Important Observation C2C2

8 Laminar Matrices Definitions: Ÿ O j – set of species that have character C j ( O j ={i : M ij =1} ). Ÿ A collection of sets {S 1,…, S k } is laminar if  for all i, j, either S i and S j are disjoint, or one includes the other. Theorem: A binary matrix M has a perfect phylogenetic tree iff the collection {O 1,…, O k } is laminar. C 1 C 2 C 3 C 4 C 5 A B C D E C 1 C 2 C 3 C 4 C 5 A B C D E Laminar Not Laminar

9 Proof of Theorem  Assume M has a perfect phylogeny. Consider the vertices labeled C i and C j : Ÿ If C i is an ancestor of C j ( C 2,C 1 below ), then O i includes O j. Ÿ If neither of them is an ancestor of the other ( C 3,C 1 below ), then O i and O j are disjoint. A E D C B C4C4 C3C3 C1C1 C5C5 C2C2

10  Assume that the collection {O 1,…, O k } is laminar. We (constructively) prove that M has a perfect phylogenetic tree. Proof outline: Consider the inclusion graph of {O 1,…, O k }. Removing “unnecessary” edges results in a directed forest. Add a root and connect it to all sources, and add edges from leaves of the inclusion tree to the singletons representing the input species. Proof of Theorem (cont) C 1 C 2 C 3 C 4 C 5 A B C D E A E D C B C4C4 C3C3 C1C1 C5C5 C2C2

11 Efficient Implementation 1. Sort the columns (characters) according to decreasing binary value. Claim: If the binary value of column i is larger than that of column j, then O i is not a proper subset of O j. Proof: C i > C j means the 1 ’s in C i are not covered by the 1 ’s in C j. Corolary: the parent of C i is the closest C j s.t. C i is a proper subset of C j. C 1 C 2 C 3 C 4 C 5 A B C D E C 2 C 1 C 3 C 5 C 4 A B C D E 10000

12 why is this? 2. Make a backwards linked list of the 1 ’s in each row Claim: If the columns are sorted, then the set of columns is laminar iff for each column i, all the links leaving column i point at the same column. Efficient Implementation (cont) C 2 C 1 C 3 C 5 C 4 A B C D E C 2 C1C1 C3C3 C5C5 C4C4 A B C D E If the matrix is laminar, reverse the pointers to get the inclusion tree. Add root and leaves, as stated in slide #10.

13 1. Sort the columns (characters) according to decreasing binary value. 2. Make a backwards linked list of the 1 ’s in each row 3. If the matrix is laminar, reverse the pointers to get the inclusion tree. Add root and leaves, as stated in slide #10. Complexity: O(nk) – use radix (bucket) sort in stage 1. Efficient Implementation - Summary C 1 C 2 C 3 C 4 C 5 A B C D E C 2 C 1 C 3 C 5 C 4 A B C D E 10000