PHYLOGENETIC TREES Bulent Moller CSE 397 18 March 2004.

Slides:



Advertisements
Similar presentations
Chapter 11 Trees Graphs III (Trees, MSTs) Reading: Epp Chp 11.5, 11.6.
Advertisements

Introduction to Algorithms Graph Algorithms
WSPD Applications.
CSE115/ENGR160 Discrete Mathematics 04/26/12 Ming-Hsuan Yang UC Merced 1.
8.3 Representing Relations Connection Matrices Let R be a relation from A = {a 1, a 2,..., a m } to B = {b 1, b 2,..., b n }. Definition: A n m  n connection.
Review Binary Search Trees Operations on Binary Search Tree
Theory of Computing Lecture 18 MAS 714 Hartmut Klauck.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Graph Isomorphism Algorithms and networks. Graph Isomorphism 2 Today Graph isomorphism: definition Complexity: isomorphism completeness The refinement.
Graphs III (Trees, MSTs) (Chp 11.5, 11.6)
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Tirgul 8 Graph algorithms: Strongly connected components.
Data Structures, Spring 2004 © L. Joskowicz 1 Data Structures – LECTURE 14 Strongly connected components Definition and motivation Algorithm Chapter 22.5.
Complexity 11-1 Complexity Andrei Bulatov NP-Completeness.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
. Perfect Phylogeny Tutorial #11 © Ilan Gronau Original slides by Shlomo Moran.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
The (Supertree) of Life: Procedures, Problems, and Prospects Presented by Usman Roshan.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Strings and.
CSE182-L17 Clustering Population Genetics: Basics.
Data Structures, Spring 2006 © L. Joskowicz 1 Data Structures – LECTURE 14 Strongly connected components Definition and motivation Algorithm Chapter 22.5.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Strings and.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
Phylogenetic Networks of SNPs with Constrained Recombination D. Gusfield, S. Eddhu, C. Langley.
Perfect Phylogeny MLE for Phylogeny Lecture 14
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Induction and recursion
1 Physical Mapping --An Algorithm and An Approximation for Hybridization Mapping Shi Chen CSE497 04Mar2004.
P HYLOGENETIC T REE. OVERVIEW Phylogenetic Tree Phylogeny Applications Types of phylogenetic tree Terminology Data used to build a tree Building phylogenetic.
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
“On an Algorithm of Zemlyachenko for Subtree Isomorphism” Yefim Dinitz, Alon Itai, Michael Rodeh (1998) Presented by: Masha Igra, Merav Bukra.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Phylogenetics II.
Incomplete Directed Perfect Phylogeny Itsik Pe'er, Tal Pupko, Ron Shamir, and Roded Sharan SIAM Journal on Computing Volume 33, Number 3, pp
Discrete Structures Lecture 12: Trees Ji Yanyan United International College Thanks to Professor Michael Hvidsten.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Mathematical Preliminaries
Problem Statement How do we represent relationship between two related elements ?
LIMITATIONS OF ALGORITHM POWER
. Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran.
Chapter 13 Backtracking Introduction The 3-coloring problem
. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
An Algorithm for the Consecutive Ones Property Claudio Eccher.
Section Recursion 2  Recursion – defining an object (or function, algorithm, etc.) in terms of itself.  Recursion can be used to define sequences.
by d. gusfield v. bansal v. bafna y. song presented by vikas taliwal
NP-Completeness (2) NP-Completeness Graphs 4/13/2018 5:22 AM x x x x x
More NP-Complete and NP-hard Problems
Richard Anderson Lecture 26 NP-Completeness
Computing Connected Components on Parallel Computers
NP-Completeness (2) NP-Completeness Graphs 7/23/ :02 PM x x x x
NP-Completeness (2) NP-Completeness Graphs 7/23/ :02 PM x x x x
NP-Completeness Proofs
Richard Anderson Lecture 26 NP-Completeness
Character-Based Phylogeny Reconstruction
Chapter 2 Sets and Functions.
Planarity Testing.
ICS 353: Design and Analysis of Algorithms
NP-Completeness (2) NP-Completeness Graphs 11/23/2018 2:12 PM x x x x
Phylogenetic Tree 12/8/2018.
Lectures on Graph Algorithms: searching, testing and sorting
Speaker: Chuang-Chieh Lin National Chung Cheng University
Text Book: Introduction to algorithms By C L R S
Trevor Brown DC 2338, Office hour M3-4pm
NP-Completeness (2) NP-Completeness Graphs 7/9/2019 6:12 AM x x x x x
Perfect Phylogeny Tutorial #10
INTRODUCTION A graph G=(V,E) consists of a finite non empty set of vertices V , and a finite set of edges E which connect pairs of vertices .
Presentation transcript:

PHYLOGENETIC TREES Bulent Moller CSE March 2004

Outline Recall Phylogenetic trees Character states and the perfect Phylogeny problem Binary Character states Compatibility is NP Complete

Recall Motivation The problem of explaining the evolutionary history of today's species How do species relate to one another in terms of common ancestors Nucleic acids and Proteins also evolve Approaches Fossil Records, Phylogenetic Trees

Recall In Phylogenetic trees Leaves represent present day species Interior nodes represent hypothesized ancestors

Features of Phylogenetic Trees Shows how interior nodes connect to one another and to the leaves, What does it tell to the biologist? Shows the distance between pairs of nodes when the tree edges are weighted What does it tell to the biologist?

Input data for Phylogenetic Reconstruction Distance Matrix Character State Matrix

A character has a finite number of states Taxonomical units for which we want to create phylogeny are called Objects e.g. species, population Every object has a state vector & inherit the same characters but not the same states!

Character State Matrix M M has n rows (Objects) M has m columns (characters) M ij denotes the state object i has for character j

Problems while constructing Phylogenetic Trees Convergence or Parallel evolution e.g. Presence of Wings in Birds and Bats Reversals e.g. Snakes Unordered characters

Assumptions There is no Convergence There is no Reversal Characters will be ordered 0 to 1 Our Character state Matrix will be Binary

Perfect Phylogeny Tree Defn: A tree has perfect phylogeny if For each state s of each character c, the set of all nodes u for which the state is s with respect to c must form a sub tree of T. In Particular, the edge e leading to this sub tree is uniquely associated with a transition from some state w to state s OBEY OUR ASSUMPTIONS

c1 c2 c3 c4 c5 C6 BD EA C Ex: Perfect Phylogeny tree

Perfect Phylogeny Problem Instance: A set O with n objects, a set C of m characters, each character having at most r states (n, m, r positive integers) Question: Is there a perfect phylogeny for O? If the character state matrix admits a perfect phylogeny we say that the defining characters are compatible

Perfect Phylogeny Problem Can we determine for every problem (input) the root? No, we may not have enough information Tree will be unrooted !

Ex: Unrooted Binary Tree Unrooted Binary tree do not imply a known ancestral root. This Tree has 3 possible rooted binary Trees with one common ancestor

Ex: Unrooted Binary Tree

Binary Character States Defn: For each Column j of M, let O j be the set of objects whose state is 1 for j. Let O j be the set of objects whose state is 0 for j. O c1 =?

Binary Character States Defn: For each Column j of M, let O j be the set of objects whose state is 1 for j. Let O j be the set of objects whose state is 0 for j. O c1 ={B,D} O c1 =?

Binary Character States Defn: For each Column j of M, let O j be the set of objects whose state is 1 for j. Let O j be the set of objects whose state is 0 for j. O c1 ={B,D} O c1 ={A,C,E}

Lemma A binary Matrix M admits a perfect phylogeny if and only if for each pair of characters i and j the sets O i and O j are disjoint or one of them contains each other

Sketch We will show the only if part of lemma by inductively building a rooted perfect phylogeny. Assume we have only 1 character as shown in the matrix

Sketch cont. According to the given matrix Oc1 = {B,D} and Oc1 = { A, C, E} Create a root and nodes Oc1, Oc1 Link node Oc1 to the root by labeling the edge with c1 and Oc1 w/o labeling

Sketch cont. According to the given matrix Oc1 = {B,D} and Oc1 = { A, C, E} Create a root and nodes Oc1, Oc1 Link node Oc1 to the root by labeling the edge with c1 and Oc1 w/o labeling Split each child of the root into as many leaves as there are objects in the nodes

Sketch cont. Consider we have built a tree T for k characters There are no leaves, nodes still contain set of objects process character k + 1 case 1: character k + 1 partitions only object sets belonging to the same node We do not hurt our perfect phylogeny property

Ex: A, B, C, D, E, F A, C, D, F B, E A, C D, F c1 c2 Oc3 O c1 = { A, C, D, F } O c2 = { B, E } k = 2 O c3 = { A, C }

Sketch cont. case 2: character k + 1 partitions object sets belonging to different nodes THIS CANNOT HAPPEN Assume it did, it can only happen if there exist a character i such that leads the objects in node a and b in different nodes. This is the case that Oi and Ok+1 are whether disjoint nor one is contained by the other.

Ex: A, B, C, D, E, F A, C, E B, D, F OiOi A, C E Oi = { A, C, E } Ok+1 = { A, B } A, B Ok+1

Algorithms For Simplicity we assume that the Phylogenetic tree construction works in 2 phases Decision Construction

Algorithms for Decisions The very basic Algorithm: Check if the input Matrix obeys Lemma How would you do that?

Basic Decision Algorithm Check every column pair of being disjoint or if one is the subset of the other One of these checks costs us O (n) we have m² column pairs O(nm²)

Decision Algorithms Improvement Visit every column only once to have Complexity O(nm) Process first characters for which the maximum number of objects has state 1 All other characters are either subsets of it or are disjoint from it.

Algorithms Perfect Phylogeny Decision Input: Binary Matrix M Output: True if M admits perfect pylogeny false otherwise //Sort column based on #1's //Initialize auxiliary matrix L for each Lij do Lij ← 0

Algorithms Perfect Phylogeny Decision for i ← 1 to n do k ← -1 for j ← 1 to m do if M ij = 1 then L ij ← k k ← j

Algorithms Perfect Phylogeny Decision for each column j of L do If Lij ≠ Lmj for some i, m and both Lij and Lmj are both non zero then return false return true

Algorithms Perfect Phylogeny Construction Input: binary matrix M with Columns sorted in decreasing order Output: perfect pylogeny for M

Algorithms Perfect Phylogeny Construction Create root for each object i do curNode ← root For 1 to m do If Mij = 1 then If there already exits edge (curNode, u) labeled j then curNode ← u else Create node u, Create edge( curNode, u) labeled j, curNode ← u Place i in curNode for each node u except root do Create as many leaves linked to u as there are objects in u

Compatibility In Phylogenies Recall that we violate the evolution process by not allowing convergence and reversals One Approach is to insist on avoiding reversals and convergence and trying to exclude few characters that causes them.

Compatibility In Phylogenies Goal: Find a maximum set of characters such that we can find a perfect phylogeny Problem: Compatibility Instance: A character state Matrix M with n objects and m directed binary characters, and a positive integer B ≤ m Question: Is there a subset L of characters that satisfies for each pair of characters i and j that the sets O i and O j are disjoint or one of them contains each other and |L| ≥ B?

Compatibility In Phylogenies Problem: Clique Instance: Graph G = (V,E), and positive integer K≤ |V| Question: Does G contain a subset V' of V with |V'|≥K such that every pair of vertices in V' is linked by an edge in E? Clique is NP Complete

Ex: Clique Which nodes build a clique with k = 3? C2 C1 C3 C4

Compatibility is NP Complete Proof: Create an Instance for Compatibility from the Instance of Clique as follows: Given G =(V,E), let m = |V|, so we create for every vertex vi in V we create character i in M The number of objects of M is n=3m(m-1)/2 For every pair (vi, vj) such that it is not an edge in E we create three objects r,s,t in M such that Mri=0, Msi=1, Mti=1, Mrj=1, Msj=1, Mtj=0 The remaining elements of M should be zero

Example C3 C2 C4 C1

Compatibility is NP Complete cont. G contains a clique V', with |V'|≥K iff M contains a compatible character subset L with |L|≥K If such a clique exists, then to every edge of this clique there corresponds a pair of characters in M, such that whenever one of them has state 1 for an object, the other has state 0 or both have 0. If L exists, then to every pair of characters of L there corresponds a pair of vertices in V linked by an edge. All this pairs together form a clique ≥K