Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng.

Slides:



Advertisements
Similar presentations
An introduction to maximum parsimony and compatibility
Advertisements

WSPD Applications.
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Graph Isomorphism Algorithms and networks. Graph Isomorphism 2 Today Graph isomorphism: definition Complexity: isomorphism completeness The refinement.
Lecture 13 CS5661 Phylogenetics Motivation Concepts Algorithms.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Phylogenetic reconstruction
Fingerprint Clustering - CPM Fingerprint Clustering with Bounded Number of Missing Values Paola Bonizzoni, Gianluca Della Vedova, Giancarlo Mauri.
Molecular Evolution Revised 29/12/06
The Cobweb of life revealed by Genome-Scale estimates of Horizontal Gene Transfer Fan Ge, Li-San Wang, Junhyong Kim Mourya Vardhan.
Phylogenetic Trees: Assumptions All existing species have a common ancestor Each species is descended from a single ancestor Each speciation gives rise.
Efficient Merging and Construction of Evolutionary Trees Andrzej Lingas,Hans Olsson, and Anna Ostlin Journal of Algorithms 2001 Reporter: Jian-Fu Dong.
University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.
The (Supertree) of Life: Procedures, Problems, and Prospects Presented by Usman Roshan.
Fast Computation of the Exact Hybridization Number of Two Phylogenetic Trees Yufeng Wu and Jiayin Wang Department of Computer Science and Engineering University.
Education and Computational Biology Dean L. Zeller Kent State University OCCBIO ‘06 July 28-30, 2006.
Phylogeny Tree Reconstruction
Two Component Systems Sequence Characteristics Identification in Bacterial Genome Yaw-Ling Lin Dept Computer Sci. & Info. Management, Providence University,
Phylogenetic Networks of SNPs with Constrained Recombination D. Gusfield, S. Eddhu, C. Langley.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Utilizing Fuzzy Logic for Gene Sequence Construction from Sub Sequences and Characteristic Genome Derivation and Assembly.
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
P HYLOGENETIC T REE. OVERVIEW Phylogenetic Tree Phylogeny Applications Types of phylogenetic tree Terminology Data used to build a tree Building phylogenetic.
Maximum Parsimony Input: Set S of n aligned sequences of length k Output: –A phylogenetic tree T leaf-labeled by sequences in S –additional sequences of.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
Computer Science Research for The Tree of Life Tandy Warnow Department of Computer Sciences University of Texas at Austin.
VAST 2011 Sebastian Bremm, Tatiana von Landesberger, Martin Heß, Tobias Schreck, Philipp Weil, and Kay Hamacher Interactive-Graphics Systems TU Darmstadt,
PHYLOGENETIC TREES Dwyane George February 24,
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Constructing evolutionary trees from rooted triples Bang Ye Wu Dept. of Computer Science and Information Engineering Shu-Te University.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Introduction to Phylogenetic Trees
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Bayes estimators for phylogenetic reconstruction Ruriko Yoshida.
394C, Spring 2013 Sept 4, 2013 Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Evolutionary tree reconstruction
Bayes estimators for phylogenetic reconstruction Ruriko Yoshida.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Data Structures TREES.
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.
Understanding sets of trees CS 394C September 10, 2009.
1 Alignment Matrix vs. Distance Matrix Sequence a gene of length m nucleotides in n species to generate an… n x m alignment matrix n x n distance matrix.
Subtree Prune Regraft & Horizontal Gene Transfer or Recombination.
Efficient Algorithms for SNP Haplotype Block Selection Problems Yaw-Ling Lin ( 林耀鈴 ) Dept Computer Sci and Info Engineering College of Computing and Informatics.
598AGB Basics Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT AAGGCCTTGGACTT.
CS 598 AGB Supertrees Tandy Warnow. Today’s Material Supertree construction: given set of trees on subsets of S (the full set of taxa), construct tree.
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
Chapter AGB. Today’s material Maximum Parsimony Fixed tree versions (solvable in polynomial time using dynamic programming) Optimal tree search.
394C: Algorithms for Computational Biology Tandy Warnow Jan 25, 2012.
WABI: Workshop on Algorithms in Bioinformatics
394C, Spring 2012 Jan 23, 2012 Tandy Warnow.
Tatsuie Tsukiji (speaker) Tokyo Denki University
Introduction to Trees Section 11.1.
Character-Based Phylogeny Reconstruction
Algorithms and networks
Multiple Alignment and Phylogenetic Trees
Hierarchical clustering approaches for high-throughput data
Algorithms and networks
Comparative RNA Structural Analysis
CS 581 Tandy Warnow.
Speaker: Chuang-Chieh Lin National Chung Cheng University
CS 394C: Computational Biology Algorithms
September 1, 2009 Tandy Warnow
Presentation transcript:

Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng Hsu 2 1 Dept Computer Sci. & Info. Management, Providence University, Taichung, Taiwan. 2 Institute of Information Science Academia Sinica, Taipei, Taiwan

Yaw-Ling Lin, Providence, Taiwan2 Motivation – Where the problems come from?

Yaw-Ling Lin, Providence, Taiwan3 Two-Component System Two-component systems (2CS): –Sensor histidine kinase –response regulator The major controlling machinery in order for bacteria to encounter a diverse and often hostile environment

Yaw-Ling Lin, Providence, Taiwan4 2CS in Pseudomonas aeruginosa PAO1 “Complete genome sequence of Pseudomonas aeruginosa PAO1, an opportunistic pathogen.” Nature Aug 31;406(6799): by Stover CK, Pham XQ, Erwin AL, et al. Genome: 6.3M bp predicted genes: genes were classified as 2CSs.

Yaw-Ling Lin, Providence, Taiwan5 2CS in PAO1

Yaw-Ling Lin, Providence, Taiwan6 2CS in PAO1

Yaw-Ling Lin, Providence, Taiwan7 2CS in PAO1

Yaw-Ling Lin, Providence, Taiwan8 2CS in PAO1 There are 123 annotated 2CS genes in PAO1. Use systemic analysis of the evolutionary relationships between the sensor kinase and response regulator of a 2CS. Construct phylogenic trees using Clustal-W for 54 sensor kinases and 59 response regulators.

Yaw-Ling Lin, Providence, Taiwan9 2CS in PAO1 -- Sensor Tree

Yaw-Ling Lin, Providence, Taiwan10 2CS: Regulator Tree

Yaw-Ling Lin, Providence, Taiwan11 Subtrees Analysis of 2CS

Yaw-Ling Lin, Providence, Taiwan12 Co-evolution subtree Analysis Sensor TreeRegulator Tree versus

Yaw-Ling Lin, Providence, Taiwan13 Different Trees Different phylogenetic trees inference methods : -Maximum parsimony -Maximum likelihood -Distance matrix fitting -Quartet based methods C omparing the same set of species w.r.t. different biological sequences or different genes, hence obtaining various trees. How to find the largest set of items on which the trees agree ?

Yaw-Ling Lin, Providence, Taiwan14 Previous Results Measuring the similarity / difference between trees: -Symmetric difference [Robinson 1979] -Robinson and Foulds (RF) metric [Robinson 1981] -Nearest-neighbor interchange [Waterman 1978] -Subtree transfer distance [Allen 2001] -Quartet metric [Estabrook 1985] Inferring the consensus tree: maximum agreement subtree problem (MAST) ; a.k.a the maximum homeomorphic agreement subtree

Yaw-Ling Lin, Providence, Taiwan15 MAST: Maximum Agreement Subtree Problem: given a set of rooted trees whose leaves are drawn from the same set of items of size n, find the largest subset of these items so that the portions of the trees restricted to the subset are isomorphic. [Amir and Keselman 1997]: NP-hard even for 3 unbounded degree trees. [Hein 1995]: the MAST for 3 trees with unbounded degree is hard to be approximated. [Amir et al 1997] Polynomial time algorithms for three or more bounded degree trees, but the time complexity is exponential in the bound for the degree.

Yaw-Ling Lin, Providence, Taiwan16 MAST: Maximum Agreement Subtree [Farach and Thorup 1997]: O(n 1. 5 log n) time algorithm for two arbitrary degree trees. [Cole et al 2002]: MAST of two binary trees can be found in O(n log n) time; MAST of two degree d trees can be found in time.

Yaw-Ling Lin, Providence, Taiwan17 Problem Definition A phylogenetic tree with n leaves is a (rooted) tree such that all the leaf nodes are uniquely labelled from 1 to n. The descendent subtree of a phylogenetic tree T is the subtree composed by all edges and nodes of T descending from a vertex. Given a set of n-leaf phylogenetic trees, we wish to explore the descendent subtrees relationships within these trees.

Yaw-Ling Lin, Providence, Taiwan18 Normalized cluster distance between two sets Symmetric set difference: Normalized cluster distance:

Yaw-Ling Lin, Providence, Taiwan19 All Pairs Subtrees Comparison – A naïve O(n 3 ) algorithm

Yaw-Ling Lin, Providence, Taiwan20 All Pairs Subtrees Comparison – Property

Yaw-Ling Lin, Providence, Taiwan21 All Pairs Subtrees Comparison – an O(n 2 ) algorithm

Yaw-Ling Lin, Providence, Taiwan22 Lowest Common Ancestor

Yaw-Ling Lin, Providence, Taiwan23 Confluent subtree

Yaw-Ling Lin, Providence, Taiwan24 Confluent subtree – Illustration

Yaw-Ling Lin, Providence, Taiwan25 Consructing confluent subtree

Yaw-Ling Lin, Providence, Taiwan26 Nearest subtree

Yaw-Ling Lin, Providence, Taiwan27 Nearest subtree: reasoning

Yaw-Ling Lin, Providence, Taiwan28 Nearest subtree: Algorithm

Yaw-Ling Lin, Providence, Taiwan29 Leaf-agree / Isomorphic Subtrees

Yaw-Ling Lin, Providence, Taiwan30 leaf-agreement – Two Trees

Yaw-Ling Lin, Providence, Taiwan31 All-agreement: Illustration X Y z x y y’=Lca(Y) T1T1 X z’=Lca(x’, y’) Y x’=Lca(X) T2T2

Yaw-Ling Lin, Providence, Taiwan32 All-agreement Method

Yaw-Ling Lin, Providence, Taiwan33 leaf-agreement – k Trees

Yaw-Ling Lin, Providence, Taiwan34 Isomorphic Descendent Subtrees

Yaw-Ling Lin, Providence, Taiwan35 Isomorphic Descendent Subtrees (2)

Yaw-Ling Lin, Providence, Taiwan36 Conclusion Computing all pairs normalized cluster distances between all paired subtrees of two trees can be computationally optimally done in O(n 2 ) time Finding nearest subtrees for a collection of pairwise disjointed subsets of leaves can be done in O(n) time. Finding all descendent subtrees consisting of the same set of leaves in a set of (unbounded-degree) trees is solvable in time linear to the size of the input trees. Finding all isomorhpic descendent subtrees in a set of (unbounded-degree) trees is solvable in time linear to the size of the input trees.

Yaw-Ling Lin, Providence, Taiwan37 Future Research Clustering analysis of 2CS for functional prediction of uncharacterized genes Co-evolutionary analysis of 2CS (Rooted / unrooted) phylogenetic trees comparison: when edges are labeled with (likelihood, log-odds) distances.

Yaw-Ling Lin, Providence, Taiwan38 The End

Yaw-Ling Lin, Providence, Taiwan39 What Date is Today? Magic Number: –4/4, 6/6, 8/8, 10/10, 12/12 –7/11, 9/5 [also 11/7, 5/9] –3/0? [implying 2/28, 2/0 = 1/31] Extension: –365 = 52 * –Leap Year? 2003: 5 ; 2004: 7 ; 2005: 1 ; 2005:2