Constructing evolutionary trees from rooted triples Bang Ye Wu Dept. of Computer Science and Information Engineering Shu-Te University.

Slides:



Advertisements
Similar presentations
An Algorithm for Constructing Parsimonious Hybridization Networks with Multiple Phylogenetic Trees Yufeng Wu Dept. of Computer Science & Engineering University.
Advertisements

Constraint Satisfaction Problems
Greedy Algorithms Greed is good. (Some of the time)
Greedy Algorithms Spanning Trees Chapter 16, 23. What makes a greedy algorithm? Feasible –Has to satisfy the problem’s constraints Locally Optimal –The.
Computing Kemeny and Slater Rankings Vincent Conitzer (Joint work with Andrew Davenport and Jayant Kalagnanam at IBM Research.)
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Graduate Center/City University of New York University of Helsinki FINDING OPTIMAL BAYESIAN NETWORK STRUCTURES WITH CONSTRAINTS LEARNED FROM DATA Xiannian.
Nov Properties of Tree Convex Constraints Authors: Yuanlin Zhang & Eugene C. Freuder Presentation by Robert J. Woodward CSCE990 ACP, Fall 2009.
Effective Heuristics for NP-Hard Problems Arising in Molecular Biology Richard M. Karp Bangalore, January 5, 2011.
Optimization of Pearl’s Method of Conditioning and Greedy-Like Approximation Algorithm for the Vertex Feedback Set Problem Authors: Ann Becker and Dan.
Approximation Algorithms
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
The Evolution Trees From: Computational Biology by R. C. T. Lee S. J. Shyu Department of Computer Science Ming Chuan University.
Polynomial-Time Approximation Schemes for Geometric Intersection Graphs Authors: T. Erlebach, L. Jansen, and E. Seidel Presented by: Ping Luo 10/17/2005.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Recent Development on Elimination Ordering Group 1.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #9.
Primal Dual Method Lecture 20: March 28 primaldual restricted primal restricted dual y z found x, succeed! Construct a better dual.
Implicit Hitting Set Problems Richard M. Karp Harvard University August 29, 2011.
A New Algorithm for Optimal 2-Constraint Satisfaction and Its Implications Ryan Williams Computer Science Department, Carnegie Mellon University Presented.
2-Layer Crossing Minimisation Johan van Rooij. Overview Problem definitions NP-Hardness proof Heuristics & Performance Practical Computation One layer:
CS541 Advanced Networking 1 Routing and Shortest Path Algorithms Neil Tang 2/18/2009.
Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.
Randomness in Computation and Communication Part 1: Randomized algorithms Lap Chi Lau CSE CUHK.
Graph-Cut Algorithm with Application to Computer Vision Presented by Yongsub Lim Applied Algorithm Laboratory.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Approximation Algorithms Motivation and Definitions TSP Vertex Cover Scheduling.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
The Maximum Independent Set Problem Sarah Bleiler DIMACS REU 2005 Advisor: Dr. Vadim Lozin, RUTCOR.
Constant Factor Approximation of Vertex Cuts in Planar Graphs Eyal Amir, Robert Krauthgamer, Satish Rao Presented by Elif Kolotoglu.
Part I: Introductory Materials Introduction to Graph Theory Dr. Nagiza F. Samatova Department of Computer Science North Carolina State University and Computer.
CSE 589 Applied Algorithms Spring Colorability Branch and Bound.
Rule Generation [Chapter ]
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Simple and Improved Parameterized Algorithms for Multiterminal Cuts Mingyu Xiao The Chinese University of Hong Kong Hong Kong SAR, CHINA CSR 2008 Presentation,
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng.
INHERENT LIMITATIONS OF COMPUTER PROGRAMS CSci 4011.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Stephane Durocher 1 Debajyoti Mondal 1 Md. Saidur Rahman 2 1 Department of Computer Science, University of Manitoba 2 Department of Computer Science &
Approximation Algorithms
The bootstrap, consenus-trees, and super-trees Phylogenetics Workhop, August 2006 Barbara Holland.
© Jalal Kawash 2010 Graphs Peeking into Computer Science.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
394C, Spring 2013 Sept 4, 2013 Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT.
Estimating Species Tree from Gene Trees by Minimizing Duplications
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Two Dimension Measures: A New Algorithimic Method for Solving NP-Hard Problems Yang Liu.
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
1 Ch20. Dynamic Programming. 2 BIRD’S-EYE VIEW Dynamic programming The most difficult one of the five design methods Has its foundation in the principle.
A global approach Finding correspondence between a pair of epipolar lines for all pixels simultaneously Local method: no guarantee we will have one to.
598AGB Basics Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT AAGGCCTTGGACTT.
CS 598 AGB Supertrees Tandy Warnow. Today’s Material Supertree construction: given set of trees on subsets of S (the full set of taxa), construct tree.
Introduction to Multiple-multicast Routing Chu-Fu Wang.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Constraint Programming for the Diameter Constrained Minimum Spanning Tree Problem Thiago F. Noronha Celso C. Ribeiro Andréa C. Santos.
Chapter AGB. Today’s material Maximum Parsimony Fixed tree versions (solvable in polynomial time using dynamic programming) Optimal tree search.
Quartet distance between general trees Chris Christiansen Thomas Mailund Christian N.S. Pedersen Martin Randers.
1 Minimum Routing Cost Tree Definition –For two nodes u and v on a tree, there is a path between them. –The sum of all edge weights on this path is called.
Frequent Pattern Mining
Exact Inference Continued
Association Analysis: Basic Concepts and Algorithms
CS 581 Tandy Warnow.
Problem Solving 4.
A p r o x i m a t n l g h s f e - u c D i s c r e t A p l d M a h m 1
Presentation transcript:

Constructing evolutionary trees from rooted triples Bang Ye Wu Dept. of Computer Science and Information Engineering Shu-Te University

An evolutionary tree  A rooted tree  Each leaf represents one species.  Internal nodes are unlabelled. (inferred common ancestors) abcdef

A (rooted) triple (triplet)  An evolutionary tree of 3 species.  A constraint in an evolutionary tree construction problem.  (c(ab)): lca(b,c)=lca(c,a)  lca(a,b) lca : lowest common ancestor  : “ is an ancestor of “  a,b should be closer than a,c or b,c. abc

A tree compatible with triples  Given a set of triples, construct a tree satisfying all the triples.  If such a tree exists, the problem is polynomial time solvable. [Aho et al, 1981]

Incompatible (conflicting) triples Two conflicting triplesThree conflicting triples (pairwise compatible)

Two optimization problems  The maximum consensus tree: –the tree satisfying maximum number of triples. –NP-hard [Jansson, 2001][Wu, to appear] –A new heuristic algorithm [this paper]  The maximum compatible set: –The compatible species subset of maximum cardinality. –NP-hard [this paper]

Previous heuristic Best-One-Split-First  If a species x is split from a set V, all triples (x(v 1 v 2 )), v 1 and v 2 in V, will be satisfied.  Repeatedly split one species from the set. Choose the split species greedily.

{a,b,d} c b {a,d} c dabc c is chosen, and the two triples is satisfied. c is split b is split

Previous heuristic Min-Cut-Split-First  Construct an auxiliary graph: –Vertex: species –Each edge is labeled by a set: for each triple (x(yz)), x is in the label set of edge (y,z).

–A bipartition corresponds to a split in the tree. –The label in the cut of the bipartition corresponds to the triples conflicting the split.  Repeatedly find the bipartition with minimum cut. a min-cut, triple (c(bd)) is conflicting

Previous heuristic Best-Pair-Merge-First  Instead of top-down splitting, BPMF uses the bottom-up merging strategy.  Starting from sets of singleton, we repeatedly merge the sets step by step.  Scoring functions are used to evaluate which pair should be merged in each step.

{a}{b}{c}{d} {a,d}{b}{c} {a,d}{b,c} {a,d,b,c} ad adbc adbc

An exact algorithm for MCTT  Dynamic programming  F(V)=max{F(V 1 )+F(V 2 )+W(V 1,V 2 )}, taken among all bipartition (V 1,V 2 ) of V. –F(V): # of satisfied triples over V. –W(V 1,V 2 ): # of (x(v 1 v 2 ) for x not in V and v 1, v 2 in V 1, V 2 respectively.  Computed with cardinality from small to large.

n=4abcd 3 n=3abc 1 abd 3 bcd 2 n=2ab 0 ac 0 ad 2 bc 1 bd 1 cd 0 n=1a0a0 b0b0 c0c0 d0d0

Our new heuristic algorithm (DPWP)  Derived from the exact algorithm.  The number of subsets of each cardinality is limited by a parameter K.  When K=infinity, it is just the exact algorithm.  Time-quality trade-off.  The time complexity is O(n 2 k 2 (n 3 +k)). –Sorry, there is a mistake in the paper.

The experiment results (time)

The MCST problem  Given triples over species set S, find a subset U of S such that all given triples over U is compatible and |U| is maximum.  We show the problem is NP-hard. –Transformed from the Feedback Vertex Set problem.

The feedback vertex set problem  Feedback vertex set: a vertex subset containing at one vertex of each cycle of the given directed graph. –In other words, removing a feedback vertex set results in an acyclic digraph.

The reduction

Concluding remarks  What is the approximation ratio? –The Best-One-Split-First algorithm is a 3- approximation algorithm, –The larger K give us better solution, but we do not know the theoretic bound of the ratio.