SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.

Slides:



Advertisements
Similar presentations
1 Modified Mincut Supertrees Roderic Page University of Glasgow.
Advertisements

Sharlee Climer, Alan R. Templeton, and Weixiong Zhang
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Institute of Intelligent Power Electronics – IPE Page1 Introduction to Basics of Genetic Algorithms Docent Xiao-Zhi Gao Department of Electrical Engineering.
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
Fast Algorithms for Minimum Evolution Richard Desper, NCBI Olivier Gascuel, LIRMM.
The (Supertree) of Life: Procedures, Problems, and Prospects Presented by Usman Roshan.
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
CSC401 – Analysis of Algorithms Lecture Notes 12 Dynamic Programming
How to See a Tree for a Forest? Combining Phylogenetic Trees – Reasons, Methods, and Consequences Tanya Y. Berger-Wolf Laboratory for High-Performance.
Bas E. Dutilh Phylogenomics Using complete genomes to determine the phylogeny of species.
Phylogeny reconstruction BNFO 602 Roshan. Simulation studies.
MCB 372 #12: Tree, Quartets and Supermatrix Approaches Collaborators: Olga Zhaxybayeva (Dalhousie) Jinling Huang (ECU) Tim Harlow (UConn) Pascal Lapierre.
Chapter 2 Opener How do we classify organisms?. Figure 2.1 Tracing the path of evolution to Homo sapiens from the universal ancestor of all life.
Building Phylogenies Parsimony 2.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Perfect Phylogeny MLE for Phylogeny Lecture 14
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette.
CIS786, Lecture 4 Usman Roshan.
Supertrees: Algorithms and Databases Roderic Page University of Glasgow DIMACS Working Group Meeting on Mathematical and Computational.
Phylogenetic trees Sushmita Roy BMI/CS 576
Dynamic Programming Introduction to Algorithms Dynamic Programming CSE 680 Prof. Roger Crawfis.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Maximum parsimony Kai Müller.
Combinatorial and Statistical Approaches in Gene Rearrangement Analysis Jijun Tang Computer Science and Engineering University of South Carolina
Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
CNRS - Université Montpellier 2 France 1 Phylogenetic Signal with Induction and non-Contradiction: the PhySIC method for building supertrees
VAST 2011 Sebastian Bremm, Tatiana von Landesberger, Martin Heß, Tobias Schreck, Philipp Weil, and Kay Hamacher Interactive-Graphics Systems TU Darmstadt,
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Constructing evolutionary trees from rooted triples Bang Ye Wu Dept. of Computer Science and Information Engineering Shu-Te University.
CSC401: Analysis of Algorithms CSC401 – Analysis of Algorithms Chapter Dynamic Programming Objectives: Present the Dynamic Programming paradigm.
Introduction to Phylogenetic Trees
Benjamin Loyle 2004 Cse 397 Solving Phylogenetic Trees Benjamin Loyle March 16, 2004 Cse 397 : Intro to MBIO.
The bootstrap, consenus-trees, and super-trees Phylogenetics Workhop, August 2006 Barbara Holland.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
On the Scalability of Computing Triplet and Quartet Distances Morten Kragelund Holt Jens Johansen Gerth Stølting Brodal 1 Aarhus University.
Estimating Species Tree from Gene Trees by Minimizing Duplications
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
MCB 3421 class 26.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
Selecting Genomes for Reconstruction of Ancestral Genomes Louxin Zhang Department of Mathematics National University of Singapore.
Using Divide-and-Conquer to Construct the Tree of Life Tandy Warnow University of Illinois at Urbana-Champaign.
A Different Solution  alternatively we can use the following algorithm: 1. if n == 0 done, otherwise I. print the string once II. print the string (n.
598AGB Basics Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT AAGGCCTTGGACTT.
SupreFine, a new supertree method Shel Swenson September 17th 2009.
CS 598 AGB Supertrees Tandy Warnow. Today’s Material Supertree construction: given set of trees on subsets of S (the full set of taxa), construct tree.
CS 395T: Computational phylogenetics January 18, 2006 Tandy Warnow.
Application of Phylogenetic Networks in Evolutionary Studies Daniel H. Huson and David Bryant Presented by Peggy Wang.
Chapter AGB. Today’s material Maximum Parsimony Fixed tree versions (solvable in polynomial time using dynamic programming) Optimal tree search.
Darwin’s Tree of Life, July million species Phylogenetic inference from genomic.
Quartet distance between general trees Chris Christiansen Thomas Mailund Christian N.S. Pedersen Martin Randers.
394C: Algorithms for Computational Biology Tandy Warnow Jan 25, 2012.
394C, Spring 2012 Jan 23, 2012 Tandy Warnow.
BNFO 602 Phylogenetics Usman Roshan.
BNFO 602 Phylogenetics – maximum parsimony
CS 581 Tandy Warnow.
Comments on bipartitions, quartets and supertrees
CS 581 Tandy Warnow.
The Most General Markov Substitution Model on an Unrooted Tree
Lecture 7 – Algorithmic Approaches
CS 394C: Computational Biology Algorithms
Algorithms for Inferring the Tree of Life
Imputing Supertrees and Supernetworks from Quartets
Presentation transcript:

SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery

SuperTriplets: ISBM Introduction: inferring phylogeny (1 gene)

SuperTriplets: ISBM Introduction: inferring phylogeny (3 genes) Gene 1Gene 3Gene 2 ?????????????????????????????????? ??????????????????????? ?????????????????? SuperTree SuperMatrix

SuperTriplets: ISBM Introduction: inferring phylogeny (more data) Gene 1000Gene 1 ??????????????????????? ?????????????????? SuperTree SuperMatrix ……………………….. ………………………. ……………………….. SNP / Morpho/ biblio

SuperTriplets: ISBM Supertree overview: MRP ?11? ??0?011?0???0010 ?? ??001???? ??00??001?0 111?? ????01 MRP [Baum 1992, Ragan 1992] 1 binary sequence per taxon 1 site per clade (1=in the clade; 0 outside; ? missing) MRP ABCDEFABCDEF CDEABFCDEABF CDEFBACDEFBA MRP [Golobo ff and Pol, 2002] Relation contradicted by all source trees

SuperTriplets: ISBM Supertree overview: intuitive approach The Supertree problem (intuitive formulation) Input: a collection of overlapping trees (a forest) Output: the tree that best represents this collection A major question is: how to define "best represents" ? Vizualizing supertree candidates within the tree space Median supertree Intuitive solution Generalization of the consensus tree Good theoretical properties [Steel and Rodriguo, 2008]

SuperTriplets: ISBM Supertree oveview: median tree d(, ) = + - Tree decomposition as: split set quartet set triplet set Tree restrictionInitial trees

SuperTriplets: ISBM Supertree overview: MRP and median tree EDCBAEDCBA T1T1 Triplet MR ABCDEFGHABCDEFGH 110?????0110?????0 11?0????011?0????0 AB|C AB|D … GH|F … FH|G … ……………………………………………… Rooting FGHBACFGHBAC T2T2 ?????1010?????1010 ……………………………………………… ?????0110?????0110 GFHBACGFHBAC T3T3 ……………………………………………… ?11? ??0?011?0???0010 ?? ??001???? ??00??001?0 111?? ????01 MRP Input forest

SuperTriplets: ISBM Supertree overview: MRP and median tree The parsimony value is related to the triplet distance: 1 parsimony step for triplets within the supertree 2 parsimony steps for others parsimony score = nbSites + (triplet distance)/2 The MRP approach is unadapted to triplet encoding for 100 taxa 97% of « ? » for 1000 taxa 99.7% of « ? » unnecessary huge matrices

SuperTriplets: ISBM Supertriplets: few notations Given a forest F of input trees N + (xy|z): number of occurrences of xy|z in F N - (xy|z) = N + (xz|y) + N + (yz|x) (alternive resolutions in F) Input trees are then useless (little impact of forest size) Searching for the (asymmetric) triplet median tree T: median : asymmetric

SuperTriplets: ISBM Supertriplets: general overview N - (homo pan|mus) N + (homo pan|mus) N - (pan bos|mus) N + (pan bos|mus) N - (homo pan|bos) N + (homo pan|bos) N - (mus pan| bos) N + (mus pan|bos) ………… ……….. Triplet decompostion first sketch NJ-like strategy improvement NNI local search Branch support and collapse O(n 3 |F| ) O(n 3 ) + consistency O(n 3 ) to test all branches once O(n 3 )

SuperTriplets: ISBM Supertriplets: agglomerative process DE|A DE|B DE|C AB|C AB|D AB|E Triplets(T 3 ) EDCBAEDCBA T0T0 C 1 ={A} C 2 ={B} EDCBAEDCBA T1T1 C 1 ={D} C 2 ={E} EDCBAEDCBA T2T2 AC|D BC|D AC|E BC|E C 1 ={A,B} C 2 ={C} EDCBAEDCBA T3T3

SuperTriplets: ISBM Supertriplets: agglomerative process Agglomeration of (C A,C B ) Transform T into T’ Resolve some new triplets (AB|X) with A  C A, B  C B, X  {C A  C B } d 3 ( T’,F ) = d 3 ( T,F ) - ( ∑ N+(AB|X) - ∑ N - (AB|X) ) We select the pair maximizing Score (C A, C B ) = (∑ N+(AB|X) - ∑ N- (AB|X) ) / ( ∑ N + (AB|X) + ∑ N - (AB|X) ) The whole process is O(n 3 ) : when C A and C B are agglomerated score(C D, C E ) is unchanged score(C {AB},C D ) is easily derived from Score (C A, C D ) and Score (C B, C D )

SuperTriplets: ISBM Supertriplets: NNI optimisation The variation d 3 (T’,F) - d 3 (T,F) depends on few triplets (here ) All these variations are initially evaluated in O(n 3 ) Once a NNI is done few NNI have to be re-evaluated (4 adjacent edges) NNI optimisation is therefore very fast 2 possible NNI per edge T T’

SuperTriplets: ISBM Supertriplets: edge supports Local support ∑ N + ( ) / [ ∑ N + ( ) + ∑ N - ( ) ] If <0.5 collapsing the edge improve d 3 (T,F) Global support Also take into account N + ( ) and N - ( ) impact two edges Final edge support: min (local, global) T

SuperTriplets: ISBM Supertriplets: simulation protocol Are they similar? Triplet/split measure [Eulenstein et al. 2004] [Criscuolo et al. 2006]

SuperTriplets: ISBM Supertriplets: simulation results Less resolved Very few errors Contain errors lack of resolution perfect Splits triplets

SuperTriplets: ISBM Supertriplets: Phylogenomic case study Supertree of 33 mammals Species: complete genomes ( EnsEMBL v54) Sequences: orthologous CDS (orthoMaM v5) Gene trees: ML trees (inferred using PAUP) Output supertree Computed in 30s Congruent with [Prasad et al. 2008]

SuperTriplets: ISBM Conclusion & prospects (Asymmetric) median supertree Easy to understand Makes tree weighting natural MRP, triplets and median supertree Understanding the criteria optimized by MRP Design a dedicated algorithm to optimize it Supertrees & supermatrix are complementary vertebrate genome project Divide and conquer approach i) trees based on multiple CDSs (supermatrix) ii) assembling those trees (supertree)

SuperTriplets: ISBM Supertriplet: N - (homo pan|mus) N + (homo pan|mus) N - (pan bos|mus) N + (pan bos|mus) N - (homo pan|bos) N + (homo pan|bos) N - (mus pan| bos) N + (mus pan|bos) ………… ……….. Triplet decompostion first sketch NJ-like strategy improvement NNI local search Branch support and collapse O(n 3 |F| ) O(n 3 ) + consistency O(n 3 ) to test all branches once O(n 3 ) Less resolved Very few errors