Presentation is loading. Please wait.

Presentation is loading. Please wait.

SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.

Similar presentations


Presentation on theme: "SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery."— Presentation transcript:

1 SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery

2 SuperTriplets: ISBM 2010 2 Introduction: inferring phylogeny (1 gene)

3 SuperTriplets: ISBM 2010 3 Introduction: inferring phylogeny (3 genes) Gene 1Gene 3Gene 2 ?????????????????????????????????? ??????????????????????? ?????????????????? SuperTree SuperMatrix

4 SuperTriplets: ISBM 2010 4 Introduction: inferring phylogeny (more data) Gene 1000Gene 1 ??????????????????????? ?????????????????? SuperTree SuperMatrix ……………………….. ………………………. ……………………….. SNP / Morpho/ biblio

5 SuperTriplets: ISBM 2010 5 Supertree overview: MRP 0100101001?11?0100 01??0?011?0???0010 ??0011010??001???? 0100010??00??001?0 111??0101000????01 MRP [Baum 1992, Ragan 1992] 1 binary sequence per taxon 1 site per clade (1=in the clade; 0 outside; ? missing) MRP ABCDEFABCDEF CDEABFCDEABF CDEFBACDEFBA MRP [Golobo ff and Pol, 2002] Relation contradicted by all source trees

6 SuperTriplets: ISBM 2010 6 Supertree overview: intuitive approach The Supertree problem (intuitive formulation) Input: a collection of overlapping trees (a forest) Output: the tree that best represents this collection A major question is: how to define "best represents" ? Vizualizing supertree candidates within the tree space Median supertree Intuitive solution Generalization of the consensus tree Good theoretical properties [Steel and Rodriguo, 2008]

7 SuperTriplets: ISBM 2010 7 Supertree oveview: median tree d(, ) = + - Tree decomposition as: split set quartet set triplet set Tree restrictionInitial trees

8 SuperTriplets: ISBM 2010 8 Supertree overview: MRP and median tree EDCBAEDCBA T1T1 Triplet MR ABCDEFGHABCDEFGH 110?????0110?????0 11?0????011?0????0 AB|C AB|D … GH|F … FH|G … ……………………………………………… Rooting FGHBACFGHBAC T2T2 ?????1010?????1010 ……………………………………………… ?????0110?????0110 GFHBACGFHBAC T3T3 ……………………………………………… 0100101001?11?0100 01??0?011?0???0010 ??0011010??001???? 0100010??00??001?0 111??0101000????01 MRP Input forest

9 SuperTriplets: ISBM 2010 9 Supertree overview: MRP and median tree The parsimony value is related to the triplet distance: 1 parsimony step for triplets within the supertree 2 parsimony steps for others parsimony score = nbSites + (triplet distance)/2 The MRP approach is unadapted to triplet encoding for 100 taxa 97% of « ? » for 1000 taxa 99.7% of « ? » unnecessary huge matrices

10 SuperTriplets: ISBM 2010 10 Supertriplets: few notations Given a forest F of input trees N + (xy|z): number of occurrences of xy|z in F N - (xy|z) = N + (xz|y) + N + (yz|x) (alternive resolutions in F) Input trees are then useless (little impact of forest size) Searching for the (asymmetric) triplet median tree T: median : asymmetric

11 SuperTriplets: ISBM 2010 11 Supertriplets: general overview N - (homo pan|mus) N + (homo pan|mus) N - (pan bos|mus) N + (pan bos|mus) N - (homo pan|bos) N + (homo pan|bos) N - (mus pan| bos) N + (mus pan|bos) ………… ……….. Triplet decompostion first sketch NJ-like strategy improvement NNI local search Branch support and collapse O(n 3 |F| ) O(n 3 ) + consistency O(n 3 ) to test all branches once O(n 3 )

12 SuperTriplets: ISBM 2010 12 Supertriplets: agglomerative process DE|A DE|B DE|C AB|C AB|D AB|E Triplets(T 3 ) EDCBAEDCBA T0T0 C 1 ={A} C 2 ={B} EDCBAEDCBA T1T1 C 1 ={D} C 2 ={E} EDCBAEDCBA T2T2 AC|D BC|D AC|E BC|E C 1 ={A,B} C 2 ={C} EDCBAEDCBA T3T3

13 SuperTriplets: ISBM 2010 13 Supertriplets: agglomerative process Agglomeration of (C A,C B ) Transform T into T’ Resolve some new triplets (AB|X) with A  C A, B  C B, X  {C A  C B } d 3 ( T’,F ) = d 3 ( T,F ) - ( ∑ N+(AB|X) - ∑ N - (AB|X) ) We select the pair maximizing Score (C A, C B ) = (∑ N+(AB|X) - ∑ N- (AB|X) ) / ( ∑ N + (AB|X) + ∑ N - (AB|X) ) The whole process is O(n 3 ) : when C A and C B are agglomerated score(C D, C E ) is unchanged score(C {AB},C D ) is easily derived from Score (C A, C D ) and Score (C B, C D )

14 SuperTriplets: ISBM 2010 14 Supertriplets: NNI optimisation The variation d 3 (T’,F) - d 3 (T,F) depends on few triplets (here ) All these variations are initially evaluated in O(n 3 ) Once a NNI is done few NNI have to be re-evaluated (4 adjacent edges) NNI optimisation is therefore very fast 2 possible NNI per edge T T’

15 SuperTriplets: ISBM 2010 15 Supertriplets: edge supports Local support ∑ N + ( ) / [ ∑ N + ( ) + ∑ N - ( ) ] If <0.5 collapsing the edge improve d 3 (T,F) Global support Also take into account N + ( ) and N - ( ) impact two edges Final edge support: min (local, global) T

16 SuperTriplets: ISBM 2010 16 Supertriplets: simulation protocol Are they similar? Triplet/split measure [Eulenstein et al. 2004] [Criscuolo et al. 2006]

17 SuperTriplets: ISBM 2010 17 Supertriplets: simulation results Less resolved Very few errors Contain errors lack of resolution perfect Splits triplets

18 SuperTriplets: ISBM 2010 18 Supertriplets: Phylogenomic case study Supertree of 33 mammals Species: complete genomes ( EnsEMBL v54) Sequences: orthologous CDS (orthoMaM v5) Gene trees: 13 000 ML trees (inferred using PAUP) Output supertree Computed in 30s Congruent with [Prasad et al. 2008]

19 SuperTriplets: ISBM 2010 19 Conclusion & prospects (Asymmetric) median supertree Easy to understand Makes tree weighting natural MRP, triplets and median supertree Understanding the criteria optimized by MRP Design a dedicated algorithm to optimize it http://www.supertriplets.univ-montp2.fr/ Supertrees & supermatrix are complementary 1 000 vertebrate genome project Divide and conquer approach i) trees based on multiple CDSs (supermatrix) ii) assembling those trees (supertree)

20 SuperTriplets: ISBM 2010 20 Supertriplet: http://www.supertriplets.univ-montp2.fr/http://www.supertriplets.univ-montp2.fr/ N - (homo pan|mus) N + (homo pan|mus) N - (pan bos|mus) N + (pan bos|mus) N - (homo pan|bos) N + (homo pan|bos) N - (mus pan| bos) N + (mus pan|bos) ………… ……….. Triplet decompostion first sketch NJ-like strategy improvement NNI local search Branch support and collapse O(n 3 |F| ) O(n 3 ) + consistency O(n 3 ) to test all branches once O(n 3 ) Less resolved Very few errors


Download ppt "SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery."

Similar presentations


Ads by Google