Presentation is loading. Please wait.

Presentation is loading. Please wait.

Construction of Genome Trees from Conservation Profiles of Proteins Fredj Tekaia Edouard Yeramian Institut Pasteur

Similar presentations

Presentation on theme: "Construction of Genome Trees from Conservation Profiles of Proteins Fredj Tekaia Edouard Yeramian Institut Pasteur"— Presentation transcript:

1 Construction of Genome Trees from Conservation Profiles of Proteins Fredj Tekaia Edouard Yeramian Institut Pasteur

2 Species tree construction and difficulties; Post genome era species tree construction; Genome tree construction based on conservation profiles; Outline Conclusions; References. Conservation profiles;

3 Species tree - Tree Of Life 16/18s rRNA tree (Woese 1990); Woese and others have used rRNA comparisons to construct a Tree Of Life showing the evolutionary relationships of a wide variety of organisms. The « Tree Of Life » has long served as a useful tool for describing the history and relationships of organisms over evolutionary time. One species is represented as a branching point, or node, on the tree, and the branches represent paths of descent from a parental node.

4 The three-domain proposal based on the ribosomal RNA tree. Woese et al. PNAS. 87: (1990) The two-empire proposal, separating eukaryotes from prokaryotes and eubacteria from archaebacteria. Mayr, D. PNAS 95: (1998). The three-domain proposal, with continuous lateral gene transfer among domains. Doolittle. Science 284: (1999) The ring of life, incorporating lateral gene transfer but preserving the prokaryote eukaryote divide. Rivera & Lake JA. Nature 431: (2004) Martin & Embley Nature 431:152-5.(2004)

5 The 1.2-Megabase Genome Sequence of Mimivirus Raoult et al. Sciences, 306: (2004) Genomic Databases and the Tree of Life Keith A. Crandall and Jennifer E. Buhay Sciences, 306; (2004) Prospects for Building the Tree of Life from Large Sequence Databases Driskell, et al. Sciences, 306; (2004)

6 Pennisi, E. (1998). Genome data shake tree of life. Science 280: New genome sequences are mystifying evolutionary biologists by revealing unexpected connections between microbes thought to have diverged hundreds of millions of years ago. and suggests to construct species trees from their whole gene content.

7 Genome phylogeny based on gene content (1999) Snel, Bork, Huynen. Nature Genetics 21, E A B

8 Tekaia, Lazcano & Dujon (1999) Genome Research 9: E A B

9 Complete genomes 2208 projects 460 published ( ) 1054 prokaryotes 631 eukaryotes

10 Genomes 2 edition T.A. Brown Gene tree - Species tree Species tree AB C Gene tree A BC Time Duplication Speciation A BC

11 Problems with species tree construction main difficulties in species tree construction include extensive incongruence between alternative phylogenies generated from single-gene data sets; -Genes don't evolve at the same rate nor in the same way; -the evolutionary history inferred from one gene may be different from what another gene appears to show.

12 Alternative solutions: integrative methods supertree The supertree approach estimates phylogenies for subsets of genes with good overlap, then combines these subtree estimates into a supertree. Bininda-Emonds et al Depends on the ability to distinguish between orthologs and paralogs; Supertree approaches are controversial, in part because the methodology results in a degree of disconnection between the underlying genetic data and the final tree produced.

13 phylogenomic tree ( based on concatenation of a gene sample common to the considered species); S1S1 SnSn.. genes don't evolve at the same rate nor in the same way; a limited number of genes are shared among all species; The tree of one percent (2006) Dagan and Martin. Genome Biology, 7:118.

14 More generally these methods suffer difficulties related to the phylogenetic tree construction: global sequence alignment (quality, gaps,...); different evolutionary histories of genes; substitution saturation;... and more seriously from gene sampling difficulties.

15 AB C Gene tree - Species tree: The gene sampling problem AB C Red is lost in C Blue is lost in A and B AB C gene tree # species tree Adapted from: Linder, Moret, Nakhleh, Warnow. True species tree

16 AB C Gene tree - Species tree: The gene sampling problem All red orthologs has been lost in the 3 species. AB C Luckily: sampling gives the blue orthologs. The true species tree is reconstructed.

17 AB C Gene tree - Species tree: The gene sampling problem All versions of the gene are in the 3 species AABBCC Gene trees are the same as the species tree

18 Genome tree is another alternative to construct species tree. The concept of genome tree is based on overall gene content similarity. (consider more than single gene information)

19 Methodology Matrice T k ij > 0 Correspondence Analysis Classification 1ip 1 j n k ij sup F1F1 FpFp orthogonal system; use of euclidean distance;

20 Systematic Analysis of Completely Sequenced Organisms In silico species specific comparisons (Tekaia & Dujon. J. Mol. Evol. 1999) (27 eucaryal, 19 archaeal and 33 bacterial species: proteins) Proteome 1 Proteome n Proteome blastp, pam250, SEG filter 99 species (B: 33; A: 19; E:27) total of proteins

21 Systematic Analysis of Completely Sequenced Organisms In silico species specific comparisons (27 eucaryal, 19 archaeal and 33 bacterial species: proteins) Degree of ancestral duplication and of ancestral conservation between pairs of species; Families of paralogs (Partition-MCL); Families of orthologs (Partition-MCL); Distribution of orthologous families according to the three domains of life; Determination of the protein dictionary (orthologs); Determination of protein conservation profiles;

22 Note on: Homologs - Paralogs - Orthologs Homologs: A 1, B 1, A 2, B 2 Paralogs: A 1 vs B 1 and A 2 vs B 2 Orthologs: A 1 vs A 2 and B 1 vs B 2 S1S1 S2S2 ab Sequence analysis Species-1Species-2 Duplication Ancestor Evolution Speciation A1A1 A2A2 B1B1 B2B2 A B A B A Time

23 Ancestor species genome Evolutionary processes include Phylogeny* duplication genesis Expansion* HGT Exchange* loss Deletion*selection* Expansion, Exchange and Deletion are noise. They should be eliminated or at least reduced. Large scale comparative analysis of predicted proteomes revealed significant evolutionary processes:

24 Genome tree construction from Protein Conservation Profiles and attempt to reduce noisy evolutionary processes To overcome some of these limitations, we consider

25 p A conservation profile is an n-component binary vector describing a protein conservation pattern across n species. Components are 0 and 1, following absence or presence of homologs. A conservation profile is the trace of protein evolutionary histories jointly captured in a set of n species (multidimensional feature); Conservation profiles are signatures of evolutionary relationships; Conservation profiles 99 species (B: 33; A: 19; E:27); proteins Main interesting properties of conservation profiles:

26 E AB S I I S n G 1, G 2, G 3, G n 1, G 1, G 2, G n2, G 1,n G 2,n G 3,n G np,n Protein conservation profiles Table : proteins x 99 species Different conservation profiles represent different evolutionary histories

27 original total proteins (99 species) non-specific proteins i.e conservation profiles (82%) distinct conservation profiles (42%) Distinct conservation profiles This set is indicative of the various observed evolutionary histories. Effect of the duplication process is reduced (one representative from each set of identical conservation profiles)


29 Genome tree construction: data matrices Jaccard similarity scores between species s ij = N 11 /(N 11 +N 01 +N 10 ); N 11 ; N 01 ; N 10 are respectively total occurrences of (1,1), (0,1) and (1,0) between i,j T = { T ij = s ij ; i=1,n; j=1,n; n } ij various evolutionary histories

30 Genome trees: data matrices T = {T ij ; i=1,n; j=1,n; n is the number of surveyed species} T ij is the overall similarity score between species j and i. Jaccard similarity scores {s ij = (Jaccard similarity score between species i and j) } T = { T ij = 100*s ij ; i=1,n; j=1,n} { s ij = N 11 /(N 11 +N 01 +N 10 ); }

31 Tekaia F, Yeramian E. (2005). PLoS Comput Biol.1(7):e75 profiles tree

32 Conclusions: Methodology Species classification is not an easy task! Methods that take into account whole genome informations are still needed; Correspondence analysis method might be helpful in revealing evolutionary trends embedded in the multidimensional relationships as obtained from large scale genome comparisons; Species tree construction should take into account the whole information included in the genomes;

33 Thus they should correspond to the most accurate type of markers for species classification; In principal profiles tree derived from distinct conservation profiles should considerably minimize genome acquisition effects and should reflect less noisy phylogenetic signals; The profiles tree presents evidence of conservation of stable phylogenetic relationships and reveals unconventional species clustering; The profiles tree corresponds to the classification of the evolutionary scenari. Conclusions... Conservation profiles represent most conserved and meaningful evolutionary signals jointly captured in a set of species;

34 Acknowledgments: The support of: The Institut Pasteur (Strategic Horizontal Programme on Anopheles gambiae) The Ministère de la Recherche Scientifique (France): ACI-IMPBIO-2004–98-GENEPHYS program. Bernard Dujon (Institut Pasteur).

35 References: Tekaia, F. and Dujon, B. (1999). Pervasiveness of gene conservation and persistence of duplicates in cellular genomes. Journal of Molecular Evolution, 49: Tekaia, F., Lazcano, A. and B. Dujon (1999). Genome tree as revealed from whole proteome comparisons. Genome Res. 12: Tekaia, F., Yeramian, E. and Dujon, B. (2002). Amino acid composition of genomes, lifestyles of organisms, and evolutionary trends: a global picture with correspondence analysis. Gene 297: Tekaia, F. and Yeramian, E. (2005). Genome Trees from Conservation Profiles. PLoS Comput Biol.1(7):e75. Tekaia, F. and Yeramian, E. (2006). Evolution of Proteomes: Fundamental signatures and global trends in amino acid composition. BMC Genomics. 7:307. Tekaia F, Latgé JP. (2005). Aspergillus fumigatus: saprophyte or pathogen? Curr Opin Microbiol. 8: Review. Systematic analysis of completely sequenced organisms:

36 References: Bininda-Emonds ORP (2005). Supertree Construction in the Genomic Age. Methods in Enzymology 395: p Bininda-Emonds,OPRP, John L. Gittleman, Mike A. Steel (2002) The (super)Tree Of Life: Procedures, Problems, and Prospects. Annual Review of Ecology and Systematics, Vol. 33: Dagan, T. and W, Martin (2006). The tree of one percent. Genome Biology, 7:118. Delsuc F, Brinkmann H, Philippe H. (2005). Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet. 6: Review. Doolittle. Science 284: (1999) Driskell, et al. (2004). Sciences, 306; (list of genome projects) Keith A. Crandall and Jennifer E. Buhay (2004). Sciences, 306; Linder, Moret, Nakhleh, and Warnow: Martin & Embley (2004). Nature 431: MCL: a cluster algorithm for graphs: Pennisi, E.(1998). Genome data shake tree of life.Science. 280: Rivera & Lake JA.(2004). Nature 431: Raoult et al.(2004). Sciences, 306: Snel, Bork, Huynen (1999). Genome phylogeny based on gene content.Nature Genetics 21, Snel B, Huynen MA, Dutilh BE (2005). Genome trees and the nature of genome evolution.Annu Rev Microbiol.;59: Review. Woese et al.(1990). PNAS. 87:

Download ppt "Construction of Genome Trees from Conservation Profiles of Proteins Fredj Tekaia Edouard Yeramian Institut Pasteur"

Similar presentations

Ads by Google