M ulti P aranoid Automatic Clustering of Orthologs and Inparalogs Shared by Multiple Proteomes Andrey Alexeyenko Ivica Tamas Gang Liu Erik L.L. Sonnhammer.

Slides:



Advertisements
Similar presentations
Weighing Evidence in the Absence of a Gold Standard Phil Long Genome Institute of Singapore (joint work with K.R.K. “Krish” Murthy, Vinsensius Vega, Nir.
Advertisements

CSE-700 Parallel Programming Assignment 6 POSTECH Oct 19, 2007 박성우.
Large scale genomes comparisons Bioinformatics aspects (Introduction) Fredj Tekaia Institut Pasteur EMBO Bioinformatic and Comparative.
1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
Gramene Comparative & Phylogenomics Resources for Plants Joshua C. Stein 1, William Spooner 1, Sharon Wei 1, Liya Ren 1, Doreen Ware 1,2 1 Cold Spring.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Tree of Life Chapter 26.
Orthology, paralogy and GO annotation Paul D. Thomas SRI International.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Types of homology BLAST
Comparative genomics Joachim Bargsten February 2012.
Human & Mouse Orthologous Gene Nomenclature (HUMOT) HUGO Gene Nomenclature Committee (HGNC) Matt Wright
Orthology Analysis Erik Sonnhammer Center for Genomics and Bioinformatics Karolinska Institutet, Stockholm.
FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center.
Benchmarking Orthology in Eukaryotes Nijmegen Tim Hulsen.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for.
Bioinformatics and Phylogenetic Analysis
Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard,
FOG: High-Resolution Fungal Orthologous Groups René van der Heijden Project 5.10: Comparative genomics for the prediction of protein function and pathways.
CS273a Lecture 10, Aut 08, Batzoglou Multiple Sequence Alignment.
Bas E. Dutilh Phylogenomics Using complete genomes to determine the phylogeny of species.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Short Primer on Comparative Genomics Today: Special guest lecture 12pm, Alway M108 Comparative genomics of animals and plants Adam Siepel Assistant Professor.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Identification of Protein Domains Eden Dror Menachem Schechter Computational Biology Seminar 2004.
Functional Linkages between Proteins. Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA.
Alexis Dereeper Homology analysis and molecular phylogeny CIBA courses – Brasil 2011.
HOGENOM a phylogenomic database
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
Genomics in Drug Organon, Oss Tim Hulsen.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
whole-genome duplications and large segmental duplications… …seem to be a common feature in eukaryotic genome evolution …play a crucial role in the evolution.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Identification of Ortholog Groups by OrthoMCL Protein sequences from organisms of interest All-against-all BLASTP Between Species: Reciprocal best similarity.
Genomic and comparative genomic analysis BIO520 BioinformaticsJim Lund.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Using blast to study gene evolution – an example.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
By Michael Han Sanger Wormbase Group SAB 2008 Comparative Genomics with.
1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Organon Tim Hulsen, Oss, November 11, 2003.
Phylogenetics.
Phylogeny & Systematics
BINF6201/8201: Molecular Sequence Analysis Dr. Zhengchang Su Office: 351 Bioinformatics Building Office hours: Tuesday and Thursday:
Gene3D, Orthology and Homology-Based Inheritance of Protein-Protein Interactions Corin Yeats
Phylogeny and the Tree of Life
PINALOG Protein Interaction Network Alignment and its implication in function prediction and complex detection Hang Phan Prof. Michael J.E. Sternberg.
BLAST program selection guide
Basics of Comparative Genomics
Comparative Genomics.
P-POD-PANTHER: update
Tao Jiang Department of Computer Science
Genome Annotation Continued
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
What do you with a whole genome sequence?
Pairwise Sequence Alignment
Phylogenetics Chapter 26.
Basics of Comparative Genomics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

M ulti P aranoid Automatic Clustering of Orthologs and Inparalogs Shared by Multiple Proteomes Andrey Alexeyenko Ivica Tamas Gang Liu Erik L.L. Sonnhammer Stockholm

Homologs: orthologs and paralogs M ulti P aranoid Homologs: genes that have descended from a common ancestral gene. Manifested by a sequence similarity. We do not believe in sequence similarity without a shared ancestry. Gene 1 Gene2 BLAST hit. Low e-value Ancestral gene Orthologs are related via a speciation S Paralogs are related via a gene duplication. May or may not be in the same species D

Homologs: orthologs and paralogs M ulti P aranoid Inparalogs ~ co-orthologs paralogs that were duplicated after the speciation and hence are orthologs to the other species’ genes Outparalogs = not co-orthologs paralogs that were duplicated before the speciation Orthology, paralogy and proposed classification for paralog subtypes Sonnhammer ELL and Koonin EV Trends in Genetics Volume 18, Issue 12Trends in Genetics Volume 18, Issue 12, 1 December 2002, Pages

Orthologs for functional genomics M ulti P aranoid Orthologs are more likely than outparalogs to have identical/similar biochemical functions and biological roles Orthologs are optimal to discover gene function via model organism counterparts Benchmarking ortholog identification methods using functional genomics data. Hulsen T, Huynen MA, de Vlieg J, Groenen PM. Genome Biol. 2006;7(4):R31. Epub 2006 Apr 13. “…the InParanoid program is the best ortholog identification method in terms of identifying functionally equivalent proteins.”

Outline M ulti P aranoid 1.InParanoid 2.The world of ortholog resources 3.Why MultiParanoid 4.Limitations 5.Future development

Homologs: orthologs and paralogs M ulti P aranoid D Orthologs Outparalogs S S D Inparalogs

InParanoid M ulti P aranoid P r o t e o m e A P r o t e o m e B Automatic clustering of orthologs and in-paralogs from pairwise species comparisons Maido Remm, Christian E. V. Storm and Erik L. L. Sonnhammer Journal of Molecular Biology 314, 5 Journal of Molecular Biology 314, 5, 14 December 2001, Pages Reciprocally best hits ~ seed orthologs Inparalogs

Resources using InParanoid Eukaryotic Ortholog Groups 3409 diseases M ulti P aranoid

Multi-species ortholog resources Clusters of Orthologous Groups HOVERGEN release 47 “Massive download” friendly: Tree-based, best for detailed analysis

M ulti P aranoid S S S D D D Any cluster of more than 2 species’ genes is controversial in terms of orthology as the speciation gives rise to a pair of species.

MultiParanoid algorithm M ulti P aranoid 1. Take >2 species with maximally close speciation points 2. Generate 2-species InParanoid clusters A-B B-C A-C ? InParanoid cluster B-C InParanoid cluster A-B InParanoid cluster A-C 3. Find protein counterparts across the clusters

M ulti P aranoid However: tree conflicts Fly Worm Human Genes: MultiParanoid validation The MultiParanoid output was benchmarked on a manually curated set of 221 human-fly-worm clusters: MultiParanoid clusters found (almost) identical -The rest controversial mainly due to: - differences between pairwise and multiple alignments - the curator’s perception and InParanoid settings InParanoid cluster membership

M ulti P aranoid MultiParanoid vs. and

M ulti P aranoid Current MultiParanoid release C.elegans H.sapiens C.intestinalisD.melanogaster ??? protein sequences classified into 7695 clusters

A solution: expansion of MultiParanoid clusters M ulti P aranoid 1. Process all the possible 3-species combinations: 2. Merge respective cluster members across the clades:

M ulti P aranoid But still, orthology is a pairwise concept! The speciation gives rise to a pair of species.

M ulti P aranoid Post-processing (bootstrap, synteny, tree manual curation etc.) Cluster size ~ outparalogs/orthologs ratio HOVERGEN release 47 How the ortholog resources cope with it?

Overview and comparison of ortholog databases Alexeyenko A, Lindberg J, Pérez-Bercoff Å, Sonnhammer ELL Drug Discovery Today:Technologies (2006) v. 3; 2, M ulti P aranoid EGO COG/KOG HomoloGene InParanoid/MultiParanoid HOPS KEGG OrthoMCL ENSEMBL Compara PhiGs MGD HOGENOM HOVERGEN INVHOGEN TreeFam OrthologID

How to reconcile… M ulti P aranoid …the demand for multi-species clusters and pair-wise gene relations? The common feature is a single ancestor gene at the root point: S S S D D D D

M ulti P aranoid Cluster of pseudo-inparalogs: a within-clade gene family Pseudo-proteome: a union of proteomes of the same clade 2 new terms:

M ulti P aranoid P s e u d o – p r o t e o m e A (reptiles) P s e u d o – p r o t e o m e B (mammals)

M ulti P aranoid S S S D D D Another view: “gene-family”-wise: … and all the members of the same cluster ascend to a single gene in the last common ancestor (LCA) of the two major clades LCA

Having more than one species in a pseudo-proteome reduces mis- assignments in case of gene loss. Closer pseudo-proteomes increase resolution. Lineage(~pseudo-proteome)-specific expansions should be also available S S S S D D D M ulti P aranoid Orthologs The clustering can be done at different levels For example: Fungi vs. animals Insects vs. mammals Rodents vs. primates

Conclusions M ulti P aranoid Most of the ortholog resources may build clusters in form of gene trees, but only InParanoid seems to correctly delineate ortholog/inparalog groups MultiParanoid algorithm has relieved the problem of “hidden outparalogs”, but the number/content of species remains limited The “ LCA-Paranoid ” concept: the long waited solution? –Each of the two clade-specific cluster parts may be regarded as a multi- species cluster –When (in future) all possible “clade clade” clustering solutions will be found, each gene would receive a complete set of orthologs at a desirable level of LCA –With sufficient number of complete proteomes, it would be possible to date each gene pair’s point of divergence