Macromolecular complexes – A new Online Portal (under construction!) Birgit Meldal (IntAct)

Slides:



Advertisements
Similar presentations
Annotation of Gene Function …and how thats useful to you.
Advertisements

Applications of GO. Goals of Gene Ontology Project.
Sandra Orchard EMBL-EBI Molecular Interactions
MitoInteractome : Mitochondrial Protein Interactome Database Rohit Reja Korean Bioinformation Center, Daejeon, Korea.
EBI Proteomics Services Team – Standards, Data, and Tools for Proteomics Henning Hermjakob European Bioinformatics Institute SME forum 2009 Vienna.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI.
The IntAct Database Sandra Orchard & Birgit Meldal.
5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI.
Computational analysis of protein-protein interactions for bench biologists 2-8 September, Berlin Protein Interaction Databases Francesca Diella.
PRO and IntAct protein complexes Sandra Orchard PRO Meeting, June 19, 2014.
The Complex Portal: A ‘one-stop shop’ for protein complexes Birgit Meldal IntAct Curator
Session outline 1.Standards and the problem of data integration Example: PSICQUIC and the PSICQUIC game 2.Introduction to ontologies. Exploring the Gene.
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
August 29, 2002InforMax Confidential1 Vector PathBlazer Product Overview.
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
UniProt - The Universal Protein Resource
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
An introduction to using the AmiGO Gene Ontology tool.
Molecular Interactions 2013 Liverpool. PSICQUIC & PSICQUIC-view 2.5/2.6/2.7 Review of new implementation based on MITAB2.7 (2.6/2.5) Reference implementation.
Daniel Rico, PhD. Daniel Rico, PhD. ::: Introduction to Functional Analysis Course on Functional Analysis Bioinformatics Unit.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Using The Gene Ontology: Gene Product Annotation.
Copyright OpenHelix. No use or reproduction without express written consent1.
Protein 3D-structure analysis Exercises. Practicals Find update frequency for RCSB PDB: weekly. When was the last update? How many protein structures.
The aims of the Gene Ontology project are threefold: - to compile vocabularies to describe components, functions and processes - to produce tools to query.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
The Complex Portal - relationship to Gene Ontology Sandra Orchard (IntAct)
Corrections. - The cacao genome is currently being sequenced - Human Chromosome 1 sequence Search ‘Genome’
Tutorial session 2 Network annotation Exploring PPI networks using Cytoscape EMBO Practical Course Session 8 Nadezhda Doncheva and Piet Molenaar.
Copyright OpenHelix. No use or reproduction without express written consent1.
EBI is an Outstation of the European Molecular Biology Laboratory. Avazeh Ghanbarian Paul Kersey Alessandro Vullo EBI Microme Annotation Meeting June 2011.
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
24th Feb 2006 Jane Lomax GO Further. 24th Feb 2006 Jane Lomax GO annotations Where do the links between genes and GO terms come from?
Reactome - a curated knowledgebase of human biological pathways and processes.
Introduction to IntAct Pablo Porras Millán, IntAct
Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic.
A curated database of biological pathways.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Deposition, Validation, Search and Analysis Services.
EBI is an Outstation of the European Molecular Biology Laboratory. Rhea Annotated reactions database 17 December 2015.
A database of biological pathways and processes (borrowed from a presentation created by Steve Jupe)
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
GO based data analysis Iowa State Workshop 11 June 2009.
SRI International Bioinformatics 1 Editing Pathway/Genome Databases Ron Caspi.
Welcome to Gramene’s RiceCyc (Pathways) Tutorial RiceCyc allows biochemical pathways to be analyzed and visualized. This tutorial has been developed for.
IntAct David Croft A database of Molecular Interactions.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
An example of GO annotation from a primary paper Rebecca E. Foulger (UniProt Curator) GO Annotation Camp, June 2005 PMID:
Protein sequence databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen This also includes old material from my thesis
Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster.
An example of GO annotation from a primary paper GO Annotation Camp, July 2006 PMID:
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
OncoTrack Bioinformatics Workshop Max Planck Institute for Molecular Genetics, Berlin Wednesday 6 th November 2013 TimeSubject 13:30-15:00 Introduction.
The TDR Targets Database Prioritizing potential drug targets in complete genomes.
Cheminformatics and Metabolism Team The EBI Enzyme Portal.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
Networks and Interactions
Take a REST from manual searching: PDBe, programmatically
Annotating with GO: an overview
Ministry of Economic Development and Innovation
Interactions and Ontologies
Intersecting different databases to define the inner and outer limits of the data-supported druggable proteome
GO : the Gene Ontology & Functional enrichment analysis
The Complex Portal Birgit Meldal
Department of Genetics • Stanford University School of Medicine
Presentation transcript:

Macromolecular complexes – A new Online Portal (under construction!) Birgit Meldal (IntAct)

Overview Aims & Definitions Data Sources Issues and Challenges: Nomenclature Sets Transient complexes GO Confidence scores Inference Visualisation Search Parameters and Filters Status quo

Project Aim To design a Online Portal to search and visualise protein complexes Including cross-referencing to source databases and beyond Export to interested parties in a format of their choice Incorporate the data into network analysis tools To curate a starter set of protein complexes for 4 major model organisms, chosen to span the taxonomic range – Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae, Escherichia coli Which will be expanded to a second set of organisms – Mus musculus, Caenorhabditis elegans, Drosophila melanogaster, Saccharomyces pombe IntAct provides the data structure

Long-term Strategy Create stable complex identifiers Joined curation effort benefit to all collaborating databases: Resource sharing Elimination of redundancies benefit to user: One central resource that links to all source databases

Why curate complexes in IntAct? Many source databases containing information on complexes are at the EBI - UniProt, ChEMBL, Reactome, PDBe, (Enzyme Portal)… IntAct has correct data structure and the experimental evidence

Definition: stable protein complexes A stable set (2 or more) of interacting protein molecules which can be co-purified and have been shown to exist as a functional unit in vivo. Non-protein molecules (e.g. small molecules, nucleic acids) may also be present in the complex. What is not a stable complex? Enzyme/substrate or any similar transient interaction Two proteins associated in a pulldown / coimmunoprecipitation with no functional link

Source Databases Reactome – human (EBI), Gramene – arabidopsis, Microme – bacteria (EBI) PDBe (EBI) – mainly human ChEMBL (EBI) MatrixDB (Sylvie Richard-Blum) Mining UniProt – yeast (Bernd Roechert, SIB – manually) Unmaintained web resources – CYGD (yeast), CORUM (human), E. coli website, 3D Complexes (Sarah Teichmann, EBI) Manual curation from IMEx DBs & the literature (Sandra & Birgit)

Data captured currently for IntAct complexes Participants – proteins (UniProt), small molecules (ChEBI), nucleic acids (???) Stoichiometry – when known Topology – when known Structured annotation using GO Cross references to experimental evidence, PDB, Reactome (human), Gramene, ChEMBL, PubMed (for further information), Intenz (enzymes) Complex-specific free-text annotations: Structure and function Synonyms to provide consistent nomenclature Physical properties, when known

Needs to link to original Interaction & PMID Will be list of aliasesFunction and structure as free-text If MW or Stokes radius know will be parameters A complex cannot be a participant! Will be recommended and systematic name

Issues - Currently, complexes are shoe-horned into an interaction which is part of a dummy publication and dummy experiment New, complex-specific functionality, parameters and tools are needed

Issues - Nomenclature Most complexes have no common name, or the common name is defined differently depending on authors or host organism. One name can describe multiple complexes (e.g. AP1 describes ~25 different homo/heterodimers) Reactome makes a string of all components by gene name but this can become too long for our short-label. We will need both recommended and systematic name. List of synonyms already available as free-text. Collaboration with GO, Reactome, HGNC

Issues – open/fuzzy sets Complexes where the identity of one or more participants is unknown, i.e. participant(s) are only identified to a set of (related) proteins Stoichiometry: often not known or average (e.g. ion channel pore proteins) Only sub-set of a given complex curated because functional assays often focus on interactions between catalytic subunits

Issues – indirect activation & transient complexes Complexes that are activated without direct ligand interaction e.g. through change of pH transient interactions Kim van Roey, Heidelberg: coorperative interactions Different complex? Same participants!

GO: – protein complex (> 400)

Issues - Gene Ontology Currently, complexes mostly children of GO: protein complex (> 400) – lacking hierarchal structure Collaboration with GO to provide structured annotation New terms should capture all potential complexes from all species for which a parental term is appropriate E.g. DNA Polymerase complex Needs to allow for (open) sets of proteins / protein families

Issues - Gene Ontology DNA polymerase III complex: The DNA polymerase III holoenzyme is a complex that contains 10 different types of subunits. These subunits are organized into 3 functionally essential sub-assemblies: the pol III core, the beta sliding clamp processivity factor and the clamp-loading complex. […] DNA polymerase III, core complex: The DNA polymerase III core complex consists of the alpha, epsilon and theta subunits and is carries out the polymerase and the 3'-5' exonuclease proofreading activities. DNA polymerase III, proofreading complex: A subcomplex of DNA polymerase III composed of the epsilon subunit which has proofreading activity, and the theta subunit which enhances the epsilon subunit's proofreading activity.

Issues - Confidence We need to define confidence scores: Do we know all participants of the complex? Do we have (open) sets of participants? How do we indicate the depth of data available, i.e. compare Reactome import vs. manual curation? e.g. using Evidence Code Ontology (ECO) only qualitative description Need a quantitative identifier

Issues – Inference data Do we use inference/modelling data (e.g. Compara)? Where is the cut-off for model organisms? e.g. function remains but participants change

Issues – Visualisation Flexible display of 2D and 3D options to capture complexity The majority of complexes has 5 participants, average size 2.3 For large complexes it needs to be dynamic: use zoom-in/-out functionality on demand, display only main participants or subcomplexes by default and expand on demand, This might be achieved by assigning confidence scores to different levels of the complex by which it collapses/expands… Most biological network packages, e.g. Cytoscape, not up to it BioLayout 3D, ONDEX For crystal structures link to PDB (e.g. BioJS widget)

Issues – Visualisation Cytoscape (Web) Christines widget bag of participants

Bubble diagram Protein A Protein B Protein C Weak evidence of Ix Strong evidence of Ix Hyperlink to IMEx Ix AC Hyperlink to binding site (IMEx/InterPro) Small Molecule Protein D ? Unknown which participant is direct interactor Gene name in bubble with hyperlink to UniProtKB Search for all Ix or Cx containing one or more of these participants Ix = Interaction, Cx = Complex Ix * * * Need to query hyperlinks from whole database on the fly rather than having a static link to just one Ix *

Issues – Visualisation Could incorporate multiple views using something like the PDB slideshow viewer

Issues – Visualisation Very big complexes, like the Proteosome, may have to be displayed statically. We may be able to get permission from the authors/journals to share figures with us.

Issues – Search Parameters Simple Search: UniprotKB ID / protein name Gene ID / name Small molecule ID / name InterPro Domain GO term PMID Complex ID / name Drug Advanced Search Filters: Stoichiometry Binding sites Biological role Source DB Host organism Interactor type (protein, small mol., NA) ECO Process/Pathway Stable vs. transient Confidence score Orthology Disease No. of participants -Already searchable -New search parameters -Most important new search parameter!

Status quo? > 550 complexes already curated (Sandra, Bernd, Birgit), many imported (e.g. MatrixDB from Sylvie) Exporter for Reactome working (David Croft) PDB export under construction (Jose Dana) ChEMBL xref list available (Yvonne Light) Not all necessary features incorporated into Editor breaks release! e.g. complexes cant be participants JAMI under construction (Marine!) Its a complex project which needs collaboration!!!

Acknowledgements Proteomics Services Henning Hermjakob IntAct Sandra Orchard Marine Dumousseau Noemi del Toro Ayllón Rafael Jimenez Pablo Porras Margaret Duesbury SIB Bernd Roechert MatrixDB Sylvie-Ricard-Blum Reactome Steve Jupe David Croft ChEMBL Anna Gaulton Yvonne Light PDBe Sameer Velankar Jose Dana GO Jane Lomax Rachel Huntley Heiko Dietze