Presentation on theme: "Macromolecular complexes – A new Online Portal (under construction!) Birgit Meldal (IntAct)"— Presentation transcript:
Macromolecular complexes – A new Online Portal (under construction!) Birgit Meldal (IntAct)
Overview Aims & Definitions Data Sources Issues and Challenges: Nomenclature Sets Transient complexes GO Confidence scores Inference Visualisation Search Parameters and Filters Status quo
Project Aim To design a Online Portal to search and visualise protein complexes Including cross-referencing to source databases and beyond Export to interested parties in a format of their choice Incorporate the data into network analysis tools To curate a starter set of protein complexes for 4 major model organisms, chosen to span the taxonomic range – Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae, Escherichia coli Which will be expanded to a second set of organisms – Mus musculus, Caenorhabditis elegans, Drosophila melanogaster, Saccharomyces pombe IntAct provides the data structure
Long-term Strategy Create stable complex identifiers Joined curation effort benefit to all collaborating databases: Resource sharing Elimination of redundancies benefit to user: One central resource that links to all source databases
Why curate complexes in IntAct? Many source databases containing information on complexes are at the EBI - UniProt, ChEMBL, Reactome, PDBe, (Enzyme Portal)… IntAct has correct data structure and the experimental evidence
Definition: stable protein complexes A stable set (2 or more) of interacting protein molecules which can be co-purified and have been shown to exist as a functional unit in vivo. Non-protein molecules (e.g. small molecules, nucleic acids) may also be present in the complex. What is not a stable complex? Enzyme/substrate or any similar transient interaction Two proteins associated in a pulldown / coimmunoprecipitation with no functional link
Source Databases Reactome – human (EBI), Gramene – arabidopsis, Microme – bacteria (EBI) PDBe (EBI) – mainly human ChEMBL (EBI) MatrixDB (Sylvie Richard-Blum) Mining UniProt – yeast (Bernd Roechert, SIB – manually) Unmaintained web resources – CYGD (yeast), CORUM (human), E. coli website, 3D Complexes (Sarah Teichmann, EBI) Manual curation from IMEx DBs & the literature (Sandra & Birgit)
Data captured currently for IntAct complexes Participants – proteins (UniProt), small molecules (ChEBI), nucleic acids (???) Stoichiometry – when known Topology – when known Structured annotation using GO Cross references to experimental evidence, PDB, Reactome (human), Gramene, ChEMBL, PubMed (for further information), Intenz (enzymes) Complex-specific free-text annotations: Structure and function Synonyms to provide consistent nomenclature Physical properties, when known
Needs to link to original Interaction & PMID Will be list of aliasesFunction and structure as free-text If MW or Stokes radius know will be parameters A complex cannot be a participant! Will be recommended and systematic name
Issues - Currently, complexes are shoe-horned into an interaction which is part of a dummy publication and dummy experiment New, complex-specific functionality, parameters and tools are needed
Issues - Nomenclature Most complexes have no common name, or the common name is defined differently depending on authors or host organism. One name can describe multiple complexes (e.g. AP1 describes ~25 different homo/heterodimers) Reactome makes a string of all components by gene name but this can become too long for our short-label. We will need both recommended and systematic name. List of synonyms already available as free-text. Collaboration with GO, Reactome, HGNC
Issues – open/fuzzy sets Complexes where the identity of one or more participants is unknown, i.e. participant(s) are only identified to a set of (related) proteins Stoichiometry: often not known or average (e.g. ion channel pore proteins) Only sub-set of a given complex curated because functional assays often focus on interactions between catalytic subunits
Issues – indirect activation & transient complexes Complexes that are activated without direct ligand interaction e.g. through change of pH transient interactions Kim van Roey, Heidelberg: coorperative interactions Different complex? Same participants!
Issues - Gene Ontology Currently, complexes mostly children of GO:0043234 protein complex (> 400) – lacking hierarchal structure Collaboration with GO to provide structured annotation New terms should capture all potential complexes from all species for which a parental term is appropriate E.g. DNA Polymerase complex Needs to allow for (open) sets of proteins / protein families
Issues - Gene Ontology DNA polymerase III complex: The DNA polymerase III holoenzyme is a complex that contains 10 different types of subunits. These subunits are organized into 3 functionally essential sub-assemblies: the pol III core, the beta sliding clamp processivity factor and the clamp-loading complex. […] DNA polymerase III, core complex: The DNA polymerase III core complex consists of the alpha, epsilon and theta subunits and is carries out the polymerase and the 3'-5' exonuclease proofreading activities. DNA polymerase III, proofreading complex: A subcomplex of DNA polymerase III composed of the epsilon subunit which has proofreading activity, and the theta subunit which enhances the epsilon subunit's proofreading activity.
Issues - Confidence We need to define confidence scores: Do we know all participants of the complex? Do we have (open) sets of participants? How do we indicate the depth of data available, i.e. compare Reactome import vs. manual curation? e.g. using Evidence Code Ontology (ECO) only qualitative description Need a quantitative identifier
Issues – Inference data Do we use inference/modelling data (e.g. Compara)? Where is the cut-off for model organisms? e.g. function remains but participants change
Issues – Visualisation Flexible display of 2D and 3D options to capture complexity The majority of complexes has 5 participants, average size 2.3 For large complexes it needs to be dynamic: use zoom-in/-out functionality on demand, display only main participants or subcomplexes by default and expand on demand, This might be achieved by assigning confidence scores to different levels of the complex by which it collapses/expands… Most biological network packages, e.g. Cytoscape, not up to it BioLayout 3D, ONDEX For crystal structures link to PDB (e.g. BioJS widget)
Issues – Visualisation Cytoscape (Web) Christines widget 1 3 4 2 bag of participants
Bubble diagram Protein A Protein B Protein C Weak evidence of Ix Strong evidence of Ix Hyperlink to IMEx Ix AC Hyperlink to binding site (IMEx/InterPro) Small Molecule Protein D ? Unknown which participant is direct interactor Gene name in bubble with hyperlink to UniProtKB Search for all Ix or Cx containing one or more of these participants Ix = Interaction, Cx = Complex Ix * * * Need to query hyperlinks from whole database on the fly rather than having a static link to just one Ix *
Issues – Visualisation Could incorporate multiple views using something like the PDB slideshow viewer
Issues – Visualisation Very big complexes, like the Proteosome, may have to be displayed statically. We may be able to get permission from the authors/journals to share figures with us.
Issues – Search Parameters Simple Search: UniprotKB ID / protein name Gene ID / name Small molecule ID / name InterPro Domain GO term PMID Complex ID / name Drug Advanced Search Filters: Stoichiometry Binding sites Biological role Source DB Host organism Interactor type (protein, small mol., NA) ECO Process/Pathway Stable vs. transient Confidence score Orthology Disease No. of participants -Already searchable -New search parameters -Most important new search parameter!
Status quo? > 550 complexes already curated (Sandra, Bernd, Birgit), many imported (e.g. MatrixDB from Sylvie) Exporter for Reactome working (David Croft) PDB export under construction (Jose Dana) ChEMBL xref list available (Yvonne Light) Not all necessary features incorporated into Editor breaks release! e.g. complexes cant be participants JAMI under construction (Marine!) Its a complex project which needs collaboration!!!
Acknowledgements Proteomics Services Henning Hermjakob IntAct Sandra Orchard Marine Dumousseau Noemi del Toro Ayllón Rafael Jimenez Pablo Porras Margaret Duesbury SIB Bernd Roechert MatrixDB Sylvie-Ricard-Blum Reactome Steve Jupe David Croft ChEMBL Anna Gaulton Yvonne Light PDBe Sameer Velankar Jose Dana GO Jane Lomax Rachel Huntley Heiko Dietze