251 st ACS National Meeting 15 th March 2016 The ChEBI Database and Ontology: a key resource for chemical biology and metabolomics Gareth Owen EMBL-EBI,

Slides:



Advertisements
Similar presentations
Café for Routine Genetic Data Exchange (Café RouGE) Human Variome Project Meeting, Paris 2010 Dr Owen Lancaster.
Advertisements

EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
ChEBI Kirill Degtyarenko, EMBL-EBI / EPO. Rafael Alcántara Michael Ashburner * Volker Ast * Michael Darsow * Paula de Matos Marcus Ennis Janna Hastings.
How to use the web for bioinformatics Molecular Technologies February 11, 2005 Ethan Strauss X 1373
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Claire O’Donovan EMBL-EBI. In UniProtKB, we aim to provide… o A high quality protein sequence database A non redundant protein database, with maximal.
EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: an EBI chemistry reference.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
CTRP User Call April 3, 2013 Gene Kraus CTRP Program Director.
CASIMIR Networking Meeting Heathrow, July 2007 CASIMIR WP4 Data Representation John Hancock Duncan Davidson.
Protein 3D-structure analysis Exercises. Practicals Find update frequency for RCSB PDB: weekly. When was the last update? How many protein structures.
Open Biomedical Ontologies. Open Biomedical Ontologies (OBO) An umbrella project for grouping different ontologies in biological/medical field –a repository.
IProLINK – A Literature Mining Resource at PIR (integrated Protein Literature INformation and Knowledge ) Hu ZZ 1, Liu H 2, Vijay-Shanker K 3, Mani I 4,
TAIR Workshop Model Organism Databases and Community Annotation Plant and Animal Genome XVI Conference, San Diego January 13, 2008.
Web Apollo and the VectorBase user community Gloria I. Giraldo-Calderón March 31, 2015.
Copyright OpenHelix. No use or reproduction without express written consent1.
EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far Paula de Matos.
Sunday, July 22, 2012 Plan Areas of coverage: high-level neurological system process, inc. sensory perception, sensory processing, cognition transmission.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Expanding GO annotations with text classification Nicko Goncharoff Reel Two, Inc.
Copyright OpenHelix. No use or reproduction without express written consent1.
“ Good annotation practice ” for chemical data: ChEBI experience Kirill Degtyarenko European Patent Office.
A collaborative tool for sequence annotation. Contact:
Real World Experiences in Operating a Collaboratory: The Protein Data Bank Helen M. Berman Board of Governors Professor of Chemistry.
ChEBI, text mining and ontological best practice Colin Batchelor Royal Society of Chemistry
EBI is an Outstation of the European Molecular Biology Laboratory. Rhea Annotated reactions database 17 December 2015.
EBI is an Outstation of the European Molecular Biology Laboratory. Tutorial 5: ChEBI - On-line Submission and Curation.
Copyright OpenHelix. No use or reproduction without express written consent1.
Anatomy Ontologies & Potential Users: Bridging the Gap Ravensara Travillian European Bioinformatics Institute
A marriage of chemistry and biology Aligning the Gene Ontology with CHEBI.
And natural products of plant origin ChEBI Janna Hastings.
Ukpmc.ac.uk As a result of the mandates Research in the open How mandates work in practice 29 th May, 2009 Paul Davey, UK PubMed Central Engagement Manager,
OncoTrack Bioinformatics Workshop Max Planck Institute for Molecular Genetics, Berlin Wednesday 6 th November 2013 TimeSubject 13:30-15:00 Introduction.
` Comparison of Gene Ontology Term Annotations Between E.coli K12 Databases REDDYSAILAJA MARPURI WESTERN KENTUCKY UNIVERSITY.
Cheminformatics and Metabolism Team The EBI Enzyme Portal.
Introduction to PubChem BioAssay
Chemistry of a Cell Macromolecule Notes.
Classifying Chemistry: Current Efforts in Canada
Tools For Vertebrate Gene Naming
AP Online Customer Support Help Desk - Kayako EBSC Bratislava Account Payables Customer Support and Invoice Query Resolution Teams.
NetDMR.
Biological Databases By: Komal Arora.
Ministry of Economic Development and Innovation
Databases.
Pathway Analysis June 13, 2017.
GO : the Gene Ontology & Functional enrichment analysis
Data Exchange & Public Reference Data
Mental Functioning and the Gene Ontology
Integrated relational Enzyme database
Chapter 4 Carbon.
The Complex Portal Birgit Meldal
Biological compounds The chemistry of living organisms
Department of Genetics • Stanford University School of Medicine
Functional Annotation of the Horse Genome
Modified from slides from Jim Hu and Suzi Aleksander Spring 2016
Introduction to Bioinformatics
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
Ensembl Genome Repository.
Session I Database & Data Mining Speaker: Mehmet M. Dalkilic
PANTHER (Protein Analysis Through Evolutionary Relationships): Trees, Hidden Markov Models, Biological Annotations Paul Thomas, Ph.D. Division of Bioinformatics.
The Gene Ontology: an evolution
Breakout Session Wrap-up Key Points & Takeaways
Advanced PGDB Editing: Gene Ontology (GO) Terms
The ChEBI ontology Modelling chemical entities: current challenges
NetDMR.
Databases This topic looks at the basic concept of a database, the key features and benefits of a Database Management System (DBMS) and the basic theory.
Welcome - webinar instructions
Pathway Analysis July 9, 2019.
Presentation transcript:

251 st ACS National Meeting 15 th March 2016 The ChEBI Database and Ontology: a key resource for chemical biology and metabolomics Gareth Owen EMBL-EBI, Cambridge, UK

Outline A brief history of ChEBI The database The ChEBI ontology Relationships used Classifying entities Automatic classification The future…

Why create a chemistry resource at a Bioinformatics Institute? By 2002, the diversity of names in the literature for even quite common biological compounds was leading to duplication of efforts when curating the Gene Ontology (GO) Michael Ashburner together with Janet Thornton realised that they needed an authoritative small molecule database based on accurate chemical nomenclature GO already had within itself an implicit chemical ontology which Michael was only too pleased to hand over to 'real' chemists who would probably do a better job than him.

What is ChEBI? Chemical Entities of Biological Interest A freely available, manually curated chemistry database Focused on correct structure and IUPAC-recommended nomenclature of ‘small’ chemical entities (no proteins or nucleic acids) High quality, manually annotated Provides chemical ontology Access ChEBI at

ChEBI release #1 - July ,783 compounds

ChEBI (Chemical Entities of Biological Interest) First release (2,783 compounds) July 2004 Updated monthly until December 2014 Now live – entries are visible as soon as they are submitted. (Download files are updated monthly, on the 1 st of each month) Currently: >48,000 fully curated entries 140,000 immediate ontological relationships

Entity of the Month

Natural products focus Over the last few years, ChEBI has concentrated on increasing its coverage of natural products New data tables have been introduced to store species information ChEBI is now the primary resource for storing molecule data for the MetaboLights resource Also currently curating metabolites from 4 key model species (human, mouse, yeast, E. coli )

The ChEBI ontology Used by numerous biomedical ontologies to handle their chemistry terms, most notably the Gene Ontology (GO)

The ChEBI ontology Organised into three sub-ontologies: Molecular structure ontology Subatomic particle ontology Role ontology ChEBI ontology

Molecular structure ontology molecular structure inorganic molecular entity group carboxylic acid organic molecular entity aldehydes organophosphorus compounds sodium chloride acetylsalicylic acid (aspirin) carboxy group chlorfenvinfos pyridoxal (vitamin B 6 )

Role ontology Role Biological roleChemical role Application vitaminacid drugpesticide analgesicinsecticide pyridoxal (vitamin B 6 ) sulfuric acid acetylsalicylic acid (aspirin) chlorfenvinfos is_a has_role

ChEBI ontology relationships Generic ontology relationships Chemistry-specific relationships

Searching the ontology (1)

Searching the ontology (2)

User Submissions and the ChEBI Ontology As soon as a submission from a user is submitted: it is given a permanent ChEBI ID and it is visible on the ChEBI website it must be classified within the ChEBI ontology Submitters can assign classification terms if they wish Most choose not to In which case, a default, broad term (e.g. is a organic molecular entity) is assigned until the entry is checked and updated by a ChEBI curator

Bulk Submissions and the ChEBI Ontology Assigning a default classification to every entry in a bulk submission causes serious problems with subsequent curation Complex validation checks (aimed at avoiding loops, etc., in the ontology tree) slow down dramatically when there are hundreds (or thousands) of child terms attached to an individual ontology term. ClassyFire (David Wishart & Yannick Djoumbou Feunang) is now used for the initial classification of bulk submissions, overcoming the problem.

ClassyFire – ChEBI mapping ClassyFire terms, definitions, and notes ChEBI mappings and notes

Some thoughts Automatic classification is essential for the ontological assignment of large numbers of structures. Different communities consider different features to be important, and often use the same term to mean different things, (e.g. ‘glycan’). So any classification system is likely to require tweaking to best fit the user community. Unification or alignment of ontologies where possible is hard work (e.g. ChEBI-GO), but the benefits pay off. Systematic incorporation of MeSH terms (talked about for years) into ChEBI is now a real possibility Text mining tools are getting much better and can result in significant economies

What about the structures being classified? Chemical structures are often ambiguous Wedged & dashed bonds may mean absolute (R), absolute but unknown (+), relative (cis), or a mixture of two or more of these types of stereochemistry. A wavy bond may mean ‘a mixture’ or ‘not determined’. A ‘normal’ bond at a stereocentre may be a racemate or it may be unspecified/unknown Is a trans double bond really trans, or has it been drawn trans for aesthetic reasons? Is a ‘crossed’ double bond a mixture or unknown? No sign of this changing anytime soon. Time to replace the molfile?

Acknowledgements Curators Marcus Ennis (2003-) Kirill Degtyarenko (2003-9) Inma Spiteri (2008-9) Zara Josephs (2010) Nico Adams ( ) Steve Turner (2009-) Gareth Owen (2010-) Namrata Kale ( ) Developers Paula De Matos ( ) Janna Hastings ( ) Duncan Hull ( ) Adriano Dekker (2009-) Ken Haug( ) Venkatesh Muthukrishnan (2011-) Team Leaders Rolf Apweiler Henning Hermjakob Christoph Steinbeck Grant agreement No.BB/K019783/1

251 st ACS National Meeting 15 th March 2016 Thank you