Presentation is loading. Please wait.

Presentation is loading. Please wait.

251 st ACS National Meeting 15 th March 2016 The ChEBI Database and Ontology: a key resource for chemical biology and metabolomics Gareth Owen EMBL-EBI,

Similar presentations


Presentation on theme: "251 st ACS National Meeting 15 th March 2016 The ChEBI Database and Ontology: a key resource for chemical biology and metabolomics Gareth Owen EMBL-EBI,"— Presentation transcript:

1 251 st ACS National Meeting 15 th March 2016 The ChEBI Database and Ontology: a key resource for chemical biology and metabolomics Gareth Owen EMBL-EBI, Cambridge, UK E-mail: gowen@ebi.ac.uk

2 Outline A brief history of ChEBI The database The ChEBI ontology Relationships used Classifying entities Automatic classification The future…

3 Why create a chemistry resource at a Bioinformatics Institute? By 2002, the diversity of names in the literature for even quite common biological compounds was leading to duplication of efforts when curating the Gene Ontology (GO) Michael Ashburner together with Janet Thornton realised that they needed an authoritative small molecule database based on accurate chemical nomenclature GO already had within itself an implicit chemical ontology which Michael was only too pleased to hand over to 'real' chemists who would probably do a better job than him.

4 What is ChEBI? Chemical Entities of Biological Interest A freely available, manually curated chemistry database Focused on correct structure and IUPAC-recommended nomenclature of ‘small’ chemical entities (no proteins or nucleic acids) High quality, manually annotated Provides chemical ontology Access ChEBI at http://www.ebi.ac.uk/chebi/

5 ChEBI release #1 - July 21 2004 2,783 compounds

6 ChEBI (Chemical Entities of Biological Interest) First release (2,783 compounds) July 2004 Updated monthly until December 2014 Now live – entries are visible as soon as they are submitted. (Download files are updated monthly, on the 1 st of each month) Currently: >48,000 fully curated entries 140,000 immediate ontological relationships

7 http://www.ebi.ac.uk/chebi/statisticsForward.do

8 http://www.ebi.ac.uk/chebi

9 Entity of the Month

10

11 http://www.ebi.ac.uk/chebi/statisticsForward.do

12 Natural products focus Over the last few years, ChEBI has concentrated on increasing its coverage of natural products New data tables have been introduced to store species information ChEBI is now the primary resource for storing molecule data for the MetaboLights resource http://www.ebi.ac.uk/metabolights Also currently curating metabolites from 4 key model species (human, mouse, yeast, E. coli )

13 The ChEBI ontology Used by numerous biomedical ontologies to handle their chemistry terms, most notably the Gene Ontology (GO)

14 The ChEBI ontology Organised into three sub-ontologies: Molecular structure ontology Subatomic particle ontology Role ontology ChEBI ontology

15 Molecular structure ontology molecular structure inorganic molecular entity group carboxylic acid organic molecular entity aldehydes organophosphorus compounds sodium chloride acetylsalicylic acid (aspirin) carboxy group chlorfenvinfos pyridoxal (vitamin B 6 )

16 Role ontology Role Biological roleChemical role Application vitaminacid drugpesticide analgesicinsecticide pyridoxal (vitamin B 6 ) sulfuric acid acetylsalicylic acid (aspirin) chlorfenvinfos is_a has_role

17 ChEBI ontology relationships Generic ontology relationships Chemistry-specific relationships

18 Searching the ontology (1)

19 Searching the ontology (2)

20 User Submissions and the ChEBI Ontology As soon as a submission from a user is submitted: it is given a permanent ChEBI ID and it is visible on the ChEBI website it must be classified within the ChEBI ontology Submitters can assign classification terms if they wish Most choose not to In which case, a default, broad term (e.g. is a organic molecular entity) is assigned until the entry is checked and updated by a ChEBI curator

21 Bulk Submissions and the ChEBI Ontology Assigning a default classification to every entry in a bulk submission causes serious problems with subsequent curation Complex validation checks (aimed at avoiding loops, etc., in the ontology tree) slow down dramatically when there are hundreds (or thousands) of child terms attached to an individual ontology term. ClassyFire (David Wishart & Yannick Djoumbou Feunang) is now used for the initial classification of bulk submissions, overcoming the problem.

22 ClassyFire – ChEBI mapping ClassyFire terms, definitions, and notes ChEBI mappings and notes

23 Some thoughts Automatic classification is essential for the ontological assignment of large numbers of structures. Different communities consider different features to be important, and often use the same term to mean different things, (e.g. ‘glycan’). So any classification system is likely to require tweaking to best fit the user community. Unification or alignment of ontologies where possible is hard work (e.g. ChEBI-GO), but the benefits pay off. Systematic incorporation of MeSH terms (talked about for years) into ChEBI is now a real possibility Text mining tools are getting much better and can result in significant economies

24 What about the structures being classified? Chemical structures are often ambiguous Wedged & dashed bonds may mean absolute (R), absolute but unknown (+), relative (cis), or a mixture of two or more of these types of stereochemistry. A wavy bond may mean ‘a mixture’ or ‘not determined’. A ‘normal’ bond at a stereocentre may be a racemate or it may be unspecified/unknown Is a trans double bond really trans, or has it been drawn trans for aesthetic reasons? Is a ‘crossed’ double bond a mixture or unknown? No sign of this changing anytime soon. Time to replace the molfile?

25 Acknowledgements Curators Marcus Ennis (2003-) Kirill Degtyarenko (2003-9) Inma Spiteri (2008-9) Zara Josephs (2010) Nico Adams (2009-11) Steve Turner (2009-) Gareth Owen (2010-) Namrata Kale (2012-15) Developers Paula De Matos (2003-12) Janna Hastings (2005-15) Duncan Hull (2009-11) Adriano Dekker (2009-) Ken Haug(2009-10) Venkatesh Muthukrishnan (2011-) Team Leaders Rolf Apweiler Henning Hermjakob Christoph Steinbeck Grant agreement No.BB/K019783/1

26 251 st ACS National Meeting 15 th March 2016 Thank you E-mail: gowen@ebi.ac.uk


Download ppt "251 st ACS National Meeting 15 th March 2016 The ChEBI Database and Ontology: a key resource for chemical biology and metabolomics Gareth Owen EMBL-EBI,"

Similar presentations


Ads by Google