Presentation on theme: "EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far."— Presentation transcript:
EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far
2 Private DataPublic Data
ChEBI: The story so far3 The state of affairs of bioinformatics in 2002 Bioinformatics is booming Human Genome sequence rough draft published June 2000 Free resources and free data
ChEBI: The story so far4 A different story for chemoinformatics Private data and private software
ChEBI: The story so far5 Too hard to solve… lets put our head in the sand
ChEBI: The story so far6 Bioinformatics data too large to keep track of chemical compounds 100000 Protein entries in SwissProt (2002) 20 million entries in EMBL Database (2002) Small databases unable to keep track ENZYME resources ~ 3500 enzymatic reactions
ChEBI: The story so far7 New initiatives start up PubChem Chemical repository, millions of entries, focus on screening assays ChEBI Manually annotated database, nomenclature reference and compound database, tens of thousands of entries
ChEBI: The story so far8 Principles of foundation December 2002 email exchanges within the EBI to address the issue of chemistry Three principles outlined 2002200320042005200620072008
ChEBI: The story so far9 “Nothing held in the database must be proprietary or derived from a proprietary source that would limit its free distribution/availability to anyone.”
ChEBI: The story so far10 “Every data item in the database should be fully traceable and explicitly referenced to the original source/version.”
ChEBI: The story so far11 “Although the EBI will provide a web interface, the entirety of the data should be available to all without constraint as, for example, SQL table dumps, ASCII tables, and XML (e.g. DAML+OIL)”
ChEBI: The story so far12 We make a start using existing resources Integrate three resources KEGG Compound IntEnz Chemical Ontology Annotation starts summer 2003 Focus on nomenclature 2002200320042005200620072008
ChEBI: The story so far13 Our first release was modest but it was a start 21 July 2004 2783 annotated entities Data: ChEBI Name, ChEBI Id IUPAC Names, Synonyms Formula Cross-references 2002200320042005200620072008
ChEBI: The story so far14 We introduce structures - Sep 2005 Molfiles InChI (IUPAC International Chemical Identifier) SMILES (Simplified Molecular Input Line Entry System) Image (PNG) 2002200320042005200620072008
ChEBI: The story so far15 Marvin in ChEBI
ChEBI: The story so far16 We start editing the chemical ontology – Dec 2005 2002200320042005200620072008
ChEBI: The story so far17 Web Services - Oct 2006 Programmatic access to a ChEBI entry SOAP based Java implementation Clients currently available in Java and perl Four methods with which to access data getLiteEntity getCompleteEntity getOntologyParents getOntologyChildren 2002200320042005200620072008
ChEBI: The story so far18 Automated Cross References – Aug 2007 Current Databases: UniProtKB, Reactome, BioModels, IntAct, SABIO-RK, PubChem and ArrayExpress 2002200320042005200620072008
ChEBI: The story so far19 2002200320042005200620072008 Chemical Structure Searching – May 2008
ChEBI: The story so far20 After all this, where are we?
ChEBI: The story so far21
ChEBI: The story so far22
ChEBI: The story so far23 Annotation is linear
ChEBI: The story so far24 Diversity of users Constant challenge of balancing our users' varied interests.
ChEBI: The story so far25 Our positives Nomenclature database Manually annotated data Attention to detail Free and accessible Loyal users
ChEBI: The story so far26 Our not so positives Size for some people Not well integrated into other bioinformatics resources Community interaction No software publicly available to manipulate the database
ChEBI: The story so far27 Involve the community Create a submission web based tool Users can easily submit their entities on a one to one basis Also allowing bulk submission from other resources.
ChEBI: The story so far28 Improvements to data depth Addition of more Xrefs: PDB, MACIE ??? Addition of more chemical attributes? What chemical attributes? Text mining projects to extract relevant chemical information from patents, journals European Patent Office
ChEBI: The story so far29 Going Open Source Commercial software packages will be replaced with Open Source Long term goal: allow people to create a free local installation of ChEBI Distribution of data in useful formats: CML, SDF
ChEBI: The story so far30 Acknowledgements ChEBI Team Paula de Matos, Kirill Degtyarenko, Marcus Ennis, Janna Hastings, Christoph Steinbeck Alumni Michael Darsow, Mickael Guedj, Alan McNaught, Martin Zbinden ChEBI supporters Rolf Apweiler, Michael Ashburner, Henning Hermjakob, Janet Thornton IntEnz Team Rafael Alcantára, Volker Ast, Kristian Axelsen, Anne Morgat EPO Collaborators Hélène Courrier, Stephane Nauche, Jeremy Parsons Database supporters ArrayExpress, IntAct, Reactome, SABIO-RK, RSC, GO, RESID etc…
ChEBI: The story so far31 Requirement for submitting data to ChEBI Disclaimer: this is only the summary of a chat I have had with the ChEBI coordinator last night. So no promises ! Information needed to submit a compound: Structure Name, synonyms Registry Database accession(s) Mapping to ChEBI Ontology ChEBI currently quite busy with ongoing projects, but would consider taking submissions.
ChEBI: The story so far32 What Could be done within APO-SYS From Pekka’s talk, I gathered that there are about 5,000 to 10,000 compounds in these siRNA libraries. Question: who else is dealing with compounds in APO-SYS? One could use the ChEBI’s web service using InCHI to identify what is already in the database. ChEBI can do targeted curation provided funding for the curation team.