Services | Research | Training | Industry Small Molecules Resources at the EBI Dr. Louisa Bellis Chemical Content Curator, ChEMBL Group EMBL-EBI, UK Bioinformatics.

Services | Research | Training | Industry Small Molecules Resources at the EBI Dr. Louisa Bellis Chemical Content Curator, ChEMBL Group EMBL-EBI, UK Bioinformatics Resources for Immunologists 6 th September 2013

Agenda Introduction Small molecule resources ChEBI ChEMBL Searching and browsing Hands-on Exercises

Small Molecules within Bioinformatics Literature Nucleotide sequences Genomes Expressions Protein sequences Protein domains, families 3D structures Enzymes Small molecules Pathways Systems

Annotation of bioinformatics data Essential for capturing understanding and knowledge associated with core data Often captured in free text, which is easier to read and better for conveying understanding to a human audience, but… Difficult for computers to parse Quality varies from database to database Terminology used varies from annotator to annotator Towards annotation using standard vocabularies: ontologies within bioinformatics

Small Molecule Databases can be used to: Investigate historical compounds and associated bioactivity data. Create Structure-Activity Relationships (SARs) Direct synthesis Direct end product testing

ChEBI and ChEMBL

What is ChEBI? Chemical Entities of Biological Interest Freely available Focused on ‘small’ chemical entities (no proteins or nucleic acids) Illustrated dictionary of chemical nomenclature High quality, manually annotated Provides chemical ontology ~39,000 ChEBI 3* compounds Access ChEBI at http://www.ebi.ac.uk/chebi/

Visualisation caffeine 1,3,7-trimethylxanthine methyltheobromine Nomenclature Formula: C8H10N4O2 Charge: 0 Mass: 194.19 Chemical data metabolite CNS stimulant trimethylxanthines Ontology MSDchem: CFF KEGG DRUG: D00528 Database Xrefs Chemical Informatics InChI=1/C8H10N4O2/c1-10-4-9-6- 5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3 SMILES: CN1C(=O)N(C)c2ncn(C)c2C1=O ChEBI Data Overview

CHEBI COMPOUND PAGE

ChEBI Chemical Structures Chemical structure may be interactively explored using MarvinView applet Available in formats Image Molfile InChI and InChIKey SMILES

Automatic Cross-references

The ChEBI ontology Organised into three sub-ontologies, namely Molecular structure ontology Subatomic particle ontology Role ontology ( R )-adrenaline

Molecular structure ontology

Role ontology

ChEBI ontology relationships Generic ontology relationships Chemistry-specific relationships

Viewing ChEBI ontology

What is ChEMBL? Database of bioactive, drug-like small molecules. Store 2D structures, calculated properties (logP, mol weight, Lipinski etc) Contains abstracted bioactivity data, e.g. binding data and IC50, from multiple primary scientific journals Covers about 33 years of compound synthesis and testing Annotated FDA-approved drugs Access ChEMBL at https://www.ebi.ac.uk/chembldb/

Data Statistics Focused towards compounds with drug-like properties by extraction from medicinal chemistry journals Includes small molecules (~92%) and peptides (~7%) Abstracted from 50,095 papers across 47 journals 1,487,579 compound records (~450,000 directly from PubChem) 1,295,510 distinct compound structures 11,420,351 activities (>6.0 million directly from PubChem) binding measurements, functional assays and ADMET 9,844 targets, with over 5,400 protein targets and over 2,440 human targets Deposition of PubChem Substances and Bioassay assays

SAR Data Compound Assay Ki=4.5 nM >Thrombin MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLERECVEETCSYEEAFEALE SSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGTNYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGA DLQENFCRNPDSSTTGPWCYTTDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRL AVTTHGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGYCDLNYCEEAVEEETG DGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLFEKKSLEDKTERELLESYIDGRIVEGSDAEIGMS PWQVMLFRKSPQELLCGASLISDRWVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIH PRYNWRENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTANVGKGQPSVLQV VNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGGPFVMKSPFNNRWYQMGIVSWGEGCDRDGKY GFYTHVFRLKKWIQKVIDQFGE APTT 11 min Target Compound Bioactivity ChEMBL Data Overview

> 10,000,000 bioactivities > 1,300,000 compounds ~30,000 distinct lead series ~15,000 candidates ~2,400 drugs Target Discovery Lead Discovery Lead Optimisation Preclinical Development Phase 1 Phase 2 Phase 3 Launch Target identification Microarray profiling Target validation Assay development Biochemistry Clinical/Animal disease models High-throughput Screening (HTS) Fragment-based screening Focused libraries Screening collection Medicinal Chemistry Structure-based drug design Selectivity screens ADMET screens Cellular/Animal disease models Pharmacokinetics Toxicology In vivo safety pharmacology Formulation Dose prediction PK tolerability Efficacy Safety & Efficacy Indication Discovery & expansion Med. Chem. SAR Clinical Candidates Drugs DiscoveryDevelopment Use Clinical Trials ChEMBL database

ChEMBL Target Types Molecular Non-molecular Nucleic acid ProteinCell-line Tissue Subcellular-fraction Organism Single Protein Protein Complex Protein Family DNA HEK293 cells Nervous Drosophila PDE5 Nicotinic acetylcholine receptor Muscarinic receptors Mitochondria

CHEMBL COMPOUND PAGE

Clickable structure Structural Representations Drug Information

ChEMBL --> ChEBI Link:

ChemSpider Links: The link works both ways. They link TO ChemSpider and FROM ChemSpider. They link on Standard InChI

Wikipedia Links: We also have links with Wikipedia. These also use the Standard_Inchi as the common identifier. These links will link to the Compound Report Card in ChEMBL.

Searching and Browsing

Chemical names Common or trivial names are those that are highly used. Advantages of common names include simplicity, easy to pronounce, universally recognised The main disadvantage is ambiguity – the same common name may refer to more than one type of chemical. Fluorene Fluorine

Systematic names A systematic name is one which corresponds to the chemical structure such that the structure can be determined from the name, e.g. 1,2-dimethyl-naphthalene Software packages exist which can generate structures from the systematic names (e.g. ACD/Name, ChemOffice, MarvinSketch). More than one correct systematic name can be assigned to the same molecular structure, depending on the manner in which naming rules are applied (e.g. IUPAC names).

Examples of common and systematic names Common namesSystematic names caffeine guaranine theine 1,3,7-trimethyl-3,7-dihydro-1H- purine-2,6-dione 7-methyltheophylline 1,3,7-trimethyl-2,6- dioxopurine

The ChEBI web service Programmatic access to a ChEBI entry SOAP based Java implementation Clients currently available in Java and perl Methods include: getLiteEntity getCompleteEntity and getCompleteEntityByList getOntologyParents getOntologyChildren and getAllOntologyChildrenInPath getStructureSearch Documented at http://www.ebi.ac.uk/chebi/webServices.do. http://www.ebi.ac.uk/chebi/webServices.do

Web services Allow users to create their own applications to query data User application

The ChEBI web service Programmatic access to a ChEBI entry SOAP based Java implementation Clients currently available in Java and perl Methods getLiteEntity getCompleteEntity and getCompleteEntityByList getOntologyParents getOntologyChildren and getAllOntologyChildrenInPath getStructureSearch Documented at http://www.ebi.ac.uk/chebi/webServices.do. http://www.ebi.ac.uk/chebi/webServices.do

Web service client object model getLiteEntity getCompleteEntity getOntology (Parents and Children)

ChEMBL Web Services Programmatic access to the ChEMBL database Provide Java, Perl and Python scripts to help you get started with the ChEMBL RESTful Web Service API Can be used to bring back compounds, lists of compounds, images, targets and assays https://www.ebi.ac.uk/chembldb/index.php/ws

Examples of Web Services

INTERFACE SEARCHING

ChEBI simple and advanced text search Narrow to category AND, OR and BUT NOT

Search options Structure drawing tools

Search Results Hover-over for a larger structure Click to go to entry page

Types of structure search Identity – based on InChI Substructure – uses fingerprints to narrow search range, then performs full substructure search algorithm Similarity – based on Tanimoto coefficient calculated between the fingerprints InChI=1/H2O/h1H2 10101101110010110010 1010110111 0010110010 Tanimoto(a,b) = c / (a+b-c) = 4 / (4+7-4) = 0.57 a b

Browse via Periodic Table Molecular entities / Elements

Navigate via links in ontology Click to follow ontology links

ChEMBL Interface Searching: Keywords Compound name Trade Name Synonym Structure Exact match Substructure SMILES Single or a list of SMILES

Run substructure and similarity searches Keyword searches. Can use * as a wildcard Can search with a list of ChEMBL IDs, or Keywords or SMILES

Types of Compound Names To Use ChEMBL captures all compound names, compound keys and synonyms from the papers. Synonyms can be taken from the publications or are curated from other sources (e.g. NCBI website). Curated and extracted synonyms in ChEMBL_16 > 660,000 Types of synonyms captured include: Research codes FDA alternative names Trade Names (not for combinations of drugs) INN, BAN, JAN, USAN

Protein Sequence Search More precise method for identifying targets Input is a protein sequence of interest Uses BLAST* algorithm to perform pair-wise comparisons between input sequence and all proteins in the Target Dictionary, to find most closely related matches Results are scored according to similarity to input sequence (determined by number of amino acids that are identical or have similar properties) *Altschul SF et al., J Mol Biol. 215(3), p403-10 (1990).

Find a protein sequence of interest Select entry of interest http://www.uniprot.org Retrieve sequence in fasta format

Paste in a FASTA file and run a search to fetch matching targets

Can also browse using the Taxonomy

Family Tree browser Search box for keyword searching

Browse Drugs Tab Able to search the approved drugs using keywords

WHY USE ChEBI AND ChEMBL?

I want to find data and information on the target, IRAK4. I also want to find out about the compounds that have been tested against this target. But where would I start?....

Identifying Chemical Tools Search ChEMBL for protein of interest Simple text search against protein names/synonyms OR Browse protein family tree OR Sequence search using BLAST (can find related proteins) Identify compounds active against this protein Sort/filter by relevant activity types and potency E.g., retrieve compounds with IC50/Ki < 100nM Retrieve other data for these compounds Structures, chemical properties, other activities

Compound Properties and Selectivity ChEMBL stores a wide range of calculated compound properties (e.g., mol wt, logP, RO5 violations) Can be used to identify compounds most likely to have good in vivo properties (Absorption, Distribution, Metabolism, Excretion) Contains activity information against liability targets (e.g., cytochrome P450s, HERG K+ channel) If compounds have been tested in these assays, can avoid those with potential toxicity issues Contains data on a wide range of targets If compounds have been tested against multiple targets, can get an idea of their selectivity (important for validation studies)

DOWNLOAD AND ANALYSIS OF RESULTS

The compound results can be downloaded as an *.SDFile.

The bioactivity data can be downloaded as *.XLS or a TAB file (tab-delimited) Activity types and values Assay details Literature references

You can use the standard Excel filtering options to filter the results

Downloads and programmatic access

Downloading ChEBI flavours All downloads come in two flavours 3 star only entries (manually annotated ChEBI entries) 2 and 3 star entries (manually annotated ChEBI, ChEMBL and user submissions)

Downloading ChEBI OBO file Use on OBO-edit SDF File Chemistry software compliant such as Bioclipse Flat file, tab delimited Import all the data into Excel Parse it into your own database structure Oracle binary dumps Import into an oracle database Generic SQL insert statements Import into MySQL or postgresql database

Downloading ChEMBL

Help and Feedback Email addresses for support queries and feedback General questions and feedback on ChEMBL interface: chembl-help@ebi.ac.uk Reporting of data errors: chembl-data@ebi.ac.uk General questions, support and feedback on ChEBI chebi-help@ebi.ac.uk

Services | Research | Training | Industry Thank you

Services | Research | Training | Industry Small Molecules Resources at the EBI Dr. Louisa Bellis Chemical Content Curator, ChEMBL Group EMBL-EBI, UK Bioinformatics.

Similar presentations

Presentation on theme: "Services | Research | Training | Industry Small Molecules Resources at the EBI Dr. Louisa Bellis Chemical Content Curator, ChEMBL Group EMBL-EBI, UK Bioinformatics."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Services | Research | Training | Industry Small Molecules Resources at the EBI Dr. Louisa Bellis Chemical Content Curator, ChEMBL Group EMBL-EBI, UK Bioinformatics.

Similar presentations

Presentation on theme: "Services | Research | Training | Industry Small Molecules Resources at the EBI Dr. Louisa Bellis Chemical Content Curator, ChEMBL Group EMBL-EBI, UK Bioinformatics."— Presentation transcript:

Similar presentations

About project

Feedback