Presentation is loading. Please wait.

Presentation is loading. Please wait.

EBI is an Outstation of the European Molecular Biology Laboratory. Small Molecules in Bioinformatics EBI Bioinformatics Roadshow 16th March 2011 Dusseldorf.

Similar presentations


Presentation on theme: "EBI is an Outstation of the European Molecular Biology Laboratory. Small Molecules in Bioinformatics EBI Bioinformatics Roadshow 16th March 2011 Dusseldorf."— Presentation transcript:

1 EBI is an Outstation of the European Molecular Biology Laboratory. Small Molecules in Bioinformatics EBI Bioinformatics Roadshow 16th March 2011 Dusseldorf

2 Small molecule resources at the EBI 03.01.2016 2 Agenda Introduction Small molecule resources ChEMBL ChEBI Searching and browsing Hands-on Exercises

3 Annotation of bioinformatics data Essential for capturing and understanding and knowledge associated with core data Often captured in free text, which is easier to read and better for conveying understanding to a human audience, but… Difficult for computers to parse Quality varies from database to database Terminology used varies from annotator to annotator Towards annotation using standard vocabularies: ontologies within bioinformatics Small molecule resources at the EBI 03.01.2016 3

4 Small molecules participate in all processes of life

5 What are Small Molecules? A small molecule is defined as a low molecular weight organic compound. Most drugs are small molecules to allow passage over cell membranes and oral bioavailability. They are also able to bind to proteins and enzymes, thereby altering function, which can lead to a therapeutic effect. Small molecules are used in everyday life.

6 Some common small molecules: Amino Acids

7 Signaling γ-aminobutyric acid GABA: chief inhibitory neurotransmitter in the mammalian central nervous system. In humans, also regulates muscle tone. synthesized by neurons found mostly as a zwitterion, that is, with the carboxyl group deprotonated and the amino group protonated conformational flexibility of GABA is important for its biological function, as it has been found to bind to different receptors with different conformations GABA deficiency linked to anxiety disorder, depression, alcoholism multiple sclerosis, action tremors, tardive dyskinesia Small molecule resources at the EBI 03.01.2016 7

8 Metabolism Adenosine 5’-triphosphate (ATP): the "molecular unit of currency" of intracellular energy transfer. generated in the cell by energy-consuming processes, broken down by energy-releasing processes proteins that bind ATP do so in a characteristic protein fold known as the Rossmann fold, which is a general nucleotide-binding structural domain that can also bind the cofactor NAD Adenosine 5'-triphosphate Small molecule resources at the EBI 03.01.2016 8

9 Enzymes Enzyme inhibitors are molecules that bind to enzymes and decrease their activity. Many drugs are enzyme inhibitors. They are also used as herbicides and pesticides. Enzyme activators bind to enzymes and increase their enzymatic activity. Enzyme activators are often involved in the allosteric regulation of enzymes in the control of metabolism. clavulanic acid acts as a suicide inhibitor of bacterial β-lactamase enzymes Small molecule resources at the EBI 03.01.2016 9

10 Pathways http://www.genome.jp/kegg-bin/highlight_pathway?scale=1.0&map=map00231&keyword=tryptophan Small molecule resources at the EBI 03.01.2016 10

11 Systems biology BioModels: quantitative models of biochemical and cellular systems tryptophan D-enantiomer: sweet L-enantiomer: bitter Small molecule resources at the EBI 03.01.2016 11

12 Drug types 2003 - 2009 'Small molecules' in various shades of blue (http://chembl.blogspot.com/) Small molecule resources at the EBI 03.01.2016 12

13 Small Molecule Databases Small Molecule Databases can be used to: Investigate historical compounds and associated bioactivity data. To give fresh insight into previously rejected drugs. Create Structure-Activity Relationships (SARs) Look at how changing a functional group can change the biological activity of a compound – before you start your own synthesis. 03.01.201613Small molecule resources at the EBI

14 Direct synthesis Could reduce number of compounds made – if any similar compounds have significant toxicity or unfavourable binding data, you can save time by not making analogues. Direct end product testing Suggest what testing could be carried out – the database can give you an idea of what testing has given ‘good’ (i.e. clear) results. Reduce number of compounds put through High Throughput Screening (HTS). 03.01.201614Small molecule resources at the EBI

15 ChEBI and ChEMBL Small molecule resources at the EBI

16 What is ChEBI? Chemical Entities of Biological Interest Freely available Focused on ‘small’ chemical entities (no proteins or nucleic acids) Illustrated dictionary of chemical nomenclature High quality, manually annotated Provides chemical ontology Access ChEBI at http://www.ebi.ac.uk/chebi/ Small molecule resources at the EBI 03.01.2016 16

17 ChEBI home page Small molecule resources at the EBI 03.01.2016 17

18 ChEBI data overview Visualisation caffeine 1,3,7-trimethylxanthine methyltheobromine Nomenclature Formula: C8H10N4O2 Charge: 0 Mass: 194.19 Chemical data metabolite CNS stimulant trimethylxanthines Ontology MSDchem: CFF KEGG DRUG: D00528 Database Xrefs Chemical Informatics InChI=1/C8H10N4O2/c1-10-4-9-6- 5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3 SMILES: CN1C(=O)N(C)c2ncn(C)c2C1=O

19 ChEBI – Chemical Entities of Biological Interest 03.01.2016 19 ChEBI entry view

20 Chemical Structures Chemical structure may be interactively explored using MarvinView applet Available in formats Image Molfile InChI and InChIKey SMILES Small molecule resources at the EBI 03.01.2016 20

21 ChEBI – Chemical Entities of Biological Interest 03.01.2016 21 Automatic Cross-references

22 What is ChEMBL? Database of bioactive, drug-like small molecules. Contains 2D structures, calculated properties (logP, mol weight, Lipinski etc) Contains abstracted bioactivity data, e.g. binding data and IC50, from multiple primary scientific journals Covers about 30 years of compound synthesis and testing Annotated FDA-approved drugs Access ChEMBL at https://www.ebi.ac.uk/chembldb/ Small molecule resources at the EBI 03.01.2016 22

23 ChEMBL Main Search Page Small molecule resources at the EBI

24 Master headline03.01.201624 Calc. properties Drug Information Clickable structure Small molecule resources at the EBI

25 03.01.201625 Structural Representations Small molecule resources at the EBI

26 03.01.201626Small molecule resources at the EBI

27 03.01.201627 Parent and Salt Forms Database links Small molecule resources at the EBI

28 ChEBI Link: 03.01.201628Small molecule resources at the EBI

29 This will take you back to ChEMBL

30 ChemSpider Links: 03.01.201630 The link works both ways. They link TO ChemSpider and FROM ChemSpider. They link on Standard_Inchi Small molecule resources at the EBI

31 Wikipedia Links: 03.01.201631 We also have links with Wikipedia. These also use the Standard_Inchi as the common identifier. These links will link to the Compound Report Card in ChEMBL. The links are added by a ChemoBot and can be updated with each release, if required. Small molecule resources at the EBI

32 STRUCTURAL REPRESENTATION 03.01.201632Small molecule resources at the EBI

33 Stereoisomers Compounds that have same molecular formula and configuration, but differ in the 3-dimensional orientations. The central tetrahedral carbon has 4 different molecular groups/atoms attached. This is known as the chiral centre. 03.01.201633Small molecule resources at the EBI

34 Stereoisomerism Example - Thalidomide Caused thousands of deformities in babies across 46 countries between 1957 and 1961. The R isomer is to control morning sickness but the S isomer was teratogenic. Sparked more tightly controlled laboratory practices across the world. 03.01.201634Small molecule resources at the EBI

35 Stereoisomers Where known, the stereochemistry of the compound is noted in the structure and in the name. If a stereoisomer of an existing compound is submitted, it is given a separate id number. If a mixture of two stereoisomers had data submitted, we will also give this a separate id number if the activity of the compounds can not be isolated. If you draw a planar compound into the structure search, you will receive data on all stereoisomers. 03.01.201635Small molecule resources at the EBI

36 Ofloxacin, Levofloxacin and Dextrofloxacin Fluoroquinolone antibiotics Ofloxacin is a racemic (equal) mixture of Levo and Dextro isomers. Levofloxacin is the more active stereoisomer Dextrofloxacin is the less active stereoisomer ChEMBL has data on each with separate bioactivities. 03.01.201636Small molecule resources at the EBI

37 Tautomers (keto-enol form) Two forms readily interconvert via the migration of a hydrogen to the adjacent oxygen and the swapping of a single to a double bond, and vice versa. ChEMBL does not differentiate between different tautomers. The preferred tautomeric structure is retained. ChEBI does differentiate and will store the separate tautomers. 03.01.201637Small molecule resources at the EBI

38 Salts About 50% of marketed drugs are combined with salts to aid in their activity. Some salts prevent the drug from being absorbed in the mouth. Some salts help the drug be activated in the intestines, rather than the stomach. There are approx 40,200 ChEMBL compounds with salts. Bioactivity data is recorded against the parent drug and against the salt. Therefore, it’s important to give these compounds different ChEMBL ids. 03.01.201638Small molecule resources at the EBI

39 Salt Example: Morphine Morphine can be adminstered with many different salts: Hydrochloride (HCl) Sulphate (SO4) Tartrate Acetate Citrate Methobromide (MeBr) Hyrobromide (HBr) Hydroiodide (HI) Lactate Chloride (Cl) Bitartrate 03.01.201639Small molecule resources at the EBI

40 Dealing with Salts in ChEMBL Each compound, if in a salt form, is analysed and matched to a ‘parent’ – i.e. the base form of the compound. (Not inorganic compounds) For example, morphine hydrochloride (CHEMBL556578), morphine sulfate (CHEMBL422878) and morphine sulfate hydrate (CHEMBL1200603) are matched to their parent morphine (CHEMBL70) This relationship is shown on the interface of the compound page. Additionally, when you run a search for a compound, you will only be brought back the parent form in the results grid. 03.01.201640Small molecule resources at the EBI

41 Parents and Salts on the Compound Page 03.01.201641 PARENT (compound report page) SALTS (with hyperlinks beneath) Small molecule resources at the EBI

42 Clicking on the Bioactivity Summary pie chart will give you the bioactivity data for ALL forms of the compound To get salt specific bioactivity data, click on the hyperlink beneath the salt form of interest to be taken to its compound page. 03.01.201642 Morphine - All Data Morphine HCl specific data Small molecule resources at the EBI

43 Naming and Classification Small molecule resources at the EBI

44 Chemical names Common or trivial names are those that are highly used. Advantages of common names include simplicity, pronounceability and universally recognised The main disadvantage is ambiguity – the same common name may refer to more than one type of chemical. Small molecule resources at the EBI

45 Systematic names A systematic name is one which corresponds to the chemical structure such that the structure can be determined from the name, e.g. 1,2-dimethyl-naphthalene Software packages exist which can generate structures from the systematic names (e.g. ACD/Name, ChemOffice, MarvinSketch). More than one correct systematic name can be assigned to the same molecular structure, depending on the manner in which naming rules are applied. Small molecule resources at the EBI

46 Examples of common and systematic names Common namesSystematic names caffeine guaranine theine 1,3,7-trimethyl-3,7- dihydro-1H-purine-2,6- dione 7-methyltheophylline 1,3,7-trimethyl-2,6- dioxopurine Small molecule resources at the EBI

47 SEARCHING IN CHEBI

48 Why? Ontological data Structure classification Chemical entity, e.g. hydrocarbon Role, e.g. ligand Subatomic particle, e.g. electron Links to other databases Kegg DrugBank PDBEChem Citations

49 How? Text-based Drawing

50 The ChEBI ontology Organised into three sub-ontologies, namely Molecular structure ontology Subatomic particle ontology Role ontology ( R ) -adrenaline Small molecule resources at the EBI 03.01.2016 50

51 Molecular structure ontology Small molecule resources at the EBI 03.01.2016 51

52 Role ontology Small molecule resources at the EBI 03.01.2016 52

53 ChEBI – Chemical Entities of Biological Interest 03.01.2016 53 ChEBI ontology relationships Generic ontology relationships Chemistry-specific relationships

54 ChEBI – Chemical Entities of Biological Interest 03.01.2016 54 Viewing ChEBI ontology

55 Simple and advanced text search Narrow by category AND, OR and BUT NOT Small molecule resources at the EBI 03.01.2016 55

56 Structure search Search options Structure drawing tools Small molecule resources at the EBI 03.01.2016 56

57 Search Results Click to go to compound page Hover-over for search menu Small molecule resources at the EBI 03.01.2016 57

58 Types of structure search Identity – based on InChI Substructure – uses fingerprints to narrow search range, then performs full substructure search algorithm Similarity – based on Tanimoto coefficient calculated between the fingerprints InChI=1/H2O/h1H2 10101101110010110010 1010110111 0010110010 Tanimoto(a,b) = c / (a+b-c) = 4 / (4+7-4) = 0.57 a b Small molecule resources at the EBI 03.01.2016 58

59 03.01.2016 59 Browse via Periodic Table Molecular entities / Elements Small molecule resources at the EBI

60 03.01.2016 60 Navigate via links in ontology Click to follow links Small molecule resources at the EBI

61 CHEBI SEARCH EXAMPLE

62 ChEBI example Search for ‘Glycine’ What is the ChEBI ID for this? Is it available as a Kegg compound? What are the IUPAC names? What is ‘glycine zwitterion’? 15428 Yes Glycine, aminoacetic acid It is a tautomer of glycine

63 SEARCHING IN CHEMBL 03.01.201663Small molecule resources at the EBI

64 How to search in ChEMBL: Keywords Compound name – dopamine, haloperidol Assay name – cytotoxicity, liver hepatotoxicity Target – RAF-1, IRAK-4 Structure BLAST search – FASTA sequence from UniProt Protein or taxonomy hierarchy 03.01.201664Small molecule resources at the EBI

65 Where to search: 03.01.201665Small molecule resources at the EBI

66 Using the search field (found on main page): Best for single words E.g. ‘dopamine’, ‘Muscarinic’ Looks for matching text in compound name, key or synonym 3-o-methyl-alpha-methyldopamine Muscarinic receptor 4 Needs an exact match Can’t use wildcards, e.g. ‘%’, ‘?’… 03.01.201666Small molecule resources at the EBI

67 Using the Protein Sequence Search 03.01.201667 Useful for searching for a specific protein or a protein from the same family The results brought back will show a percentage similarity to the inputted sequence. An exact match will give 100%. Same targets but different organisms will give ~90% Small molecule resources at the EBI

68 Compound Drawing Can draw the full structure of interest or a partial structure Using the Substructure Search you can find compounds containing your partial structure Using the Similarity Search, you can find similar compounds – based on a percentage score (70-100%) 03.01.201668Small molecule resources at the EBI

69 DOWNLOAD AND ANALYSIS OF CHEMBL RESULTS 03.01.201669Small molecule resources at the EBI

70

71 The compounds can be downloaded as an *.SDFile. 03.01.201671Small molecule resources at the EBI

72 The bioactivity data can be downloaded as *.XLS 03.01.201672Small molecule resources at the EBI

73 03.01.201673Small molecule resources at the EBI

74 CHEMBL WORKED EXAMPLE

75 STRUCTURE ACTIVITY RELATIONSHIPS Small molecule resources at the EBI

76 Drug design Ligand-based: relies on knowledge of other molecules that bind to the biological target of interest. Structure-based: relies on knowledge of the 3D structure of the biological target. A lead has evidence that modulation of the target will have therapeutic value: e.g. disease linkage studies showing associations between mutations in the biological target and certain disease states. evidence that the target is druggable, i.e. capable of binding to a small molecule and that its activity can be modulated by the small molecule. Target is cloned and expressed, then libraries of potential drug compounds are screened using screening assays Small molecule resources at the EBI 03.01.2016 76

77 Drug Discovery Process > 2,900,000 bioactivities > 600,000 compounds ~30,000 distinct lead series ~12,000 candidates ~2000 drugs Target Discovery Lead Discovery Lead Optimisatio n Preclinical Development Phase 1 Phase 2 Phase 3 Launch Target identification Microarray profiling Target validation Assay development Biochemistry Clinical/Animal disease models High-throughput Screening (HTS) Fragment-based screening Focused libraries Screening collection Medicinal Chemistry Structure-based drug design Selectivity screens ADMET screens Cellular/Animal disease models Pharmacokineti cs Toxicology In vivo safety pharmacology Formulation Dose prediction PK tolerabilit y Efficacy Safety & Efficacy Indication Discovery & expansion Med. Chem. SAR Clinical Candidates Dru gs DiscoveryDevelopment Use Clinical Trials ChEMBL database Small molecule resources at the EBI

78 SAR Data Compound Assay Ki=4.5 nM >Thrombin MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLERECVEETCSY EEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGTNYRGHVNITRSGIECQLWRS RYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYTTDPTVRRQECSIPVCGQDQVTVAMTPRSEG SSVNLSPPLEQCVPDRGQQYQGRLAVTTHGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGD EEGVWCYVAGKPGDFGYCDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEAD CGLRPLFEKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDRWVL TAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWRENLDRDIALMKLK KPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTANVGKGQPSVLQVVNLPIVERPVC KDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGGPFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFY THVFRLKKWIQKVIDQFGE APTT 11 min Target Compound Bioactivity Small molecule resources at the EBI

79 Current Data Content (ChEMBL_09) Abstracted from 39,094 papers from 16 journals Ongoing curation and clean-up of all data 759,220 compound records 623,012 distinct compound structures 8,091 targets 4,912 protein molecular targets 3,030,317 experimental bioactivities binding measurements, functional assays and ADMET Small molecule resources at the EBI

80 ChEMBL Assay Data ChEMBL contains >3 million data points relating compounds to targets or effects. These activities come from ~490K assays reported in medicinal chemistry literature. Assays can be classified as: binding measurements e.g., IC50 functional assay endpoints e.g., Vasodilation ADME/toxicity data e.g., LD50 Small molecule resources at the EBI

81 Compound Properties and Selectivity Stores a wide range of calculated compound properties (e.g., mol wt, logP, RO5 violations) Can be used to identify compounds most likely to have good in vivo properties (Absorption, Distribution, Metabolism, Excretion) Contains activity information against liability targets (e.g., cytochrome P450s, HERG K+ channel) If compounds have been tested in these assays, can avoid those with potential toxicity issues Contains data on a wide range of targets If compounds have been tested against multiple targets, can get an idea of their selectivity (important for validation studies) Small molecule resources at the EBI 03.01.2016 81

82 Identifying Chemical Tools Search ChEMBL for protein of interest Simple text search against protein names/synonyms Browse protein family tree Sequence search using BLAST (can find related proteins) Identify compounds active against this protein Sort/filter by relevant activity types and potency E.g., retrieve compounds with IC50/Ki < 100nM Retrieve other data for these compounds Structures, chemical properties, other activities Small molecule resources at the EBI 03.01.2016 82

83 Example SAR 1.Run a search on RAF-1 2.Filter on all IC50 values less than 100nM 3.Run the structures through an external source, such as Pipeline Pilot, to show the most common substructures. This will give you an idea of what type of compounds have a good IC50 for the target RAF-1. You can then design a similar compound(s) based on these substructures.

84 Assessing selectivity So far we have only identified compounds that may be active against a target of interest Often the aim is to find compounds that are selective for that target (i.e., not active against other targets) Need to consider all of the available activity data for each compound to see if it is known to be active against any other targets Small molecule resources at the EBI

85 Extract the list of SMILES from the XLS spreadsheet Run this through ChEMBL SMILES list search tool Filter the bioactivity for IC50 > 100nM Download the filtered bioactivity as another XLS spreadsheet Run a filter on the spreadsheet Not RAF-1 Collect the subset of compounds that showed the specificity for RAF-1

86 Selective for RAF-1 Selective for RAF-1 and inactive for other targets

87 You can use external programs like Pipeline Pilot™ and Spotfire Decision Site™ to analyse the results. 03.01.201687Small molecule resources at the EBI

88 Pipeline pilot protocol to extract all data with an IC50 of 0-100nM 03.01.201688Small molecule resources at the EBI

89 IC50 vs Molecular Weight - Spotfire™ 03.01.201689Small molecule resources at the EBI

90 Downloads and programmatic access

91 Downloading ChEBI flavours 03.01.2016 91 All downloads come in two flavours 3 star only entries (manually annotated ChEBI entries) 2 and 3 star entries (manually annotated ChEBI, ChEMBL and user submissions) Small molecule resources at the EBI

92 03.01.2016 92 Downloading ChEBI OBO file Use on OBO-edit SDF File Chemistry software compliant such as Bioclipse Flat file, tab delimited Import all the data into Excel Parse it into your own database structure Oracle binary dumps Import into an oracle database Generic SQL insert statements Import into MySQL or postgresql database Small molecule resources at the EBI

93 03.01.2016 93 The ChEBI web service Programmatic access to a ChEBI entry SOAP based Java implementation Clients currently available in Java and perl Methods getLiteEntity getCompleteEntity and getCompleteEntityByList getOntologyParents getOntologyChildren and getAllOntologyChildrenInPath getStructureSearch Documented at http://www.ebi.ac.uk/chebi/webServices.do. http://www.ebi.ac.uk/chebi/webServices.do Small molecule resources at the EBI

94 Downloading ChEMBL Frequent releases (approx monthly) SDFile Text MySQL Oracle Small molecule resources at the EBI

95 Downloading ChEMBL Small molecule resources at the EBI

96 Help and Feedback Email addresses for support queries and feedback General questions and feedback on ChEMBL interface: chembl-help@ebi.ac.uk Reporting of data errors: chembl-data@ebi.ac.uk General questions, support and feedback on ChEBI chebi-help@ebi.ac.uk Small molecule resources at the EBI 03.01.2016 96

97 Thank you


Download ppt "EBI is an Outstation of the European Molecular Biology Laboratory. Small Molecules in Bioinformatics EBI Bioinformatics Roadshow 16th March 2011 Dusseldorf."

Similar presentations


Ads by Google