Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.

Slides:



Advertisements
Similar presentations
JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.
Advertisements

Analysis of High-Throughput Screening Data C371 Fall 2004.
Everardo Macias, Patrick Tomboc Eamonn F. Healy, Chemistry Department,
MODELLING OF PHYSICO-CHEMICAL PROPERTIES FOR ORGANIC POLLUTANTS F. Consolaro, P. Gramatica and S. Pozzi QSAR Research Unit, Dept. of Structural and Functional.
PharmaMiner: Geometric Mining of Pharmacophores 1.
Personalia: Pre-Sheffield Batchelor’s degree in Chemistry at Oxford Pre-university job in my local public library system Chemistry or information science?
GraphSig: Mining Significant Substructures in Compound Libraries 1.
…ask more of your data 1 Bayesian Learning Build a model which estimates the likelihood that a given data sample is from a "good" subset of a larger set.
Faculty of Computer Science © 2006 CMPUT 605February 04, 2008 Novel Approaches for Small Bio-molecule Classification and Structural Similarity Search Karakoc.
Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics.
Cheminformatics II Apr 2010 Postgrad course on Comp Chem Noel M. O’Boyle.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
Basic Steps of QSAR/QSPR Investigations
SciFinder ® : Part of the process™ 2006 Edition. SciFinder ® : Part of the process™ 2006 Edition SciFinder ® 2006 provides new, powerful capabilities.
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Selecting Distinctive 3D Shape Descriptors for Similarity Retrieval Philip Shilane and Thomas Funkhouser.
Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott.
8 th Iranian workshop of Chemometrics 7-9 February 2009 Progress of Chemometrics in Iran Mehdi Jalali-Heravi February 2009 In the Name of God.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Active Learning Strategies for Drug Screening 1. Introduction At the intersection of drug discovery and experimental design, active learning algorithms.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Chemoinformatics in Drug Design
Structure-based Drug Design
Lecture 7: Computer aided drug design: Statistical approach. Lecture 7: Computer aided drug design: Statistical approach. Chen Yu Zong Department of Computational.
Predicting Highly Connected Proteins in PIN using QSAR Art Cherkasov Apr 14, 2011 UBC / VGH THE UNIVERSITY OF BRITISH COLUMBIA.
1 Data mining of toxic chemicals & database-based toxicity prediction Jiansuo Wang & Luhua Lai Institute of Physical Chemistry, Peking University P. R.
Pharmacophore and FTrees
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
Molecular Descriptors
1 InstantJChem: a flexible chemical database system G. Marcou, D. Horvath + Laboratoire d’infochimie, Université de Strasbourg, 1, rue Blaise Pascal,
Topological Summaries: Using Graphs for Chemical Searching and Mining Graphs are a flexible & unifying model Scalable similarity searches through novel.
Similarity Methods C371 Fall 2004.
AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria.
Introduction to Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Fusing database rankings in similarity-based virtual screening Peter Willett, University of Sheffield.
Faculté de Chimie, ULP, Strasbourg, FRANCE
1. Chemometrices:  Signal processing  Classification & pattern reccognation  Experimental design  Multivariative calibration  Quantitative Structure.
3D- QSAR. QSAR A QSAR is a mathematical relationship between a biological activity of a molecular system and its physicochemical parameters. QSAR attempts.
Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.
Paola Gramatica, Elena Bonfanti, Manuela Pavan and Federica Consolaro QSAR Research Unit, Department of Structural and Functional Biology, University of.
QSAR Study of HIV Protease Inhibitors Using Neural Network and Genetic Algorithm Akmal Aulia, 1 Sunil Kumar, 2 Rajni Garg, * 3 A. Srinivas Reddy, 4 1 Computational.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
P. Gramatica and F. Consolaro QSAR Research Unit, Dept. of Structural and Functional Biology, University of Insubria, Varese, Italy.
McKim Conference on Predictive Toxicology
December 1, Classification Analysis of HIV RNase H Bioassay Lianyi Han Computational Biology Branch NCBI/NLM/NIH Rocky ‘07.
PharmaMiner: Geometric Mining of Pharmacophores 1.
Catalyst TM What is Catalyst TM ? Structural databases Designing structural databases Generating conformational models Building multi-conformer databases.
Introduction to Chemoinformatics and Drug Discovery Irene Kouskoumvekaki Associate Professor February 15 th, 2013.
MUTAGENICITY OF AROMATIC AMINES: MODELLING, PREDICTION AND CLASSIFICATION BY MOLECULAR DESCRIPTORS M.Pavan and P.Gramatica QSAR Research Unit, Dept. of.
Predicting patterns of biological performance using chemical substructure features Diego Borges-Rivera 08/04/08.
Use of Machine Learning in Chemoinformatics
Bioinformatics in Drug Design and Discovery Unit 2.
Identification of structurally diverse Growth Hormone Secretagogue (GHS) agonists by virtual screening and structure-activity relationship analysis of.
Computational Approach for Combinatorial Library Design Journal club-1 Sushil Kumar Singh IBAB, Bangalore.
Julia Salas CS379a Aim of the Study To determine distinguishing features of orally administered drugs –Physical and structural features probed.
SMA5422: Special Topics in Biotechnology Lecture 11: Computer aided drug design: QSAR approach. SMA5422: Special Topics in Biotechnology Lecture 11: Computer.
Natural products from plants
Page 1 Computer-aided Drug Design —Profacgen. Page 2 The most fundamental goal in the drug design process is to determine whether a given compound will.
Toxicity vs CHEMICAL space
Hierarchical Classification of Calculated Molecular Descriptors
SMA5422: Special Topics in Biotechnology
APPLICATIONS OF BIOINFORMATICS IN DRUG DISCOVERY
Molecular Docking Profacgen. The interactions between proteins and other molecules play important roles in various biological processes, including gene.
Machine Learning Week 1.
Building Hypotheses and Searching Databases
Daylight and Discovery
Virtual Screening.
Current Status at BioChemtek
Describing a crystal to a computer: How to represent and predict material structure with machine learning Keith T Butler.
Presentation transcript:

Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course

2CBS, Department of Systems Biology Major Aspects of Chemoinformatics Databases: Development of databases for storage and retrieval of small molecule structures and their properties. Machine learning: Training of Decision Trees, Neural Networks, Self Organizing Maps, etc. on molecular data. Predictions: Molecular properties relevant to drugs, virtual screening of chemical libraries, system chemical biology networks…

3CBS, Department of Systems Biology Machine Learning

4CBS, Department of Systems Biology

5

6

7

8

9

10CBS, Department of Systems Biology

11CBS, Department of Systems Biology

12CBS, Department of Systems Biology

13CBS, Department of Systems Biology

14CBS, Department of Systems Biology

15CBS, Department of Systems Biology

16CBS, Department of Systems Biology

17CBS, Department of Systems Biology

18CBS, Department of Systems Biology Machine learning classifiers

19CBS, Department of Systems Biology Clustering: Self Organizing Maps Distinguishing molecules of different biological activities and finding a new lead structure

20CBS, Department of Systems Biology Clustering: Self Organizing Maps Distinguishing molecules of different biological activities and finding a new lead structure

21CBS, Department of Systems Biology Clustering: Self Organizing Maps Distinguishing molecules of different biological activities and finding a new lead structure

22CBS, Department of Systems Biology Clustering: Self Organizing Maps Distinguishing molecules of different biological activities and finding a new lead structure

23CBS, Department of Systems Biology Machine Learning

24CBS, Department of Systems Biology Machine Learning Molecular Structures Properties Molecular Descriptors QSAR Virtual Screening Clustering Classification

25CBS, Department of Systems Biology Different descriptor types Simple feature counts (such as number of rotatable bonds or molecular weight) Fragmental descriptors which indicate the presence or absence (or count) of groups of atoms and substructures Physicochemical properties (density, solubility, vdWaals volume) Topological indices (size, branching, overall shape)

26CBS, Department of Systems Biology Major Aspects of Chemoinformatics Databases: Development of databases for storage and retrieval of small molecule structures and their properties. Machine learning: Training of Decision Trees, Neural Networks, Self Organizing Maps, etc. on molecular data. Predictions: Molecular properties relevant to drugs, virtual screening of chemical libraries, system chemical biology networks…

27CBS, Department of Systems Biology In QSAR models structural parameters (descriptors) are fitted to experimental data for biological activity (or another given property, P) Quantitative Structure-Activity Relationships (QSAR)

28CBS, Department of Systems Biology Prediction of Solubility, ADME & Toxicity

29CBS, Department of Systems Biology hERG Classification with SVM

30CBS, Department of Systems Biology Evaluation of the data set

31CBS, Department of Systems Biology Performance of SVM

32CBS, Department of Systems Biology Performance of SVM

33CBS, Department of Systems Biology Virtual screening  Computational techniques for a rapid assessment of large libraries of chemical structures in order to guide the selection of likely drug candidates.

34CBS, Department of Systems Biology Similarity Search Similar Property Principle – Molecules having similar structures and properties are expected to exhibit similar biological activity. Thus, molecules that are located closely together in the chemical space are often considered to be functionally related.

35CBS, Department of Systems Biology Fingerprints-based Similarity Search –widely used similarity search tool –consists of descriptors encoded as bit strings –Bit strings of query and database are compared using similarity metric such as Tanimoto coefficient MACCS fingerprints: 166 structural keys that answer questions of the type: Is there a ring of size 4? Is at least one F, Br, Cl, or I present? where the answer is either TRUE (1) or FALSE (0)

36CBS, Department of Systems Biology Tanimoto Similarity or 90% similarity

37CBS, Department of Systems Biology Similarity Search

38CBS, Department of Systems Biology Questions?

39CBS, Department of Systems Biology Molecular editors and viewers

40CBS, Department of Systems Biology Molecular editors and viewers

41CBS, Department of Systems Biology Format conversion