Use of Machine Learning in Chemoinformatics

Slides:



Advertisements
Similar presentations
Shape and Color Clustering with SAESAR Norah E. MacCuish, John D. MacCuish, and Mitch Chapman Mesa Analytics & Computing, Inc.
Advertisements

JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.
Analysis of High-Throughput Screening Data C371 Fall 2004.
1 Sequential Screening S. Stanley Young NISS HTS Workshop October 25, 2002.
Everardo Macias, Patrick Tomboc Eamonn F. Healy, Chemistry Department,
PharmaMiner: Geometric Mining of Pharmacophores 1.
Personalia: Pre-Sheffield Batchelor’s degree in Chemistry at Oxford Pre-university job in my local public library system Chemistry or information science?
…ask more of your data 1 Bayesian Learning Build a model which estimates the likelihood that a given data sample is from a "good" subset of a larger set.
Faculty of Computer Science © 2006 CMPUT 605February 04, 2008 Novel Approaches for Small Bio-molecule Classification and Structural Similarity Search Karakoc.
Cheminformatics II Apr 2010 Postgrad course on Comp Chem Noel M. O’Boyle.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
Basic Steps of QSAR/QSPR Investigations
1CBS, Department of Systems Biology Exercise : Drug-likeness by ’rule of five’ Log into your databar account, start Firefox and go to the following web-
SciFinder ® : Part of the process™ 2006 Edition. SciFinder ® : Part of the process™ 2006 Edition SciFinder ® 2006 provides new, powerful capabilities.
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Active Learning Strategies for Drug Screening 1. Introduction At the intersection of drug discovery and experimental design, active learning algorithms.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
20/03/2008 Dept. of Pharmaceutics 1 APPLICATIONS OF BIOINFORMATICS IN DRUG DISCOVERY AND PROCESS RESEARCH Dr. Basavaraj K. Nanjwade M.Pharm., Ph.D Associate.
Chemoinformatics in Drug Design
Structure-based Drug Design
Lecture 7: Computer aided drug design: Statistical approach. Lecture 7: Computer aided drug design: Statistical approach. Chen Yu Zong Department of Computational.
Predicting Highly Connected Proteins in PIN using QSAR Art Cherkasov Apr 14, 2011 UBC / VGH THE UNIVERSITY OF BRITISH COLUMBIA.
1 Data mining of toxic chemicals & database-based toxicity prediction Jiansuo Wang & Luhua Lai Institute of Physical Chemistry, Peking University P. R.
Pharmacophore and FTrees
Non ionic organic pesticide environmental behaviour: ranking and classification F. Consolaro and P. Gramatica QSAR Research Unit, Dept. of Structural and.
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
Molecular Descriptors
1 InstantJChem: a flexible chemical database system G. Marcou, D. Horvath + Laboratoire d’infochimie, Université de Strasbourg, 1, rue Blaise Pascal,
Topological Summaries: Using Graphs for Chemical Searching and Mining Graphs are a flexible & unifying model Scalable similarity searches through novel.
Similarity Methods C371 Fall 2004.
A genetic algorithm for structure based de-novo design Scott C.-H. Pegg, Jose J. Haresco & Irwin D. Kuntz February 21, 2006.
Introduction to Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Fusing database rankings in similarity-based virtual screening Peter Willett, University of Sheffield.
Faculté de Chimie, ULP, Strasbourg, FRANCE
1. Chemometrices:  Signal processing  Classification & pattern reccognation  Experimental design  Multivariative calibration  Quantitative Structure.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
3D- QSAR. QSAR A QSAR is a mathematical relationship between a biological activity of a molecular system and its physicochemical parameters. QSAR attempts.
Paola Gramatica, Elena Bonfanti, Manuela Pavan and Federica Consolaro QSAR Research Unit, Department of Structural and Functional Biology, University of.
QSAR Study of HIV Protease Inhibitors Using Neural Network and Genetic Algorithm Akmal Aulia, 1 Sunil Kumar, 2 Rajni Garg, * 3 A. Srinivas Reddy, 4 1 Computational.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
P. Gramatica and F. Consolaro QSAR Research Unit, Dept. of Structural and Functional Biology, University of Insubria, Varese, Italy.
Selecting Diverse Sets of Compounds C371 Fall 2004.
McKim Conference on Predictive Toxicology
December 1, Classification Analysis of HIV RNase H Bioassay Lianyi Han Computational Biology Branch NCBI/NLM/NIH Rocky ‘07.
Computer-aided drug discovery (CADD)/design methods have played a major role in the development of therapeutically important small molecules for several.
PharmaMiner: Geometric Mining of Pharmacophores 1.
Catalyst TM What is Catalyst TM ? Structural databases Designing structural databases Generating conformational models Building multi-conformer databases.
Introduction to Chemoinformatics and Drug Discovery Irene Kouskoumvekaki Associate Professor February 15 th, 2013.
MUTAGENICITY OF AROMATIC AMINES: MODELLING, PREDICTION AND CLASSIFICATION BY MOLECULAR DESCRIPTORS M.Pavan and P.Gramatica QSAR Research Unit, Dept. of.
Predicting patterns of biological performance using chemical substructure features Diego Borges-Rivera 08/04/08.
Bioinformatics in Drug Design and Discovery Unit 2.
Identification of structurally diverse Growth Hormone Secretagogue (GHS) agonists by virtual screening and structure-activity relationship analysis of.
Computational Approach for Combinatorial Library Design Journal club-1 Sushil Kumar Singh IBAB, Bangalore.
Julia Salas CS379a Aim of the Study To determine distinguishing features of orally administered drugs –Physical and structural features probed.
Indiana University School of Indiana University ECCR Summary Infrastructure: Cheminformatics web service infrastructure made available as a community resource.
SMA5422: Special Topics in Biotechnology Lecture 11: Computer aided drug design: QSAR approach. SMA5422: Special Topics in Biotechnology Lecture 11: Computer.
Natural products from plants
Page 1 Computer-aided Drug Design —Profacgen. Page 2 The most fundamental goal in the drug design process is to determine whether a given compound will.
Toxicity vs CHEMICAL space
Computational Tools Seminar
Hierarchical Classification of Calculated Molecular Descriptors
APPLICATIONS OF BIOINFORMATICS IN DRUG DISCOVERY
ADME/Tox PredictionTox Prediction. The characterization of Absorption, Distribution, Metabolism, and Excretion (also known as ADME) and Toxicity are essential.
Building Hypotheses and Searching Databases
Daylight and Discovery
Virtual Screening.
Current Status at BioChemtek
Describing a crystal to a computer: How to represent and predict material structure with machine learning Keith T Butler.
Presentation transcript:

Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor February 15th, 2013

Major Aspects of Chemoinformatics Databases: Development of databases for storage and retrieval of small molecule structures and their properties. Machine learning: Training of Decision Trees, Neural Networks, Self Organizing Maps, etc. on molecular data. Predictions: Molecular properties relevant to drugs, virtual screening of chemical libraries, system chemical biology networks…

Machine Learning Tries to teach the computer to draw conclusions based on previous experience.

Akinator, the Web Genius Akinator the Genius can read your mind and tell you who you're thinking of by answering a few questions.

Machine learning classifiers

Clustering: Self Organizing Maps Distinguishing molecules of different biological activities and finding a new lead structure

Clustering: Self Organizing Maps Distinguishing molecules of different biological activities and finding a new lead structure

Clustering: Self Organizing Maps Distinguishing molecules of different biological activities and finding a new lead structure

Clustering: Self Organizing Maps Distinguishing molecules of different biological activities and finding a new lead structure

Machine Learning

Machine Learning Molecular Structures Properties Molecular Descriptors QSAR Virtual Screening Clustering Classification Molecular Structures Properties Molecular Descriptors

Different descriptor types • Simple feature counts (such as number of rotatable bonds or molecular weight) • Fragmental descriptors which indicate the presence or absence (or count) of groups of atoms and substructures • Physicochemical properties (density, solubility, vdWaals volume) • Topological indices (size, branching, overall shape)

Major Aspects of Chemoinformatics Databases: Development of databases for storage and retrieval of small molecule structures and their properties. Machine learning: Training of Decision Trees, Neural Networks, Self Organizing Maps, etc. on molecular data. Predictions: Molecular properties relevant to drugs, virtual screening of chemical libraries, system chemical biology networks…

Quantitative Structure-Activity Relationships (QSAR) In QSAR models structural parameters (descriptors) are fitted to experimental data for biological activity (or another given property, P)

Prediction of Solubility, ADME & Toxicity Give guidelines Filter compounds that enter the lab Speed up the drug discovery process

Virtual screening Computational techniques for a rapid assessment of large libraries of chemical structures in order to guide the selection of likely drug candidates.

Similarity Search Similar Property Principle – Molecules having similar structures and properties are expected to exhibit similar biological activity. Thus, molecules that are located closely together in the chemical space are often considered to be functionally related.

Fingerprints-based Similarity Search widely used similarity search tool consists of descriptors encoded as bit strings Bit strings of query and database are compared using similarity metric such as Tanimoto coefficient MACCS fingerprints: 166 structural keys that answer questions of the type: Is there a ring of size 4? Is at least one F, Br, Cl, or I present? where the answer is either TRUE (1) or FALSE (0)

Tanimoto Similarity or 90% similarity

Similarity Search

Example: Virtual Screening of PubChem Go to http://pubchem.ncbi.nlm.nih.gov/search/search.cgi Search PubChem for compounds that are similar to this structure with Tc>0.95: How many similar compounds do you find? Click on BioActivity Analysis How many of them are biologically active? On how many bioassays have they been tested on? Click on Structure-Activity Which compound is your query? Which compounds are most similar to your query? Are they active on the same bioassays?

Questions?