Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott.

Slides:



Advertisements
Similar presentations
Analysis of High-Throughput Screening Data C371 Fall 2004.
Advertisements

PCA for analysis of complex multivariate data. Interpretation of large data tables by PCA In industry, research and finance the amount of data is often.
1 Sequential Screening S. Stanley Young NISS HTS Workshop October 25, 2002.
3D Molecular Structures C371 Fall Morgan Algorithm (Leach & Gillet, p. 8)
Dimension reduction (1)
Jürgen Sühnel Institute of Molecular Biotechnology, Jena Centre for Bioinformatics Jena / Germany Supplementary Material:
6th lecture Modern Methods in Drug Discovery WS07/08 1 More QSAR Problems: Which descriptors to use How to test/validate QSAR equations (continued from.
CS790 – Bioinformatics A Gentle Introduction to (or review of) Fundamentals of Chemistry and Organic Chemistry Square one… CS 790 – Bioinformatics.
Cheminformatics II Apr 2010 Postgrad course on Comp Chem Noel M. O’Boyle.
Basic Steps of QSAR/QSPR Investigations
Quantative Structure- Activity Relationships. Why QSAR? The number of compounds required for synthesis in order to place 10 different groups in 4 positions.
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
Life and Chemistry: Small Molecules
Chapter 2 Chemical Foundations.
Introduction to Statistical Thermodynamics of Soft and Biological Matter Lecture 4 Diffusion Random walk. Diffusion. Einstein relation. Diffusion equation.
Molecular Modeling: Statistical Analysis of Complex Data C372 Dr. Kelsey Forsythe.
Quantitative Structure- Activity Relationships (QSAR)
Advanced Medicinal Chemistry
Quantitative Structure-Activity Relationships (QSAR)  Attempts to identify and quantitate physicochemical properties of a drug in relation to its biological.
Lecture 7: Computer aided drug design: Statistical approach. Lecture 7: Computer aided drug design: Statistical approach. Chen Yu Zong Department of Computational.
1 Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc.
QSAR Qualitative Structure-Activity Relationships Can one predict activity (or properties in QSPR) simply on the basis of knowledge of the structure of.
Pharmacophore and FTrees
Chapter 11 Simple Regression
Computational Techniques in Support of Drug Discovery October 2, 2002 Jeffrey Wolbach, Ph. D.
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
Molecular Descriptors
BINF6201/8201 Principle components analysis (PCA) -- Visualization of amino acids using their physico-chemical properties
How H 2 0 interacts with: Itself –Hydrogen-bonding Ions and charged functional groups –Solvation, screening, dielectric value Non-polar groups –The hydrophobic.
- 2013/ D Structures of Biological Macromolecules Part 5: Drug Research and Design Jürgen Sühnel Supplementary Material:
Presented By Wanchen Lu 2/25/2013
Modern Methods in Drug Discovery WS08/09
A unifying model of cation binding by humic substances Class: Advanced Environmental Chemistry (II) Presented by: Chun-Pao Su (Robert) Date: 2/9/1999.
SimBioSys Inc.© 2001http:// New methods for studying receptor-ligand interactions Zsolt Zsoldos, Aniko Simon SimBioSys Inc.,
Molecular Modeling: Conformational Molecular Field Analysis (CoMFA)
Chapter Two Water: The Solvent for Biochemical Reactions
Designing a Tri-Peptide based HIV-1 protease inhibitor Presented by, Sushil Kumar Singh IBAB,Bangalore Submitted to Dr. Indira Ghosh AstraZeneca India.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Quantitative Structure Activity Relationship (QSAR)
3D- QSAR. QSAR A QSAR is a mathematical relationship between a biological activity of a molecular system and its physicochemical parameters. QSAR attempts.
In silico discovery of inhibitors using structure-based approaches Jasmita Gill Structural and Computational Biology Group, ICGEB, New Delhi Nov 2005.
Ligand-based drug discovery No a priori knowledge of the receptor What information can we get from a few active compounds.
Paola Gramatica, Elena Bonfanti, Manuela Pavan and Federica Consolaro QSAR Research Unit, Department of Structural and Functional Biology, University of.
QSAR Study of HIV Protease Inhibitors Using Neural Network and Genetic Algorithm Akmal Aulia, 1 Sunil Kumar, 2 Rajni Garg, * 3 A. Srinivas Reddy, 4 1 Computational.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
Homework 2 (due We, Feb. 1): Reading: Van Holde, Chapter 1 Van Holde Chapter 3.1 to 3.3 Van Holde Chapter 2 (we’ll go through Chapters 1 and 3 first. 1.Van.
Lecture 5 Barometric formula and the Boltzmann equation (continued) Notions on Entropy and Free Energy Intermolecular interactions: Electrostatics.
Pharmacophores Chapter 13 Part 2.
Selecting Diverse Sets of Compounds C371 Fall 2004.
Computer-aided drug discovery (CADD)/design methods have played a major role in the development of therapeutically important small molecules for several.
CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: Room.
Structure- based Structure-based computer-aided drug discovery (SB-CADD) approach: helps to design and evaluate the quality, in terms of affinity, of series.
Atoms and Molecules: The Chemical Basis of Life Chapter 2.
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
Use of Machine Learning in Chemoinformatics
Bioinformatics in Drug Design and Discovery Unit 2.
Part 2. Physicochemical Properties 1.Rules ( 양혜란 ) 2.Liphophilicity ( 백아름 ) 3.pKa ( 박숙진 ) 4.Solubility ( 전종수, 최영재 ) 5.Permeability ( 김소연, 강경태 )
CoMFA Study of Piperidine Analogues of Cocaine at the Dopamine Transporter: Exploring the Binding Mode of the 3  -Substituent of the Piperidine Ring Using.
Elon Yariv Graduate student in Prof. Nir Ben-Tal’s lab Department of Biochemistry and Molecular Biology, Tel Aviv University.
Canonical Correlation Analysis (CCA). CCA This is it! The mother of all linear statistical analysis When ? We want to find a structural relation between.
SMA5422: Special Topics in Biotechnology Lecture 11: Computer aided drug design: QSAR approach. SMA5422: Special Topics in Biotechnology Lecture 11: Computer.
Introduction Lecture Dr Jehad Al-Shuneigat
Virtual Screening.
Current Status at BioChemtek
CZ3253: Computer Aided Drug design Lecture 4: Structural modeling of chemical molecules Prof. Chen Yu Zong Tel:
Chapter Two Water: The Solvent for Biochemical Reactions
New compounds with improved biological activity
Structure Activity Relationships (SAR) And
Patrick: An Introduction to Medicinal Chemistry 6e
Introduction Lecture Dr Jehad Al-Shuneigat
Presentation transcript:

Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott

Outline Introduction Structures and activities Regression techniques: PCA, PLS Analysis techniques: Free-Wilson, Hansch Comparative Molecular Field Analysis

QSAR: The Setting Quantitative structure-activity relationships are used when there is little or no receptor information, but there are measured activities of (many) compounds They are also useful to supplement docking studies which take much more CPU time

From Structure to Property EC50

From Structure to Property LD50

From Structure to Property

QSAR: Which Relationship? Quantitative structure-activity relationships correlate chemical/biological activities with structural features or atomic, group or molecular properties within a range of structurally similar compounds

Free Energy of Binding DGbinding = DG0 + DGhb + DGionic + DGlipo + DGrot DG0 entropy loss (translat. + rotat.) +5.4 DGhb ideal hydrogen bond –4.7 DGionic ideal ionic interaction –8.3 DGlipo lipophilic contact –0.17 DGrot entropy loss (rotat. bonds) +1.4 (Energies in kJ/mol per unit feature)

Free Energy of Binding and Equilibrium Constants The free energy of binding is related to the reaction constants of ligand-receptor complex formation: DGbinding = –2.303 RT log K = –2.303 RT log (kon / koff) Equilibrium constant K Rate constants kon (association) and koff (dissociation)

Concentration as Activity Measure A critical molar concentration C that produces the biological effect is related to the equilibrium constant K Usually log (1/C) is used (c.f. pH) For meaningful QSARs, activities need to be spread out over at least 3 log units

Molecules Are Not Numbers! Where are the numbers? Numerical descriptors

An Example: Capsaicin Analogs EC50(mM) log(1/EC50) H 11.80 4.93 Cl 1.24 5.91 NO2 4.58 5.34 CN 26.50 C6H5 0.24 6.62 NMe2 4.39 5.36 I 0.35 6.46 NHCHO ?

An Example: Capsaicin Analogs log(1/EC50) MR p s Es H 4.93 1.03 0.00 Cl 5.91 6.03 0.71 0.23 -0.97 NO2 5.34 7.36 -0.28 0.78 -2.52 CN 4.58 6.33 -0.57 0.66 -0.51 C6H5 6.62 25.36 1.96 -0.01 -3.82 NMe2 5.36 15.55 0.18 -0.83 -2.90 I 6.46 13.94 1.12 -1.40 NHCHO ? 10.31 -0.98 MR = molar refractivity (polarizability) parameter; p = hydrophobicity parameter; s = electronic sigma constant (para position); Es = Taft size parameter

An Example: Capsaicin Analogs log(1/EC50) = -0.89 + 0.019 * MR + 0.23 * p + -0.31 * s + -0.14 * Es

Basic Assumption in QSAR The structural properties of a compound contribute in a linearly additive way to its biological activity provided there are no non-linear dependencies of transport or binding on some properties

Molecular Descriptors Simple counts of features, e.g. of atoms, rings, H-bond donors, molecular weight Physicochemical properties, e.g. polarisability, hydrophobicity (logP), water-solubility Group properties, e.g. Hammett and Taft constants, volume 2D Fingerprints based on fragments 3D Screens based on fragments

2D Fingerprints C N O P S X F Cl Br I Ph CO NH OH Me Et Py CHO SO C=C Am Im 1

Principal Component Analysis (PCA) Many (>3) variables to describe objects = high dimensionality of descriptor data PCA is used to reduce dimensionality PCA extracts the most important factors (principal components or PCs) from the data Useful when correlations exist between descriptors The result is a new, small set of variables (PCs) which explain most of the data variation

PCA – From 2D to 1D

PCA – From 3D to 3D-

Different Views on PCA Statistically, PCA is a multivariate analysis technique closely related to eigenvector analysis In matrix terms, PCA is a decomposition of matrix X into two smaller matrices plus a set of residuals: X = TPT + R Geometrically, PCA is a projection technique in which X is projected onto a subspace of reduced dimensions

Partial Least Squares (PLS) y1 = a0 + a1x11 + a2x12 + a3x13 + … + e1 y2 = a0 + a1x21 + a2x22 + a3x23 + … + e2 y3 = a0 + a1x31 + a2x32 + a3x33 + … + e3 … yn = a0 + a1xn1 + a2xn2 + a3xn3 + … + en Y = XA + E (compound 1) (compound 2) (compound 3) … (compound n) X = independent variables Y = dependent variables

PLS – Cross-validation Squared correlation coefficient R2 Value between 0 and 1 (> 0.9) Indicating explanative power of regression equation With cross-validation: Squared correlation coefficient Q2 Value between 0 and 1 (> 0.5) Indicating predictive power of regression equation

Free-Wilson Analysis log (1/C) = S aixi + m xi: presence of group i (0 or 1) ai: activity group contribution of group i m: activity value of unsubstituted compound

Free-Wilson Analysis Computationally straightforward Predictions only for substituents already included Requires large number of compounds

Hansch Analysis Drug transport and binding affinity depend nonlinearly on lipophilicity: log (1/C) = a (log P)2 + b log P + c Ss + k P: n-octanol/water partition coefficient s: Hammett electronic parameter a,b,c: regression coefficients k: constant term

Hansch Analysis Fewer regression coefficients needed for correlation Interpretation in physicochemical terms Predictions for other substituents possible

Pharmacophore Set of structural features in a drug molecule recognized by a receptor Sample features:  H-bond donor  charge  hydrophobic center Distances, 3D relationship

Pharmacophore Selection Dopamine L = lipophilic site; A = H-bond acceptor; D = H-bond donor; PD = protonated H-bond donor

Pharmacophore Selection Dopamine L = lipophilic site; A = H-bond acceptor; D = H-bond donor; PD = protonated H-bond donor

Comparative Molecular Field Analysis (CoMFA) Set of chemically related compounds Common pharmacophore or substructure required 3D structures needed (e.g., Corina-generated) Flexible molecules are “folded” into pharmacophore constraints and aligned

CoMFA Alignment

CoMFA Grid and Field Probe (Only one molecule shown for clarity)

Electrostatic Potential Contour Lines

CoMFA Model Derivation Molecules are positioned in a regular grid according to alignment Probes are used to determine the molecular field: Electrostatic field (probe is charged atom) Van der Waals field (probe is neutral carbon) Ec = S qiqj / Drij Evdw = S (Airij-12 - Birij-6)

3D Contour Map for Electronegativity

CoMFA Pros and Cons Suitable to describe receptor-ligand interactions 3D visualization of important features Good correlation within related set Predictive power within scanned space Alignment is often difficult Training required