Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott.

Similar presentations


Presentation on theme: "Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott."— Presentation transcript:

1 Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott

2 Outline Introduction Structures and activities
Regression techniques: PCA, PLS Analysis techniques: Free-Wilson, Hansch Comparative Molecular Field Analysis

3 QSAR: The Setting Quantitative structure-activity relationships are used when there is little or no receptor information, but there are measured activities of (many) compounds They are also useful to supplement docking studies which take much more CPU time

4 From Structure to Property
EC50

5 From Structure to Property
LD50

6 From Structure to Property

7 QSAR: Which Relationship?
Quantitative structure-activity relationships correlate chemical/biological activities with structural features or atomic, group or molecular properties within a range of structurally similar compounds

8 Free Energy of Binding DGbinding = DG0 + DGhb + DGionic + DGlipo + DGrot DG entropy loss (translat. + rotat.) +5.4 DGhb ideal hydrogen bond –4.7 DGionic ideal ionic interaction –8.3 DGlipo lipophilic contact –0.17 DGrot entropy loss (rotat. bonds) +1.4 (Energies in kJ/mol per unit feature)

9 Free Energy of Binding and Equilibrium Constants
The free energy of binding is related to the reaction constants of ligand-receptor complex formation: DGbinding = –2.303 RT log K = –2.303 RT log (kon / koff) Equilibrium constant K Rate constants kon (association) and koff (dissociation)

10 Concentration as Activity Measure
A critical molar concentration C that produces the biological effect is related to the equilibrium constant K Usually log (1/C) is used (c.f. pH) For meaningful QSARs, activities need to be spread out over at least 3 log units

11 Molecules Are Not Numbers!
Where are the numbers? Numerical descriptors

12 An Example: Capsaicin Analogs
EC50(mM) log(1/EC50) H 11.80 4.93 Cl 1.24 5.91 NO2 4.58 5.34 CN 26.50 C6H5 0.24 6.62 NMe2 4.39 5.36 I 0.35 6.46 NHCHO ?

13 An Example: Capsaicin Analogs
log(1/EC50) MR p s Es H 4.93 1.03 0.00 Cl 5.91 6.03 0.71 0.23 -0.97 NO2 5.34 7.36 -0.28 0.78 -2.52 CN 4.58 6.33 -0.57 0.66 -0.51 C6H5 6.62 25.36 1.96 -0.01 -3.82 NMe2 5.36 15.55 0.18 -0.83 -2.90 I 6.46 13.94 1.12 -1.40 NHCHO ? 10.31 -0.98 MR = molar refractivity (polarizability) parameter; p = hydrophobicity parameter; s = electronic sigma constant (para position); Es = Taft size parameter

14 An Example: Capsaicin Analogs
log(1/EC50) = * MR * p * s * Es

15 Basic Assumption in QSAR
The structural properties of a compound contribute in a linearly additive way to its biological activity provided there are no non-linear dependencies of transport or binding on some properties

16 Molecular Descriptors
Simple counts of features, e.g. of atoms, rings, H-bond donors, molecular weight Physicochemical properties, e.g. polarisability, hydrophobicity (logP), water-solubility Group properties, e.g. Hammett and Taft constants, volume 2D Fingerprints based on fragments 3D Screens based on fragments

17 2D Fingerprints C N O P S X F Cl Br I Ph CO NH OH Me Et Py CHO SO C=C
Am Im 1

18 Principal Component Analysis (PCA)
Many (>3) variables to describe objects = high dimensionality of descriptor data PCA is used to reduce dimensionality PCA extracts the most important factors (principal components or PCs) from the data Useful when correlations exist between descriptors The result is a new, small set of variables (PCs) which explain most of the data variation

19 PCA – From 2D to 1D

20 PCA – From 3D to 3D-

21 Different Views on PCA Statistically, PCA is a multivariate analysis technique closely related to eigenvector analysis In matrix terms, PCA is a decomposition of matrix X into two smaller matrices plus a set of residuals: X = TPT + R Geometrically, PCA is a projection technique in which X is projected onto a subspace of reduced dimensions

22 Partial Least Squares (PLS)
y1 = a0 + a1x11 + a2x12 + a3x13 + … + e1 y2 = a0 + a1x21 + a2x22 + a3x23 + … + e2 y3 = a0 + a1x31 + a2x32 + a3x33 + … + e3 yn = a0 + a1xn1 + a2xn2 + a3xn3 + … + en Y = XA + E (compound 1) (compound 2) (compound 3) (compound n) X = independent variables Y = dependent variables

23 PLS – Cross-validation
Squared correlation coefficient R2 Value between 0 and 1 (> 0.9) Indicating explanative power of regression equation With cross-validation: Squared correlation coefficient Q2 Value between 0 and 1 (> 0.5) Indicating predictive power of regression equation

24 Free-Wilson Analysis log (1/C) = S aixi + m
xi: presence of group i (0 or 1) ai: activity group contribution of group i m: activity value of unsubstituted compound

25 Free-Wilson Analysis Computationally straightforward
Predictions only for substituents already included Requires large number of compounds

26 Hansch Analysis Drug transport and binding affinity
depend nonlinearly on lipophilicity: log (1/C) = a (log P)2 + b log P + c Ss + k P: n-octanol/water partition coefficient s: Hammett electronic parameter a,b,c: regression coefficients k: constant term

27 Hansch Analysis Fewer regression coefficients needed for correlation
Interpretation in physicochemical terms Predictions for other substituents possible

28 Pharmacophore Set of structural features in a drug molecule recognized by a receptor Sample features:  H-bond donor  charge  hydrophobic center Distances, 3D relationship

29 Pharmacophore Selection
Dopamine L = lipophilic site; A = H-bond acceptor; D = H-bond donor; PD = protonated H-bond donor

30 Pharmacophore Selection
Dopamine L = lipophilic site; A = H-bond acceptor; D = H-bond donor; PD = protonated H-bond donor

31 Comparative Molecular Field Analysis (CoMFA)
Set of chemically related compounds Common pharmacophore or substructure required 3D structures needed (e.g., Corina-generated) Flexible molecules are “folded” into pharmacophore constraints and aligned

32 CoMFA Alignment

33 CoMFA Grid and Field Probe
(Only one molecule shown for clarity)

34 Electrostatic Potential Contour Lines

35 CoMFA Model Derivation
Molecules are positioned in a regular grid according to alignment Probes are used to determine the molecular field: Electrostatic field (probe is charged atom) Van der Waals field (probe is neutral carbon) Ec = S qiqj / Drij Evdw = S (Airij-12 - Birij-6)

36 3D Contour Map for Electronegativity

37 CoMFA Pros and Cons Suitable to describe receptor-ligand interactions
3D visualization of important features Good correlation within related set Predictive power within scanned space Alignment is often difficult Training required


Download ppt "Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott."

Similar presentations


Ads by Google