Presentation is loading. Please wait.

Presentation is loading. Please wait.

Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Similar presentations


Presentation on theme: "Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria."— Presentation transcript:

1

2 Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria decision making environmetrics experimental design artificial neural networks statistical process control Milano Chemometrics and QSAR Research Group Department of Environmental Sciences University of Milano - Bicocca P.za della Scienza, 1 - 20126 Milano (Italy) Website: michem.unimib.it/chm/

3 Roberto Todeschini Milano Chemometrics and QSAR Research Group An introduction to molecular descriptors and QSAR Iran - February 2009

4  synthesis: chemistry produces the objetcs of its own study  chemical composition: a unifying concept for all the experimental sciences  molecular structure: one the most fruitful scientific concepts of this century  synthesis: chemistry produces the objetcs of its own study  chemical composition: a unifying concept for all the experimental sciences  molecular structure: one the most fruitful scientific concepts of this century The chemical data

5 The concept of molecular structure is one of the most reach of the last 140 years. Molecular structure

6 The basic assumptions are that different molecular structures have different chemical properties and similar molecular structures have similar molecular properties. Molecular structure congenericity principle

7 Each molecular representation represents a different way to look at the molecular structure and its chemical meaning is strongly immersed in the framework of the chemical theories. Molecular structure

8 Some historical notes

9 Studi sull’isomeria delle così dette sostanze aromatiche a sei atomi di carbonio. Gazzetta Chimica Italiana, vol. IV, p.305 Some historical notes 1874 Wilhelm KÖRNER

10 To distinguish the observed different di-substituted benzenes, he proposed to distinguish them into ortho-, meta-, and para-. Some historical notes These can be considered the first 3 molecular descriptors 1874 Wilhelm KÖRNER

11 Based on these descriptors, 90 years later, Corwin Hansch proposed the first QSAR approach. Some historical notes Lipophilic, electronic and steric descriptors for ortho-, meta-, and para-substituents 1964 Corwin HANSCH

12 “The molecular descriptor is the final result of a logic and mathematical procedure which transforms chemical information encoded within a symbolic representation of a molecule into a useful number or the result of some standardized experiment.” R. Todeschini and V. Consonni Definition of molecular descriptor Molecular descriptors

13  3300 molecular descriptors Molecular descriptors

14 lion forefeet eagle hind legs scorpion tail dragon head bull body unicorn snake neck Molecular descriptors

15 size symmetry branching steric shape cyclicity hydrophobicity H - bonding electronic aspects reactivity Molecular descriptors

16 size symmetry branching steric shape cyclicity hydrophobicity H - bonding electronic aspects several meanings in just one number reactivity Molecular descriptors

17

18 graph theory discrete mathematics physical chemistry information theory quantum chemistry organic chemistry differential topology algebraic topology derived from …. QSAR/QSPR medicinal chemistry pharmacology genomics drug design toxicology proteomics analytical chemistry environmetrics virtual screening library searching applied in …. statisticschemometricschemoinformatics processed by …. Molecular descriptors

19 molecule physico - chemical properties  biological activities  molecular descriptors  Molecular descriptors

20 Historical note: fragment approach The biological activity of a molecule is the sum of its fragment properties common reference skeleton molecule properties gradually modified by substituents Congenericity principle QSAR styrategies can be applied ONLY to classes of similar compounds

21 Biological response = f 1 (L) + f 2 (E) + f 3 (S) + f 4 (M) Corvin Hansch, 1964 Historical note: Hansch approach Lipophilic properties Electronic properties Steric properties Other molecular properties 1 2 3 4

22 1 Congenericity approach 2 Linear additive scheme 3 Limited representation of global molecular properties 4 No 3D and conformational information Historical note: Hansch approach

23 boiling point melting point dipole moment molar refractivity parachor octanol/water partition coefficient vapor pressure density solubility............................. Physico-chemical properties The role of the molecular descriptors

24 binding affinity lethal dose inhibition concentration mutagenicity carcinogenicity................ Biological activities The role of the molecular descriptors

25 biodegradation bioconcentration BOD COD half - life time mobility atmospheric persistance......................... Environmental properties The role of the molecular descriptors

26 .... and more conductivity retention time reological behaviours......................... The role of the molecular descriptors

27 molecule molecular descriptors  molecular structure representation a real object numbers Representations of a molecular structure

28

29 3D - geometrical 0D - counts Representations of a molecular structure Cl H H H H H H 2D - topochemical 2D - topostructural.. ·· ·· ·· ·· ·· ··........ C C C C C C CC C C CC C l H H H H H H 1D – fragment counts.. ·· ·· ·· ·· ·· ··........ C C C C C C CC C C CC C l H H H H H H

30 probes interaction energy value at each point for each probe steric steric electronic electronic hydrophobic hydrophobic Representations of a molecular structure 4D

31 molecular graph graph invariants topostructural descriptors topochemical descriptors topographic descriptors topological information indices 2D Atom list 0D countingsumming grid-based QSAR techniques interaction energy values 4D Substructure list 1D counting molecular geometry x, y, z coordinates geometrical descriptors quantum-chemical descriptors bulk descriptors molecular surface descriptors 3D structural keys

32 molecular graph graph invariants Wiener index, Hosoya Z index Zagreb indices, Mohar indices Randic connectivity index Balaban distance connectivity index Schultz molecular topological index Kier shape descriptors eigenvalues of the adjacency matrix eigenvalues of the distance matrix Kirchhoff number detour index topological charge indices............... Wiener index, Hosoya Z index Zagreb indices, Mohar indices Randic connectivity index Balaban distance connectivity index Schultz molecular topological index Kier shape descriptors eigenvalues of the adjacency matrix eigenvalues of the distance matrix Kirchhoff number detour index topological charge indices............... total information content on..... mean information content on..... total information content on..... mean information content on..... Kier-Hall valence connectivity indices Burden eigenvalues BCUT descriptors Kier alpha-modified shape descriptors 2D autocorrelation descriptors............... Kier-Hall valence connectivity indices Burden eigenvalues BCUT descriptors Kier alpha-modified shape descriptors 2D autocorrelation descriptors............... 3D-Wiener index 3D-Balaban index D/D index............... 3D-Wiener index 3D-Balaban index D/D index............... topological information indices topostructural descriptors topochemical descriptors molecular geometry x, y, z coordinates topographic descriptors

33 molecular geometry x, y, z coordinates geometrical descriptors interaction energy values grid-based QSAR techniques quantum-chemical descriptors gravitational indices 3D-Morse descriptors EVA descriptors EEVA descriptors WHIM descriptors GETAWAY descriptors.............. gravitational indices 3D-Morse descriptors EVA descriptors EEVA descriptors WHIM descriptors GETAWAY descriptors.............. CoMFA, GRID G-WHIM descriptors............ CoMFA, GRID G-WHIM descriptors............ van der Waals volume geometric volume........... van der Waals volume geometric volume........... charges electronegativities superdelocalizability hardness softness E LUMO E HOMO.............. charges electronegativities superdelocalizability hardness softness E LUMO E HOMO.............. solvent-accessible surface area CPSA descriptors molecular shape analysis Mezey 3D shape analysis........... solvent-accessible surface area CPSA descriptors molecular shape analysis Mezey 3D shape analysis........... molecular surface volume descriptors

34 Properties of a molecular descriptor Several scientists are involved in searching for new molecular descriptors able to catch new aspects of the molecular structure. This kind of reasearch involves creativity and imagination together with solid theoretical basis allowing to obtain numbers with some structural chemical meaning. "There are no restriction on the design of structural invariants, the limiting factor is one's own imagination." [1]. M. Randic (1996), Molecular bonding profiles, J. Math. Chem., 19, 375-392

35 Properties of a molecular descriptor  invariance with respect to labeling and numbering of atoms  invariance with respect to roto-translation  an unambiguous algorithmically computable definition  values in a suitable numerical range for the set of molecules where it is applicable to  invariance with respect to labeling and numbering of atoms  invariance with respect to roto-translation  an unambiguous algorithmically computable definition  values in a suitable numerical range for the set of molecules where it is applicable to a descriptor MUST have...

36 Properties of a molecular descriptor a descriptor should have...  a structural interpretation  a good correlation with at least one property  no trivial correlation with other molecular descriptors  gradual change in its values with gradual changes in the molecular structure  not including in the definition experimental properties  not restricted to a too small class of molecular structures  preferably, some discrimination power among isomers  preferably, not trivially including in the definition other molecular descriptors  preferably, allowing reversible decoding (back from the descriptor value to the structure)

37 QSAR strategy  regression models (quantitative response)  classification models (qualitative response)  ranking models (ordered response)  regression models (quantitative response)  classification models (qualitative response)  ranking models (ordered response) models...

38 QSAR strategy - Regression

39 QSAR strategy - Classification

40 QSAR strategy - Ranking

41 QSAR strategy experimental responses molecular descriptors training set set of molecules MODEL SRC (QSAR, QSPR,... ) fitting molecular descriptors new molecules predicted new responses reversible decoding experimental responses molecular descriptors test set prediction power

42 QSAR strategy The true interest is in predictive power of the model Model validation Chemometrics

43 … towards conclusions …

44 FAQ - Frequently Asked Questions 1. What is the meaning of that descriptor ? 2. Why are there some models with the same prediction power but different molecular descriptors ? 3. Why use a huge number of molecular descriptors ?

45 FGA - our Frequently Given Answers 1. What is the meaning of that descriptor ? A molecular descriptor is a number extracted by a well defined algorithm from a molecular representation of a complex system, i.e. the molecule. There are good reasons to believe that often our difficulties to attribute a meaning to this number ultimately flow from the lacking of deeper chemical theories and higher level languages and not from exoteric approaches to the descriptor definition. A molecular descriptor is a number extracted by a well defined algorithm from a molecular representation of a complex system, i.e. the molecule. There are good reasons to believe that often our difficulties to attribute a meaning to this number ultimately flow from the lacking of deeper chemical theories and higher level languages and not from exoteric approaches to the descriptor definition. R. Todeschini and V. Consonni

46 2. Why are there some models with the same prediction power but different molecular descriptors ? Molecular descriptors are often intercorrelated, therefore different molecular descriptors can, in turn, take part in a model. FGA - our Frequently Given Answers Any alternative viewpoint with a different emphasis leads to an inequivalent description. There is only one reality but there are many points of view. Hans Primas Hans Primas

47 3. Why use a huge number of molecular descriptors ? Complexity is not an intrinsic property of systems, but rather arises from the number of ways in which we are able (or desire) to interact with a system. A molecule is undoubtedly a complex system FGA - our Frequently Given Answers

48 www.moleculardescriptors.eu

49 Department of Environmental Sciences University of Milano - Bicocca P.za della Scienza, 1 - 20126 Milano (Italy) Website: michem.disat.unimib.it/chm/ THANK YOU Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria decision making environmetrics experimental design artificial neural networks statistical process control Milano Chemometrics and QSAR Research Group

50

51 coffee break

52 ... since December 2006 www.moleculardescriptors.eu  news  software  books  tutorials and a forum  news  software  books  tutorials and a forum

53

54 Don’t forget your goal! An understanding of the behavior of a system does not always coincide with the prediction of the system’s future behavior! 4. Is a model explaining the known facts of a system better than a model predicting the future events of that system ? fitting versus prediction FGA - our Frequently Given Answers

55 QSAR strategy - Regression

56 "SIGNORI, Si potrebbe chiedersi quale sia il modo più proficuo per ritrarre da una ipotesi il maggior utile per lo sviluppo di una data dottrina. Forse a molti potrà sembrare che in tale riguardo convenga procedere con grande prudenza per non introdurre nella scienza concezioni ipotetiche troppo ardite, che non si trovino poi in concordanza con la realtà dei fatti. Io credo invece che il progresso della scienza sia stato ritardato piuttosto da soverchia prudenza che da soverchio ardire. Nella scienza bisogna a tempo sapere osare come in materia di amore: sapere osare subito ed andare fino in fondo; i reclami ed i rammarichi del poi non servono a nulla." "SIGNORI, Si potrebbe chiedersi quale sia il modo più proficuo per ritrarre da una ipotesi il maggior utile per lo sviluppo di una data dottrina. Forse a molti potrà sembrare che in tale riguardo convenga procedere con grande prudenza per non introdurre nella scienza concezioni ipotetiche troppo ardite, che non si trovino poi in concordanza con la realtà dei fatti. Io credo invece che il progresso della scienza sia stato ritardato piuttosto da soverchia prudenza che da soverchio ardire. Nella scienza bisogna a tempo sapere osare come in materia di amore: sapere osare subito ed andare fino in fondo; i reclami ed i rammarichi del poi non servono a nulla." Giacomo Ciamician Tratto dalla Prolusione all'Opera scientifica di Wilhelm KÖRNER, Milano 15 maggio 1910.

57 Fragment approach The biological activity of a molecule is the sum of its fragment properties Congeneric molecules, i.e. a common reference skeleton Substituent properties

58 Fragment approach Parametric approach (Hammett – Hansch,1964) Group approach (Free-Wilson and Fujita-Ban, 1976) DARC-PELCO approach (Dubois, 1966) Sterimol approach (Verloop, 1976)

59 Hansch molecular descriptors partition coefficients - logP, logKow chromatog. param. - Rf, RT, Solubility …. Hammett constants molar refraction dipole moment HOMO, LUMO Ionization potential …. molecular weight VDW volume molar volume surface area …. lipophilic properties steric properties electronic properties Hansch approach

60 The role of the molecular descriptors

61 Introduction

62 Conclusions A molecular descriptor is a number extracted by a well defined algorithm from a molecular representation of a complex system, i.e. the molecule. There are good reasons to believe that often our difficulties to attribute a meaning to this number ultimately flow from the lacking of deeper chemical theories and higher level languages and not from exoteric approaches to the descriptor definition. A molecular descriptor is a number extracted by a well defined algorithm from a molecular representation of a complex system, i.e. the molecule. There are good reasons to believe that often our difficulties to attribute a meaning to this number ultimately flow from the lacking of deeper chemical theories and higher level languages and not from exoteric approaches to the descriptor definition. R. Todeschini and V. Consonni

63 Properties of a molecular descriptor

64 Conclusions Any alternative viewpoint with a different emphasis leads to an inequivalent description. There is only one reality but there are many points of view. Any alternative viewpoint with a different emphasis leads to an inequivalent description. There is only one reality but there are many points of view. Hans Primas Hans Primas

65 X

66 molecule physico - chemical properties  biological activities    molecular descriptors  

67 1D1D.. ·· ·· ·· ·· ·· ··........ C C C C C C CC C C CC C l H H H H H H 3D3D 0D0D.. ·· ·· · · ·· ·· ··........ H H H H H H 2D2D Representations of a molecular structure

68 molecular structure ? Just a question …

69 “... : benchè certamente si traveggano già dei rapporti fra la costituzione chimica (composizione e struttura) e le proprietà fisiche loro, è ancor certamente di gran lunga troppo ristretto il numero dei fatti, per dedurne delle conseguenze, che oltre al carattere d’una semplice ipotesi possono pretendere anche quello della probabilità. In ogni caso tali rapporti non sono di natura tanto semplice come a priori forse era lecito aspettarsi. Di certo le proprietà fisiche dei corpi sono in primo luogo una funzione della composizione e struttura loro, sulla di cui forma nulla ancora si sa; funzione probabilmente molto complessa e per il di cui studio occorrerà un imprevedibile numero di fatti, onde poter sufficientemente restringere la cerchia delle rappresentazioni possibili.” Some historical notes


Download ppt "Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria."

Similar presentations


Ads by Google