Presentation is loading. Please wait.

Presentation is loading. Please wait.

Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Similar presentations


Presentation on theme: "Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria."— Presentation transcript:

1 Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria decision making environmetrics experimental design artificial neural networks statistical process control Milano Chemometrics and QSAR Research Group Department of Environmental Sciences University of Milano - Bicocca P.za della Scienza, 1 - 20126 Milano (Italy) Website: michem.unimib.it/chm/

2 Roberto Todeschini Milano Chemometrics and QSAR Research Group Molecular descriptors Constitutional descriptors and graph invariants Iran - February 2009

3 Content  Counting descriptors  Empirical descriptors  Fragment descriptors  Molecular graphs  Topological descriptors

4 Counting descriptors Each descriptor represents the number of elements of some defined chemical quantity. For example: - the number of atoms or bonds - the number of carbon or chlorine atoms - the number of OH or C=O functional groups - the number of benzene rings - the number of defined molecular fragments

5 Counting descriptors... also a sum of some atomic / bond property is considered as a count descriptor, as well as its average For example: - molecular weight and average molecular weight - sum of the atomic electronegativities - sum of the atomic polarizabilities - sum of the bond orders

6 A counting descriptor n is semi-positive variable, i.e. n  0 Its statistical distribution is usually a Poisson distribution. Counting descriptors Main characteristics simple the most used local information high degeneracy discriminant modelling power

7 Empirical descriptors Descriptors based on specific structural aspects present in sets of congeneric compounds and usually not applicable (or giving a single default value) to compounds of different classes.

8 It is a descriptor dedicated to the modelling of the benzene rings and is defined as the sum of the six lengths joining the adjacent substituent groups. HH HH CH 3 Cl Index of Taillander Empirical descriptors Taillander et al., 1983

9 Empirical descriptors It is a descriptor dedicated to the modelling of hydrophilicity and is based on a function of the counting of hydrophilic groups (OH-, SH-, NH-,...) and carbon atoms. nHynumber of hydrophilic groups nCnumber of carbon atoms ntotal number of non-hydrogen atoms -1  Hy  3.64 Hydrophilicity index (Hy) Todeschini et al., 1999

10 Empirical descriptors CompoundnHynCnHy hydrogen peroxide2023.64 carbonic acid2133.48 water2013.44 butanetetraol4483.30 propanetriol3362.54 ethanediol2241.84 methanol1121.40 ethanol1230.71 decanediol210120.52 propanol1340.37 butanol1450.17 pentanol1560.03 methane0110.00 nHy = 0 and nC = 000N0.00 decanol11011- 0.28 ethane022- 0.63 pentane055- 0.90 decane01010- 0.96 alcane with nC = 1000010001000- 1.00

11 Fragment approach  Parametric approach (Hammett – Hansch,1964)  Substituent approach (Free-Wilson, Fujita-Ban, 1976)  DARC-PELCO approach (Dubois, 1966)  Sterimol approach (Verloop, 1976)

12 Fragment approach The biological activity of a molecule is the sum of its fragment properties common reference skeleton molecule properties gradually modified by substituents Congenericity principle QSAR styrategies can be applied ONLY to classes of similar compounds

13 Biological response = f 1 (L) + f 2 (E) + f 3 (S) + f 4 (M) Corvin Hansch, 1964 Hansch approach Lipophilic properties Electronic properties Steric properties Other molecular properties 1 2 3 4

14 Hansch approach 1 Congenericity approach 2 Linear additive scheme 3 Limited representation of global molecular properties 4 No 3D and conformational information

15 Free-Wilson approach 12

16

17 Free-Wilson, 1964 F Br I F Br I Pos. 1Pos. 2 I ks absence/presence of k-th subst. in the s-th site

18 Fragment approach Fingerprints binary vector 1 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 presence of a fragmentabsence of a fragment similarity searching

19 Molecular graph 1234 56 7

20 Mathematical object defined as G = ( V, E ) set V set V vertices et E set E edges 1234 56 7 atomsbonds

21 Usually in the molecular graph hydrogen atoms are not considered H - depleted molecular graph Molecular graph

22 A walk in G is a sequence of vertices w = (v 1, v 2, v 3,..., v k ) such that {v j, v j+1 }  E. The length of a walk is the number of edges traversed by the walk. A path in G is a walk without any repeated vertices. The length of a path (v 1, v 2, v 3,..., v k+1 ) is k. v 1 v 2 v 3 v 2 v 5 walk of length 4 v 1 v 2 v 3 v 4 v 5 path of length 4 1 23 4 5 6 Molecular graph

23 The topological distance d ij is the length of the shortest path between the vertices v i and v j. 1 23 4 5 6 d 15 = 2 The detour distance  ij is the length of the longest path between the vertices v i and v j.  15 = 4

24 Molecular graph A self returning walk is a walk closed in itself, i.e. a walk starting and ending on the same vertex. A cycle is a walk with no repeated vertices other than its first and last ones (v 1 = v k ). v 1 v 2 v 3 v 2 v 1 Self returning walk of length 4 1 23 4 5 6 v 2 v 3 v 4 v 5 v 2

25 Molecular graph The molecular walk (path) count MWCk (MPCk) of order k is the total number of walks (paths) of k-th length in the molecular graph. MWC0 = nSK (no. of atoms) MWC1 = nBO (no. of bonds)  Molecular size  Branching  Graph complexity DRAGON MWC1, MWC2, …, MWC10

26 Molecular graph The self-returning walk count SRWk of order k is the total number of self-returning walks of length k in the graph. spectral moments of the adjacency matrix, i.e. linear combinations of counts of certain fragments contained in the molecular graph, i.e. embedding frequencies. SRW1 = nSK SRW2 = nBO DRAGON SRW1, SRW2, …, SRW10

27 Molecular graph Local vertex invariants (LOVIs) are quantities associated to each vertex of a molecular graph. Graph invariants are molecular descriptors representing graph properties that are preserved by isomorphism. ® characteristic polynomial ® derived from local vertex invariants

28 Molecular graph and more Molecular graph Topological matrix Algebraic operator Local Vertex Invariants Graph invariants Molecular descriptors

29 molecular graph graph invariants Wiener index, Hosoya Z index Zagreb indices, Mohar indices Randic connectivity index Balaban distance connectivity index Schultz molecular topological index Kier shape descriptors eigenvalues of the adjacency matrix eigenvalues of the distance matrix Kirchhoff number detour index topological charge indices............... Wiener index, Hosoya Z index Zagreb indices, Mohar indices Randic connectivity index Balaban distance connectivity index Schultz molecular topological index Kier shape descriptors eigenvalues of the adjacency matrix eigenvalues of the distance matrix Kirchhoff number detour index topological charge indices............... total information content on..... mean information content on..... total information content on..... mean information content on..... Kier-Hall valence connectivity indices Burden eigenvalues BCUT descriptors Kier alpha-modified shape descriptors 2D autocorrelation descriptors............... Kier-Hall valence connectivity indices Burden eigenvalues BCUT descriptors Kier alpha-modified shape descriptors 2D autocorrelation descriptors............... 3D-Wiener index 3D-Balaban index D/D index............... 3D-Wiener index 3D-Balaban index D/D index............... topological information indices topostructural descriptors topochemical descriptors molecular geometry x, y, z coordinates topographic descriptors

30 Molecule graph invariants Numerical chemical information extracted from molecular graphs. The mathematical representation of a molecular graph is made by the topological matrices: adjacency matrix atom connectivity matrix atom connectivity matrix distance matrix distance matrix edge distance matrix edge distance matrix incidence matrix incidence matrix... more than 60 matrix representations of the molecular structure

31 Local vertex invariants (LOVIs) are quantities associated to each vertex of a molecular graph. Examples: atom vertex degree atom vertex degree valence vertex degree valence vertex degree sum of the vertex distance degree sum of the vertex distance degree maximum vertex distance degree maximum vertex distance degree Local vertex invariants

32 Topological matrices Adjacency matrix Derived from a molecular graph, it represents the whole set of connections between adjacent pairs of atoms. a ij = 1 if atom i and j are bonded 0 otherwise

33 Bond number B It is the simplest graph invariant obtained from the adjacency matrix. It is the number of bonds in the molecular graph calculated as: where a ij is the entry of the adjacency matrix. Topological matrices

34 atom vertex degree It is the row sum of the vertex adjacency matrix 000 1 000 0 0 0 0 1 0 0 111 111 0 1 000000 1 000000 0 1 00000 1 000000 1234567 2 1 3 4 5 6 7 1 4 3 1 1 1 1 1234 56 7 Local vertex invariants

35 number of valence electrons of the i-th atom number of hydrogens bonded to the i-th atom valence vertex degree for atoms of the 2nd principal quantum number (C, N, O, F)

36 Local vertex invariants the vertex degree of the i-th atom is the count of edges incident with the i-th atom, i.e. the count of  bonds or  electrons. valence vertex degree

37 Local vertex invariants total number of electrons of the i-th atom (Atomic Number) for atoms with principal quantum number > 2

38 Topological descriptors Zagreb indices (Gutman, 1975)  i vertex degree of the i-th atom

39 Topological descriptors Kier-Hall connectivity indices (1986) Randic branching index (1975) They are based on molecular graph decomposition into fragments (subgraphs) of different size and complexity and use atom vertex degrees as subgraph weigth. is called edge connectivity

40 Topological descriptors mean Randic branching index

41 Topological descriptors atom connectivity indices of m-th order m Pnumber of m-th order paths qsubgraph type (Path, Cluster, Path/Cluster, Chain) n = mfor Chain (Ring) subgraph type n = m + 1 otherwise The immediate bonding environment of each atom is encoded by the subgraph weigth. The number of terms in the sum depends on the molecular structure. The connectivity indices show a good capability of isomer discrimination and reflect some features of molecular branching.

42 They encode atom identities as well as the connectivities in the molecular graph. valence connectivity indices of m-th order Topological descriptors

43 Kier-Hall electronegativity correlation with the Mulliken-Jaffe electronegativity: principal quantum number Kier-Hall relative electronegativity electronegativity of carbon sp 3 taken as zero

44 Distance matrix vertex distance matrix degree s i It is the row sum of the vertex distance matrix 1234 56 7 The distance d ij between two vertices is the smallest number of edges between them. 2321032 2 2 0 012 2 111 1112 1320323 1223032 3122303 1223230 1234567 2 1 3 4 5 6 7 13 3 8 2 9 2 14 3 13 3 14 3 13 3 sisi ii s i is high for terminal vertices and low for central vertices

45 The eccentricity  i of the i-th atom is the upper bound of the distance d ij between the atom i and the other atoms j Local vertex invariants

46 Topological descriptors Petitjean shape index (1992) A simple shape descriptor I PJ = 0for structure strictly cyclic I PJ = 1for structure strictly acyclic and with an even diameter

47 Topological descriptors Wiener index (1947) high values for big molecules and for linear molecules low values for small molecules and for branched or cyclic molecules The Average Wiener index is independent from the molecular size. d ij topological distances

48 Topological descriptors Balaban distance connectivity index (1982) B number of bonds C number of cycles s i sum of the i-th row distances one of the most discriminant indices average sum of the i-th row distances number of atoms

49 1234 56 7 Edge descriptors abc de f 212101 2 1 0 011 1 11 122 211021 121202 121120 abcdef b a c d e f 7 2 5 1 7 2 8 2 7 2 EsiEsi EiEi a b c d e f atom bond

50 Some geometrical descriptors are derived from the corresponding topological descriptors substituting the topological distances d st by the geometrical distances r st. topographic descriptors They are called topographic descriptors. Topographic descriptors For example, the 3D-Wiener index:

51 The geometry matrix G (or geometric distance matrix) is a square symmetric matrix whose entry r st is the geometric distance calculated as the Euclidean distance between the atoms s and t: Molecular geometry

52 Department of Environmental Sciences University of Milano - Bicocca P.za della Scienza, 1 - 20126 Milano (Italy) Website: michem.disat.unimib.it/chm/ THANK YOU Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria decision making environmetrics experimental design artificial neural networks statistical process control Milano Chemometrics and QSAR Research Group

53 coffee break

54 Goal

55

56 Molecular graph

57

58 Molecule graph invariants

59 Molecular graph

60

61

62

63

64

65 Hansch molecular descriptors partition coefficients - logP, logKow chromatog. param. - Rf, RT, Solubility …. Hammett constants molar refraction dipole moment HOMO, LUMO Ionization potential …. molecular weight VDW volume molar volume surface area …. lipophilic properties steric properties electronic properties Hansch approach

66 Molecular graph

67

68


Download ppt "Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria."

Similar presentations


Ads by Google