Presentation on theme: "Substructures and Patterns in 2-D Chemical Space Danail Bonchev Department of Mathematics and Applied Mathematics and Center for the Study of Biological."— Presentation transcript:
Substructures and Patterns in 2-D Chemical Space Danail Bonchev Department of Mathematics and Applied Mathematics and Center for the Study of Biological Complexity Virginia Commonwealth University Workshop CCSWS2: Optimization, Search and Graph-Theoretical Algorithms for Chemical Compound Space, IPAM, UCLA, 11-15 April, 2011
Molecular Properties and Graph Theory H. Wiener JACS 69(1947)17; JPC 52 (1948) 1082 – empirical equations – “path number” H. Hosoya Bull Chem Soc Japan 44 (1971) 2332 – reformulation in graph theory terms E. Smolenski Zh. Fiz. Khim. 38 (1964) 700 M. Gordon and J. W. Kennedy, J Chem Soc Faraday Trans II 69 (1973) 484. H i - the number of subgraphs of k nodes TI i – topological invariant
Molecular Connectivity Concept Randic, 1975: Kier and Hall, 1976 where SG = p (path), c (cluster), pc (path-cluster), etc. path cluster path-cluster
Z k - the total number of electrons in the k-th atom - the number of valence electrons in the k-th atom H k - the number of hydrogen atoms directly attached to the kth non-hydrogen atom m = 0 - atomic valence connectivity indices m = 1 - one bond path valence connectivity indices m = 2 - two bond fragment valence connectivity indices m = 3 three contiguous bond fragment valence connectivity indices etc. B. Kier, L. H. Hall, Eur. J. Med. Chem., 1977, 12, 307. Kier and Hall Valence Connectivity Indices - Valence connectivity for the k-th atom in the molecular graph Definition: The success of molecular connectivity indices Why they work so well?
From Molecular Connectivity to Overall Topological Indices 1986, Bertz & Herndon – the idea for using the total subgraph count as a similarity measure Bertz, S.; Herndon, W. C. In Artificial Intelligence Applications in Chemistry; ACS: Washington, D.C., 1986, pp.169-175. 1995-1997, Bonchev/Bertz – a subgraph count-based measure of structural complexity D. Bonchev, Bulg. Chem. Commun. 28, 567-582(1995). D. Bonchev, SAR QSAR Environ. Res. 7, 23-43(1997). Bertz, S. H. and Sommer, T. J. Chem. Commun. 2409-2410(1997). S. H. Bertz and W. F. Wright, Graph Theory Notes New York Acad. Sci. 32-48 (1998). D. Bonchev, In: Topological Indices and Related Descriptors, J. Devillers and A.T. Balaban, Eds., Gordon and Breach, Reading, U.K., 1999, p. 361-401. D. Bonchev, J. Chem. Inf. Comput. Sci. 40, 934-941(2000). D. Bonchev, J. Mol. Graphics Model., 5271 (2001) 1-11. D. Bonchev, N.Trinajstić, SAR QSAR Environ. Res. 12 (2001) 213-235. D. Bonchev,J. Chem. Inf. Comput. Sci., 41(2001) 582-592. D. Bonchev, Lect. Ser. Computer and Computational Sciences, 4, 1554-1557 (2005). 1995-2005, Bonchev – overall topological indices
From Subgraph Count To Overall Topological Indices The idea: Weight all subgraphs with graph-invariant values and sum-up to characterize the structure as a whole. Sum-up weighted subgraphs having the same number of edges to capture different levels of graph complexity. Motivation: The more complete the molecular structure representation, the better it captures the patterns of structural complexity, the more distinctive the topological descriptor, the more accurate the structure -property relationship.
Definition 1: The Overall Topological Index OTI(G) of any graph G is defined as the sum of the topological index values TI i (G i ) of all K subgraphs G i of G : Definition 2: The e th -order Overall Topological Index e OTI(G) of any graph G is defined as the sum of the topological index values TI j ( e G j ) of all e K subgraphs e G j of G, which have e edges: The Overall Topological Complexity Indices K j j e j e e GGOTIG 1 ) ()(
Corollary 1: The Overall Topological Index OTI(G) of any graph G can be presented as a sum over all e-orders of this index e OTI(G): Some More Definitions Definition 3: The Overall Topological Index Vector OTIV(G) of any graph G is the ordered sequence of all e OTIs: OTIV(G) = OTI( 1 OTI, 2 OTI, …, E OTI) Corollary 2: The E -order overall topological index, E OTI(G), is the index TI (G) itself: E OTI (G) = TI(G)
Even More Definitions Definition 4a: The average overall topological index OTI a (G), and its e-order term e OTI a (G e ) are obtained by dividing OTI(G) or e OTI(G) by the number of vertices V: Definition 4b: The normalized overall topological index OTI n (G), and its e-term e OTI n (G e ), are obtained by dividing OTI (G) by the value OTI(K V ) that index has for the complete graph K V having the same number of vertices V:
Aren’t You Tired of Definitions? The overall topological indices work well for molecules but what about networks? Computational disaster! Definition 5: The cumulative p th -order overall index p OTI(G) is defined as the sum over the first e = 0, 1, 2, …, p orders e OTI(G)s The Solution: Use the first several orders of the OTIs !
How to Apply the Overall Topological Indices Approach to Molecules with Heteroatoms? For OTI(G) ≡ OC, OM1, OM2 (overall connectivity and the first and second Zagreb index) substitute vertex degree a i with the Kier and Hall atomic valence term a i v : Example for overall connectivity index, OC:
Topological Indices Used in Realizing the Overall Indices Program Total Adjacency, A(G): - degree of vertex i; N – number of vertices in G First Zagreb Index, M1(G): Second Zagreb Index, M1(G): - distance of vertex i - distance between vertices i and j; Wiener Number, W(G):
The Overall Hosoya Index The Overall Hosoya Index OZ(G) The Hosoya Index z(G): H. Hosoya, Bull. Chem. Soc. Japan 44, 2332-2339 (1971). p (G,k) is the number of not adjacent k edges in G, p(G,0) being unity and p(G,1) the number of edges.
Formulae for the Overall Indices for Some Classes of Graphs Monocyclic Graphs Linear (Path) Graphs e SC(P n ) = n – e; SC(P n ) = n(n+1)/2 e OC(P n ) = 2[q(e+1) - e 2 ] ; OC(P n ) = n(n-1)(n+4)/3 e OW (P n ) = e (e+1)(e+2)(n-e)/6 ; OW(P n ) = (n+3)(n+2)(n+1)n(n-1)/120 n – total number of vertices ; q – total number of edges ; e – number of edges in a subgraph e SC – number of subgraphs having e edges each e SC(C n ) = n ; q SC(C n ) = 1 ; SC(C n ) = n 2 + 1 e OC(C n ) = 2n(e+1) ; q OC(C n ) = 2n ; OC(C n ) = n(n 2 +n+2) e OW(C n ) = e (e+1)(e+2)n/6 (for e = 1, 2, …, n-1) ; q OW(C n ) = W OW(C n ) = (n 5 +2n 4 +2n 3 -2n 2 -an)/24 ; a(even) = 0 ; a(odd) = 3
Total Walk Count, twc Example 5 4 1 3 2 WC = 106 ( 8, 16, 28, 54) 1 3 l = 1 l =2 1 3 1 3 4 l = 3 1 3 4 1 3 1 3 4 5 Rucker, G. & Rucker, C., J.Chem. Inf. Comput. Sci. (2000), 40, 99-106. Rucker, G. & Rucker, C., J.Chem. Inf. Comput. Sci. (2001) 41, 1457-1462. -The number of walks of length l that start in vertex i -The total number of walks of length l
The Six Overall Topological Indices Order Structures According to Patterns of Increasing Complexity 1 (4) 2 (14) 3 (32) 4(39) 5(60) 6 (76) 7 (100) 8(100) 9(127) 10 (136) 11 (164) 12 (181) 13 (154) 14 (194) 15 (214) 16 (234) 17 (246) 18 (276) 19 (284) 20 (314) 21 (369) # (OC)
Graph s SCOCOWOM1OM2OZ 1341114 261462210 3 3221562621 4113924872723 51560561106040 61776671686746 720100802926852 82110012618813072 92412715427714984 102513616130016189 1128164188404172100 1230181197505168103 1328154252294272125 1432194311418315147 1534214 333468351159 1636234354516390172 1737246384584366173 1840276411668410191 1941284414762370185 2044314440850412202 21493695101075433225 Table 1. Quantitative Comparison of the Six Overall Topological Indices in C2-C7 Alkanes
Table 2. Standard deviations of the best C3-C8 alkane properties models with five parameters produced by the six overall topological indices versus those obtained by the set of molecular connectivity indices PropertiesCorrelation Coefficient, R Standard SD of the best molecular Deviation, SD connectivity models Boiling Point, C 0.9993 1.60 3.31 Heat of Formation, kJ/mol 0.9995 1.02 1.37 Heat of Vaporization, kJ/mol 0.9950 0.67 0.79 Heat of Atomization, kcal/mol 1.0000 0.30 5.78 Surface Tension, dyn/cm 0.9963 0.17 0.22 Molar Volume*, cm 3 /mol 0.9999 0.23 0.36 Molar Refraction, cm 3 /mol 1.0000 0.041 0.044 Critical Volume, L/mol 0.9948 0.0079 0.0087 Critical Pressure, atm 0.9955 0.37 0.50 Critical Temperature*, C 0.9983 3.23 4.76
SC 28 (5, 8, 9, 5, 1) 30 (5, 9, 10, 5, 1) OC (in) 111 (12, 28, 41, 25, 5) 135 (16, 40, 49, 25, 5 ) TWC 15 (5, 5, 5) 21 (5, 7, 9) 1 2 The Overall Complexity Measures Can Discriminate Very Subtle Complexity Features Complexity of structure 2 is higher, because it has more complex cycle Cyclicity contributes more to complexity than Branching
Some Conclusions While the six topological indices used show degeneracy and order differently the isomeric molecules, the overall indices are non-degenerate and order similarly the molecules in series of increasing complexity. The sets of overall topological indices produce QSPR models with (sometimes considerably) smaller standard deviations than the corresponding models with molecular connectivity indices. The best model statistics is shown by overall connectivity, followed by the overall Wiener indices. The patterns of structural complexity deserve considerable attention due to their generality
Molecular Branching Wiener, 1947: First analyzed some aspects of branching of molecular skeleton by fitting experimental data for several properties of alkane compounds to the diversion of his “path number” W in branched alkanes from that of the linear isomeric compound. Graph-invariants tested early as “branching indices” of acyclic molecules correlating to their properties: Graph non-adjacency number, Hosoya, 1971 Graph largest eigenvalue, Lovasz and Pelikan, 1973 First and second Zagreb indices, Gutman et al., 1975 Molecular branching index, Randić, 1975 Rouvray, D.H. and King, P.B., Eds.,Topology in Chemistry. Discrete Mathematics of Molecules. Horwood, Chichester, U.K. 2002. Wiener, H. Structural Determination of Paraffin Boiling Points. J. Am. Chem. Soc. 1947, 69, 17-20. Relation of the Physical Properties of the Isomeric Alkanes to Molecular Structure. J. Phys. Chem. 1948, 52, 1082-1089.
The Goal: To go beyond inventing new graph invariants and experimental data fitting, and try to understand the topological basis of molecular properties. The Hypothesis: The increase in branching complexity is associated with a decrease in the Wiener number W. D. Bonchev and N. Trinajstic, On Topological Characterization of Molecular Branching. Intern. J. Quantum Chem. Symp. 12(1978)293 ‑ 303. D. Bonchev, and N. Trinajstic, Information Theory, Distance Matrix, and Molecular Branching, J. Chem. Phys. 67(1977) 4517 ‑ 4533. D. Bonchev, Topological Order in Molecules. 1. Molecular Branching Revisited, Theochem 336(1995)137-156. The Branching Patterns of Molecular Structures
Rule 1: (N – number of vertices in the main chain; j – branch position) Rule 2: Rule 3: (N 1 - number of vertices in the branch) Rule 4: The Rules of Branching
u v u1u1 v1v1 Generalization of the Branching Rules 5 more general rules derived: three mechanisms of formation of new branches, one with branch transformations related to a vertex degree redistribution, one shows the topological identity of branch elongation to branch shifting toward a more central position. The number of branches and the number of vertices of higher degree are considerably stronger complexity factors than the branch length and branch centrality, however the role of centrality increases with the size of the system, and becomes dominant in polymeric macromolecules. Conclusions: D. Bonchev, Topological Order in Molecules. 1. Molecular Branching Revisited, Theochem 336(1995)137-156. O. E. Polansky and D. Bonchev, Commun. Math. Comput. Chem. (MATCH) 1986, 21, 133 ‑ 186; 1990, 25, 3 ‑ 40.
Molecular Cyclicity Similar conjecture: All structural patterns that increase the cyclic complexity of molecules are associated with a decrease in the Wiener number. Bonchev, Mekenyan, Trinajstic, 1979-1983 stronger link between the cycles Cyclic complexity increases by: A) 26 2 x 14 3 x 10 4 x 8 6 x 6 reduction in the cycle size for the creation of more cycles of smaller size B)
Papers for cyclic complexity: Intern. J. Quantum Chem. 1980, 17, 845 ‑ 89; 1981, 19, 929 ‑ 955. Math. Comput. Chem. (MATCH) 1979, 6, 93 ‑ 115; 1981, 11, 145 ‑ 168; Croat. Chem. Acta 1983,56, 237 ‑ 261. transforming a linear chain of cycles into a zigzag-like one C) D) increasing the number of cycles fused to a common edge (propelerity) LUMO HOMO LUMO HOMO ΔW 0ΔW < 0, ΔE < 0 Rules 3, 5-7, 9, 10, 12-15 Rule 1 E) With a single exception the 15 rules derived for benzenoid hydrocarbons identify structural transformations that increase their stability
Topology of Polymers Wiener “infinite” index: the limit for the Wiener number of a polymer having N non-H atoms, normalized per unit distance and unit bond: (N – number of atoms, C – number of cycles) For structure 9: (Bonchev, Mekenyan et al., 1980-1983)
A simple equation incorporating only topological invariants of the monomer unit was derived 10 years later. These are the numbers of atoms N 1 and cycles C 1 in the monomer unit, as well as the number of bonds D (or the graph distance) between two neighboring monomer units: Examples: D = 2, N 1 = 4, C 1 = 1, = 2/15; d = 4, N 1 = 6, C 1 = 1,= 4/21 Improved Method for Calculating Wiener Infinite Index T.-S. Balaban, A. T. Balaban, and D. Bonchev, J. Mol. Structure (Theochem) 2001, 535, 81-92
Equations linking the Wiener number to the radius of gyration and viscosity of polymer melts and solutions = (3x1+3x2) / (3x1 + 2x2 + 1x3) = 9/10 = 0.9 g (3-arm star) = is the friction coefficient, and c is the number of polymer chains in a unit volume g is the Zimm-Stockmayer branching ratio of a branched macromolecule R g 2 and g are measured by laser light scattering D. Bonchev, E. Markel, and A. Dekmezian, J. Chem. Inf. Comput. Sci. 2001, 41, 1274-1285. D. Bonchev, E. Markel, and A. Dekmezian, Polymer 2002, 43, 203-222. Kirchhoff-number-based generalization of the equations for polymers containing atomic rings
D. Bonchev, O. Mekenyan, and H. Fritsche, An Approach to the Topological Modeling of Crystal Growth, J. Cryst. Growth 1980, 49, 90 ‑ 96. D. Bonchev, O. Mekenyan, and H. Fritsche, A Topological Approach to Crystal Vacancy Studies. I.Model Crystallites with a Single Vacancy, Phys. stat. sol. (a) 1979, 55, 181 ‑ 187. O. Mekenyan, D. Bonchev, and H. Fritsche, A Topological Approach to Crystal Vacancy Studies. II. Model Crystallites with Two and Three Vacancies, Phys. Stat. sol.(a) 1979, 56, 607 ‑ 614. O. Mekenyan, D. Bonchev, and H. Fritsche, A Topological Approach to Crystal Defect Studies, Z. Phys. Chem. (Leipzig) 1984, 265, 959 ‑ 967. H. G. Fritsche, D. Bonchev, and O. Mekenyan, Deutung der Magischen Zahlen von Argonclustern als Extremwerte Topologischer Indizes, Z. Chem. 1987, 27, 234. H. G. Fritsche, D. Bonchev, and O. Mekenyan, On the Topologies of (M 13 ) 13 Superclusters of Ruthenium, Rhodium and Gold, J. Less ‑ Common Metals 1988, 141, 137 ‑ 143. H. G. Fritsche, D. Bonchev, and O. Mekenyan, Are Small Clusters of Inert ‑ Gas Atoms Polyhedra of Minimun Surfaces? Phys. Stat. Sol.(b) 1988, 148K, 101 ‑ 104. H. G. Fritsche, D. Bonchev, and O. Mekenyan, The Optimum Topology of Small Clusters, Z. Phys. Chem. (Leipzig) 1989, 270, 467 ‑ 476. H. G. Fritsche, D. Bonchev, and O. Mekenyan, A Topological Approach to Studies of Ordered Structures of Absorbed Gases in Host Lattices (I). The Structure of ‑ PdD0.5, Crystal Res. Technol. 1983, 18, 1075 ‑ 1081. Topology of Crystals Basic criterion used: Wiener number minimum
Crystal Growth Reproduced shape maximally close to the spherical shape typical for the free nucleation in vapor phase, and crystallization under zero-gravity conditions: W=1 W=8 W=48 W=369 W=972 W=5536 The detailed sequences of crystal growth were constructed by adding an atom at each step, and by selecting from a number of candidate-structures the one with the minimum Wiener number.
Crystallization on a substrate with a low surface energy. The crystallization on a substrate with a high surface energy also reproduced the experimentally observed monolayer shape.
Prediction of the most probable locations of crystal vacancies and defect atoms Criterion used: Equations derived for a series of two- and three-dimensional models of crystal lattice with variable vacancy locations. For a simple cubic crystallite having N = 3x3x3 atoms, the variation in the Wiener number is expressed as: where i, j, k are the lattice nodes along the x, y, and z coordinate axes, respectively. ΔW increases when going from volume to face to edge to corner in agreement with thermodynamic theory and quantum chemical calculations.
Modeling of Atomic Clusters The Wiener number minimum was used again as a criterion Adding one atom at a time over a certain crystal face and connecting this atom to all face atoms produced cluster genetic lines. Two of the genetic lines resulted in icosahedrons, two others yielded cubo-octahedron, and another line generated anticubo- octahedron in agreement with the experimental data. The minimum of the Wiener number in the icosahedron cluster also explained the “magic” number 13, for which a maximum intensity of cluster mass spectra has been observed. Predicted correctly the doubly magic metal super clusters [(M 13 ) 13 ] n, where M = ruthenium, rhodium or gold, as well as the stable argon clusters at the magic numbers 13, 19, 23, 26, 29, and 32.
B1 = A/D = / “Small-World “Connectivity Complex network properties: High Connectivity and Small Diameter They can be integrated into a single parameter: - average vertex degree; - average node distance B1 – a quick estimate of network complexity B2 – a much more precise complexity measure b i - a measure of node centrality