Download presentation
Presentation is loading. Please wait.
Published byBethanie Cummings Modified over 8 years ago
1
1 Chemical Structure Representation and Search Systems Lecture 2. Oct 30, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software & Consultancy Services Sheffield, UK
2
2 Lecture 2: Topics to be Covered Problems for chemical structure representation aromaticity tautomerism multi-centre bonds stereochemistry organometallics and inorganics macromolecules and polymers incompletely-defined substances o Markush Structures
3
3 Structure diagrams and topological graphs
4
4 useful analogy, but not a perfect one identical graphs identical molecules different graphs different molecules realities of chemical structures cause problems
5
5 Aromaticity electronic property of certain ring systems, giving enhanced chemical stability bonds in aromatic rings have properties that are distinct from single and double bonds generally accepted definition is Hückel rule 4n+2 pi-electrons (n is a small integer) there are borderline cases aromaticity causes problems for computer representation different systems deal with it in different ways
6
6 Aromaticity problems using single and double bonds can give different topological graphs for the same compound one solution is to use an aromatic bond type
7
7 Alternating bonds and aromaticity Chemical Abstracts Registry System uses a “normalised” bond type for all rings with alternating single and double bonds this includes some systems that are not aromatic (8 ≠ 4n+2) and omits some that are
8
8 Representing aromaticity some systems represent aromaticity as an atom property SMILES allows use of lower-case atomic symbols for aromatic atoms (adjacent aromatic atoms are assumed to be joined by aromatic bonds) problem is that aromaticity is really a ring property
9
9 Aromaticity: problem areas Aromaticity is sometimes a matter of degree or opinion Aromatic envelope rings Outer ring has 10 = 4n+2 pi electrons fusion bond is not aromatic Exocyclic bonds: right ring has 6 pi electrons 2 from usp, 2 from bond in ring, 2 from bonds in left ring and 0 from exocyclic bond to O)
10
10 Tautomerism dynamic equilibrium between positional isomers (labile H) are they different compounds? answer depends on what you want to do with them can use normalised bonds to represent them by a single graph gets mixed up with ring alternating bonds some tautomers may be aromatic, when others are not
11
11 Tautomerism tautomerism is a matter of degree tautomers can be defined in different ways HQ–X=R Q=X–RH only certain elements can be Q, X or R o keto-enol tautmers are not recognised by Chemical Abstracts o mono-unsaturated carbon chains are not distinguished by Daylight
12
12 Structure conventions sometimes called “business rules” some chemical groups can be shown in different but equally valid ways conventions are needed to determine which is preferred software may be needed to convert to preferred form
13
13 Structure conventions Getting the structure representation “right” can be very important automatic property prediction o wrong tautomeric form can give poor prediction of solubility, acid dissociation constant etc. receptor site docking o molecular modelling programs “dock” small molecules into protein receptor sites, and calculate score based on hydrogen- bond interactions, charges etc. o wrong ionisation state / tautomer can give misleading results
14
14 Multi-centre bonds sometimes bonds involve more than 2 atoms graph edges always involve exactly 2 e.g. ferrocene most systems fudge this sort of structure bond to arbitrary carbon bonds to all 5 carbons bond to dummy atom placed in ring o which itself has dummy bonds to ring atoms
15
15 Stereochemistry different compounds with identical connectivity same topology, different topography S-tyrosine R-tyrosine
16
16 Stereochemistry configuration is often unknown or partially known (relative stereochemistry) or you may have a mixture of stereoisomers o in which one isomer may occur in enantiomeric excess many different descriptors used by chemists wedge (up) and hatched (down) bonds in structure diagrams Cahn, Ingold, Prelog (CIP) designators (R, S, E, Z) text-based descriptors (stereoparent, or optical rotation)
17
17 Stereochemistry: up/down bonds can be used as additional “colours” for graph edges many connection table formats have special codes for up and down bonds need to know which end of bond is which useful for re-generating diagrams for display can be used to calculate other stereo descriptors
18
18 Up/down bond problems different patterns of up/down bonds can show the same stereo- isomer different graphs, same molecule some patterns of up and down bonds actually convey no useful information about configuration
19
19 Stereochemistry: CIP designators R.S. Cahn, C. Ingold, and V. Prelog, Angewandte Chemie Intl. Ed. in English 1966, 5, 385-551 one-letter designator for stereocentres based on rules assigning priorities to groups around it tetrahedral carbons (R, S) double bonds (E, Z) additional colours for graph nodes or edges useful for distinguishing stereoisomers when absolute configuration is known less useful for matching parts of structures (substructure search) as priority rules can cause designator to change when remote part of structure is changed
20
20 Stereochemistry: ordered “stereovertex” lists define order of neighbours around stereocentre there are two sets of equivalent orders, corresponding to the two configurations of a tetrahedral carbon atom A B C D A D B C A C D B B C A D B D C A B A D C C A B D C D A B C B D A D A C B D B A C D C B A A D C B A C B D A B D C B A C D B D A C B C D A C B A D C D B A C A D B D A B C D C A B D B C A neighbours are listed around a right-handed spiral
21
21 Stereochemistry: stereovertex lists Two alternative approaches: 1.Geometric ordering List neighbours of stereo centre in a predefined order for the geometry (e.g. right-handed spiral) Advantages: ordering is locally-defined (rest of molecule is irrelevant) stereocentre need not be a single atom Disadvantage: equivalent orderings need to be defined
22
22 Stereochemistry: stereovertex lists 2.Parity value most common used approach in practice list neighbours according to an ordering rule atom numbers in connection table CIP priority rules decide which geometry they conform to right-handed (clockwise) or left-handed (anti- clockwise) spiral record this as parity value on stereocentre CIP R and S designators are an example of this potential disadvantage: ordering rule may be globally defined (rest of molecule is relevant)
23
23 Stereochemistry: parity values MDL formats: number atoms around stereo centre with 1, 2, 3, and 4 in order of increasing connection table atom number o “implicit” hydrogen atom is considered to be atom 4 view stereo centre so that the bond to atom 4 projects behind the plane formed by atoms 1, 2, and 3 if numbers increase: o clockwise: parity value is 1 o anti-clockwise, parity is 2 parity value stored at node for stereo centre atom o parity 0 = not stereo o parity 3 = unknown stereo
24
24 Stereochemistry: parity value Stereochemistry in SMILES clockwise/anticlockwise approach, like MDL atoms are numbered according to sequence of atoms in SMILES view from first atom (instead of toward last atom as in MDL) if other three atoms are anticlockwise – use @ if other three atoms are clockwise – use @@ OC(=O)[C@H](N)CC1=CC=C(O)C=C1 OC(=O)[C@@H](N)CC1=CC=C(O)C=C1
25
25 Double bond stereochemistry depiction of double bonds in a structure diagram usually implies either cis or trans configuration MDL files use bond type code to indicate 0: use 2D atom co-ordinates to determine cis/trans 3: double bond stereochemistry not specified (other code values are used for up/down/either single bonds)
26
26 Double bond stereo in SMILES / and \ used as “directional” single bonds only meaningful when used on both atoms of a double bond several ways of showing same configuration
27
27 Stereovertex lists for double bonds neighbours of stereocentre have rectangular geometry A B C D B C D A C D A B D A B C A C B D B D A C C B D A D A C B neighbours are listed around a right-handed spiral (clockwise)
28
28 Other stereochemistry geometries Many coordination complexes have other stereochemical geometries e.g. there are special SMILES rules for these specification of equivalent geometric orderings defines symmetry properties of each geometry
29
29 Stereochemistry of biphenyls some stereoisomers occur because of sterically-hindered rotation of a single bond o stereocentre is C–C bond here geometric ordering of neighbours of stereocentre can specify configuration 3 1 4 2
30
30 Allene stereochemistry anti-rectangle geometry also applies to allene configuration stereocentre is C=C=C group
31
31 Stereochemistry: conclusions Many different systems in use Interconversions between different representations not always easy e.g. wedge bonds → CIP descriptors Several problems remain incomplete/partially-defined stereochemistry “knotted” structures, helices etc. B. Rohde, “Representation and manipulation of stereochemistry”, in J. Gasteiger (Ed.) Handbook of Chemoinformatics, Vol 1, pp. 206-230. Wiley, 2003
32
32 Other representation complications Organometallic and co-ordination compounds complex stereochemistry special bond types may be needed (dative bonds etc.) ambiguity over covalent/ionic character of bonds o “business rules” rules usually needed Inorganic compounds topological representation often not possible composition may not involve integral ratios between elements
33
33 Macromolecules in principle can represent all atoms, as for small molecules some systems use “shortcuts” or “superatoms” for subunits (e.g. amino acids)
34
34 Macromolecules Each shortcut is defined with appropriate attachment points ordinary atoms can be mixed with shortcuts system can expand shortcuts when needed
35
35 Polymers special problems are presented because properties of polymer can be affected by polymerisation conditions average number of subunits extent of cross-linking ratio between different subunits random / block sequences of subunits etc. Two main approaches monomer representation structural repeating unit (SRU) representation
36
36 Polymers Monomer-based representation show original monomer(s) and describe polymerisation conditions in text notes SRU-based representation show repeating units (as shortcuts), with details of length etc. generally more satisfactory for structure search complications when composition is incompletely defined
37
37 Incompletely-defined substances unknown stereochemistry unknown attachment position unknown repetition
38
38 Markush (“Generic”) structures structures with R-groups shorthand for describing sets of structures with common features
39
39 Markush structures also called “generic” structures very important in chemical patents o inventor claims whole class of related compounds can be used to describe combinatorial libraries can be used as queries in database searches will be discussed in more detail in lecture 5 (Nov 13)
40
40 Conclusions from Lecture 2 analogy between chemical structures and topological graphs is not perfect and many problems arise in situations where it breaks down aromaticity and tautomerism stereochemistry additional complications arise in representing some classes of molecule inorganic and coordination compounds macromolecules and polymers incompletely-defined substances
41
41 Lecture 3: Topics to be Covered More Graph Theory Structure Analysis and Processing canonicalisation and symmetry perception ring perception functional group identification structure fingerprints and fragments structure depiction principles of structure searching
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.