Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Chemical Structure Representation and Search Systems Lecture 2. Oct 30, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software.

Similar presentations


Presentation on theme: "1 Chemical Structure Representation and Search Systems Lecture 2. Oct 30, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software."— Presentation transcript:

1 1 Chemical Structure Representation and Search Systems Lecture 2. Oct 30, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software & Consultancy Services Sheffield, UK

2 2 Lecture 2: Topics to be Covered  Problems for chemical structure representation aromaticity tautomerism multi-centre bonds stereochemistry organometallics and inorganics macromolecules and polymers incompletely-defined substances o Markush Structures

3 3 Structure diagrams and topological graphs

4 4  useful analogy, but not a perfect one identical graphs identical molecules different graphs different molecules  realities of chemical structures cause problems

5 5 Aromaticity  electronic property of certain ring systems, giving enhanced chemical stability  bonds in aromatic rings have properties that are distinct from single and double bonds  generally accepted definition is Hückel rule 4n+2 pi-electrons (n is a small integer)  there are borderline cases  aromaticity causes problems for computer representation different systems deal with it in different ways

6 6 Aromaticity problems  using single and double bonds can give different topological graphs for the same compound  one solution is to use an aromatic bond type

7 7 Alternating bonds and aromaticity  Chemical Abstracts Registry System uses a “normalised” bond type for all rings with alternating single and double bonds this includes some systems that are not aromatic (8 ≠ 4n+2) and omits some that are

8 8 Representing aromaticity  some systems represent aromaticity as an atom property SMILES allows use of lower-case atomic symbols for aromatic atoms (adjacent aromatic atoms are assumed to be joined by aromatic bonds)  problem is that aromaticity is really a ring property

9 9 Aromaticity: problem areas Aromaticity is sometimes a matter of degree or opinion Aromatic envelope rings  Outer ring has 10 = 4n+2 pi electrons  fusion bond is not aromatic Exocyclic bonds: right ring has 6 pi electrons  2 from usp,  2 from bond in ring,  2 from bonds in left ring and  0 from exocyclic bond to O)

10 10 Tautomerism  dynamic equilibrium between positional isomers (labile H)  are they different compounds? answer depends on what you want to do with them  can use normalised bonds to represent them by a single graph gets mixed up with ring alternating bonds some tautomers may be aromatic, when others are not

11 11 Tautomerism  tautomerism is a matter of degree  tautomers can be defined in different ways HQ–X=R  Q=X–RH only certain elements can be Q, X or R o keto-enol tautmers are not recognised by Chemical Abstracts o mono-unsaturated carbon chains are not distinguished by Daylight

12 12 Structure conventions sometimes called “business rules” some chemical groups can be shown in different but equally valid ways conventions are needed to determine which is preferred software may be needed to convert to preferred form

13 13 Structure conventions Getting the structure representation “right” can be very important automatic property prediction o wrong tautomeric form can give poor prediction of solubility, acid dissociation constant etc. receptor site docking o molecular modelling programs “dock” small molecules into protein receptor sites, and calculate score based on hydrogen- bond interactions, charges etc. o wrong ionisation state / tautomer can give misleading results

14 14 Multi-centre bonds  sometimes bonds involve more than 2 atoms graph edges always involve exactly 2  e.g. ferrocene  most systems fudge this sort of structure bond to arbitrary carbon bonds to all 5 carbons bond to dummy atom placed in ring o which itself has dummy bonds to ring atoms

15 15 Stereochemistry  different compounds with identical connectivity  same topology, different topography S-tyrosine R-tyrosine

16 16 Stereochemistry  configuration is often unknown or partially known (relative stereochemistry) or you may have a mixture of stereoisomers o in which one isomer may occur in enantiomeric excess  many different descriptors used by chemists wedge (up) and hatched (down) bonds in structure diagrams Cahn, Ingold, Prelog (CIP) designators (R, S, E, Z) text-based descriptors (stereoparent, or optical rotation)

17 17 Stereochemistry: up/down bonds  can be used as additional “colours” for graph edges many connection table formats have special codes for up and down bonds need to know which end of bond is which  useful for re-generating diagrams for display  can be used to calculate other stereo descriptors

18 18 Up/down bond problems  different patterns of up/down bonds can show the same stereo- isomer different graphs, same molecule  some patterns of up and down bonds actually convey no useful information about configuration

19 19 Stereochemistry: CIP designators  R.S. Cahn, C. Ingold, and V. Prelog, Angewandte Chemie Intl. Ed. in English 1966, 5, 385-551  one-letter designator for stereocentres based on rules assigning priorities to groups around it tetrahedral carbons (R, S) double bonds (E, Z)  additional colours for graph nodes or edges useful for distinguishing stereoisomers when absolute configuration is known less useful for matching parts of structures (substructure search) as priority rules can cause designator to change when remote part of structure is changed

20 20 Stereochemistry: ordered “stereovertex” lists  define order of neighbours around stereocentre there are two sets of equivalent orders, corresponding to the two configurations of a tetrahedral carbon atom A B C D A D B C A C D B B C A D B D C A B A D C C A B D C D A B C B D A D A C B D B A C D C B A A D C B A C B D A B D C B A C D B D A C B C D A C B A D C D B A C A D B D A B C D C A B D B C A neighbours are listed around a right-handed spiral

21 21 Stereochemistry: stereovertex lists Two alternative approaches: 1.Geometric ordering List neighbours of stereo centre in a predefined order for the geometry  (e.g. right-handed spiral) Advantages: ordering is locally-defined (rest of molecule is irrelevant) stereocentre need not be a single atom Disadvantage: equivalent orderings need to be defined

22 22 Stereochemistry: stereovertex lists 2.Parity value most common used approach in practice list neighbours according to an ordering rule atom numbers in connection table CIP priority rules decide which geometry they conform to right-handed (clockwise) or left-handed (anti- clockwise) spiral record this as parity value on stereocentre CIP R and S designators are an example of this potential disadvantage: ordering rule may be globally defined (rest of molecule is relevant)

23 23 Stereochemistry: parity values MDL formats: number atoms around stereo centre with 1, 2, 3, and 4 in order of increasing connection table atom number o “implicit” hydrogen atom is considered to be atom 4 view stereo centre so that the bond to atom 4 projects behind the plane formed by atoms 1, 2, and 3 if numbers increase: o clockwise: parity value is 1 o anti-clockwise, parity is 2 parity value stored at node for stereo centre atom o parity 0 = not stereo o parity 3 = unknown stereo

24 24 Stereochemistry: parity value Stereochemistry in SMILES  clockwise/anticlockwise approach, like MDL  atoms are numbered according to sequence of atoms in SMILES  view from first atom (instead of toward last atom as in MDL) if other three atoms are anticlockwise – use @ if other three atoms are clockwise – use @@ OC(=O)[C@H](N)CC1=CC=C(O)C=C1 OC(=O)[C@@H](N)CC1=CC=C(O)C=C1

25 25 Double bond stereochemistry  depiction of double bonds in a structure diagram usually implies either cis or trans configuration  MDL files use bond type code to indicate 0: use 2D atom co-ordinates to determine cis/trans 3: double bond stereochemistry not specified (other code values are used for up/down/either single bonds)

26 26 Double bond stereo in SMILES / and \ used as “directional” single bonds only meaningful when used on both atoms of a double bond several ways of showing same configuration

27 27 Stereovertex lists for double bonds  neighbours of stereocentre have rectangular geometry A B C D B C D A C D A B D A B C A C B D B D A C C B D A D A C B neighbours are listed around a right-handed spiral (clockwise)

28 28 Other stereochemistry geometries  Many coordination complexes have other stereochemical geometries e.g.  there are special SMILES rules for these  specification of equivalent geometric orderings defines symmetry properties of each geometry

29 29 Stereochemistry of biphenyls  some stereoisomers occur because of sterically-hindered rotation of a single bond o stereocentre is C–C bond  here geometric ordering of neighbours of stereocentre can specify configuration 3 1 4 2

30 30 Allene stereochemistry  anti-rectangle geometry also applies to allene configuration stereocentre is C=C=C group

31 31 Stereochemistry: conclusions  Many different systems in use  Interconversions between different representations not always easy e.g. wedge bonds → CIP descriptors  Several problems remain incomplete/partially-defined stereochemistry “knotted” structures, helices etc. B. Rohde, “Representation and manipulation of stereochemistry”, in J. Gasteiger (Ed.) Handbook of Chemoinformatics, Vol 1, pp. 206-230. Wiley, 2003

32 32 Other representation complications  Organometallic and co-ordination compounds complex stereochemistry special bond types may be needed (dative bonds etc.) ambiguity over covalent/ionic character of bonds o “business rules” rules usually needed  Inorganic compounds topological representation often not possible composition may not involve integral ratios between elements

33 33 Macromolecules  in principle can represent all atoms, as for small molecules  some systems use “shortcuts” or “superatoms” for subunits (e.g. amino acids)

34 34 Macromolecules  Each shortcut is defined with appropriate attachment points  ordinary atoms can be mixed with shortcuts  system can expand shortcuts when needed

35 35 Polymers  special problems are presented because properties of polymer can be affected by polymerisation conditions average number of subunits extent of cross-linking ratio between different subunits random / block sequences of subunits etc.  Two main approaches monomer representation structural repeating unit (SRU) representation

36 36 Polymers  Monomer-based representation show original monomer(s) and describe polymerisation conditions in text notes  SRU-based representation show repeating units (as shortcuts), with details of length etc. generally more satisfactory for structure search complications when composition is incompletely defined

37 37 Incompletely-defined substances  unknown stereochemistry  unknown attachment position  unknown repetition

38 38 Markush (“Generic”) structures structures with R-groups shorthand for describing sets of structures with common features

39 39 Markush structures also called “generic” structures very important in chemical patents o inventor claims whole class of related compounds can be used to describe combinatorial libraries can be used as queries in database searches will be discussed in more detail in lecture 5 (Nov 13)

40 40 Conclusions from Lecture 2  analogy between chemical structures and topological graphs is not perfect and many problems arise in situations where it breaks down aromaticity and tautomerism stereochemistry  additional complications arise in representing some classes of molecule inorganic and coordination compounds macromolecules and polymers incompletely-defined substances

41 41 Lecture 3: Topics to be Covered  More Graph Theory  Structure Analysis and Processing canonicalisation and symmetry perception ring perception functional group identification structure fingerprints and fragments structure depiction principles of structure searching


Download ppt "1 Chemical Structure Representation and Search Systems Lecture 2. Oct 30, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software."

Similar presentations


Ads by Google