Presentation is loading. Please wait.

Presentation is loading. Please wait.

SMILES. Simplified molecular input line entry specification The simplified molecular input line entry specification or SMILES is a specification for unambiguously.

Similar presentations


Presentation on theme: "SMILES. Simplified molecular input line entry specification The simplified molecular input line entry specification or SMILES is a specification for unambiguously."— Presentation transcript:

1 SMILES

2 Simplified molecular input line entry specification The simplified molecular input line entry specification or SMILES is a specification for unambiguously describing the structure of chemical molecules using short ASCII stringschemicalmoleculesASCIIstrings SMILES strings can be imported by most molecule editors for conversion back into two-dimensional drawings or three-dimensional models of the moleculesmolecule editorstwo-dimensional three-dimensional

3 SMILES Simplified Molecular Input Line Entry System (SMILES) Widely used AND computationally efficient Uses atomic symbols and a set of intuitive rules Uses hydrogen-suppressed molecular graphs (HSMG)

4 Canonical SMILES and Isomeric SMILES The term Canonical SMILES refers to the version of the SMILES specification that includes rules for ensuring that each distinct chemical molecule has a single unique SMILES representation – A common application of Canonical SMILES is for indexing and ensuring uniqueness of molecules in a databasedatabase The term Isomeric SMILES refers to the version of the SMILES specification that includes extensions to support the specification of isotopes, chirality, and configuration about double bonds isotopeschirality – A notable feature of these rules is that they allow rigorous partial specification of chirality.

5 Graph-based definition In terms of a graph-based computational procedure, SMILES is a string obtained by printing the symbol nodes encountered in a depth- first tree traversal of a chemical graph The chemical graph is first trimmed to remove hydrogen atoms and cycles are broken to turn it into a spanning tree Where cycles have been broken, numeric suffix labels are included to indicate the connected nodes Parentheses are used to indicate points of branching on the tree

6 SMILES Bonds SINGLE* DOUBLE TRIPLE AROMATIC* * can be omitted -=#:-=#:

7 SMILES Branches Represented by enclosure in parentheses Can be nested or stacked Examples: CC(O)CC is 2-Butanol OCC(C)C is iso-Butanol OC(C)(C)C is tert-Butanol

8 SMILES Bonds Ethene Chloroethene 1,1-Dichloroethene cis-1,2-Dichloroethene Trichloroethene Perchloroethene C=C ClC=C ClC(Cl)=C ClC=CCl ClC(Cl)=CCl ClC(Cl)=C(Cl)Cl

9 SMILES Symbols String of alphanumeric characters and certain punctuation symbols Terminates at the first space encountered when read left to right The ORGANIC SUBSET: B, C, N, O, P, S, F, Cl, Br, I

10 Other SMILES Atoms Aliphatic or nonaromatic carbon: C Atom in aromatic ring: lowercase letter Designate ring closure with pairs of matching digits, e.g. c1ccccc1 is Benzene, whereas C1CCCCC1 is Cyclohexane

11 SMILES Charges Specify attached hydrogens and charges in square brackets Number of attached hydrogens is the symbol H followed by optional digit

12 SMILES Charges [H+] [OH-] [OH3+] [Fe++] [NH4+] proton hydroxyl anion hydronium cation iron(II) cation ammonium cation

13 SMILES Cyclic Structures Break one single or one aromatic bond in each ring Number in any order – Designate ring-breaking atoms by the same digit following the atomic symbol

14 Cyclic Structures Numbers indicate start and stop of ring Same number indicates start and end of the ring, entered immediately following the start/end atoms Only numbers 1 – 9 are used A number should appear only twice Atom can be associated w. 2 consecutive numbers, e.g., Napthalene: c12ccccc1cccc2

15 SMILES Conventions Avoid two consecutive left parentheses if possible Strive for the fewest number of possible branches Tautomeric bonds are not designated; enter the appropriate form

16 Further Restrictions A branch cannot begin a SMILES notation A branch cannot immediately follow a double- or triple-bond symbol Example: C=(CC)C is invalid, but C(=CC)C or C(CC)=C are valid SMILES

17 SMILES Fragments Nitro Nitrate Nitrite Sulfonic acid Cyanide/Nitrile Azide Azido N(=O)(=O) ON(=O)(=O) ON(=O) S(=O)(=O)O C#N N=N#N N+=N-

18 SMILES Metals [Al] [As] [Au] [Be] [Bi] [Cd] [Ca] [Fe] [Hg] [K] [Li] [Mg] [Na] [Ni] [Pt] [Sb] [Sn] [Zn] [Zr]

19 Disconnected Structures Tetramethyl ammonium bromide C[N+]C(C)C.[Br-]

20 Isomeric and Chiral SMILES Isomeric configuration indicated by forward and backward slashes: / \ Examples: – trans-1,2-dibromoethene: Br/C=C/Br – cis-1,2-dibromoethene: Br/C=C\Br Chirality indicated by the “@” symbol

21 Another Application SMILESCAS Database http://esc.syrres.com/interkow/smilecas.htm Over 103,000 SMILES notations Input CAS Registry Number Leads to SMILES and thence to a structure search

22 Example 1 CC(C(C)(C)(Br))C

23 Example 2

24 Example 3

25 Example 4

26 Example 5

27 Example 6

28 Example 7

29 Example 8

30 Example 9

31 Example 10

32 Example 11

33 Example 12

34 Example 13

35 Example 14

36 Example 15

37 Example 16


Download ppt "SMILES. Simplified molecular input line entry specification The simplified molecular input line entry specification or SMILES is a specification for unambiguously."

Similar presentations


Ads by Google