Presentation is loading. Please wait.

Presentation is loading. Please wait.

SMILES Simplified Molecular Input Line Entry System (SMILES) Widely used AND computationally efficient Uses atomic symbols and a set of intuitive rules.

Similar presentations


Presentation on theme: "SMILES Simplified Molecular Input Line Entry System (SMILES) Widely used AND computationally efficient Uses atomic symbols and a set of intuitive rules."— Presentation transcript:

1 SMILES Simplified Molecular Input Line Entry System (SMILES) Widely used AND computationally efficient Uses atomic symbols and a set of intuitive rules Uses hydrogen-suppressed molecular graphs (HSMG)

2 SMILES Bonds SINGLE* DOUBLE TRIPLE AROMATIC* * can be omitted -=#:-=#:

3 Butanols 2-Butanol iso-Butanol tert-Butanol

4 SMILES Branches Represented by enclosure in parentheses Can be nested or stacked Examples: CC(O)CC is 2-Butanol OCC(C)C is iso-Butanol OC(C)(C)C is tert-Butanol

5 SMILES Bonds Ethene Chloroethene 1,1-Dichloroethene cis-1,2-Dichloroethene Trichloroethene Perchloroethene C=C ClC=C ClC(Cl)=C ClC=CCl ClC(Cl)=CCl ClC(Cl)=C(Cl)Cl

6 SMILES Atoms Use normal chemical symbols Add punctuation symbols if necessary No super- or subscripts

7 SMILES Symbols String of alphanumeric characters and certain punctuation symbols Terminates at the first space encountered when read left to right The ORGANIC SUBSET: B, C, N, O, P, S, F, Cl, Br, I

8 Other SMILES Atoms Aliphatic or nonaromatic carbon: C Atom in aromatic ring: lowercase letter Designate ring closure with pairs of matching digits, e.g. c1ccccc1 (or C1=CC=CC=C1) is Benzene, whereas C1CCCCC1 is Cyclohexane

9 SMILES Charges Specify attached hydrogens and charges in square brackets Number of attached hydrogens is the symbol H followed by optional digit

10 SMILES Charges [H+] [OH-] [OH3+] [Fe++] [NH4+] proton hydroxyl anion hydronium cation iron(II) cation ammonium cation

11 SMILES Cyclic Structures Break one single or one aromatic bond in each ring Number in any order –Designate ring-breaking atoms by the same digit following the atomic symbol

12 Cyclic Structures Numbers indicate start and stop of ring Same number indicates start and end of the ring, entered immediately following the start/end atoms Only numbers 1 – 9 are used A number should appear only twice Atom can be associated w. 2 consecutive numbers, e.g., Napthalene: c12ccccc1cccc2

13 Naphthalene c12ccccc1cccc2

14 SMILES Conventions Avoid two consecutive left parentheses if possible Strive for the fewest number of possible branches Tautomeric bonds are not designated; enter the appropriate form

15 Further Restrictions A branch cannot begin a SMILES notation A branch cannot immediately follow a double- or triple-bond symbol Example: C=(CC)C is invalid, but C(=CC)C or C(CC)=C are valid SMILES

16 SMILES Fragments Nitro Nitrate Nitrite Sulfonic acid Cyanide/Nitrile Azide Azido N(=O)(=O) ON(=O)(=O) ON(=O) S(=O)(=O)O C#N N=N#N N+=N-

17 SMILES Metals [Al] [As] [Au] [Be] [Bi] [Cd] [Ca] [Fe] [Hg] [K] [Li] [Mg] [Na] [Ni] [Pt] [Sb] [Sn] [Zn] [Zr]

18 Disconnected Structures Indicated by a dot Tetramethyl ammonium bromide C[N+]C(C)C.[Br-]

19 Isomeric and Chiral SMILES Isomeric configuration indicated by forward and backward slashes: / \ Examples: –trans-1,2-dibromoethene: Br/C=C/Br Direction of the slash continues –cis-1,2-dibromoethene: Br/C=C\Br Direction of the slash reverses Chirality indicated by the symbol

20 Some Applications JMDraw/SMILESViewer (Christoph Steinbeck) JME Molecular Editor (Peter Ertl) STN Express (SMILES as output) Tripos (dbtranslate: SMILES to MOL) Marvin (Ferenc Csizmadia) CACTVS

21 Another Application SMILESCAS Database Over 103,000 SMILES notations Input CAS Registry Number Leads to SMILES and thence to a structure search


Download ppt "SMILES Simplified Molecular Input Line Entry System (SMILES) Widely used AND computationally efficient Uses atomic symbols and a set of intuitive rules."

Similar presentations


Ads by Google