Presentation is loading. Please wait.

Presentation is loading. Please wait.

SMILES 2 C371 Lecture Based on Dr. David Wild’s C571 Presentations Fall 2004.

Similar presentations


Presentation on theme: "SMILES 2 C371 Lecture Based on Dr. David Wild’s C571 Presentations Fall 2004."— Presentation transcript:

1 SMILES 2 C371 Lecture Based on Dr. David Wild’s C571 Presentations Fall 2004

2 Linear Notations Represent the atoms, bonds, and connectivity as a linear text string SMILES –Concise –Orignally designed for manual command line entry into text-only systems –Now widely used Can be input to a spreadsheet cell, on one line of a text file, or in an Oracle database text field System to generate canonical form of SMILES

3 Review of SMILES Atoms represented by normal chemical symbols (uppercase for aliphatics, lowercase for aromatic) Adjacent atoms imply single bonds Use = for double, # for triple bonds Hydrogens usually implicit Parentheses imply branching Ring closure indicated by numbers

4 SMILES Review (cont’d) Can make Hydrogens explicit Non-organic atoms are put in square brackets, e.g., [Xe] Charged species also in square brackets with a + or -, e.g., [Na+] or [O-] Unknown atoms indicated by a * Stereochemistry represented by

5 SMILES for Tyrosine NC(Cc1ccc(O)cc1)C(=O)O

6 SMILES FOR Acetaminophen (Tylenol) O=C(O)Nc1ccc(O)cc1

7 SMILES for Isatin O=c2[nH]c1ccccc1c2=O

8 Canonicalizing SMILES – Morgan Algorithm Each atom has a connectivity value: how many atoms it is connected to That value is replaced by the sum of the connectivity values of the its neighbors Continues iteratively, until number of different values is maximized Atoms are numbered in decreasing order of connectivity value –In case of a tie, other properties are used (e.g. atomic number, bond order, etc).

9 Canonicalizing SMILES – CANGEN Two-stage procedure used by Daylight First stage CANON, generates a canonical connection table using a modified version of the Morgan Algorithm that produces a tree structure Second stage GENES creates a unique SMILES using a depth-first search of a the molecular graph tree output by CANON More information – JCICS 29,1989,97-101

10 Representing reactions Need to identify the 2D arrangement of products and reagents and distinguish them) –Possibly map which starting material atoms map to which product atoms. Other information (e.g., yield, equilibrium constants, conditions generally stored separately Not all reactions specified stoichiometrically CH 4 + 2O 2  CO 2 + 2H 2 O

11 Simple Reaction SMILES Each reagent and product represented as SMILES Reagents on the left of a “>>”; products on the right Individual reagents and products are separated by a “.” CH 4 + 2O 2  CO 2 + 2H 2 O Reaction SMILES: C.OO>>C(O)O.O

12 Reaction SMILES example Agents specified between the two “>>” Reaction SMILES: C.O=O>O=[O+]-[O-]>O=C=O.O

13 Reaction SMILES example Note implicit hydrogens Reaction SMILES: C(=O)Cl.NC>>C(=O)NC.Cl

14 Atom-mapping SMIRKS representation Each reactant atom gets a tag (e.g “C” becomes “[C:1]”) which maps to the same product tag. Hydrogens are explicit SMIRKS: [C:1](=[O:2])[Cl:3].[H:99][N:4]([H:100])[C:0]>>[C:1](=[O:2])[N:4]([H:100])[C:0].[Cl:3][H:99]

15 Daylight RS/SMIRKS Sites Basic reaction representation (Reaction SMILES) –http://www.daylight.com/dayhtml_tutorials/languages/ smiles/index.htmlhttp://www.daylight.com/dayhtml_tutorials/languages/ smiles/index.html SMIRKS introduction –http://www.daylight.com/dayhtml_tutorials/languages/ smirks/index.htmlhttp://www.daylight.com/dayhtml_tutorials/languages/ smirks/index.html SMIRKS theory –http://www.daylight.com/dayhtml/doc/theory/theory.rx n.htmlhttp://www.daylight.com/dayhtml/doc/theory/theory.rx n.html SMIRKS depicter –http://www.daylight.com/daycgi_tutorials/react.cgihttp://www.daylight.com/daycgi_tutorials/react.cgi

16 Representing generic structures A generic structure is one which, by ambiguity, represents a (possibly infinite) set of possible structures Ambiguity usually takes the form of “R” groups Originally used for representing patents Now used for representing combinatorial libraries too Also known as Markush Structures

17 Specifying a substructure query with SMARTS SMARTS: a superset of SMILES extended to allow partial structures (substructures) and optional parts of molecules to be represented Simple example *C(=O)O where the * represents an attachment point (i.e. any number of any atoms) More information: –http://www.daylight.com/meetings/summerschool01/course/basics/ smarts.htmlhttp://www.daylight.com/meetings/summerschool01/course/basics/ smarts.html –http://www.daylight.com/dayhtml/doc/theory/theory.smarts.htmlhttp://www.daylight.com/dayhtml/doc/theory/theory.smarts.html

18 SMARTS special characters (examples) *Any atom~Any bond aAromatic atom:Aromatic bond AAliphatic ring bond RRing atom&Logical AND RnRnAtom in ring of size n ;Logical AND (low prec.) HnHnn attached hydrogens,Logical OR XnXnn total connections!Logical NOT

19 SMARTS examples [!C;R]Any atom in a ring that is not aliphatic Carbon [O;H1]Hydroxyl group (-OH) c:cTwo carbons separated by aromatic bond C~NCarbon and nitrogen attached by any bond *C(=O)OCarboxyl Group

20 Try out a SMARTS search DepictMatch: –http://www.daylight.com/cgi-bin/contrib/depictmatch.cgihttp://www.daylight.com/cgi-bin/contrib/depictmatch.cgi Enter a set of SMILES and a SMARTS, and any part of the SMILES that is found in the SMARTS is highlighted As an example, we’ll use the sample dataset described on the following two slides, and use *C(=O)O (carboxyl group) as our SMARTS and RC(=O)O (carboxyl attached to a ring)

21 Sample dataset AcetaminophenAlprenololAmphetamineCaptopril ChlorpromazineDiclofenacGabapentinSalicylate

22 Sample Dataset SMILES file CC(=O)Nc1ccc(O)cc1 Acetaminophen CC(C)NCC(O)COc1ccccc1CC=C Alprenolol CC(N)Cc1ccccc1 Amphetamine CC(CS)C(=O)N1CCCC1C(=O)O Captopril CN(C)CCCN1c2ccccc2Sc3ccc(Cl)cc13 Chlorpromazine OC(=O)Cc1ccccc1Nc2c(Cl)cccc2Cl Diclofenac NCC1(CC(=O)O)CCCCC1 Gabapentin COC(=O)c1ccccc1O Salicylate

23 Web / Oracle Systems Advantages –Single database for structures and data –No software to install on client machines (except maybe plug-ins like Chime) –Not dependent on (expensive) contract with MDL –Highly customizable Disadvantages –Requires extensive web-based interface software to be written, for registration, searching, etc –Company will have to maintain system internally –Requires current ISIS system to be abandoned

24 Chemistry Cartridges Daylight DayCart –http://www.daylight.com/products/daycart.htmlhttp://www.daylight.com/products/daycart.html Tripos Auspyx –http://www.tripos.com/sciTech/inSilicoDisc/chemInfo/auspyx.htmlhttp://www.tripos.com/sciTech/inSilicoDisc/chemInfo/auspyx.html Accelrys Accord for Oracle –http://www.accelrys.com/accord/oracle.htmlhttp://www.accelrys.com/accord/oracle.html MDL Direct –http://www.mdl.com/products/framework/rel_chemistry_server/in dex.jsphttp://www.mdl.com/products/framework/rel_chemistry_server/in dex.jsp IDBS ActivityBase –http://www.id-bs.com/products/abase/http://www.id-bs.com/products/abase/ JChem Cartridge –http://www.jchem.comhttp://www.jchem.com

25 Example - DayCart Store SMILES as string (VARCHAR2) in Oracle database Cartridge provides extra functions and extensions to functions for searching based on chemical structures Structure search implemented by EXACT function Substructure search implemented by MATCHES function Similarity search implemented by TANIMOTO and EUCLID functions

26 Measuring similarity between molecules Similar Property Principle: “Molecules with similar structure are likely to have similar biological activity” Generally the Tanimoto Coefficient or Euclidean Distance between fingerprints is used

27 Fingerprint Similarity – Tanimoto Also known as Jaccard Coefficient ‘1s’ in common / ‘1s’ not in common 0’s are treated as not significant Similarity is between 0 (dissimilar) and 1 (same) Good cutoff for likely biologically similar molecules is 0.7 or 0.8 Tanimoto Similarity = c #a + #b - c c = ‘1’s in common #a = ‘1’s in fingerprint A #b = ‘1’s in fingerprint B Example: A B c = 4 #a = 6 #b = 6 Tanimoto Similarity = 4 / ( – 4 ) = 0.5

28 Fingerprint similarity – Euclidean Pythagorean distance For binary dimensions, equivalent to the square root of the Hamming distance (i.e. square root of the number of bits that are different) 0’s are treated as significant Smaller values mean more similar Example: Different? xx xx Euclidean distance = sqrt(4) = 2.0

29 Sample dataset AcetaminophenAlprenololAmphetamineCaptopril ChlorpromazineDiclofenacGabapentinSalicylate

30 Sample Dataset SMILES file CC(=O)Nc1ccc(O)cc1 Acetaminophen CC(C)NCC(O)COc1ccccc1CC=C Alprenolol CC(N)Cc1ccccc1 Amphetamine CC(CS)C(=O)N1CCCC1C(=O)O Captopril CN(C)CCCN1c2ccccc2Sc3ccc(Cl)cc13 Chlorpromazine OC(=O)Cc1ccccc1Nc2c(Cl)cccc2Cl Diclofenac NCC1(CC(=O)O)CCCCC1 Gabapentin COC(=O)c1ccccc1O Salicylate

31 Oracle table Test for sample dataset Smiles Name LogP CC(=O)Nc1ccc(O)cc1 Acetaminophen 0.27 CC(C)NCC(O)COc1ccccc1CC=C Alprenolol 2.81 CC(N)Cc1ccccc1 Amphetamine 1.76 CC(CS)C(=O)N1CCCC1C(=O)O Captopril 0.84 CN(C)CCCN1c2ccccc2Sc3ccc(Cl)cc13 Chlorpromazine 5.20 OC(=O)Cc1ccccc1Nc2c(Cl)cccc2Cl Diclofenac 4.02 NCC1(CC(=O)O)CCCCC1 Gabapentin COC(=O)c1ccccc1O Salicylate 2.60

32 DayCart structure search using SQL select * from Test where exact(Smiles, “CC(N)Cc1ccccc1”) = 1; Smiles Name LogP CC(N)Cc1ccccc1 Amphetamine 1.76

33 DayCart substructure search select * from Test where matches(Smiles, “*C(=O)O”) = 1; Smiles Name LogP CC(CS)C(=O)N1CCCC1C(=O)O Captopril 0.84 OC(=O)Cc1ccccc1Nc2c(Cl)cccc2Cl Diclofenac 4.02 NCC1(CC(=O)O)CCCCC1 Gabapentin COC(=O)c1ccccc1O Salicylate 2.60

34 Substructure search for carboxylic acid AcetaminophenAlprenololAmphetamineCaptopril ChlorpromazineDiclofenacGabapentin Salicylate

35 DayCart substructure / value search select * from Test where (matches(Smiles, “*C(=O)O”) = 1) AND (LogP > 1.0)); Smiles Name LogP OC(=O)Cc1ccccc1Nc2c(Cl)cccc2Cl Diclofenac 4.02 COC(=O)c1ccccc1O Salicylate 2.60

36 DayCart similarity search select * from TEST where tanimoto(SMILES, “CC(=O)Oc1ccccc1C(=O)O”) > 0.6; SMILES NAME LOGP COC(=O)c1ccccc1O Salicylate 2.60 CC(=O)Nc1ccc(O)cc1 Acetaminophen 0.27 CC(N)Cc1ccccc1 Amphetamine 1.76 Aspirin

37 Similarity search for carboxylic acid AcetaminophenAlprenololAmphetamineCaptopril ChlorpromazineDiclofenacGabapentinSalicylate   

38 More examples of DayCart hool02/course/admin/daycart_hints.html


Download ppt "SMILES 2 C371 Lecture Based on Dr. David Wild’s C571 Presentations Fall 2004."

Similar presentations


Ads by Google