1 PharmID: A New Algorithm for Pharmacophore Identification Stan Young Jun Feng and Ashish Sanil NISSMPDM 3 June 2005
2 X-ray Structure Protein surface Bound drug Zinc H 2 O’s Hiding Around Note Zinc ion
3 Outline Background Background Computational Procedure and Algorithm Computational Procedure and Algorithm Examples Examples Conclusions Conclusions
4 Conformation Generation OMEGA® generates thousands of conformers in a few seconds. OMEGA® generates thousands of conformers in a few seconds. It is able to reproduce bioactive conformations. It is able to reproduce bioactive conformations. Boström, Greenwood, and Gottfries. J. Mol. Graph. Mod., 2003, 21,
5 Many feature combinations Exhaustive enumeration of pharmacophore hypotheses Exhaustive enumeration of pharmacophore hypotheses No. of Features Possible combinations
6 Pharmacophore Identification Active molecules are known, receptor unknown. Active molecules are known, receptor unknown. Assume that all molecules bind in a common manner to the biological target. Assume that all molecules bind in a common manner to the biological target. Difficulties: Difficulties: Conformational flexibility Many different combinations of pharmacophoric groups Two very large search spaces: conformations and feature combinations. conformations and feature combinations.
7 Work Flow for Pharmacophore Identification Single conformer SDF or SMILES External Conformation Generation Program PharmID Different Pharmacophore Hypotheses
8 Our Strategy To superimpose the molecules in 3D, we first align the bit string for each conformer in 1D. To superimpose the molecules in 3D, we first align the bit string for each conformer in 1D. Ideally, the important features and best conformers will be picked out at the same time. Ideally, the important features and best conformers will be picked out at the same time. Our search is a many to one, Our search is a many to one, not many to many! not many to many!
9 Computation Procedure 1. Pharmacophore bit string generation 2. Bit string alignment/assessment 3. Hypothesis generation 4. Refinement
10 Feature Definition Predefined pharmacophore features: HD : Hydrogen Bond Donor HA : Hydrogen Bond Acceptor POS: Positive Charge Center NEG: Negative Charge Center ARC: Aromatic Center HYP: Hydrophobic Center Predefined pharmacophore features: HD : Hydrogen Bond Donor HA : Hydrogen Bond Acceptor POS: Positive Charge Center NEG: Negative Charge Center ARC: Aromatic Center HYP: Hydrophobic Center User defined groups: Any functional groups can be defined using Daylight® SMART strings. User defined groups: Any functional groups can be defined using Daylight® SMART strings.
11 Bit String Generation Conf. 1 Conf. 2 Conf N H N 3D Atom (group) – Distance – Atom (group) features. F 1 ………………F m
12 Definition of Distance Bins homogeneous non-overlapped. 0-1, 1-2, 2-3, 4-5, 5-6, 6-7, 7-8, 8-9, 9-10, 11-12, 12 Å and above. homogeneous non-overlapped. 0-1, 1-2, 2-3, 4-5, 5-6, 6-7, 7-8, 8-9, 9-10, 11-12, 12 Å and above. heterogeneous non-overlapped. 1-2, 2-5, 5-8, 8-12, 13 Å and above. heterogeneous non-overlapped. 1-2, 2-5, 5-8, 8-12, 13 Å and above. Overlapped. 1-3, 2-4, 3-5, 4-6, 5-7, 6-8, 7-9, 8-10, 9-11, Å. Overlapped. 1-3, 2-4, 3-5, 4-6, 5-7, 6-8, 7-9, 8-10, 9-11, Å.
13 Data Structure for Input M1C1M1C2M1C M2C1M2C2M2C
14 The Trick If you know the correct conformation for each molecule, then it is relatively easy to identify the key features. If you know the correct features and distances, then it is easy to identify the correct conformation. Guess one, predict the other, iterate.
15 Given the features, easy to find the conformations M1C1M1C2M1C M2C1M2C2M2C
16 Given the conformations, easy to find the features. M1C1M1C2M1C M2C1M2C2M2C
17 Bioinformatics Motif Finding using Gibbs Sampling. 1. Remove one sequence. 2. Randomly select one position for each sequence. 3. Calculate probabilities for all positions for the motif “window”. 4. Using the “window” compute probabilities for removed sequence motif position. 5. Repeat the above steps for all sequences until converged. This will be easier to see with pictures.
18 Objective Function W : bit string length c i,j : count of residue j in position i q i,j : residue frequencies, position i, residue j p j : residue background frequencies J: residue types, 20 for protein, 4 for DNA, RNA W x 20 Window
19 Alignment Algorithm Mostly used in sequence alignment to find the common motif. ………….. Mostly used in sequence alignment to find the common motif. TCAGAACCAGTTATAAATTTATCATTTCCTTCTCCACTCCT GCCTCAGGATCCAGCACACATTATCACAAACTTAGTGTCCA CATTATCACAAACTTAGTGTCCATCCATCACTGCTGACCCT ………….. Fast and sensitive, less likely to fall into local minimum. Fast and sensitive, less likely to fall into local minimum. Lawrence, et al. (1993) Science, 262, W x 20
20 PharmID Algorithm using Gibbs. 1. Remove one compound. 2. Start with a random conformer for other compounds. 3. Calculate probabilities for feature importance. 4. Compute conformation probabilities for omitted compound. 5. Repeat steps 1-4 until converges. Again, pictures will make this clear.
21 Gibbs Sampling: Fingerprints Movement Conf_1 Conf_2 Conf_3 possible Mol_ , 9, 18 Mol_ , 9, 18 Mol_ , 9, Mol_ , 9, 18 1_ _ _ _
22 Bit String Alignment Only 2 residue types (0, 1) Only 2 residue types (0, 1) Rigid molecules that have only 1 or a few conformers can speed up the alignment and help to determine the best set of features. Rigid molecules that have only 1 or a few conformers can speed up the alignment and help to determine the best set of features.
23 Hypothesis Generation Why? Why? Features may not be part of the same pharmacophore. How? How? Clique Detection. (Bron-Kerbosch Algorithm) A clique is a set of ALL connected points.
24 Hypothesis Generation in Selected Conformers : Clique Detection Pharmacophore Features Two point Pharmacophores identified by Gibbs Sampling A pharmacophore hypotheses should be an all-connected graph Discarded two point pharmacophores
25 Hypothesis Generation: Output Pharmacophore 1 Members: …(Mol. ID) Features: Hydrogen Bond Donor, Hydrogen Bond Acceptor, … Pharmacophore 1 Members: …(Mol. ID) Features: Hydrogen Bond Donor, Hydrogen Bond Acceptor, … Pharmcophore 2 Members: … Features: … Pharmcophore 2 Members: … Features: … … …
26 Refinement For all molecules For all conformers For all hypotheses generated Test each qualified conformer against each hypothesis End For If new hypothesis found Insert the new hypothesis into the list End For
27 Benchmarking: Test Datasets 1. Bit string alignment bit strings 2. Single binding mode Angiotensin-Converting Enzyme (ACE) inhibitors 3. Multiple binding modes/mechanisms Dopamine receptor inhibitors (D2/D4) Dopamine receptor inhibitors (D2/D4)
28 Example 1: A Toy Dataset (Gibbs Sampling Only) 20 x 20 bit strings, mimic 20 molecules, each with 20 conformers. Each bit string is 20 bits long. Computation time: <1 sec. Result: 1_ _ _ _ _ _ _ _ …
29 Example 2: ACE Inhibitors 78 active compounds. 78 active compounds. OMEGA® From OpenEye® is used to generate multiple conformers. OMEGA® From OpenEye® is used to generate multiple conformers. Two RMSD cutoffs used: 2.0 Å : 4,613 conformers generated. 1.0 Å : 46,268 conformers generated. Two RMSD cutoffs used: 2.0 Å : 4,613 conformers generated. 1.0 Å : 46,268 conformers generated.
30 ACE inhibitors Results Using 4,613 conformers, 55/78 molecules contain expected pharmacophore. Using 4,613 conformers, 55/78 molecules contain expected pharmacophore. Using 46,268 conformers, 65/78 molecules contain expected pharmacophore. Using 46,268 conformers, 65/78 molecules contain expected pharmacophore.
31 Example 2: ACE inhibitors: Best Identified Pharmacophore 2.84 ~ 4.50 Å 4.51 ~ 5.70 Å 4.99 ~ 6.77
32 Example 2: ACE inhibitors Other possible pharmacophore
33 Example 3: Testing on Multiple Binding Modes (D2, D4 ligands)
34 Example 3: Dopamine antagonists Two pharmacophores were extracted from one data set!
35 Conclusion Traditional Methods: Exhaustive enumeration of pharmacophores, limited coverage of conformational space. Traditional Methods: Exhaustive enumeration of pharmacophores, limited coverage of conformational space. “Many to many” limits search. “Many to many” limits search. PharmID: Selective enumeration of pharmacophores, better coverage of conformational space. PharmID: Selective enumeration of pharmacophores, better coverage of conformational space. Each search is “many to one”. Each search is “many to one”.
36 Acknowledgements Coworkers Stan Young, Jun Feng, Ashish Sanil Coworkers Stan Young, Jun Feng, Ashish Sanil OMEGA is a product from OpenEye Scientific Software Inc. OMEGA is a product from OpenEye Scientific Software Inc. Support from Hereditary Disease Foundation. Support from Hereditary Disease Foundation. Become a NISS affiliate!