1 PharmID: A New Algorithm for Pharmacophore Identification Stan Young Jun Feng and Ashish Sanil NISSMPDM 3 June 2005.

Slides:



Advertisements
Similar presentations
Case Study: Dopamine D 3 Receptor Anthagonists Chapter 3 – Molecular Modeling 1.
Advertisements

1 Sequential Screening S. Stanley Young NISS HTS Workshop October 25, 2002.
3D Molecular Structures C371 Fall Morgan Algorithm (Leach & Gillet, p. 8)
PharmaMiner: Geometric Mining of Pharmacophores 1.
Clustering the Temporal Sequences of 3D Protein Structure Mayumi Kamada +*, Sachi Kimura, Mikito Toda ‡, Masami Takata +, Kazuki Joe + + : Graduate School.
Ubiquinase Johnny has a genetic disorder which leads to overactivity of the enzyme ubiquinase. Previous studies have determined 1) The molecule benzamide.
Gibbs sampling for motif finding in biological sequences Christopher Sheldahl.
S TRUCTURAL B IOINFORMATICS. A subset of Bioinformatics concerned with the of biological structures - proteins, DNA, RNA, ligands etc. It is the first.
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
M. Wagener 3D Database Searching and Scaffold Hopping Markus Wagener NV Organon.
Agenda A brief introduction The MASS algorithm The pairwise case Extension to the multiple case Experimental results.
Transcription factor binding motifs (part I) 10/17/07.
A Very Basic Gibbs Sampler for Motif Detection Frances Tong July 28, 2004 Southern California Bioinformatics Summer Institute.
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Discovery of RNA Structural Elements Using Evolutionary Computation Authors: G. Fogel, V. Porto, D. Weekes, D. Fogel, R. Griffey, J. McNeil, E. Lesnik,
An Integrated Approach to Protein-Protein Docking
BL5203: Molecular Recognition & Interaction Lecture 5: Drug Design Methods Ligand-Protein Docking (Part I) Prof. Chen Yu Zong Tel:
Active Learning Strategies for Compound Screening Megon Walker 1 and Simon Kasif 1,2 1 Bioinformatics Program, Boston University 2 Department of Biomedical.
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
RAPID: Randomized Pharmacophore Identification for Drug Design PW Finn, LE Kavraki, JC Latombe, R Motwani, C Shelton, S Venkatasubramanian, A Yao Presented.
Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.
Pharmacophore and FTrees
Module 2: Structure Based Ph4 Design
Molecular Descriptors
Gapped BLAST and PSI-BLAST : a new generation of protein database search programs Team2 邱冠儒 黃尹柔 田耕豪 蕭逸嫻 謝朝茂 莊閔傑 2014/05/12 1.
ClusPro: an automated docking and discrimination method for the prediction of protein complexes Stephen R. Comeau, David W.Gatchell, Sandor Vajda, and.
A genetic algorithm for structure based de-novo design Scott C.-H. Pegg, Jose J. Haresco & Irwin D. Kuntz February 21, 2006.
Guiding Motif Discovery by Iterative Pattern Refinement Zhiping Wang, Mehmet Dalkilic, Sun Kim School of Informatics, Indiana University.
Introduction to Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Confab – Systematic generation of diverse low-energy conformers Noel M O’Boyle, 1 Tim Vandermeersch, 2 and Geoffrey R Hutchison. 3 1 Analytical and Biological.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
1 PowerMV Chemical Data Mining Environment S. Stanley Young Jun Feng and Jack Liu NISS MPDM, McMaster University 4 June 2005.
Know More Before You Score: An Analysis of Structure-Based Virtual Screening Protocols ä Structure-Based Virtual Screening (SBVS) is a proven technique.
In silico discovery of inhibitors using structure-based approaches Jasmita Gill Structural and Computational Biology Group, ICGEB, New Delhi Nov 2005.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
SimBioSys Inc.© Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package.
SimBioSys Inc.© 2004http:// Conformational sampling in protein-ligand complex environment Zsolt Zsoldos SimBioSys Inc., © 2004 Contents:
1 © Patrick An Introduction to Medicinal Chemistry 3/e Chapter 10 DRUG DESIGN: OPTIMIZING TARGET INTERACTIONS Part 2: Section 10.2.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
Hierarchical Database Screenings for HIV-1 Reverse Transcriptase Using a Pharmacophore Model, Rigid Docking, Solvation Docking, and MM-PB/SA Junmei Wang,
Pharmacophores Chapter 13 Part 2.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Selecting Diverse Sets of Compounds C371 Fall 2004.
Multiple Species Gene Finding using Gibbs Sampling Sourav Chatterji Lior Pachter University of California, Berkeley.
R L R L L L R R L L R R L L water DOCKING SIMULATIONS.
PharmaMiner: Geometric Mining of Pharmacophores 1.
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
Introduction to Chemoinformatics and Drug Discovery Irene Kouskoumvekaki Associate Professor February 15 th, 2013.
Sequence Alignment.
Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive.
Ubiquitination Sites Prediction Dah Mee Ko Advisor: Dr.Predrag Radivojac School of Informatics Indiana University May 22, 2009.
CoMFA Study of Piperidine Analogues of Cocaine at the Dopamine Transporter: Exploring the Binding Mode of the 3  -Substituent of the Piperidine Ring Using.
Computational Approach for Combinatorial Library Design Journal club-1 Sushil Kumar Singh IBAB, Bangalore.
Molecular mechanics Classical physics, treats atoms as spheres Calculations are rapid, even for large molecules Useful for studying conformations Cannot.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Improving compound–protein interaction prediction by building up highly credible negative samples Toward more realistic drug-target interaction predictions.
HP-SEE In the search of the HDAC-1 inhibitors. The preliminary results of ligand based virtual screening Ilija N. Cvijetić, Ivan O. Juranić,
Jump to first page Relational Data. Jump to first page Inductive Logic Programming (ILP) n Can use ILP to find a set of rules capturing a property that.
Creation Of Novel Compounds by Evaluation of Residues at Target Sites
A Very Basic Gibbs Sampler for Motif Detection
Simplified picture of the principles used for multiple copy simultaneous search (MCSS) and for computational combinatorial ligand design (CCLD). Simplified.
Virtual Screening.
Patrick: An Introduction to Medicinal Chemistry 6e
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Presentation transcript:

1 PharmID: A New Algorithm for Pharmacophore Identification Stan Young Jun Feng and Ashish Sanil NISSMPDM 3 June 2005

2 X-ray Structure Protein surface Bound drug Zinc H 2 O’s Hiding Around Note Zinc ion

3 Outline Background Background Computational Procedure and Algorithm Computational Procedure and Algorithm Examples Examples Conclusions Conclusions

4 Conformation Generation OMEGA® generates thousands of conformers in a few seconds. OMEGA® generates thousands of conformers in a few seconds. It is able to reproduce bioactive conformations. It is able to reproduce bioactive conformations. Boström, Greenwood, and Gottfries. J. Mol. Graph. Mod., 2003, 21,

5 Many feature combinations Exhaustive enumeration of pharmacophore hypotheses Exhaustive enumeration of pharmacophore hypotheses No. of Features Possible combinations

6 Pharmacophore Identification Active molecules are known, receptor unknown. Active molecules are known, receptor unknown. Assume that all molecules bind in a common manner to the biological target. Assume that all molecules bind in a common manner to the biological target. Difficulties: Difficulties:  Conformational flexibility  Many different combinations of pharmacophoric groups Two very large search spaces: conformations and feature combinations. conformations and feature combinations.

7 Work Flow for Pharmacophore Identification Single conformer SDF or SMILES External Conformation Generation Program PharmID Different Pharmacophore Hypotheses

8 Our Strategy To superimpose the molecules in 3D, we first align the bit string for each conformer in 1D. To superimpose the molecules in 3D, we first align the bit string for each conformer in 1D. Ideally, the important features and best conformers will be picked out at the same time. Ideally, the important features and best conformers will be picked out at the same time. Our search is a many to one, Our search is a many to one, not many to many! not many to many!

9 Computation Procedure 1. Pharmacophore bit string generation 2. Bit string alignment/assessment 3. Hypothesis generation 4. Refinement

10 Feature Definition Predefined pharmacophore features: HD : Hydrogen Bond Donor HA : Hydrogen Bond Acceptor POS: Positive Charge Center NEG: Negative Charge Center ARC: Aromatic Center HYP: Hydrophobic Center Predefined pharmacophore features: HD : Hydrogen Bond Donor HA : Hydrogen Bond Acceptor POS: Positive Charge Center NEG: Negative Charge Center ARC: Aromatic Center HYP: Hydrophobic Center User defined groups: Any functional groups can be defined using Daylight® SMART strings. User defined groups: Any functional groups can be defined using Daylight® SMART strings.

11 Bit String Generation Conf. 1 Conf. 2 Conf N H N 3D Atom (group) – Distance – Atom (group) features. F 1 ………………F m

12 Definition of Distance Bins homogeneous non-overlapped. 0-1, 1-2, 2-3, 4-5, 5-6, 6-7, 7-8, 8-9, 9-10, 11-12, 12 Å and above. homogeneous non-overlapped. 0-1, 1-2, 2-3, 4-5, 5-6, 6-7, 7-8, 8-9, 9-10, 11-12, 12 Å and above. heterogeneous non-overlapped. 1-2, 2-5, 5-8, 8-12, 13 Å and above. heterogeneous non-overlapped. 1-2, 2-5, 5-8, 8-12, 13 Å and above. Overlapped. 1-3, 2-4, 3-5, 4-6, 5-7, 6-8, 7-9, 8-10, 9-11, Å. Overlapped. 1-3, 2-4, 3-5, 4-6, 5-7, 6-8, 7-9, 8-10, 9-11, Å.

13 Data Structure for Input M1C1M1C2M1C M2C1M2C2M2C

14 The Trick If you know the correct conformation for each molecule, then it is relatively easy to identify the key features. If you know the correct features and distances, then it is easy to identify the correct conformation. Guess one, predict the other, iterate.

15 Given the features, easy to find the conformations M1C1M1C2M1C M2C1M2C2M2C

16 Given the conformations, easy to find the features. M1C1M1C2M1C M2C1M2C2M2C

17 Bioinformatics Motif Finding using Gibbs Sampling. 1. Remove one sequence. 2. Randomly select one position for each sequence. 3. Calculate probabilities for all positions for the motif “window”. 4. Using the “window” compute probabilities for removed sequence motif position. 5. Repeat the above steps for all sequences until converged. This will be easier to see with pictures.

18 Objective Function W : bit string length c i,j : count of residue j in position i q i,j : residue frequencies, position i, residue j p j : residue background frequencies J: residue types, 20 for protein, 4 for DNA, RNA W x 20 Window

19 Alignment Algorithm Mostly used in sequence alignment to find the common motif. ………….. Mostly used in sequence alignment to find the common motif. TCAGAACCAGTTATAAATTTATCATTTCCTTCTCCACTCCT GCCTCAGGATCCAGCACACATTATCACAAACTTAGTGTCCA CATTATCACAAACTTAGTGTCCATCCATCACTGCTGACCCT ………….. Fast and sensitive, less likely to fall into local minimum. Fast and sensitive, less likely to fall into local minimum. Lawrence, et al. (1993) Science, 262, W x 20

20 PharmID Algorithm using Gibbs. 1. Remove one compound. 2. Start with a random conformer for other compounds. 3. Calculate probabilities for feature importance. 4. Compute conformation probabilities for omitted compound. 5. Repeat steps 1-4 until converges. Again, pictures will make this clear.

21 Gibbs Sampling: Fingerprints Movement Conf_1 Conf_2 Conf_3 possible Mol_ , 9, 18 Mol_ , 9, 18 Mol_ , 9, Mol_ , 9, 18 1_ _ _ _

22 Bit String Alignment Only 2 residue types (0, 1) Only 2 residue types (0, 1) Rigid molecules that have only 1 or a few conformers can speed up the alignment and help to determine the best set of features. Rigid molecules that have only 1 or a few conformers can speed up the alignment and help to determine the best set of features.

23 Hypothesis Generation Why? Why? Features may not be part of the same pharmacophore. How? How? Clique Detection. (Bron-Kerbosch Algorithm) A clique is a set of ALL connected points.

24 Hypothesis Generation in Selected Conformers : Clique Detection Pharmacophore Features Two point Pharmacophores identified by Gibbs Sampling A pharmacophore hypotheses should be an all-connected graph Discarded two point pharmacophores

25 Hypothesis Generation: Output Pharmacophore 1 Members: …(Mol. ID) Features: Hydrogen Bond Donor, Hydrogen Bond Acceptor, … Pharmacophore 1 Members: …(Mol. ID) Features: Hydrogen Bond Donor, Hydrogen Bond Acceptor, … Pharmcophore 2 Members: … Features: … Pharmcophore 2 Members: … Features: … … …

26 Refinement For all molecules   For all conformers   For all hypotheses generated Test each qualified conformer against each hypothesis End For If new hypothesis found Insert the new hypothesis into the list End For

27 Benchmarking: Test Datasets 1. Bit string alignment bit strings 2. Single binding mode Angiotensin-Converting Enzyme (ACE) inhibitors 3. Multiple binding modes/mechanisms Dopamine receptor inhibitors (D2/D4) Dopamine receptor inhibitors (D2/D4)

28 Example 1: A Toy Dataset (Gibbs Sampling Only) 20 x 20 bit strings, mimic 20 molecules, each with 20 conformers. Each bit string is 20 bits long. Computation time: <1 sec. Result: 1_ _ _ _ _ _ _ _ …

29 Example 2: ACE Inhibitors 78 active compounds. 78 active compounds. OMEGA® From OpenEye® is used to generate multiple conformers. OMEGA® From OpenEye® is used to generate multiple conformers. Two RMSD cutoffs used: 2.0 Å : 4,613 conformers generated. 1.0 Å : 46,268 conformers generated. Two RMSD cutoffs used: 2.0 Å : 4,613 conformers generated. 1.0 Å : 46,268 conformers generated.

30 ACE inhibitors Results Using 4,613 conformers, 55/78 molecules contain expected pharmacophore. Using 4,613 conformers, 55/78 molecules contain expected pharmacophore. Using 46,268 conformers, 65/78 molecules contain expected pharmacophore. Using 46,268 conformers, 65/78 molecules contain expected pharmacophore.

31 Example 2: ACE inhibitors: Best Identified Pharmacophore 2.84 ~ 4.50 Å 4.51 ~ 5.70 Å 4.99 ~ 6.77

32 Example 2: ACE inhibitors Other possible pharmacophore

33 Example 3: Testing on Multiple Binding Modes (D2, D4 ligands)

34 Example 3: Dopamine antagonists Two pharmacophores were extracted from one data set!

35 Conclusion Traditional Methods: Exhaustive enumeration of pharmacophores, limited coverage of conformational space. Traditional Methods: Exhaustive enumeration of pharmacophores, limited coverage of conformational space. “Many to many” limits search. “Many to many” limits search. PharmID: Selective enumeration of pharmacophores, better coverage of conformational space. PharmID: Selective enumeration of pharmacophores, better coverage of conformational space. Each search is “many to one”. Each search is “many to one”.

36 Acknowledgements Coworkers Stan Young, Jun Feng, Ashish Sanil Coworkers Stan Young, Jun Feng, Ashish Sanil OMEGA is a product from OpenEye Scientific Software Inc. OMEGA is a product from OpenEye Scientific Software Inc. Support from Hereditary Disease Foundation. Support from Hereditary Disease Foundation. Become a NISS affiliate!