Improving enrichment rates A practical solution to an impractical problem Noel O’Boyle Cambridge Crystallographic Data Centre

Slides:



Advertisements
Similar presentations
L. Brillet (CEA) – ANR meeting – META08 Hammamet 1/18 validation ANR meeting - 28/10/2008 CEA Grenoble - DSV/iRTSV/CMBA.
Advertisements

Christopher Reynolds Supervisor: Prof. Michael Sternberg Bioinformatics Department Division of Molecular Biosciences Imperial College London.
AutoDock 4 and AutoDock Vina -Brief Intruction
Jürgen Sühnel Institute of Molecular Biotechnology, Jena Centre for Bioinformatics Jena / Germany Supplementary Material:
Developing & Benchmarking Large-scale Docking (LSD) Pipeline Niu Huang, 02/17/2004.
Insight into Molecular Geometry and Interactions using Small Molecule Crystallographic Data John Liebeschuetz Cambridge Crystallographic.
Bioinformatics Vol. 21 no (Pages ) Reporter: Yu Lun Kuo (D )
Why multiple scoring functions can improve docking performance - Testing hypotheses for rescoring success Noel M. O’Boyle, John W. Liebeschuetz and Jason.
Computational Drug Design Apr 2010 Postgrad course on Comp Chem Noel M. O’Boyle.
Lipinski’s rule of five
Molecular dynamics refinement and rescoring in WISDOM virtual screenings Gianluca Degliesposti University of Modena and Reggio Emilia Molecular Modelling.
Establishing a Successful Virtual Screening Process Stephen Pickett Roche Discovery Welwyn.
Why multiple scoring functions can improve docking performance Testing hypotheses for rescoring success Noel O’Boyle, John Liebeschuetz,
Summary Molecular surfaces QM properties presented on surface Compound screening Pattern matching on surfaces Martin Swain Critical features Dave Whitley.
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
FLEX* - REVIEW.
Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott.
Development and Validation of a Genetic Algorithm for Flexible Docking Gareth Jones, Peter Willet, Robert C. Glen, Andrew R. Leach and Robin Taylor J.
An Integrated Approach to Protein-Protein Docking
Molecular Docking Using GOLD Tommi Suvitaival Seppo Virtanen S Basics for Biosystems of the Cell Fall 2006.
Comparative Evaluation of 11 Scoring Functions for Molekular Docking Authors: Renxiao Wang, Yipin Lu and Shaomeng Wang Presented by Florian Lenz.
eHiTS Score Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in scoring function evolution: a new statistically.
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGG
Optimization of Carbocyclic Analogues to a Specific Pharmaceutical Enzyme Target via Discovery Studio TM Douglas Harris Department of Chemistry and Biochemistry,
ClusPro: an automated docking and discrimination method for the prediction of protein complexes Stephen R. Comeau, David W.Gatchell, Sandor Vajda, and.
A genetic algorithm for structure based de-novo design Scott C.-H. Pegg, Jose J. Haresco & Irwin D. Kuntz February 21, 2006.
Increasing the Value of Crystallographic Databases Derived knowledge bases Knowledge-based applications programs Data mining tools for protein-ligand complexes.
Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander.
Flexible Multi-scale Fitting of Atomic Structures into Low- resolution Electron Density Maps with Elastic Network Normal Mode Analysis Tama, Miyashita,
1 John Mitchell; James McDonagh; Neetika Nath Rob Lowe; Richard Marchese Robinson.
In silico discovery of inhibitors using structure-based approaches Jasmita Gill Structural and Computational Biology Group, ICGEB, New Delhi Nov 2005.
SimBioSys Inc.© Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package.
SimBioSys Inc.© 2004http:// Conformational sampling in protein-ligand complex environment Zsolt Zsoldos SimBioSys Inc., © 2004 Contents:
A two-state homology model of the hERG K + channel: application to ligand binding Ramkumar Rajamani, Brett Tongue, Jian Li, Charles H. Reynolds J & J PRD.
Altman et al. JACS 2008, Presented By Swati Jain.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Hierarchical Database Screenings for HIV-1 Reverse Transcriptase Using a Pharmacophore Model, Rigid Docking, Solvation Docking, and MM-PB/SA Junmei Wang,
R L R L L L R R L L R R L L water DOCKING SIMULATIONS.
Structure- based Structure-based computer-aided drug discovery (SB-CADD) approach: helps to design and evaluate the quality, in terms of affinity, of series.
DockCrunch and Beyond... The future of receptor-based virtual screening Bohdan Waszkowycz, Tim Perkins & Jin Li Protherics Molecular Design Ltd Macclesfield,
Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive.
Identification of structurally diverse Growth Hormone Secretagogue (GHS) agonists by virtual screening and structure-activity relationship analysis of.
CoMFA Study of Piperidine Analogues of Cocaine at the Dopamine Transporter: Exploring the Binding Mode of the 3  -Substituent of the Piperidine Ring Using.
Molecular mechanics Classical physics, treats atoms as spheres Calculations are rapid, even for large molecules Useful for studying conformations Cannot.
2014 Using machine learning to predict binding sites in proteins Jenelle Bray Stanford University October 10, 2014 #GHC
Elon Yariv Graduate student in Prof. Nir Ben-Tal’s lab Department of Biochemistry and Molecular Biology, Tel Aviv University.
Docking and Virtual Screening Using the BMI cluster
Molecular Modeling in Drug Discovery: an Overview
TIDEA Target (and Lead) Independent Drug Enhancement Algorithm.
Structural Bioinformatics Elodie Laine Master BIM-BMC Semester 3, Genomics of Microorganisms, UMR 7238, CNRS-UPMC e-documents:
How to forecast solar flares?
DATA MINING FOR SMALL MOLECULE ALLOSTERIC INHIBITORS
Molecular Docking Profacgen. The interactions between proteins and other molecules play important roles in various biological processes, including gene.
International Chemical Design & Discovery Course 2017
Virtual Screening.
Current Status at BioChemtek
Machine Learning to Predict Experimental Protein-Ligand Complexes
An Integrated Approach to Protein-Protein Docking
Alexey Sulimov, Ekaterina Katkova, Vladimir Sulimov,
Fernando Corrêa, Jason Key, Brian Kuhlman, Kevin H. Gardner  Structure 
Ligand Binding to the Voltage-Gated Kv1
Cheminformatics Basics
Bohdan Waszkowycz, Tim Perkins & Jin Li
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Docking validation of Dud778-dUTPase cocrystal structure.
Julia Salas Case Study, CS379a
Presentation transcript:

Improving enrichment rates A practical solution to an impractical problem Noel O’Boyle Cambridge Crystallographic Data Centre

Overview Docking – an impractical problem? A practical solution Incorporation of burial depth into the ChemScore scoring function –Training using negative data –Results Conclusions

Docking – an impractical problem? Protein-ligand docking software –Predicts the binding affinity of small-molecule ligands to a protein target Virtual screen –Goal is to identify true ligands in a large dataset of molecules –Enrichment: the relative ranking of actives with respect to a set of inactives If only…

Docking – an impractical problem? Warren et al., J. Med. Chem., 2006, 49, 5912 –Large scale evaluation of 10 docking programs (37 scoring functions) against 8 proteins with ~200 actives each –No statistically significant correlation between measured affinity and any of the scoring functions “At its simplest level, this is a problem of subtraction of large numbers, inaccurately calculated, to arrive at a small number.” Leach, AR; Shoichet, BK; Peishoff, CE. J. Med. Chem. 2006, 49, 5851

A practical solution Pham, T. A.; Jain, A. N. J. Med. Chem. 2006, 49, Many scoring functions are trained using known binding affinities for a wide variety of protein-ligand complexes –Only positive data is used …do we really need to calculate the binding affinity? If we are just interested in performance in a virtual screen… –Why not directly optimize the enrichment? –Use both positive and negative data – poses of active molecules and inactive molecules

ChemScore scoring function in GOLD ΔG coefficients are constants derived from fitting to binding affinity values S lipo and S hbond are the sum of several lipophilic or hydrogen bond interactions

Burial depth scaling (BDS) Neither s hbond nor s lipo explicitly take into account the location in the active site where an interaction occurs –…but ligands tend to bind deep in the active site If we scale s hbond and s lipo based on burial depth, we may be able to improve the discrimination between actives and inactives Burial depth measured by number of protein heavy atoms within 8Å of an interaction, ρ

Dataset Astex Diverse Set (Hartshorn et al. J. Med. Chem. 2007, 50, 726) –85 high quality protein-ligand complexes Positive data –Highest scoring docked pose of active (where a pose was found within 2.0Å of crystal structure) –Otherwise locally-optimized crystal structure (6 out of 85) Negative data –For each active, chose 99 inactives from Astex in-house database of compounds available for purchase –Inactives chosen to be physicochemically similar to active, but topologically distinct –Docked each inactive into corresponding protein

Optimization procedure Brute force optimization over a grid (SciPy) Set parameter values (3 for f hbond, 3 for f lipo ) Calculate the scores of the active and inactive poses Calculate the rank of each of the 85 actives with respect to its 99 inactives (top rank is 1) The objective function is the mean of these ranks End result –a minimized objective function –optimized parameter values

Optimization results Without BDS: 18.6 Optimizing c hbond and c lipo : 14.0 (2 params) Optimizing c hbond and f lipo : 13.9 (4 params) Optimizing f hbond and c lipo : 12.5 (4 params) Optimizing f hbond and f lipo : 11.5 (6 params) 2 out of the 5 worst performers involved metal-ligand interactions –Applying f hbond to the metal term improved the mean ranks of those actives from 8.9 to 7.0 Final BDS equation involved c lipo and f hbond (= f metal )

Testing of final equation Without BDS: 18.6 After training BDS: 12.5 –f hbond params: ρ 1 = 13, ρ 2 = 105, f max = 1.80 –c lipo = 0.52 Brute force optimization after swapping the active with an inactive –Without BDS: 18.8 –After training BDS: 18.6 Applied to test set –Without BDS: 18.8 –After BDS: 12.6

Comparison of HB and lipophilic interactions s hbond s lipo

Performance of BDS

1w2g – thymidylate kinase

1p62 – deoxycytidine kinase

Performance of BDS

1xm6 – phosphodiesterase 4B

1hnn – phenylethanolamine N-methyltransferase

Conclusions Rewarding deeply-buried hydrogen bonds improves the discrimination between actives and inactives Negative data can be used to identify and address deficiencies in scoring functions

Acknowledgements Cambridge Crystallographic Data Centre –Robin Taylor, John Liebeschutz, Jason Cole, Simon Bowden, Richard Sykes Astex Therapeutics –Suzanne Brewerton, Chris Murray, Marcel Verdonk Martin Harrison (AstraZeneca) BDS will be available in the forthcoming GOLD 4.0 release

Blank

Receptor density functions used Optimized mean rank of actives Hydrogen bond function term(s) Lipophilic function term(s) Training Setρ1ρ1 ρ2ρ2 Sρ1ρ1 ρ2ρ2 S None f HB and f L fLfL f HB g HB and g L f HB and g L Test Set A None18.8 f HB and g L

Molecular weight effect DatasetMean rank of actives Before scalingAfter scaling Training set Test Set B Test Set C

Docking – an impractical problem? “Why does docking remain so primitive that it is unable to even rank- order a hit list? Accurate prediction of binding affinities for a diverse set of molecules turns out to be genuinely difficult. At its simplest level, this is a problem of subtraction of large numbers, inaccurately calculated, to arrive at a small number. The large numbers are the interaction energy between the ligand and protein on one hand and the cost of bringing the two molecules out of the solvent and into an intimate complex on the other hand. The result of this subtraction is the free energy of binding, the small number we most want to know.” Leach, AR; Shoichet, BK; Peishoff, CE. J. Med. Chem. 2006, 49, 5851

Astex Diverse Set “Diverse, high-quality test set for the valid of protein- ligand docking performance” –Hartshorn et al. J. Med. Chem. 2007, 50, protein-ligand complexes with high-quality crystal structures –Pharmaceutically relevant targets –Drug-like ligands –Diverse ligands, proteins In general, all waters have been removed