A MULTIBODY ATOMIC STATISTICAL POTENTIAL FOR PREDICTING ENZYME-INHIBITOR BINDING ENERGY Majid Masso Laboratory for Structural Bioinformatics,

Slides:

Advertisements

Similar presentations

Forecasting Using the Simple Linear Regression Model and Correlation

Advertisements

FTP Biostatistics II Model parameter estimations: Confronting models with measurements.

3D Molecular Structures C371 Fall Morgan Algorithm (Leach & Gillet, p. 8)

With thanks to Zhijun Wu An introduction to the algorithmic problems of Distance Geometry.

Case Studies Class 5. Computational Chemistry Structure of molecules and their reactivities Two major areas –molecular mechanics –electronic structure.

Structural bioinformatics

Glycogen Phosphorylase Inhibitors: A Free Energy Perturbation Analysis of Glucopyranose Spirohydantoin Analogues G. Archontis, K. A. Watson, Q. Xie, G.

Docking of Protein Molecules

Protein Primer. Outline n Protein representations n Structure of Proteins Structure of Proteins –Primary: amino acid sequence –Secondary:  -helices &

Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.

. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]

An Integrated Approach to Protein-Protein Docking

BL5203: Molecular Recognition & Interaction Lecture 5: Drug Design Methods Ligand-Protein Docking (Part I) Prof. Chen Yu Zong Tel:

Protein Structure Prediction Samantha Chui Oct. 26, 2004.

Relationships Among Variables

A Statistical Geometry Approach to the Study of Protein Structure Majid Masso Bioinformatics and Computational Biology George Mason University.

The Geometry of Biomolecular Solvation 1. Hydrophobicity Patrice Koehl Computer Science and Genome Center

Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.

Computational Chemistry. Overview What is Computational Chemistry? How does it work? Why is it useful? What are its limits? Types of Computational Chemistry.

Molecular Modeling Part I Molecular Mechanics and Conformational Analysis ORG I Lab William Kelly.

Protein Tertiary Structure Prediction

Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica

NUS CS5247 A dimensionality reduction approach to modeling protein flexibility By, By Miguel L. Teodoro, George N. Phillips J* and Lydia E. Kavraki Rice.

Being a binding site: Characterizing Residue-Composition of Binding Sites on Proteins joint work with Zoltán Szabadka and Gábor Iván, Protein Information.

Optimization of Carbocyclic Analogues to a Specific Pharmaceutical Enzyme Target via Discovery Studio TM Douglas Harris Department of Chemistry and Biochemistry,

Computational Biology BS123A/MB223 UC-Irvine Ray Luo, MBB, BS.

Marcin Pacholczyk, Silesian University of Technology.

CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.

Overcoming the Curse of Dimensionality in a Statistical Geometry Based Computational Protein Mutagenesis Majid Masso Bioinformatics and Computational Biology.

Increasing the Value of Crystallographic Databases Derived knowledge bases Knowledge-based applications programs Data mining tools for protein-ligand complexes.

Prediction of HIV-1 Drug Resistance: Representation of Target Sequence Mutational Patterns via an n-Grams Approach Majid Masso School of Systems Biology,

 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.

On the nature of cavities on protein surfaces: Application to the Identification of drug-binding sites Murad Nayal, Barry Honig Columbia University, NY.

Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander.

1 John Mitchell; James McDonagh; Neetika Nath Rob Lowe; Richard Marchese Robinson.

Conformational Entropy Entropy is an essential component in ΔG and must be considered in order to model many chemical processes, including protein folding,

Molecular Mechanics Studies involving covalent interactions (enzyme reaction): quantum mechanics; extremely slow Studies involving noncovalent interactions.

Altman et al. JACS 2008, Presented By Swati Jain.

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.

Structure database: PDB Tuomas Hätinen. Protein Data Bank A repository for 3-D biological macromolecular structure. It includes proteins, nucleic acids.

Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.

UNC Chapel Hill David A. O’Brien Chain Growing Using Statistical Energy Functions David A. O'Brien Balasubramanian Krishnamoorthy: Jack Snoeyink Alex Tropsha.

Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.

Correlation & Regression Analysis

Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.

Force Fields Summary. Force Fields 2 What is a Force Field ? A force field is a set of equations and parameters which when evaluated for a (molecular)

Emidio Capriotti, Piero Fariselli and Rita Casadio Biocomputing Unit

Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.

Molecular dynamics simulations of toxin binding to ion channels Quantitative description protein –ligand interactions is a fundamental problem in molecular.

Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive.

1 Three-Body Delaunay Statistical Potentials of Protein Folding Andrew Leaver-Fay University of North Carolina at Chapel Hill Bala Krishnamoorthy, Alex.

Molecular dynamics (MD) simulations  A deterministic method based on the solution of Newton’s equation of motion F i = m i a i for the ith particle; the.

Molecular mechanics Classical physics, treats atoms as spheres Calculations are rapid, even for large molecules Useful for studying conformations Cannot.

We propose an accurate potential which combines useful features HP, HH and PP interactions among the amino acids Sequence based accessibility obtained.

CORRELATION-REGULATION ANALYSIS Томский политехнический университет.

Modeling Cell Proliferation Activity of Human Interleukin-3 (IL-3) Upon Single Residue Replacements Majid Masso Bioinformatics and Computational Biology.

A Computational Study of RNA Structure and Dynamics Rhiannon Jacobs and Harish Vashisth Department of Chemical Engineering, University of New Hampshire,

A new protein-protein docking scoring function based on interface residue properties Reporter: Yu Lun Kuo (D )

Majid Masso School of Systems Biology, George Mason University

DEVELOPMENT OF SEMI-EMPIRICAL ATOMISTIC POTENTIALS MS-MEAM

Protein Structure Prediction and Protein Homology modeling

Molecular Docking Profacgen. The interactions between proteins and other molecules play important roles in various biological processes, including gene.

An Integrated Approach to Protein-Protein Docking

Volume 25, Issue 11, Pages e3 (November 2017)

Product moment correlation

Volume 23, Issue 10, Pages (October 2016)

Ligand Binding to the Voltage-Gated Kv1

Model selection and fitting

Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.

Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.

Presentation transcript:

A MULTIBODY ATOMIC STATISTICAL POTENTIAL FOR PREDICTING ENZYME-INHIBITOR BINDING ENERGY Majid Masso Laboratory for Structural Bioinformatics, School of Systems Biology, George Mason University, University Blvd. MS 5B3, Manassas, Virginia 20110, USA I. Abstract Accurate prediction of enzyme-inhibitor binding energy has the capacity to speed drug design and chemical genomics efforts by helping to narrow the focus of experiments. Here a non-redundant set of three hundred high-resolution crystallographic enzyme-inhibitor structures was compiled for analysis, complexes with known binding energies (ΔG) based on the availability of experimentally determined inhibition constants (k i ). Additionally, a separate set of over 1400 diverse high-resolution macromolecular crystal structures was collected for the purpose of creating an all-atom knowledge-based statistical potential, via application of the Delaunay tessellation computational geometry technique. Next, two hundred of the enzyme-inhibitor complexes were randomly selected to develop a model for predicting binding energy, first by tessellating structures of the complexes as well as the enzymes without their bound inhibitors, then by using the statistical potential to calculate a topological score for each structure tessellation. We derived as a predictor of binding energy an empirical linear function of the difference between topological scores for a complex and its isolated enzyme. A correlation coefficient (r) of 0.79 was obtained for the experimental and calculated ΔG values, with a standard error of 2.34 kcal/mol. Lastly, the model was evaluated with the held-out set of one hundred complexes, for which structure tessellations were performed in order to calculate topological score differences, and binding energy predictions were generated from the derived linear function. Calculated binding energies for the test data also compared well with their experimental counterparts, displaying a correlation coefficient of r = 0.77 with a standard error of 2.50 kcal/mol. II. Protein Data Bank ( PDB – repository of solved (x-ray, nmr,...) structures Each structure file contains atomic 3D coordinate data AtomXYZ :::: :::: III. Macromolecular Modeling Native structure is conformation having lowest energy Physics-based energy calculations using quantum mechanics are computationally impractical Same for molecular mechanics-based potential energy functions (i.e., force fields): E(total) = E(bond) + E(angle) + E(dihedral) + E(electrostatic) + E(van der Waals) Alternative (our approach): knowledge-based potentials of mean force (i.e., generated from known protein structures) IV. Knowledge-Based Potentials of Mean Force Assumptions: –At equilibrium, native state has global free energy min –Microscopic states (i.e., features) follow Boltzmann dist Examples: –Well-documented in the literature: distance-dependent pairwise interactions at the atomic or amino acid level –This study: inclusion of higher-order contributions by developing all-atom four-body statistical potentials Motivation (our prior work): –Four-body protein potential at the amino acid level V. Motivational Example: Pairwise Amino Acid Potential A 20-letter protein alphabet yields 210 residue pairs Obtain large, diverse PDB dataset of single protein chains For each residue pair (i, j), calculate the relative frequency f ij with which they appear within a given distance (e.g., 12 angstroms) of each other in all the protein structures Calculate a rate p ij expected by chance alone from a background or reference distribution (more later…) Apply inverted Bolzmann principle: s ij = log(f ij / p ij ) quantifies interaction propensity and is proportional to the energy of interaction (by a factor of ‘–RT’) VI. All-Atom Four-Body Statistical Potential Obtain diverse PDB dataset of 1417 single chain and multimeric proteins, many complexed to ligands (see XV. References) Six-letter atomic alphabet: C, N, O, S, M (metals), X (other) Apply Delaunay tessellation to the atomic point coordinates of each PDB file – objectively identifies all nearest-neighbor quadruplets of atoms in the structure (8 angstrom cutoff) VII. All-Atom Four-Body Statistical Potential A six-letter atomic alphabet yields 126 distinct quadruplets For each quad (i, j, k, l), calculate observed rate of occurrence f ijkl among all tetrahedra from the 1417 structure tessellations Compute rate p ijkl expected by chance from a multinomial reference distribution: a n = proportion of atoms from all structures that are of type n t n = number of occurrences of atom type n in the quad Apply inverted Bolzmann principle: s ijkl = log(f ijkl / p ijkl ) quantifies the interaction propensity and is proportional to the energy of atomic quadruplet interaction VIII. Summary Data for the 1417 Structure Files and their Delaunay Tessellations IX. All-Atom Four-Body Statistical Potential X. Topological Score (TS) Delaunay tessellation of any macromolecular structure yields an aggregate of tetrahedral simplices Each simplex can be scored using the all-atom four-body potential based on the quad present at the four vertices Topological score (or ‘total potential’) of the structure: the sum of all constituent simplices in the tessellation s ijkl TS = Σs ijkl XI. Topological Score Difference (ΔTS) XII. Application of ΔTS: Predicting Enzyme–Inhibitor Binding Energy MOAD – repository of exp. inhibition constants (k i ) for protein–ligand complexes whose structures are in PDB Collected k i values for 300 complexes reflecting diverse protein structures Obtained exp. binding energy from k i via ΔG exp = –RTln(k i ) Calculated ΔTS for complexes XIII. Predicting Enzyme–Inhibitor Binding Energy Randomly selected 200 complexes to train a model Correlation coefficient r = 0.79 between ΔTS and ΔG exp Empirical linear transform of ΔTS to reflect energy values: ΔG calc = (1 / ) × ΔTS – 6.24 Linear => same r = 0.79 value between ΔG calc and ΔG exp Also, standard error of SE = 2.34 kcal/mol and fitted regression line of y = 0.98x – 0.41 (y = ΔG calc and x = ΔG exp ) XIV. Predicting Enzyme–Inhibitor Binding Energy For the test set of 100 remaining complexes: r = 0.77 between ΔG calc and ΔG exp SE = 2.50 kcal/mol Fitted regression line is y = 1.07x All training/test data available online as a text file (see XV. References) XV. References and Acknowledgments PDB dataset: Train/test dataset: PDB (structure DB): MOAD (ligand binding DB): Qhull (Delaunay tessellation): UCSF Chimera (ribbon/ball-stick structure visualization): Matlab (tessellation visualization):