Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio Jude Shavlik

Slides:



Advertisements
Similar presentations
François Fages MPRI Bio-info 2007 Formal Biology of the Cell Protein structure prediction with constraint logic programming François Fages, Constraint.
Advertisements

Determination of Protein Structure. Methods for Determining Structures X-ray crystallography – uses an X-ray diffraction pattern and electron density.
Review of Basic Principles of Chemistry, Amino Acids and Proteins Brian Kuhlman: The material presented here is available on the.
2D matching part 2 Review of alignment methods and
By Guang Song and Nancy M. Amato Journal of Computational Biology, April 1, 2002 Presentation by Athina Ropodi.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Geometric Algorithms for Conformational Analysis of Long Protein Loops J. Cortess, T. Simeon, M. Remaud- Simeon, V. Tran.
Protein Planes Bob Fraser CSCBC Overview Motivation Points to examine Results Further work.
Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe.
Ameet Soni* and Jude Shavlik Dept. of Computer Sciences Dept. of Biostatistics and Medical Informatics Craig Bingman Dept. of Biochemistry Center for Eukaryotic.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Global Optimization: For Some Problems, There’s HOPE Daniel M. Dunlavy University of Maryland, College Park Applied Mathematics and Scientific Computation.
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University.
CISC667, F05, Lec21, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction 3-Dimensional Structure.
Graphical Models for Protein Kinetics Nina Singhal CS374 Presentation Nov. 1, 2005.
Protein Tertiary Structure Prediction. Protein Structure Prediction & Alignment Protein structure Secondary structure Tertiary structure Structure prediction.
Protein Primer. Outline n Protein representations n Structure of Proteins Structure of Proteins –Primary: amino acid sequence –Secondary:  -helices &
FLEX* - REVIEW.
Algorithm for Fast MC Simulation of Proteins Itay Lotan Fabian Schwarzer Dan Halperin Jean-Claude Latombe.
Dali: A Protein Structural Comparison Algorithm Using 2D Distance Matrices.
TEXTAL: A System for Automated Model Building Based on Pattern Recognition Thomas R. Ioerger Department of Computer Science Texas A&M University.
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
Protein Side Chain Packing Problem: A Maximum Edge-Weight Clique Algorithmic Approach Dukka Bahadur K.C, Tatsuya Akutsu and Tomokazu Seki Proceedings of.
A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry.
Bioinf. Data Analysis & Tools Molecular Simulations & Sampling Techniques117 Jan 2006 Bioinformatics Data Analysis & Tools Molecular simulations & sampling.
Computational Structure Prediction Kevin Drew BCH364C/391L Systems Biology/Bioinformatics 2/12/15.
Laplacian Surface Editing
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
1 TEMPLATE MATCHING  The Goal: Given a set of reference patterns known as TEMPLATES, find to which one an unknown pattern matches best. That is, each.
Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,
02/03/10 CSCE 769 Dihedral Angles Homayoun Valafar Department of Computer Science and Engineering, USC.
Representations of Molecular Structure: Bonds Only.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Statistical Physics of the Transition State Ensemble in Protein Folding Alfonso Ramon Lam Ng, Jose M. Borreguero, Feng Ding, Sergey V. Buldyrev, Eugene.
Rotamer Packing Problem: The algorithms Hugo Willy 26 May 2010.
. Sequence Alignment. Sequences Much of bioinformatics involves sequences u DNA sequences u RNA sequences u Protein sequences We can think of these sequences.
Probabilistic Methods for Interpreting Electron-Density Maps Frank DiMaio University of Wisconsin – Madison Computer Sciences Department
Computing Missing Loops in Automatically Resolved X-Ray Structures Itay Lotan Henry van den Bedem (SSRL)
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
intro-VIRUSES Virus NamePDB ID HUMAN PAPILLOMAVIRUS 161DZL BACTERIOPHAGE GA1GAV L-A virus1M1C SATELLITE PANICUM MOSAIC VIRUS1STM SATELLITE TOBACCO NECROSIS2BUK.
Protein Folding and Modeling Carol K. Hall Chemical and Biomolecular Engineering North Carolina State University.
Altman et al. JACS 2008, Presented By Swati Jain.
Approximation Algorithms For Protein Folding Prediction Giancarlo MAURI,Antonio PICCOLBONI and Giulio PAVESI Symposium on Discrete Algorithms, pp ,
Sampling for Part Based Object Models Daniel Huttenlocher September, 2006.
Protein Structure Prediction
New Strategies for Protein Folding Joseph F. Danzer, Derek A. Debe, Matt J. Carlson, William A. Goddard III Materials and Process Simulation Center California.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Belief Propagation in Large, Highly Connected Graphs for 3D Part-Based Object Recognition Frank DiMaio and Jude Shavlik Computer Sciences Department University.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
CSE280Stefano/Hossein Project: Primer design for cancer genomics.
Automated Refinement (distinct from manual building) Two TERMS: E total = E data ( w data ) + E stereochemistry E data describes the difference between.
Molecular dynamics (MD) simulations  A deterministic method based on the solution of Newton’s equation of motion F i = m i a i for the ith particle; the.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Find the optimal alignment ? +. Optimal Alignment Find the highest number of atoms aligned with the lowest RMSD (Root Mean Squared Deviation) Find a balance.
Computational Structure Prediction
Generating, Maintaining, and Exploiting Diversity in a Memetic Algorithm for Protein Structure Prediction Mario Garza-Fabre, Shaun M. Kandathil, Julia.
Reduce the need for human intervention in protein model building
Protein Planes Bob Fraser CSCBC 2007.
Yang Zhang, Andrzej Kolinski, Jeffrey Skolnick  Biophysical Journal 
Homology Modeling.
Protein structure prediction.
Combining Efficient Conformational Sampling with a Deformable Elastic Network Model Facilitates Structure Refinement at Low Resolution  Gunnar F. Schröder,
Molecular Mechanism for Stabilizing a Short Helical Peptide Studied by Generalized- Ensemble Simulations with Explicit Solvent  Yuji Sugita, Yuko Okamoto 
Coordination geometry of nonbonded residues in globular proteins
Protein structure prediction
Presentation transcript:

Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio Jude Shavlik George N. Phillips, Jr. ICML Bioinformatics Workshop 21 August 2003

Task Overview Given Electron density for a region in a protein Protein’s topology Find Atomic positions of individual atoms in the density map 

Pictorial Structures A pictorial structure is… a collection of image parts together with… a deformable conformation of these parts

Pictorial Structures Formally, a model consists of Set of parts V={v 1, …, v n } Configuration L=(l 1, …, l n ) Edges e ij  E, connect neighboring parts v i, v j – Explicit dependency between l i, l j – G = (V,E) forms a Markov Random Field Appearance parameters A i for each part Connection parameters C ij for each edge v3v3 v4v4 v5v5 v6v6 v1v1 v2v2 e 13 e 23 e 34 e 35 e 46 v4v4

Matching Algorithm Overview Want configuration L of model Θ maximizing P(L|I,Θ)  P(I|L,Θ) · P(L|Θ) P(I|L,Θ) = Π i P(I|l i,Θ) = 1 Z1Z1 e - Σ i match i (l i ) P(L|Θ) = Π (v i,v j )  E P(l i,l j |C ij ) = 1 Z2Z2 e - Σ (v i,v j )  E d ij (l i,l j ) Equivalent to minimizing Σ i match i (l i )+ Σ (v i,v j )  E d ij (l i,l j )

Linear-Time Matching Algorithm A Dynamic Programming implementation runs in quadratic time  Requires tree configuration of parts Felzenszwalb & Huttenlocher (2000) developed linear-time matching algorithm  Additional constraint on part-to-part cost function d ij  Basic “Trick”: Parallelize minimization computation over entire grid using a Generalized Distance Transform

Pictorial Structures for Map Interpretation Basic Idea: Build pictorial structure that is able to model all configurations of a molecule  Each part in “collection of parts” corresponds to an atom  Model has low-cost conformation for low-energy states of the molecule

The Screw-Joint Model Ideally, we would have cost function = atomic energy Problem: Impossible to represent atomic energy function using pairwise potentials while maintaining tree-structure Solution: screw-joint model  Ignore non-bonded interactions  Edges correspond to covalent bonds  Allow free rotation around bonds

Screw-Joint Model Details Each part’s configuration has six params (x,y,z,α,β,γ) with  (x,y,z) is part’s position  α is part’s rotation (about bond connecting v i and v j )  (β,γ) is part’s orientation vivi vjvj vivi vjvj (x ij,y ij,z ij ) (βi,γi)(βi,γi) (βj,γj)(βj,γj) (xi,yi,zi)(xi,yi,zi) (xj,yj,zj)(xj,yj,zj) αjαj αiαi  Part-to-part cost function d ij based on child’s deviation from ideal  Matching cost function match i based on 3x3x3 template match

Pictorial Structures for Map Interpretation  Ideally, we would … Build pictorial structure for the entire protein Run the matching algorithm to get best layout  However, computationally infeasible  Instead, we use two-phase algorithm that … a) computes best backbone trace b) computes best sidechain conformation (current focus)

Sidechain Refinement Assume we have a rough C α trace of the protein Next use pictorial structure matching to place sidechains Walk along chain one residue at a time, placing individual atoms C α, MET_80 C α, ARG_81 C α, ALA_82 C α, PRO_83

Sidechain Refinement Given:  residue type  approximate C α locations Find: most likely location for sidechain atoms in the residue Example Alanine N C -1 CαCα Cα -1 O -1 CCβCβ O Cα +1 N +1 O N N O Matching algorithm

Learning Model Parameters O N N O C CαCα N CβCβ Averaged 3D Template Averaged Bond Geometry Canonic Orientation N C -1 CαCα CCβ ON +1 Alanine C α C CαCα N CβCβ r = 1.53 θ = 0.0° φ = -19.3° r = 1.51 θ = 118.4° φ = -19.7°

Soft Maximums Sometimes we may get an optimal match like the one to the right When this occurs, explore the space of non-optimal solutions via soft maximums in DP Basic Idea: Take a path with probability inversely proportional to its cost ACTUAL PREDICTED 1

Soft Maximums Figure to the right shows soft maximums Red molecule eventually found Annealing increases “softness” until legal structure found Legal structure may not be “right” ACTUAL PREDICTED 1 PREDICTED 2

Results Only sidechain refinement implemented & tested Experimental Methodology  Assume C α ’s known to within 2Å  Trained on 1.7 Å resolution protein, tested on 1.9 Å resolution protein  Templates built for ALA, VAL, TYR, LYS Model Parameters  Grid spacing of 0.5 Å within diameter 10 Å sphere  Rotational discretization: 12 rotational steps 84 orientations

Sidechain Placement Compared predicted vs. actual location for 599 atoms on testset protein 29.9% atoms within 0.5Å 72.3% atoms within 1.0Å 93.0% atoms within 2.0Å Recall 0.5Å grid spacing

Predictive Accuracy Task We used DP matching score as a predictor of amino acid type Tested 49 ALA, LYS, TYR, VAL residues Highest scoring normalized template determined type 61.2% accuracy (majority classification = 33%)

The Good… PREDICTED ACTUAL PREDICTED vs. ACTUAL LYSINE VALINE TYROSINE

… and the Bad PREDICTEDACTUAL PREDICTED vs. ACTUAL LYSINE ALANINE TYROSINE VALINE

Future Work Implement & integrate backbone tracing algorithm, to create complete two-tiered solution Better strategies to handle illegal molecule configurations  perturbation of branches involved in collisions  more accurate representation of atomic energy function, e.g. torsion angle Better match function … make use of previous work? More tests (larger training set, higher resolution)

Acknowledgements NLM grant 1T15 LM NLM grant 1R01 LM NIH grant P50 GM64598.