A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry.

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
Introduction to protein x-ray crystallography. Electromagnetic waves E- electromagnetic field strength A- amplitude  - angular velocity - frequency.
Determination of Protein Structure. Methods for Determining Structures X-ray crystallography – uses an X-ray diffraction pattern and electron density.
Exact Inference in Bayes Nets
Dynamic Bayesian Networks (DBNs)
Learning to estimate human pose with data driven belief propagation Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05.
Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.
Consistent probabilistic outputs for protein function prediction William Stafford Noble Department of Genome Sciences Department of Computer Science and.
Techniques for Improved Probabilistic Inference in Protein-Structure Determination via X-Ray Crystallography Ameet Soni Department of Computer Sciences.
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
Graphical models: approximate inference and learning CA6b, lecture 5.
Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe.
A Graphical Model For Simultaneous Partitioning And Labeling Philip Cowans & Martin Szummer AISTATS, Jan 2005 Cambridge.
Ameet Soni* and Jude Shavlik Dept. of Computer Sciences Dept. of Biostatistics and Medical Informatics Craig Bingman Dept. of Biochemistry Center for Eukaryotic.
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
The TEXTAL System: Automated Model-Building Using Pattern Recognition Techniques Dr. Thomas R. Ioerger Department of Computer Science Texas A&M University.
CAPRA: C-Alpha Pattern Recognition Algorithm Thomas R. Ioerger Department of Computer Science Texas A&M University.
The TEXTAL System for Automated Model Building Thomas R. Ioerger Texas A&M University.
Big Ideas in Cmput366. Search Blind Search Iterative deepening Heuristic Search A* Local and Stochastic Search Randomized algorithm Constraint satisfaction.
Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio Jude Shavlik
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
Don't fffear the buccaneer Kevin Cowtan, York. ● Map simulation ⇨ A tool for building robust statistical methods ● 'Pirate' ⇨ A new statistical phase improvement.
Belief Propagation Kai Ju Liu March 9, Statistical Problems Medicine Finance Internet Computer vision.
Automated Model-Building with TEXTAL Thomas R. Ioerger Department of Computer Science Texas A&M University.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Mean Field Inference in Dependency Networks: An Empirical Study Daniel Lowd and Arash Shamaei University of Oregon.
12/07/2008UAI 2008 Cumulative Distribution Networks and the Derivative-Sum-Product Algorithm Jim C. Huang and Brendan J. Frey Probabilistic and Statistical.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
第十讲 概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models
Probabilistic Methods for Interpreting Electron-Density Maps Frank DiMaio University of Wisconsin – Madison Computer Sciences Department
Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
CS Statistical Machine learning Lecture 24
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
Lecture 2: Statistical learning primer for biologists
Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
CPSC 422, Lecture 17Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17 Oct, 19, 2015 Slide Sources D. Koller, Stanford CS - Probabilistic.
Belief Propagation in Large, Highly Connected Graphs for 3D Part-Based Object Recognition Frank DiMaio and Jude Shavlik Computer Sciences Department University.
Tightening LP Relaxations for MAP using Message-Passing David Sontag Joint work with Talya Meltzer, Amir Globerson, Tommi Jaakkola, and Yair Weiss.
Forward and inverse kinematics in RNA backbone conformations By Xueyi Wang and Jack Snoeyink Department of Computer Science UNC-Chapel Hill.
Bayes network inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y 
1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.
Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
Bayesian Belief Propagation for Image Understanding David Rosenberg.
Perfect recall: Every decision node observes all earlier decision nodes and their parents (along a “temporal” order) Sum-max-sum rule (dynamical programming):
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Free Energy Estimates of All-atom Protein Structures Using Generalized Belief Propagation Kamisetty H., Xing, E.P. and Langmead C.J. Raluca Gordan February.
Stony Brook Integrative Structural Biology Organization
Reduce the need for human intervention in protein model building
Bucket Renormalization for Approximate Inference
Bayesian Refinement of Protein Functional Site Matching
Markov Networks.
CSCI 5822 Probabilistic Models of Human and Machine Learning
Graduate School of Information Sciences, Tohoku University
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
Expectation-Maximization & Belief Propagation
Markov Networks.
Graduate School of Information Sciences, Tohoku University
Presentation transcript:

A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry Department University of Wisconsin – Madison USA Presented at the Fourteenth Conference on Intelligent Systems for Molecular Biology (ISMB 2006), Fortaleza, Brazil, August 7, 2006

X-ray Crystallography Protein Crystal Collection Plate FFT Electron Density Map (“3D picture”) X-ray beam

Given: Sequence + Density Map Sequence + Electron Density Map

Find: Each Atom’s Coordinates

Our Subtask: Backbone Trace CαCα CαCα CαCα CαCα

The Unit Cell  3D density function ρ(x,y,z) provided over unit cell  Unit cell may contain multiple copies of the protein

The Unit Cell  3D density function ρ(x,y,z) provided over unit cell  Unit cell may contain multiple copies of the protein

Density Map Resolution ARP/wARP (Perrakis et al. 1997) TEXTAL (Ioerger et al. 1999) Resolve (Terwilliger 2002) Our focus 2Å 3Å 4Å

Overview of ACMI (our method)  Local Match Algorithm searches for sequence-specific 5-mers centered at each amino acid Many false positives  Global Consistency Use probabilistic model to filter false positives Find most probable backbone trace  Global Consistency Use probabilistic model to filter false positives Find most probable backbone trace

5-mer Lookup and Cluster … VKH V LVSPEKIEELIKGY … PDB Cluster 1 Cluster 2 wt=0.67wt=0.33 NOTE: can be done in precompute step

5-mer Search  6D search (rotation + translation) for representative structures in density map  Compute “similarity”  Computed by Fourier convolution (Cowtan 2001)  Use tuneset to convert similarity score to probability

Convert Scores to Probabilities 5-mer representative scores t i (u i ) search density map Bayes’ rule probability distribution over unit cell P(5-mer at u i | Map) match to tuneset score distributions POS NEG

In This Talk…  Where we are now For each amino acid in the protein, we have a probability distribution over the unit cell  Where we are headed Find the backbone layout maximizing

Pairwise Markov Field Models  A type of undirected graphical model  Represent joint probabilities as product of vertex and edge potentials  Similar to (but more general than) Bayesian networks u1u1 u3u3 u2u2 y

Protein Backbone Model ALAGLYLYSLEU  Each vertex is an amino acid  Each label is location + orientation  Evidence y is the electron density map  Each vertex (or observational) potential comes from the 5-mer matching

Protein Backbone Model  Two types of edge (or structural) potentials Adjacency constraints ensure adjacent amino acids are ~3.8 Å apart and in the proper orientation ALAGLYLYSLEU

Protein Backbone Model  Two types of structural (edge) potentials Adjacency constraints ensure adjacent amino acids are ~3.8 Å apart and in the proper orientation Occupancy constraints ensure nonadjacent amino acids do not occupy same 3D space ALAGLYLYSLEU

Backbone Model Potential Constraints between adjacent amino acids: =x

Constraints between nonadjacent amino acids: Backbone Model Potential

Observational (“amino-acid-finder”) probabilities Backbone Model Potential

Probabilistic Inference  Exact methods are intractable  Use belief propagation (BP) to approximate marginal distributions  Want to find backbone layout that maximizes

Belief Propagation (BP)  Iterative, message-passing method (Pearl 1988)  A message,, from amino acid i to amino acid j indicates where i expects to find j  An approximation to the marginal (or belief), is given as the product of incoming messages

Belief Propagation Example ALAGLY

Technical Challenges  Representation of potentials Store Fourier coefficients in Cartesian space At each location x, store a single orientation r  Speeding up O(N 2 X 2 ) naïve implementation X = the unit cell size (# Fourier coefficients) N = the number of residues in the protein

Speeding Up O(N 2 X 2 ) Implementation  O(X 2 ) computation for each occupancy message Each message must integrate over the unit cell O(X log X) as multiplication in Fourier space  O(N 2 ) messages computed & stored Approx N-3 occupancy messages with a single message O(N) messages using a message product accumulator  Improved implementation O(NX log X)

1XMT at 3Å Resolution 1.12Å RMSd 100% coverage HIGH LOW prob(AA at location)

1VMO at 4Å Resolution 3.63Å RMSd 72% coverage HIGH LOW prob(AA at location)

1YDH at 3.5Å Resolution 1.47Å RMSd 90% coverage HIGH LOW prob(AA at location)

Experiments  Tested ACMI against other map interpretation algorithms: TEXTAL and Resolve  Used ten model-phased maps  Smoothly diminished reflection intensities yielding 2.5, 3.0, 3.5, 4.0 Å resolution maps

RMS Deviation ACMI Textal Resolve Density Map Resolution Cα RMS Deviation ACMI

Model Completeness Density Map Resolution ACMI Textal Resolve % chain traced % residues identified ACMI

Per-protein RMS Deviation ACMI RMS Error TEXTAL RMS Error Resolve RMS Error

Conclusions  ACMI effectively combines weakly-matching templates to construct a full model  Produces an accurate trace even with poor-quality density map data  Reduces computational complexity from O(N 2 X 2 ) to O(N X log X)  Inference possible for even large unit cells

Future Work  Improve “amino-acid-finding” algorithm  Incorporate sidechain placement / refinement  Manage missing data Disordered regions Only exterior visible (e.g., in CryoEM)

Acknowledgements  Ameet Soni  Craig Bingman  NLM grants 1R01 LM and 1T15 LM007359