Probabilistic Methods for Interpreting Electron-Density Maps Frank DiMaio University of Wisconsin – Madison Computer Sciences Department

Slides:

Advertisements

Similar presentations

Mean-Field Theory and Its Applications In Computer Vision1 1.

Advertisements

Bayesian Belief Propagation

Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,

CSCE643: Computer Vision Bayesian Tracking & Particle Filtering Jinxiang Chai Some slides from Stephen Roth.

Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.

Exact Inference in Bayes Nets

Dynamic Bayesian Networks (DBNs)

Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.

Techniques for Improved Probabilistic Inference in Protein-Structure Determination via X-Ray Crystallography Ameet Soni Department of Computer Sciences.

GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe.

Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.

A Graphical Model For Simultaneous Partitioning And Labeling Philip Cowans & Martin Szummer AISTATS, Jan 2005 Cambridge.

Ameet Soni* and Jude Shavlik Dept. of Computer Sciences Dept. of Biostatistics and Medical Informatics Craig Bingman Dept. of Biochemistry Center for Eukaryotic.

1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.

The TEXTAL System: Automated Model-Building Using Pattern Recognition Techniques Dr. Thomas R. Ioerger Department of Computer Science Texas A&M University.

Graphical Models for Mobile Robot Localization Shuang Wu.

The TEXTAL System for Automated Model Building Thomas R. Ioerger Texas A&M University.

Probabilistic Robotics

Nonlinear and Non-Gaussian Estimation with A Focus on Particle Filters Prasanth Jeevan Mary Knox May 12, 2006.

Particle filters (continued…). Recall Particle filters –Track state sequence x i given the measurements ( y 0, y 1, …., y i ) –Non-linear dynamics –Non-linear.

Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio Jude Shavlik

Belief Propagation Kai Ju Liu March 9, Statistical Problems Medicine Finance Internet Computer vision.

Automated Model-Building with TEXTAL Thomas R. Ioerger Department of Computer Science Texas A&M University.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Bayesian Filtering for Location Estimation D. Fox, J. Hightower, L. Liao, D. Schulz, and G. Borriello Presented by: Honggang Zhang.

Protein Side Chain Packing Problem: A Maximum Edge-Weight Clique Algorithmic Approach Dukka Bahadur K.C, Tatsuya Akutsu and Tomokazu Seki Proceedings of.

The Role of Specialization in LDPC Codes Jeremy Thorpe Pizza Meeting Talk 2/12/03.

A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry.

Computer vision: models, learning and inference

Particle Filtering in Network Tomography

Markov Localization & Bayes Filtering

BraMBLe: The Bayesian Multiple-BLob Tracker By Michael Isard and John MacCormick Presented by Kristin Branson CSE 252C, Fall 2003.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.

Simultaneous Localization and Mapping Presented by Lihan He Apr. 21, 2006.

Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

Physics Fluctuomatics / Applied Stochastic Process (Tohoku University) 1 Physical Fluctuomatics Applied Stochastic Process 9th Belief propagation Kazuyuki.

Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM.

Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

Expected accuracy sequence alignment Usman Roshan.

Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)

Lecture 2: Statistical learning primer for biologists

Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.

DISTIN: Distributed Inference and Optimization in WSNs A Message-Passing Perspective SCOM Team

John Lafferty Andrew McCallum Fernando Pereira

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

CPSC 422, Lecture 17Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17 Oct, 19, 2015 Slide Sources D. Koller, Stanford CS - Probabilistic.

Belief Propagation in Large, Highly Connected Graphs for 3D Part-Based Object Recognition Frank DiMaio and Jude Shavlik Computer Sciences Department University.

Pattern Recognition and Machine Learning

Tightening LP Relaxations for MAP using Message-Passing David Sontag Joint work with Talya Meltzer, Amir Globerson, Tommi Jaakkola, and Yair Weiss.

1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.

More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.

Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.

Particle filters for Robot Localization An implementation of Bayes Filtering Markov Localization.

1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.

Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),

10 October, 2007 University of Glasgow 1 EM Algorithm with Markov Chain Monte Carlo Method for Bayesian Image Analysis Kazuyuki Tanaka Graduate School.

Free Energy Estimates of All-atom Protein Structures Using Generalized Belief Propagation Kamisetty H., Xing, E.P. and Langmead C.J. Raluca Gordan February.

Learning Deep Generative Models by Ruslan Salakhutdinov

Probabilistic Robotics

Markov Networks.

CSCI 5822 Probabilistic Models of Human and Machine Learning

≠ Particle-based Variational Inference for Continuous Systems

Graduate School of Information Sciences, Tohoku University

Physical Fluctuomatics 7th~10th Belief propagation

Expectation-Maximization & Belief Propagation

Markov Networks.

Probabilistic Robotics Bayes Filter Implementations FastSLAM

Presentation transcript:

Probabilistic Methods for Interpreting Electron-Density Maps Frank DiMaio University of Wisconsin – Madison Computer Sciences Department

3D Protein Structure backbone sidechain backbone sidechain C-alpha

3D Protein Structure ALA LEU PRO VAL ARG …… ???

High-Throughput Structure Determination Protein-structure determination important Understanding function of a protein Understanding mechanisms Targets for drug design Some proteins produce poor density maps Interpreting poor electron-density maps is very (human) laborious I aim to automatically interpret poor-quality electron-density maps

Electron-Density Map Interpretation … … GIVEN: 3D electron-density map, (linear) amino-acid sequence

Electron-Density Map Interpretation … … FIND: All-atom Protein Model

My focus Density Map Resolution Morris et al. (2003) Ioerger et al. (2002) Terwilliger (2003) 2.0Å3.0Å4.0Å1.0Å

Thesis Contributions A probabilistic approach to protein-backbone tracing DiMaio et al., Intelligent Systems for Molecular Biology (2006) Improved template matching in electron-density maps DiMaio et al., IEEE Conference on Bioinformatics and Biomedicine (2007) Creating all-atom protein models using particle filtering DiMaio et al. (under review) Pictorial structures for atom-level molecular modeling DiMaio et al., Advances in Neural Information Processing Systems (2004) Improving the efficiency of belief propagation DiMaio and Shavlik, IEEE International Conference on Data Mining (2006) Iterative phase improvement in ACMI

A CMI Overview Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007) Independent amino-acid search Templates model 5-mer conformational space Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006) Protein structural constraints refine local search Markov field (MRF) models pairwise constraints Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories

A CMI Overview Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007) Independent amino-acid search Templates model 5-mer conformational space Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006) Protein structural constraints refine local search Markov field (MRF) models pairwise constraints Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories

5-mer Lookup …SAW C VKFEKPADKNGKTE… Protein DB A CMI searches map for each template independently Spherical-harmonic decomposition allows rapid search of all template rotations

Spherical-Harmonic Decomposition f (θ,φ)

5-mer Fast Rotation Search pentapeptide fragment from PDB (the “template”) electron density map calculated (expected) density in 5A sphere map-region sampled in spherical shells template-density sampled in spherical shells sampled region of density in 5A sphere

5-mer Fast Rotation Search map-region sampled in spherical shells template-density sampled in spherical shells template spherical- harmonic coefficients map-region spherical- harmonic coefficients correlation coefficient as function of rotation fast-rotation function (Navaza 2006, Risbo 1996)

Convert Scores to Probabilities correlation coefficients over density map t i (u i ) scan density map for fragment probability distribution over density map P(5-mer at u i | EDM) Bayes’ rule

A CMI Overview Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007) Independent amino-acid search Templates model 5-mer conformational space Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006) Protein structural constraints refine local search Markov field (MRF) models pairwise constraints Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories

Probabilistic Backbone Model Trace assigns a position and orientation u i ={x i, q i } to each amino acid i The probability of a trace U = {u i } is This full joint probability intractable to compute Approximate using pairwise Markov field

Pairwise Markov-Field Model Joint probabilities defined on a graph as product of vertex and edge potentials GLYLYSLEUSERALA

ACMI’s Backbone Model Observational potentials tie the map to the model LEUSERGLYLYSALA

GLYLYSLEUSERALA ACMI’s Backbone Model Adjacency constraints ensure adjacent amino acids are ~3.8 Å apart and in proper orientation Occupancy constraints ensure nonadjacent amino acids do not occupy same 3D space

Backbone Model Potential

Constraints between adjacent amino acids × =

Backbone Model Potential Constraints between all other amino acid pairs

Backbone Model Potential Observational (“template-matching”) probabilities

Inferring Backbone Locations Want to find backbone layout that maximizes

Inferring Backbone Locations Exact methods are intractable Use belief propagation (Pearl 1988) to approximate marginal distributions Want to find backbone layout that maximizes

Belief Propagation Example LYS 31 LEU 32 m LYS31→LEU32 p LEU32 p LYS31 ˆ ˆ

Belief Propagation Example LYS 31 LEU 32 m LEU32→LYS31 p LEU32 p LYS31 ˆ ˆ

Naïve implementation O(N 2 G 2 ) N = the number of amino acids in the protein G = # of points in discretized density map O(G 2 ) computation for each message passed O(G log G) as Fourier-space multiplication O(N 2 ) messages computed & stored Approx (N-3) occupancy msgs with 1 message O(N) messages using a message accumulator Improved implementation O(NG log G) Scaling BP to Proteins (DiMaio and Shavlik, ICDM 2006)

Naïve implementation O(N 2 G 2 ) N = the number of amino acids in the protein G = # of points in discretized density map O(G 2 ) computation for each message passed O(G log G) as Fourier-space multiplication O(N 2 ) messages computed & stored Approx (N-3) occupancy msgs with 1 message O(N) messages using a message accumulator Improved implementation O(NG log G) Scaling BP to Proteins (DiMaio and Shavlik, ICDM 2006)

To pass a message Occupancy Message Approximation occupancy edge potential product of incoming msgs to i except from j

To pass a message Occupancy Message Approximation occupancy edge potential product of all incoming msgs to i “Weak” potentials between nonadjacent amino acids lets us approximate

Occupancy Message Approximation

Send outgoing occupancy message product to a central accumulator ACC

Occupancy Message Approximation ACC Then, each node’s incoming message product is computed in constant time

BP Output After some number of iterations, BP gives probability distributions over Cα locations ALA LEU PRO VAL ARG …… ………

A CMI ’s Backbone Trace Independently choose Cα locations that maximize approximate marginal distribution … …

Example: 1XRI HIGH LOW Å RMSd 93% complete prob(AA at location) 3.3Å resolution density map 39° mean phase error

Testset Density Maps (raw data) Density-map resolution (Å) Density-map mean phase error (deg.)

Experimental Accuracy % Cα’s located within 2Å of some Cα / correct Cα ACMIARP/ wARP TextalResolve % backbone correctly placed % amino acids correctly identified

Experimental Accuracy on a Per-Protein Basis ACMI % Cα’s located ARP/wARP % Cα’s located Resolve % Cα’s located Textal % Cα’s located

A CMI Overview Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007) Independent amino-acid search Templates model 5-mer conformational space Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006) Protein structural constraints refine local search Markov field (MRF) models pairwise constraints Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories

Problems with A CMI Biologists want location of all atoms All C α ’s lie on a discrete grid Maximum-marginal backbone model may be physically unrealistic Ignoring a lot of information Multiple models may better represent conformational variation within crystal Probability=0.4Probability=0.35Probability=0.25 Maximum- marginal structure

A CMI with Particle Filtering (A CMI -PF) Idea: Represent protein using a set of static 3D all-atom protein models

Particle Filtering Overview (Doucet et al. 2000) Given some Markov process x 1:K  X with observations y 1:K  Y Particle Filtering approximates some posterior probability distribution over X using a set of N weighted point estimates

Particle Filtering Overview Markov process gives recursive formulation Use importance fn. q(x k |x 0:k-1,y k ) to grow particles Recursive weight update,

Particle Filtering for Protein Structures Particle refers to one specific 3D layout of some subsequence of the protein At each iteration advance particle’s trajectory by placing an additional amino-acid’s atoms

Particle Filtering for Protein Structures Alternate extending chain left and right

Particle Filtering for Protein Structures Alternate extending chain left and right An iteration alternately places C α position b k+1 given b k All sidechain atoms s k given b k-1:k+1 bkbk b k+1 sksk b k-1

Particle Filtering for Protein Structures Key idea: Use the conditional distribution p(b k |b i k-1,Map) to advance particle trajectories Construct this conditional distribution from BP’s marginal distributions bkbk b k+1 sksk b k-1

Algorithm place “seeds” b k i for each particle i=1…N while amino-acids remain place b k i +1 / b j i -1 given b j:k i for each i=1…N place s k i given b k i -1:k+1 for each i=1…N optionally resample N particles end while Particle Filtering for Protein Structures bkbk b k-1 b k+1 sksk … …

Backbone Step (for particle i ) (1) Sample L b k+1 ’s from b k-1 –b k –b k+1 pseudoangle distribution b k b k+1 1…L b k-1 place b k i +1 given b k i for each i=1…N

Backbone Step (for particle i ) p k+1 (b ) k+1k+1 1 k+1k+1 2 k+1k+1 L … b k b k-1 (2) Weight each sample by its ACMI-computed approximate marginal place b k i +1 given b k i for each i=1…N b k+1 1…L

Backbone Step (for particle i ) p k+1 (b ) k+1 1 p k+1 (b ) k+1 2 p k+1 (b ) k+1 L … b k b k-1 (3) Select b k+1 with probability proportional to sample weight place b k i +1 given b k i for each i=1…N b k+1 1…L

Backbone Step (for particle i ) b k-1 b k b k+1 (4) Update particle weight as sum of sample weights place b k i +1 given b k i for each i=1…N

Sidechain Step (for particle i ) place s k i given b k i -1:k+1 for each i=1…N (1) Sample s k from a database of sidechain conformations Protein Data Bank

Sidechain Step (for particle i ) p k (EDM | s ) k 1 k 2 k 3 (2) For each sidechain conformation, compute probability of density map given the sidechain place s k i given b k i -1:k+1 for each i=1…N

Sidechain Step (for particle i ) p k (EDM | s ) k 1 k 3 k 2 (3) Select sidechain conformation from this weighted distribution place s k i given b k i -1:k+1 for each i=1…N

Sidechain Step (for particle i ) (4) Update particle weight as sum of sample weights place s k i given b k i -1:k+1 for each i=1…N

Particle Resampling wt = 0.1 wt = 0.4 wt = 0.3 wt = 0.1 wt = 0.2 wt = 0.1 wt = 0.4 wt = 0.3 wt = 0.1

Amino-Acid Sampling Order Begin at some amino acid k with probability At each step, move left to right with probability j k

Experimental Methodology Run ACMI-PF 10 times with 100 particles each Return highest-weight particle from each run Each run samples amino-acids in a different order Refine each structure for 10 iterations in Refmac5 Compare 10-structure model to others using R free

A CMI -PF Versus A CMI -Naïve Refined R free Number of ACMI-PF runs Additionally, ACMI-PF’s models have … Fewer gaps (10 vs. 28) Lower sidechain RMS error (2.1Å vs. 2.3Å)

A CMI -PF Versus Others ACMI-PF R free ARP/wARP R free Resolve R free Textal R free

A CMI -PF Example: 2A3Q 1.79Å RMSd 92% complete 2.3Å resolution 66° phase err.

A CMI Overview Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007) Independent amino-acid search Templates model 5-mer conformational space Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006) Protein structural constraints refine local search Markov field (MRF) models pairwise constraints Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories Phase 4: Iterative phase improvement Use particle-filtering models to improve density-map quality Rerun entire pipeline on improved density map Repeat until convergence

Phase Problem Intensities Phases Measured by X-ray crystallography Experimentally estimated (e.g. MAD, MIR)

Density-Map Phasing 30°60°75°0° mean phase error

Iterative Phase Improvement Predicted 3D model Initial density map Revised density map

A CMI -PF’s Phase Improvement Error in initial phases (deg. mean phase error) Error in ACMI-PF’s phases (deg. mean phase error)

Two-Iteration A CMI % backbone located Iteration 1 % backbone located Iteration

Future Work: Many-iteration A CMI Number of ACMI iterations Average % uninterpreted AAs Average mean phase error 

Conclusions ACMI’s three steps construct a set of all-atom protein models from a density map Novel message approximation allows inference on large, highly-connected models Resulting protein models are more accurate than other methods

Ongoing and Future Work Incorporate additional structural biology background knowledge Incorporate more complex potential functions Further work on iterative phase improvement Generalize my algorithms to other 3D image data

Acknowledgements Advisor Jude Shavlik Committee George Phillips Charles Dyer David Page Mark Craven Collaborators Ameet Soni Dmitry Kondrashov Eduard Bitto Craig Bingman 6th floor MSCers Center for Eukaryotic Structural Genomics Funding UW-Madison Graduate School NLM 1T15 LM NLM 1R01 LM008796