Free Energy Estimates of All-atom Protein Structures Using Generalized Belief Propagation Kamisetty H., Xing, E.P. and Langmead C.J. Raluca Gordan February.

Slides:

Advertisements

Similar presentations

Bayesian Belief Propagation

Advertisements

Discrete Optimization Lecture 4 – Part 3 M. Pawan Kumar Slides available online

Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.

Section 3: Appendix BP as an Optimization Algorithm 1.

Exact Inference in Bayes Nets

Loopy Belief Propagation a summary. What is inference? Given: –Observabled variables Y –Hidden variables X –Some model of P(X,Y) We want to make some.

Convergent Message-Passing Algorithms for Inference over General Graphs with Convex Free Energies Tamir Hazan, Amnon Shashua School of Computer Science.

Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.

EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.

Probabilistic Inference Lecture 1

Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.

Markov Networks.

Belief Propagation on Markov Random Fields Aggeliki Tsoli.

CS774. Markov Random Field : Theory and Application Lecture 04 Kyomin Jung KAIST Sep

1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.

Global Approximate Inference Eran Segal Weizmann Institute.

Lecture 5: Learning models using EM

Genome evolution: a sequence-centric approach Lecture 5: Undirected models and variational inference.

Conditional Random Fields

Genome evolution: a sequence-centric approach Lecture 6: Belief propagation.

Announcements Readings for today:

Belief Propagation Kai Ju Liu March 9, Statistical Problems Medicine Finance Internet Computer vision.

Bayesian Networks Alan Ritter.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

A Trainable Graph Combination Scheme for Belief Propagation Kai Ju Liu New York University.

Genome Evolution. Amos Tanay 2009 Genome evolution: Lecture 8: Belief propagation.

Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Physics Fluctuomatics (Tohoku University) 1 Physical Fluctuomatics 7th~10th Belief propagation Appendix Kazuyuki Tanaka Graduate School of Information.

Mean Field Inference in Dependency Networks: An Empirical Study Daniel Lowd and Arash Shamaei University of Oregon.

1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)

Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.

1 Structured Region Graphs: Morphing EP into GBP Max Welling Tom Minka Yee Whye Teh.

Probabilistic Graphical Models

Physics Fluctuomatics / Applied Stochastic Process (Tohoku University) 1 Physical Fluctuomatics Applied Stochastic Process 9th Belief propagation Kazuyuki.

Markov Random Fields Probabilistic Models for Images

Conformational Entropy Entropy is an essential component in ΔG and must be considered in order to model many chemical processes, including protein folding,

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Physics Fluctuomatics (Tohoku University) 1 Physical Fluctuomatics 7th~10th Belief propagation Kazuyuki Tanaka Graduate School of Information Sciences,

Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)

1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:

Lecture 2: Statistical learning primer for biologists

Belief Propagation and its Generalizations Shane Oldenburger.

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

Graduate School of Information Sciences, Tohoku University

Pattern Recognition and Machine Learning

Today Graphical Models Representing conditional dependence graphically

1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.

Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Bayesian Belief Propagation for Image Understanding David Rosenberg.

Markov Random Fields in Vision

Perfect recall: Every decision node observes all earlier decision nodes and their parents (along a “temporal” order) Sum-max-sum rule (dynamical programming):

Learning Deep Generative Models by Ruslan Salakhutdinov

Exact Inference Continued

Graduate School of Information Sciences, Tohoku University, Japan

Markov Networks.

CSCI 5822 Probabilistic Models of Human and Machine Learning

Generalized Belief Propagation

Bucket Renormalization for Approximate Inference

Markov Random Fields Presented by: Vladan Radosavljevic.

Graduate School of Information Sciences, Tohoku University

Physical Fluctuomatics 7th~10th Belief propagation

Exact Inference Continued

Expectation-Maximization & Belief Propagation

Probabilistic image processing and Bayesian network

Unifying Variational and GBP Learning Parameters of MNs EM for BNs

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Markov Networks.

Mean Field and Variational Methods Loopy Belief Propagation

Generalized Belief Propagation

Presentation transcript:

Free Energy Estimates of All-atom Protein Structures Using Generalized Belief Propagation Kamisetty H., Xing, E.P. and Langmead C.J. Raluca Gordan February 12, 2008

Papers Free Energy Estimates of All-atom Protein Structures Using Generalized Belief Propagation Kamisetty, H., Xing, E.P. and Langmead C.J. Constructing Free-Energy Approximations and Generalized Belief Propagation Algorithms Yedidia, J.S., Freeman, W.T. and Weiss Y. Understanding Belief Propagation and its Generalizations Yedidia, J.S., Freeman, W.T. and Weiss Y. Bethe free energy, Kikuchi approximations, and belief propagation algorithms Yedidia, J.S., Freeman, W.T. and Weiss Y. Effective energy functions for protein structure prediction Lazaridis, T. and Karplus M.

free energy entropy internal energy Markov random field probabilistic graphical models potential function pair-wise MRF factor graphs region-based free energy region graph belief propagation generalized belief propagation marginal probabilities Gibbs free energy inference Bayes nets enthalpy

free energy entropy internal energy Markov random field probabilistic graphical models potential function pair-wise MRF factor graphs region-based free energy region graph belief propagation generalized belief propagation marginal probabilities Gibbs free energy inference Bayes nets enthalpy

free energy entropy internal energy Markov random field probabilistic graphical models potential function pair-wise MRF factor graphs region-based free energy region graph belief propagation generalized belief propagation marginal probabilities Gibbs free energy inference Bayes nets enthalpy

free energy entropy internal energy Markov random field probabilistic graphical models potential function pair-wise MRF factor graphs region-based free energy region graph belief propagation generalized belief propagation marginal probabilities Gibbs free energy inference Bayes nets enthalpy

Free energy Free energy = the amount of energy in a system which can be converted into work Gibbs free energy = the amount of thermodynamic energy which can be converted into work at constant temperature and pressure Enthalpy = the “heat content” of a system Entropy = a measure of the degree of randomness or disorder of a system G = Gibbs free energy H = enthalpy S = entropy E = internal energy T = temperature P = pressure V = volume Stryer L., Biochemistry (4th Edition) G = H – T·S = (E + P·V) – T·S

Thermodynamics: changes in free energy, entropy, … For nearly all biochemical reactions ΔV is small and ΔH is almost equal to ΔE Hence, we can write: Gibbs free energy (G) Stryer L., Biochemistry (4th Edition) ΔG = ΔH – T·ΔS ΔG = (ΔE + P·ΔV) – T·ΔS ΔG = ΔE – T·ΔS

Free energy functions G = E – T· S Energy functions are used in protein structure prediction, fold recognition, homology modeling, protein design E.g.: approaches to protein structure prediction are based on the thermodynamic hypothesis, which postulates that the native state of a protein is the state of lowest free energy under physiological conditions. The contribution of Kamisetty H., Xing E.P and Langmead, C.J:  the entropy component of their free energy estimate can be used to distinguish native protein structures from decoys (structures with similar internal energy to that of the native structure, but otherwise incorrect)  compute estimates of ΔΔG upon mutation that correlate well with experimental values. Lazaridis T. and Karplus M., Effective energy function for protein structure prediction

Free energy functions G = E – T· S Internal energy functions E  model inter- and intramolecular interactions (e.g. van der Waals, electrostatic, solvent, etc.) Entropy functions S  are harder to compute because they involve sums over an exponential number of terms

The entropy term G = E – T· S Ignore the entropy term + simple - limits the accuracy Use statistical potentials derived from known protein structures (PDB) + these statistics encode both the entropy S and the internal energy E - the interactions are not independent* Model the protein structure as a probabilistic graphical model and use inference-based approaches to estimate the free energy (Kamisetty et al.) + fast and accurate * Thomas P.D. and Dill, K.A., Statistical Potentials Extracted From Protein Structures: How Accurate Are They?

free energy entropy internal energy Markov random field probabilistic graphical models potential function pair-wise MRF factor graphs region-based free energy region graph belief propagation generalized belief propagation marginal probabilities Gibbs free energy inference Bayes nets enthalpy

Probabilistic Graphical Models Are graphs that represent the dependencies among random variables  usually each random variable is a node, and the edges between the nodes represent conditional dependencies E.g.  Bayesian networks  (pair-wise) Markov random fields  Factor graphs

Bayes Nets – random variables – values for the rv Each variable can be in a discrete number of states Arrows - conditional probabilities Each variable is independent of the other variables, given its parents Joint probability: Marginal probability:

Bayes Nets – random variables – values for the rv Each variable can be in a discrete number of states Arrows - conditional probabilities Each variable is independent of the other variables, given its parents Joint probability: Marginal probability: Belief: probability computed approximately

– hidden variables – values for the hidden vars – observed variables compatibility functions (potentials) often called the evidence for for connected vars and Markov Random Fields Overall joint probability: where Z is a normalization constant (also called the partition function) pair-wise MRF because the potential is pair-wise

Factor Graphs Bipartite graph:  – variable nodes ( – values for the vars)  – function (factor) nodes (represent the interactions between variables) The joint probability factors into a product of functions: E.g.:

Factor Graphs Bipartite graph:  – variable nodes ( – values for the vars)  – function (factor) nodes (represent the interactions between variables) The joint probability factors into a product of functions: E.g.:

Graphical Models Bayes nets pair-wise MRF factor graphs Understanding Belief Propagation and its Generalizations Yedidia, J.S., Freeman, W.T. and Weiss Y. (2002)

free energy entropy internal energy Markov random field probabilistic graphical models potential function pair-wise MRF factor graphs region-based free energy region graph belief propagation generalized belief propagation marginal probabilities Gibbs free energy inference Bayes nets enthalpy

Belief Propagation (BP) Marginal probabilities that we compute approximately = beliefs Marginal probability The number of terms in the sums grows exponentially with the number of variables BP is a method for approximating the marginal probabilities in a time that grows linearly with the number of variables (nodes) BP for pwMRFs, BNs or FGs is precisely mathematically equivalent at every iteration of the BP algorithm

Belief Propagation (BP) The message from node to node about the state node should be in.  E.g.: has 3 possible values {1,2,3} and The belief at each node: The message update rule: hidden variables, observed variables compatibility functions (potentials), marginal probabilities

Belief Propagation (BP) The message update rule: The belief at each node:

Belief Propagation (BP) Iterative method When the MRF has no cycles, the beliefs computed using BP are exact! Even when the MRF has cycles, the BP algorithm is still well defined and empirically often gives good approximate answers.

Statistical physics (Boltzmann’s law) Kullback-Leibler distance: KL = 0 iff the beliefs are exact and in this case we have When the beliefs are exact the Gibbs free energy achieves its minimal value (–lnZ, also called the “Helmholz free energy”) Graphical Models and Free Energy

Approximating the Free Energy Approximations  Mean-field free energy approximation uses one-node beliefs and assumes that  Bethe free energy approximation uses one-node beliefs and two-node beliefs  Region-based free energy approximations idea: break up the graph into a set of regions, compute the free energy over each region and then approximate the total free energy by the sum of the free energies over the regions Summations over an exponential number of terms

Generalized Belief Propagation Region-based free energy approximations  idea: break up the graph into a set of regions, compute the free energy over each region and then approximate the total free energy by the sum of the free energies over the regions GBP  a message-passing algorithm similar to BP  messages between regions vs. messages between nodes  the regions of nodes that communicate can be visualized in terms of a region graph (Yedidia, Freeman, Weiss) the region-graph approximation method generalizes the Bethe method, the junction graph method and the cluster variation method different choices of region graphs give different GBP algorithms tradeoff: complexity / accuracy how to optimally choose the regions – more art than science

Generalized Belief Propagation Usually improves on simple BP (when the graph contains cycles) Good advice: when constructing the regions, try to include at least the shortest cycles inside regions For region graphs with no cycles, GBP is guaranteed to work Even when the region graph has cycles, GBP usually gives good results  Constructing Free-Energy Approximations and Generalized Belief Propagation Algorithms Yedidia, J.S., Freeman, W.T. and Weiss Y.

Free Energy Estimates of All-atom Protein Structures Using Generalized Belief Propagation Kamisetty H., Xing, E.P. and Langmead C.J.

Model Model the protein structure as a complex probability distribution, using a pair-wise MRF  observed variables: backbone atom positions (continuous)  hidden variables: side chain atom positions represented using rotamers (discrete)  interactions (edges): two variables share an edge if they are closer than a threshold distance (C α -C α distance < 8Å)  potential functions: where is the energy of interaction between rotamer state of residue and rotamer state of residue

Model

MRF to Factor Graph

Building the Region Graph big regions – 3 or 2 variables small regions – one variable To form the region graph, add edges from each big region to all small regions that contain a strict subset of the big region’s nodes.

Generalized Belief Propagation Choice of regions  Idea: place residues that are closely coupled together in the same big regions  Balance accuracy/complexity  Aji and McEliece “Two-way algorithm” (Yedidia, Freeman, Weiss) Initialize the GBP messages to random starting points and run the algorithm until the beliefs converge or for maximum 100 iterations

Results on the Decoy Datasets 48 datasets Each dataset :  multiple decoys and the native structure of a protein  all decoys had similar backbones to the native structure (C α RMSD < 2.0Å) when ranked in decreasing order of entropy, the native structure is ranked the highest in 87.5% of the datasets PROCHECK (protein structure validation): for the datasets in which the native structure was ranked 3 rd or 4 th, this structure had a very high number of “bad” bond angles For dissimilar backbones: 84% G = E – T· S

Results on the Decoy Datasets Comparison to other energy functions:

Predicting ΔΔG upon mutation

Summary Model protein structures as complex probability distributions, using probabilistic graphical models (MRFs and FGs) Use Generalized Belief Propagation (two-way algorithm) to approximate the free energy Successfully use the method to  distinguish native structures from decoys  predict changes in free energy after mutation Other applications: side chain placement (Yanover and Weiss), other inference problems over the graphical model.

Questions?