Ameet Soni* and Jude Shavlik Dept. of Computer Sciences Dept. of Biostatistics and Medical Informatics Craig Bingman Dept. of Biochemistry Center for Eukaryotic.

Slides:



Advertisements
Similar presentations
Mean-Field Theory and Its Applications In Computer Vision1 1.
Advertisements

Bayesian Belief Propagation
Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,
Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.
Determination of Protein Structure. Methods for Determining Structures X-ray crystallography – uses an X-ray diffraction pattern and electron density.
Exact Inference in Bayes Nets
Protein Backbone Angle Prediction with Machine Learning Approaches by R Kang, C Leslie, & A Yang in Bioinformatics, 1 July 2004, vol 20 nbr 10 pp
Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.
Techniques for Improved Probabilistic Inference in Protein-Structure Determination via X-Ray Crystallography Ameet Soni Department of Computer Sciences.
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.
Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Structural bioinformatics
Carnegie Mellon Focused Belief Propagation for Query-Specific Inference Anton Chechetka Carlos Guestrin 14 May 2010.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
The TEXTAL System: Automated Model-Building Using Pattern Recognition Techniques Dr. Thomas R. Ioerger Department of Computer Science Texas A&M University.
CAPRA: C-Alpha Pattern Recognition Algorithm Thomas R. Ioerger Department of Computer Science Texas A&M University.
The TEXTAL System for Automated Model Building Thomas R. Ioerger Texas A&M University.
Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio Jude Shavlik
Protein Primer. Outline n Protein representations n Structure of Proteins Structure of Proteins –Primary: amino acid sequence –Secondary:  -helices &
Conditional Random Fields
Mark J. van der Woerd1, Donald Estep2, Simon Tavener2, F
Belief Propagation Kai Ju Liu March 9, Statistical Problems Medicine Finance Internet Computer vision.
A Trainable Graph Combination Scheme for Belief Propagation Kai Ju Liu New York University.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Active Learning for Networked Data Based on Non-progressive Diffusion Model Zhilin Yang, Jie Tang, Bin Xu, Chunxiao Xing Dept. of Computer Science and.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
De novo Protein Design Presented by Alison Fraser, Christine Lee, Pradhuman Jhala, Corban Rivera.
Bug Localization with Machine Learning Techniques Wujie Zheng
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical.
High-resolution computational models of genome binding events Yuan (Alan) Qi Joint work with Gifford and Young labs Dana-Farber Cancer Institute Jan 2007.
Gevorg Grigoryan, PhD PROTEINS AS MATRICES. Background: Cells  Nano-Machines  Cells are tiny machines:  sense environment, respond, make decisions.
Probabilistic Methods for Interpreting Electron-Density Maps Frank DiMaio University of Wisconsin – Madison Computer Sciences Department
Considerations for Protein Crystallography (BT Chapter 18) 1.Growing crystals Usually require 0.5mm in shortest dimension, except if using Synchrotron.
Direct Message Passing for Hybrid Bayesian Networks Wei Sun, PhD Assistant Research Professor SFL, C4I Center, SEOR Dept. George Mason University, 2009.
Computing Missing Loops in Automatically Resolved X-Ray Structures Itay Lotan Henry van den Bedem (SSRL)
Ozgur Ozturk, Ahmet Sacan, Hakan Ferhatosmanoglu, Yusu Wang The Ohio State University LFM-Pro: a tool for mining family-specific sites in protein structure.
Background Subtraction based on Cooccurrence of Image Variations Seki, Wada, Fujiwara & Sumi Presented by: Alon Pakash & Gilad Karni.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Patentability Considerations in the 3-D Structure Arts Patentability Considerations in the 3-D Structure Arts Michael P. Woodward Supervisory Patent Examiner.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Automatic Locating of Anthropometric Landmarks on 3D Human Models
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Graphical Models over Multiple Strings Markus Dreyer and Jason Eisner Dept. of Computer Science, Johns Hopkins University EMNLP 2009 Presented by Ji Zongcheng.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
Expectation-Maximization (EM) Algorithm & Monte Carlo Sampling for Inference and Approximation.
CPSC 422, Lecture 17Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17 Oct, 19, 2015 Slide Sources D. Koller, Stanford CS - Probabilistic.
Belief Propagation in Large, Highly Connected Graphs for 3D Part-Based Object Recognition Frank DiMaio and Jude Shavlik Computer Sciences Department University.
Matching Protein  -Sheet Partners by Feedforward and Recurrent Neural Network Proceedings of Eighth International Conference on Intelligent Systems for.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Tightening LP Relaxations for MAP using Message-Passing David Sontag Joint work with Talya Meltzer, Amir Globerson, Tommi Jaakkola, and Yair Weiss.
Ubiquitination Sites Prediction Dah Mee Ko Advisor: Dr.Predrag Radivojac School of Informatics Indiana University May 22, 2009.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
Daphne Koller Overview Maximum a posteriori (MAP) Probabilistic Graphical Models Inference.
Tommy Messelis * Stefaan Haspeslagh Burak Bilgin Patrick De Causmaecker Greet Vanden Berghe *
Free Energy Estimates of All-atom Protein Structures Using Generalized Belief Propagation Kamisetty H., Xing, E.P. and Langmead C.J. Raluca Gordan February.
Stony Brook Integrative Structural Biology Organization
Bucket Renormalization for Approximate Inference
≠ Particle-based Variational Inference for Continuous Systems
Computational Biology
Dr. Thomas R. Ioerger Department of Computer Science
Presentation transcript:

Ameet Soni* and Jude Shavlik Dept. of Computer Sciences Dept. of Biostatistics and Medical Informatics Craig Bingman Dept. of Biochemistry Center for Eukaryotic Structural Genomics Presented at the ACM International Conference on Bioinformatics and Computational Biology 2010 Guiding Belief Propagation using Domain Knowledge for Protein-Structure Determination

2 Protein Structure Determination  Proteins essential to most cellular function  Structural support  Catalysis/enzymatic activity  Cell signaling  Protein structures determine function  X-ray crystallography is main technique for determining structures 2

3 X-ray Crystallography: Background Electron-Density Map (3D Image) Interpret Protein Crystal X-ray Beam Protein Structure 3 Diffraction pattern FFT Collect

4 Task Overview  Given:  A protein sequence  Electron-density map (EDM) of protein  Do:  Automatically produce a protein structure (or trace ) that is All atom Physically feasible 4 SAVRVGLAIM...

5 Challenges & Related Work 1 Å2 Å3 Å4 Å Our Method: ACMI 5 ARP/wARP TEXTAL & RESOLVE

6 ACMI Overview 6 - Background - Inference in ACMI-BP - Guiding Belief Propagation - Experiments & Results

7 Our Technique: ACMI Perform Local Match Apply Global Constraints Sample Structure ACMI-SHACMI-BPACMI-PF 7 p k +1 ( b ) k+ 1 *1 p k +1 ( b ) k+ 1 *2 p k +1 ( b ) k+ 1 *M … b k b k-1 b k+1 *1…M a priori probability of each AA’s location marginal probability of each AA’s location all-atom protein structures

8 Previous Work [DiMaio et al, 2007] 8

9 ACMI Framework Perform Local Match Apply Global Constraints Sample Structure ACMI-SHACMI-BPACMI-PF 9 p k +1 ( b ) k+ 1 *1 p k +1 ( b ) k+ 1 *2 p k +1 ( b ) k+ 1 *M … b k b k-1 b k+1 *1…M a priori probability of each AA’s location marginal probability of each AA’s location all-atom protein structures

10 Inference in ACMI-BP 10 - Background - ACMI Overview - Guiding Belief Propagation - Experiments & Results

11 ACMI-BP 11  ACMI models the probability of all possible traces using a pairwise Markov Random Field (MRF) LEUSERGLYLYSALA

12 ACMI-BP: Pairwise Markov Field LEUSERGLYLYSALA  Model ties adjacency constraints, occupancy constraints, and Phase 1 priors 12

13 Approximate Inference  P(U|M) intractable to calculate, maximize exactly  ACMI-BP uses Loopy Belief Propagation (BP)  Local, message-passing scheme  Distributes evidence between nodes  Approximates marginal probabilities if graph has cycles 13

14 ACMI-BP: Loopy Belief Propagation LYS 31 LEU 32 m LYS31→LEU32 p LEU32 p LYS31 14

15 ACMI-BP: Loopy Belief Propagation LYS 31 LEU 32 m LEU32→LEU31 p LEU32 p LYS31 15

16 Guiding Belief Propagation 16 - Background - ACMI Overview - Inference in ACMI-BP - Experiments & Results

 Best case: wasted resources  Worst case: poor information given more influence Message Scheduling 17 SERLYSALA  Key design choice: message-passing schedule  When BP is approximate, ordering affects solution [Elidan et al, 2006]  ACMI-BP uses a naïve, round-robin schedule

18 Using Domain Knowledge 18  Idea: use expert to assign importance of messages  Biochemist insight: well-structured regions of protein correlate with strong features in density map  eg, helices/strands have stable conformations  Protein disorder - regions of a structure that are unstable/hard to define  ACMI-BP can use disorder to decide importance  Accurate predictors exist based on sequence alone

19 Guided ACMI-BP 19

20 Related Work  Assumption: messages with largest change in value are more useful  Residual Belief Propagation [Elidan et al, UAI 2006]  Calculates residual factor for each node  Each iteration, highest residual node passes messages  General BP technique 20

21 Experiments & Results 21 - Background - ACMI Overview - Inference in ACMI-BP - Guiding Belief Propagation

22 Message Schedulers Tested 22  Our previous technique: naive, round robin (BP)  Our proposed technique: Guidance using disorder prediction (DOBP)  Disorder prediction using DisEMBL [Linding et al, 2003]  Prioritize residues with high stability (ie, low disorder)  Residual factor (RBP) [Elidan et al, 2006]

23 Experimental Methodology  Run whole ACMI pipeline  Phase 1: Local amino-acid finder (prior probabilities)  Phase 2: Either BP, DOBP, or RBP  Phase 3: Sample all-atom structures from Phase 2 results  Test set of 10 poor-resolution electron-density maps  From UW Center for Eukaryotic Structural Genomics  Deemed the most difficult of a large set of proteins 23

24 ACMI-BP Marginal Accuracy 24

25 ACMI-BP Marginal Accuracy 25

26 ACMI-BP Marginal Accuracy 26

27 Protein Structure Results 27  Do these better marginals produce more accurate protein structures?  RBP fails to produce structures in ACMI-PF  Marginals are high in entropy (28.48 vs 5.31)  Insufficient sampling of correct locations

28 Conclusions  Our contribution: framework for utilizing domain knowledge in BP message scheduling  General technique for belief propagation  Alternative to information-based techniques  Our technique improves inference in ACMI  Disorder prediction used in our framework  Residual-based technique fails  Future directions 28

29  Phillips Laboratory at UW - Madison  UW Center for Eukaryotic Structural Genomics (CESG)  NLM R01-LM  NLM Training Grant T15-LM  NIH Protein Structure Initiative Grant GM Thank you! Acknowledgements 29

30 Protein Structures: Background  Building blocks are amino acids (AKA residues )  Chain of amino acids form the primary sequence Alpha Carbon Sidechain Backbone 30

31 Related Work  ARP/wARP [Morris, Perrakis, and Lamzin, 1999]  TEXTAL [Ioerger and Sacchettini, 2003]  RESOLVE [Terwilliger, 2003]  BUCCANEER [Cowtan, 2006] 31

Previous Work ACMI-PF R free ARP/wARP R free Resolve R free Textal R free 32

33 ACMI-SH: Templates …SAW C VKFEKPADKNGKTE… Protein DB 33

34 ACMI-SH: Fast Rotation Search pentapeptide fragment from PDB (the “template”) electron density map calculated (expected) density in 5A sphere map-region sampled in spherical shells template-density sampled in spherical shells sampled region of density in 5A sphere 34

35 ACMI-SH: Fast Rotation Search map-region sampled in spherical shells template-density sampled in spherical shells template spherical- harmonic coefficients map-region spherical- harmonic coefficients correlation coefficient as function of rotation fast-rotation function (Navaza 2006, Risbo 1996) 35

36 Backbone Model Potential Constraints between adjacent amino acids: =x 36

37 Constraints between nonadjacent amino acids: Backbone Model Potential 37

38 Observational (“amino-acid-finder”) probabilities Backbone Model Potential 38

39 Publications 39  F. DiMaio, A. Soni, and J. Shavlik, “Machine learning in structural biology: Interpreting 3D protein images,” in Introduction to Machine Learning and Bioinformatics,  F. DiMaio, A. Soni, G. N. Phillips, and J. Shavlik, “Improved methods for template matchingin electron-density maps using spherical harmonics,” in Proceedings of the 2007 IEEE International Conference on Bioinformatics and Biomedicine  F. DiMaio, A. Soni, G. N. Phillips, and J. Shavlik, “Spherical-harmonic decomposition for molecular recognition in electron-density maps,” International Journal of Data Mining and Bioinformatics,  F. DiMaio, D. Kondrashov, E. Bitto, A. Soni, C. Bingman, G. Phillips, and J. Shavlik, “Creating protein models from electron-density maps using particle-filtering methods,” Bioinformatics,  E. S. Burgie, C. A. Bingman, S. L. Grundhoefer, A. Soni, and G. N. Phillips, Jr., “Structural characterization of Uch37 reveals the basis of its auto-inhibitory mechanism.” In preparation, PDB ID: 3IHR.

40 Testset Density Maps (raw data) Density-map resolution (Å) Density-map mean phase error (deg.)

41 ACMI-PF Overview  Particle refers to one layout of protein subsequence  An iteration alternately places, for each of N particles  C α position b k +1 given b k  All sidechain atoms s k given b k -1: k +1 bkbk b k+1 sksk b k-1 41

42 ACMI-PF: Backbone Step (1) Sample M b k +1 ’s from empirical C  - C  - C  pseudoangle distribution b k b k+1 *1…M b k-1  place b k i +1 given b k i, b k i -1 (for particle i ) 42