Bayesian Refinement of Protein Functional Site Matching

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
Industrial Engineering College of Engineering Bayesian Kernel Methods for Binary Classification and Online Learning Problems Theodore Trafalis Workshop.
Structural Bioinformatics Workshop Max Shatsky Workshop home page:
Department of Computer Science, University of California, Santa Barbara August 11-14, 2003 CTSS: A Robust and Efficient Method for Protein Structure Alignment.
Bayesian Classification of Protein Data Thomas Huber Computational Biology and Bioinformatics Environment ComBinE Department of Mathematics.
Biological Data Mining A comparison of Neural Network and Symbolic Techniques
FLEX* - REVIEW.
The Protein Data Bank (PDB)
1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
A demonstration of clustering in protein contact maps for α-helix pairs Robert Fraser, University of Waterloo Janice Glasgow, Queen’s University.
The Paradigm of Econometrics Based on Greene’s Note 1.
Protein Tertiary Structure Prediction
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
Bayes Factor Based on Han and Carlin (2001, JASA).
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Structural alignments of Proteins using by TOPOFIT method Vitkup D., Melamud E., Moult J., Sander C. Completeness in structural genomics. Nature Struct.
WSEAS AIKED, Cambridge, Feature Importance in Bayesian Assessment of Newborn Brain Maturity from EEG Livia Jakaite, Vitaly Schetinin and Carsten.
Preserving Link Privacy in Social Network Based Systems Prateek Mittal University of California, Berkeley Charalampos Papamanthou.
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
CSE554AlignmentSlide 1 CSE 554 Lecture 5: Alignment Fall 2011.
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Multiuser Detection (MUD) Combined with array signal processing in current wireless communication environments Wed. 박사 3학기 구 정 회.
A 3D Model Alignment and Retrieval System Ding-Yun Chen and Ming Ouhyoung.
Particle Filters for Shape Correspondence Presenter: Jingting Zeng.
Small protein modules with similar 3D structure but different amino acid sequence Institute of Evolution, University of Haifa, ISRAEL Genome Diversity.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
DALI Method Distance mAtrix aLIgnment
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
DDPIn Distance and Density Based Protein Indexing David Hoksza Charles University in Prague Department of Software Engineering Czech Republic.
Pair-wise Structural Comparison using DALILite Software of DALI Rajalekshmy Usha.
Reducing MCMC Computational Cost With a Two Layered Bayesian Approach
Reverse engineering of regulatory networks Dirk Husmeier & Adriano Werhli.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
An Efficient Index-based Protein Structure Database Searching Method 陳冠宇.
Mismatch String Kernals for SVM Protein Classification Christina Leslie, Eleazar Eskin, Jason Weston, William Stafford Noble Presented by Pradeep Anand.
Find the optimal alignment ? +. Optimal Alignment Find the highest number of atoms aligned with the lowest RMSD (Root Mean Squared Deviation) Find a balance.
Structural Bioinformatics Elodie Laine Master BIM-BMC Semester 3, Genomics of Microorganisms, UMR 7238, CNRS-UPMC e-documents:
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
CSE 554 Lecture 8: Alignment
Bioinformatics Overview
Paper – Stephen Se, David Lowe, Jim Little
Special Topics In Scientific Computing
Dynamical Statistical Shape Priors for Level Set Based Tracking
Predicting Active Site Residue Annotations in the Pfam Database
Finding Functionally Significant Structural Motifs in Proteins
Protein Structures.
Homology Modeling.
Protein structure prediction.
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
The Origin of CDR H3 Structural Diversity
謝孫源 (Sun-Yuan Hsieh) 成功大學 電機資訊學院 資訊工程系
Machine Learning: Lecture 6
DALI Method Distance mAtrix aLIgnment
Machine Learning: UNIT-3 CHAPTER-1
Robert Fraser, University of Waterloo
Presentation transcript:

Bayesian Refinement of Protein Functional Site Matching Kanti V Mardia, Vysaul B Nyirongo*, Peter J Green, Nicola D Gold, David R Westhead Presented by Deephan, Mohan

Presentation Flow Background Conventional Methods Bayesian Refinement Results Conclusion Disclaimer : Contrary to the assumption made by the authors, the paper presenter does have a thorough understanding of all the concepts related to the topics of advanced statistical, graph theory and structural genomics discussed in the paper..

Motivation Structural Genomics Structural Site comparison Functional Site comparison Knowledge based methods Similarity Search Algorithms

Protein Functional Site Matching Modeled as a graph theoretic problem Shape analysis of Proteins Crucial for prediction of molecular interactions Infer functional relationship of proteins Classification of Binding Patterns Resource: SITESDB Database Contains Protein Structural data Entries formed from PDB (Protein Data Bank)

The Methodology Graph Similarity Problem Refining the Graph Match Objective: Matching Functional sites -Comparing amino acid configurations (Cα and Cβ atoms) Functional site – Graph Amino acid positions – Vertices Refining the Graph Match Application of Bayesian Strategy Markov Chain Monte Carlo (MCMC) procedure

Need for Bayesian Refinement?? Bayesian Inference: Complete Distribution of matches Solution space Noise Adaptation Flexibility Edge over combinatorial methods

Bayesian Model Common Tool used in Statistical Inference Based on Posterior Joint Distribution Product of Prior density and Likelihood Biologically speaking, Prior Density - Distribution of Transformation Parameters Likelihood - Related to matches between functional sites

Representation and Matching Functional sites X and Y represented as Graphs G1 and G2 Vertex sets V1 = {Xj, j = 1, 2, ..., m} , V2 = {Yk, k = 1, 2, ..., n} Xj , Yk - represents coordinates of amino acids in jth and kth positions of X,Y x1j, y1k – Cα coordinates for X,Y x2j, y2k – Cβ coordinates for X,Y x1 = {x1j : j = 1 ..., m}, x2 = {x2j : j = 1 ..., m} y1 = {y1k : k = 1 ..., n}, y2 = {y2k : k = 1 ..., n}

Graph Theoretic Approach Objective: Creation of Vertex Product Graph (Hv) Hv = G1 ○v G2 VH=V1 x V2 An edge between two vertices vh = (Xj, Yk), vh' = (Xj', Yk') ∈ VH exists for j ≠ j' and k ≠ k' when 1. the absolute difference between distances |x1j - x1j'| and |y1k - y1k'| and 2. also the absolute difference between distances |x2j - x2j'| and | y2k - y2k'| are both less than 1.5Å (matching distance threshold).

Bayesian Alignment Matching between amino acids X and Y represented by matrix M, Mjk = Transformations to bring the configurations into alignment is given by xij = Ayik + τ for Mjk = 1, i = 1, 2 A – Rotation Matrix, τ – Translation vector 1 if jth amino acid corresponds to kth amino acid 0 otherwise

Bayesian Modeling (contd) Joint Posterior Distribution: p(A), p(τ) and p(σ) denote prior distributions for A, τ and σ |A| - Jacobian Transformation presence of Gaussian noise N(0, σ2) in in the atomic positions for x1j and y1k

Bayesian Modeling (contd) Side chains orientation: Extending the model by taking into account the relative orientation of Cα and Cβ in matching amino acids

MCMC Refinement Step Markov Chain Monte Carlo (MCMC) – used to sample the full joint distribution function p(M, A, τ, σ, x1, y1, x2, y2) p(M, A, τ, σ, x1, y1, x2, y2) – function of RMSD and angle for orientation difference between amino acids

Significance of RMSD RMSD – Root Mean Square Distribution Matches of lower RMSD over larger numbers of matching residues are more statistically significant MCMC Refinement improved the RMSD (reduction) and the number of matching residues ( increase)

Decision tree for refining the graph solution by the MCMC method Decision tree for refining the graph solution by the MCMC method. Boxes with curved corners show processes and their output while boxes with sharp corners are for branching conditions. The procedure starts with graph solution MG. The graph solution's RMSD and number of matches are denoted by RMSDG and LG respectively. MCMC is re-iterated until the MCMC solution: MB is better. The RMSD and number of matches for MB are denoted by RMSDB and LB respectively. MB and MG are compared using 1) RMSDs and the number of matches or 2) P-values for MG and MG, denoted by PG and PB respectively.

Results Two Binding Sites: Alcohol dehydrogenase structure (60 amino acids) 17 – β hydroxysteroid dehydrogenase ( 63 amino acids) 4 Matching Studies were performed Each study was performed with and without considering the physico-chemical properties of amino-acids.

Case-I Case 1: Site 1hdx_1 matching against its own SCOP family 125/145 sites produced significant matches – increased to 131/145 (after refinement) RMSD is improved from > 1.5Å to less than 1Å Increase in the number of matching residues

Case 2: 17 – β hydroxysteroid dehydrogenase and family After MCMC Refinement step significant matches increased from 248 to 318 of 326 sites Increased number of matching residues at a similar RMSD RMSD improvement in minority of the sites

Case 3: alcohol dehydrogenase and superfamily Matching sites increased form 200 to 324 Case 4: Alcohol dehydrogenase and FAD/NAD(P)-binding domain 12 sites improved after MCMC refinement

Discussion of Results MCMC refinement step provides significant improvement over Graph Matching Techniques Success – Lack of dependence on strict distance matching criteria Computationally expensive Refinements adapts to shape variations in binding sites

Thank You!!!!