MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance Andrew I. Jewett, Conrad C. Huang and Thomas.

Slides:



Advertisements
Similar presentations
Multiple Sequence Alignment Dynamic Programming. Multiple Sequence Alignment VTISCTGSSSNIGAG  NHVKWYQQLPG VTISCTGTSSNIGS  ITVNWYQQLPG LRLSCSSSGFIFSS.
Advertisements

Parallel BioInformatics Sathish Vadhiyar. Parallel Bioinformatics  Many large scale applications in bioinformatics – sequence search, alignment, construction.
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Optimal Sum of Pairs Multiple Sequence Alignment David Kelley.
Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically.
Structural bioinformatics
Tertiary protein structure viewing and prediction July 1, 2009 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
©CMBI 2005 Sequence Alignment In phylogeny one wants to line up residues that came from a common ancestor. For information transfer one wants to line up.
Heuristic alignment algorithms and cost matrices
Drug Docking Computer Science Department Brigham Young University Andrew Harris Mark Clement Kenneth Sundberg.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.
Tertiary protein structure viewing and prediction July 5, 2006 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo,
Protein Structure Space Patrice Koehl Computer Science and Genome Center
Appendix: Automated Methods for Structure Comparison Basic problem: how are any two given structures to be automatically compared in a meaningful way?
The Protein Data Bank (PDB)
Tertiary protein structure modelling May 31, 2005 Graded papers will handed back Thursday Quiz#4 today Learning objectives- Continue to learn how to manipulate.
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment III CIS 667 February 10, 2004.
BNFO 602 Multiple sequence alignment Usman Roshan.
Optimal binary search trees
Sequence similarity. Motivation Same gene, or similar gene Suffix of A similar to prefix of B? Suffix of A similar to prefix of B..Z? Longest similar.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Model Database. Scene Recognition Lamdan, Schwartz, Wolfson, “Geometric Hashing”,1988.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Developing Pairwise Sequence Alignment Algorithms
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Protein Tertiary Structure Prediction
Chapter 9 Superposition and Dynamic Programming 1 Chapter 9 Superposition and dynamic programming Most methods for comparing structures use some sorts.
Protein Sequence Alignment and Database Searching.
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Construction of Substitution Matrices
Protein Structure Comparison. Sequence versus Structure The protein sequence is a string of letters: there is an optimal solution (DP) to the problem.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Conformational Space.  Conformation of a molecule: specification of the relative positions of all atoms in 3D-space,  Typical parameterizations:  List.
Protein Strucure Comparison Chapter 6,7 Orengo. Helices α-helix4-turn helix, min. 4 residues helix3-turn helix, min. 3 residues π-helix5-turn helix,
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Protein motif extraction with neuro-fuzzy optimization Bill C. H. Chang and Author : Bill C. H. Chang and Saman K. Halgamuge Saman K. Halgamuge Adviser.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Pair-wise Structural Comparison using DALILite Software of DALI Rajalekshmy Usha.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Construction of Substitution matrices
Structural alignment methods Like in sequence alignment, try to find best correspondence: –Look at atoms –A 3-dimensional problem –No a priori knowledge.
Jürgen Sühnel Supplementary Material: 3D Structures of Biological Macromolecules Exercise 1:
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
Lab Meeting 10/08/20041 SuperPose: A Web Server for Automated Protein Structure Superposition Gary Van Domselaar October.
An Efficient Index-based Protein Structure Database Searching Method 陳冠宇.
Protein Tertiary Structure Prediction Structural Bioinformatics.
The Big Picture Chapter 3. A decision problem is simply a problem for which the answer is yes or no (True or False). A decision procedure answers a decision.
Find the optimal alignment ? +. Optimal Alignment Find the highest number of atoms aligned with the lowest RMSD (Root Mean Squared Deviation) Find a balance.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Lab Lab 10.2: Homology Modeling Lab Boris Steipe Departments of Biochemistry and.
Protein Structures.
Sequence Alignment Algorithms Morten Nielsen BioSys, DTU
Volume 26, Issue 1, Pages e3 (January 2018)
Presentation transcript:

MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance Andrew I. Jewett, Conrad C. Huang and Thomas E. Ferrin BIOINFORMATICS Vol. 19 no. 5 2003, pages 625–634

ABSTRACT Motivation: Existing algorithms for automated protein structure alignment generate contradictory results and are difficult to interpret. An algorithm which can provide a context for interpreting the alignment and uses a simple method to characterize protein structure similarity is needed. Results: We describe a heuristic for limiting the search space for structure alignment comparisons between two proteins, and an algorithm for finding minimal root-mean-squared- distance (RMSD) alignments as a function of the number of matching residue pairs within this limited search space. Our alignment algorithm uses coordinates of alphacarbon atoms to represent each amino acid residue and requires a total computation time of , where m and n denote the lengths of the protein sequences.

ABSTRACT This makes our method fast enough for comparisons of moderatesize proteins (fewer than ∼800 residues) on current workstation-class computers and therefore addresses the need for a systematic analysis of multiple plausible shape similarities between two proteins using a widely accepted comparison metric.

RMSD (Root Mean Square Deviation)

RMSD (Root Mean Square Distance) A B

Basic Algorithm (1) generate the set of initial superpositions to evaluate; (2) for each candidate superposition of the two molecules, identify the minimal RMSD alignments; (3) optimize the superposition of the molecules based on the best alignments obtained in step two.

Sampling Superposition Space To generate the set of candidate superpositions, we superpose all fragments of four consecutive residues from one structure onto all fragments of four consecutive residues from the second structure. To superpose the two fragments, we use Diamond’s method (Diamond, 1988) to align the alpha-carbon atoms from each of the four pairs of residues. The number of superpositions that need to be evaluated is (n − 4) × (m − 4)

Identifying Matching Residue Pairs To compute the residue equivalences, we use a dynamic programming algorithm similar to the algorithm by Needleman and Wunsch (1970). Let denote the minimum possible sum-squared-distance between N pairs of matched residues, considering only the first i residues from structure A, and the first j residues from structure B.

The lowest layer represents the score matrix for matching only one pair of residues; each layer above represents the score matrix for matching an additional residue pair and is computed using the data from the layer below

Let denote the minimum possible sum-squared-distance between N pairs of matched residues, considering only the first i residues from structure A, and the first j residues from structure B. The minimum RMSD of any alignment containing N pairs of matched residues at this superposition is then

Complexity analysis two structures : n and m residues(m ≤ n) match pairs : N ≤ m total complexity of our algorithm is