Protein Structure Alignment

Slides:



Advertisements
Similar presentations
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Advertisements

A 3-D reference frame can be uniquely defined by the ordered vertices of a non- degenerate triangle p1p1 p2p2 p3p3.
Seminar in structural bioinformatics Multiple structural alignment of proteins By Elad Kaspani.
Protein Tertiary Structure Prediction
Structural bioinformatics
1 September, 2004 Chapter 5 Macromolecular Structure.
Protein Structure Alignment Human Myoglobin pdb:2mm1 Human Hemoglobin alpha-chain pdb:1jebA Sequence id: 27% Structural id: 90% Another example: G-Proteins:
Tertiary protein structure viewing and prediction July 1, 2009 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
Structural Bioinformatics Workshop Max Shatsky Workshop home page:
Docking Algorithm Scheme Part 1: Molecular shape representation Part 2: Matching of critical features Part 3: Filtering and scoring of candidate transformations.
Protein Structure, Databases and Structural Alignment
Alignment of Flexible Molecular Structures. Motivation Proteins are flexible. One would like to align proteins modulo the flexibility. Hinge and shear.
Agenda A brief introduction The MASS algorithm The pairwise case Extension to the multiple case Experimental results.
Tertiary protein structure viewing and prediction July 5, 2006 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Proteins  Proteins control the biological functions of cellular organisms  e.g. metabolism, blood clotting, immune system amino acids  Building blocks.
Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo,
FLEX* - REVIEW.
Structural Bioinformatics Workshop Max Shatsky Workshop home page:
The Protein Data Bank (PDB)
Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia.
Object Recognition. Geometric Task : find those rotations and translations of one of the point sets which produce “large” superimpositions of corresponding.
A unified statistical framework for sequence comparison and structure comparison Michael Levitt Mark Gerstein.
1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M.
Structural Bioinformatics Seminar Dina Schneidman
MASS and MultiProt methods. Problem Definition Input: a collection of 3D protein structures Goal: find substructures common to two or more proteins.
1 Seminar in structural bioinformatics Pairwise Structural Alignment Presented by: Dana Tsukerman.
Model Database. Scene Recognition Lamdan, Schwartz, Wolfson, “Geometric Hashing”,1988.
Protein Structure Alignment
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.
Protein Tertiary Structure Prediction
Structural alignments of Proteins using by TOPOFIT method Vitkup D., Melamud E., Moult J., Sander C. Completeness in structural genomics. Nature Struct.
Chapter 9 Superposition and Dynamic Programming 1 Chapter 9 Superposition and dynamic programming Most methods for comparing structures use some sorts.
Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
EECS 730 Introduction to Bioinformatics Structure Comparison Luke Huan Electrical Engineering and Computer Science
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
1 Enter the following Micro-RNA sequence into the box Run MFold and look at the results MFold Using MFold to predict RNA secondary structure
Protein Structure Comparison. Sequence versus Structure The protein sequence is a string of letters: there is an optimal solution (DP) to the problem.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein Strucure Comparison Chapter 6,7 Orengo. Helices α-helix4-turn helix, min. 4 residues helix3-turn helix, min. 3 residues π-helix5-turn helix,
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
A data-mining approach for multiple structural alignment of proteins WY Siu, N Mamoulis, SM Yiu, HL Chan The University of Hong Kong Sep 9, 2009.
Raquel A. Romano 1 Scientific Computing Seminar May 12, 2004 Projective Geometry for Computer Vision Projective Geometry for Computer Vision Raquel A.
Pair-wise Structural Comparison using DALILite Software of DALI Rajalekshmy Usha.
MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance Andrew I. Jewett, Conrad C. Huang and Thomas.
Structural alignment methods Like in sequence alignment, try to find best correspondence: –Look at atoms –A 3-dimensional problem –No a priori knowledge.
How to detect the change of model for fitting. 2 dimensional polynomial 3 dimensional polynomial Prepare for simple model (for example, 2D polynomial.
Jürgen Sühnel Supplementary Material: 3D Structures of Biological Macromolecules Exercise 1:
Topics in bioinformatics CS697 Spring 2011 Class 12 – Mar Molecular distance measurements Molecular transformations.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
An Efficient Index-based Protein Structure Database Searching Method 陳冠宇.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Local Flexibility Aids Protein Multiple Structure Alignment Matt Menke Bonnie Berger Lenore Cowen.
Find the optimal alignment ? +. Optimal Alignment Find the highest number of atoms aligned with the lowest RMSD (Root Mean Squared Deviation) Find a balance.
Structural Bioinformatics Elodie Laine Master BIM-BMC Semester 3, Genomics of Microorganisms, UMR 7238, CNRS-UPMC e-documents:
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches.
Protein Structure Comparison
Generating, Maintaining, and Exploiting Diversity in a Memetic Algorithm for Protein Structure Prediction Mario Garza-Fabre, Shaun M. Kandathil, Julia.
Protein Structures.
Volume 19, Issue 7, Pages (July 2011)
Protein structure prediction.
Volume 109, Issue 6, Pages (September 2015)
Robert Fraser, University of Waterloo
Protein structure prediction
Peter König, Rafael Giraldo, Lynda Chapman, Daniela Rhodes  Cell 
Presentation transcript:

Protein Structure Alignment Human Hemoglobin alpha-chain pdb:1jebA Human Myoglobin pdb:2mm1 Another example: G-Proteins: 1c1y:A, 1kk1:A6-200 Sequence id: 18% Structural id: 72% Sequence id: 27% Structural id: 90%

Transformations Translation Translation and Rotation Rigid Motion (Euclidian Trans.) Translation, Rotation + Scaling

Inexact Alignment. Simple case – two closely related proteins with the same number of amino acids. T Question: how to measure an alignment error?

Distance Functions Two point sets: A={ai} i=1…n B={bj} j=1…m Pairwise Correspondence: (ak1,bt1) (ak2,bt2)… (akN,btN) (1) Exact Matching: ||aki – bti||=0 (2) Bottleneck max ||aki – bti|| (3) RMSD (Root Mean Square Distance) Sqrt( Σ||aki – bti||2/N)

Superposition - best least squares (RMSD – Root Mean Square Deviation) Given two sets of 3-D points : P={pi}, Q={qi} , i=1,…,n; rmsd(P,Q) = √ S i|pi - qi |2 /n Find a 3-D rigid transformation T* such that: rmsd( T*(P), Q ) = minT √ S i|T(pi) - qi |2 /n A closed form solution exists for this task. It can be computed in O(n) time.

Correspondence is Unknown Given two configurations of points in the three dimensional space, T find those rotations and translations of one of the point sets which produce “large” superimpositions of corresponding 3-D points.

A 3-D reference frame can be uniquely defined by the ordered vertices of a non-degenerate triangle p1 p2 p3

Sequence Based Structure Alignment Run pairwise sequence alignment. Based on sequence correspondence compute 3D transformation (least square fit can be applied). Iteratively improve structural superposition. Not a good approach – sequence alignment can be incorrect.

Structure Alignment (Straightforward Algorithm) For each pair of triplets, one from each molecule which define ‘almost’ congruent triangles compute the rigid transformation that superimposes them. Count the number of aligned point pairs and sort the hypotheses by this number.

Complexity : O(n3m3 ) * O(nm) . For the highest ranking hypotheses improve the transformation by replacing it by the best RMSD transformation for all the matching pairs. Complexity : O(n3m3 ) * O(nm) . Applying 3D grid gives practically O(n3m3) * O(n) If one exploits protein backbone geometry + 3D grid : O(nm) * O(n)

Structural Alignment Approaches Two interrelated problems: 3D transformation and point correspondence (matching, alignment) Some methods: Generate a set of 3D transformations. Cluster similar transformations. Compute 3D alignment for each cluster representative. Generate a set of 3D transformations. Compute 3D alignment for each transformation. Geometric Hashing: Combines transformation and correspondence detection in one scheme.

Accuracy improvement during detection of 3D transformation. Instead of 3 points use more. How many? Align any possible pair of fragments - Fij(k) i+k-1 j+k-1 i j

Accept Fij(k) if rmsd(Fij(k)) <e. Complexity O(n3 n) * O(n) (assume n~m) (For each Fij(k) we need compute its rmsd) can be reduced to O(n3) * O(n)

Improvement : BLAST idea - detect short similar fragments, then extend as much as possible. k+l-1 t+l-1 k t i-1 i+1 i j-1 j+1 j ai-1 ai ai+1 bj-1 bj bj+1 Extend while: rmsd(Fij(k)) <e. Complexity: O(n2)*O(n)

Sequence-order Independent Alignment

4-helix bundle 2cbl:A 1f4n:A 1rhg:A 1b3q

Sequence Order Independent Alignment

Sequence Order Independent Alignment 2cbl:A 1f4n 1rhg:A 1b3q 51 103 113 169 chain A chain B 3 58 54 7 73 126 171 147 34 12 chain A chain B 306 355 354 305

The C2 domain calcium-binding motif E. A. NALEFSKI and J. J. FALKE The C2 domain calcium-binding motif: Structural and functional diversity Protein Sci 1996 5: 2375-2390

TRAF-Immunoglobulin Ensemble E- strand Ensemble: 8 proteins from 2 folds. Core: sandwich of 6 strands Runtime: 21 seconds - helices ; - strands

Some Links Rasmol – Molecular Visualization SCOP - Structural Classification of Proteins FlexProt  (pairwise flexible alignment) MultiProt  (multiple structural alignment) MASS  (multiple structural alignment by secondary structures) PatchDock  (molecule docking) SiteEngine  (recognition of functional sites in protein structures) 3D-Jury Protein Structure Prediction