Protein Structure Comparison. Sequence versus Structure The protein sequence is a string of letters: there is an optimal solution (DP) to the problem.

Slides:



Advertisements
Similar presentations
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Advertisements

PCA + SVD.
3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ.
Slides by Olga Sorkine, Tel Aviv University. 2 The plan today Singular Value Decomposition  Basic intuition  Formal definition  Applications.
Protein Structure Alignment Human Myoglobin pdb:2mm1 Human Hemoglobin alpha-chain pdb:1jebA Sequence id: 27% Structural id: 90% Another example: G-Proteins:
Bioinformatics master course DNA/Protein structure-function analysis and prediction Lecture 5: Protein Fold Families Jaap Heringa Integrative Bioinformatics.
Protein structure (Part 2 of 2).
Agenda A brief introduction The MASS algorithm The pairwise case Extension to the multiple case Experimental results.
Proteins  Proteins control the biological functions of cellular organisms  e.g. metabolism, blood clotting, immune system amino acids  Building blocks.
Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo,
Protein Structure Space Patrice Koehl Computer Science and Genome Center
Appendix: Automated Methods for Structure Comparison Basic problem: how are any two given structures to be automatically compared in a meaningful way?
The Protein Data Bank (PDB)
Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia.
ProteinStructuralDatabases. Proteins are built from amino-acids. Introduction H | NH2-c-CO2H | R.
Protein threading Structure is better conserved than sequence
Protein structure Classification Ole Lund, Associate professor, CBS, DTU.
BMI 731 Protein Structures and Related Database Searches.
Structure Alignment in Polynomial Time Rachel Kolodny Stanford University Nati Linial The Hebrew University of Jerusalem.
Protein Structure Alignment
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structure Prediction and Analysis
IBGP/BMI 705 Lab 4: Protein structure and alignment TA: L. Cooper.
Chapter 12 Protein Structure Basics. 20 naturally occurring amino acids Free amino group (-NH2) Free carboxyl group (-COOH) Both groups linked to a central.
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
CSE554AlignmentSlide 1 CSE 554 Lecture 8: Alignment Fall 2014.
Chapter 9 Superposition and Dynamic Programming 1 Chapter 9 Superposition and dynamic programming Most methods for comparing structures use some sorts.
Protein Structure Similarity
Bioinformatics master course DNA/Protein structure-function analysis and prediction Lecture 5: Protein Fold Families Centre for Integrative Bioinformatics.
Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
1 Randomized Algorithms for Three Dimensional Protein Structures Comparison Yaw-Ling Lin Dept Computer Sci and Info Engineering, Providence University,
EECS 730 Introduction to Bioinformatics Structure Comparison Luke Huan Electrical Engineering and Computer Science
CSE554AlignmentSlide 1 CSE 554 Lecture 5: Alignment Fall 2011.
CATH – a hierarchic classification of protein domain structures Rui Kuang.
PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)
Approximation of Protein Structure for Fast Similarity Measures Fabian Schwarzer Itay Lotan Stanford University.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Conformational Space.  Conformation of a molecule: specification of the relative positions of all atoms in 3D-space,  Typical parameterizations:  List.
Protein Strucure Comparison Chapter 6,7 Orengo. Helices α-helix4-turn helix, min. 4 residues helix3-turn helix, min. 3 residues π-helix5-turn helix,
DALI Method Distance mAtrix aLIgnment
CSE554AlignmentSlide 1 CSE 554 Lecture 8: Alignment Fall 2013.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Pharm 201 Lecture 10, Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne.
A data-mining approach for multiple structural alignment of proteins WY Siu, N Mamoulis, SM Yiu, HL Chan The University of Hong Kong Sep 9, 2009.
1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu BIOINFORMATICS Structures Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a (last edit.
Pair-wise Structural Comparison using DALILite Software of DALI Rajalekshmy Usha.
MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance Andrew I. Jewett, Conrad C. Huang and Thomas.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
Topics in bioinformatics CS697 Spring 2011 Class 12 – Mar Molecular distance measurements Molecular transformations.
Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures Rachel Kolodny Patrice Koehl Michael Levitt Stanford University.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
Lab Meeting 10/08/20041 SuperPose: A Web Server for Automated Protein Structure Superposition Gary Van Domselaar October.
An Efficient Index-based Protein Structure Database Searching Method 陳冠宇.
Local Flexibility Aids Protein Multiple Structure Alignment Matt Menke Bonnie Berger Lenore Cowen.
Find the optimal alignment ? +. Optimal Alignment Find the highest number of atoms aligned with the lowest RMSD (Root Mean Squared Deviation) Find a balance.
1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu Several motifs (  -sheet, beta-alpha-beta, helix-loop-helix) combine to form a compact globular.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
Structural Bioinformatics Elodie Laine Master BIM-BMC Semester 3, Genomics of Microorganisms, UMR 7238, CNRS-UPMC e-documents:
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches.
CSE 554 Lecture 8: Alignment
Chapter 14 Protein Structure Classification
Protein Structure Comparison
Finding Functionally Significant Structural Motifs in Proteins
DALI Method Distance mAtrix aLIgnment
Protein structure prediction
Presentation transcript:

Protein Structure Comparison

Sequence versus Structure The protein sequence is a string of letters: there is an optimal solution (DP) to the problem of string matching, given a scoring scheme The protein structure is a 3D shape: the goal is to find algorithms similar to DP that finds the optimal match between two shapes.

Protein Structure Comparison Global versus local alignment Measuring protein shape similarity Protein structure superposition Protein structure alignment

Global versus Local Global alignment

Global versus Local (2) Local alignment motif

Measuring protein structure similarity Visual comparison Dihedral angle comparison Distance matrix RMSD (root mean square distance) Given two “shapes” or structures A and B, we are interested in defining a distance, or similarity measure between A and B. Is the resulting distance (similarity measure) D a metric? D(A,B) ≤ D(A,C) + D(C,B)

Comparing dihedral angles Torsion angles (  ) are: - local by nature - invariant upon rotation and translation of the molecule - compact (O(n) angles for a protein of n residues) Add 1 degree To all  But…

Distance matrix

Distance matrix (2) Advantages - invariant with respect to rotation and translation - can be used to compare proteins Disadvantages - the distance matrix is O(n2) for a protein with n residues - comparing distance matrix is a hard problem - insensitive to chirality

Root Mean Square Distance (RMSD) To compare two sets of points (atoms) A={a 1, a 2, …a N } and B={b 1, b 2, …,b N }: -Define a 1-to-1 correspondence between A and B for example, a i corresponds to b i, for all i in [1,N] -Compute RMS as: d(A i,B i ) is the Euclidian distance between a i and b i.

Protein Structure Superposition Simplified problem: we know the correspondence between set A and set B We wish to compute the rigid transformation T that best align a 1 with b 1, a 2 with b 2, …, a N with b N The error to minimize is defined as: Old problem, solved in Statistics, Robotics, Medical Image Analysis, …

Protein Structure Superposition (2) A rigid-body transformation T is a combination of a translation t and a rotation R: T(x) = Rx+t The quantity to be minimized is :

The translation part E is minimum with respect to t when: Then: If both data sets A and B have been centered on 0, then t = 0 ! Step 1: Translate point sets A and B such that their centroids coincide at the origin of the framework

The rotation part (1) Let  A and  B be then barycenters of A and B, and A’ and B’ the matrices containing the coordinates of the points of A and B centered on O: Build covariance matrix: x= 3x3 3xN Nx3

The rotation part (2) U and V are orthogonal matrices, and D is a diagonal matrix containing the singular values. U, V and D are 3x3 matrices Compute SVD (Singular Value Decomposition) of C: Define S by: Then

2. Build covariance matrix: The algorithm 1. Center the two point sets A and B 3. Compute SVD (Singular Value Decomposition) of C: 5. Compute rotation matrix 4. Define S: O(N) in time! 6. Compute RMSD:

Example 1: NMR structures Superposition of NMR Models 1AW6

Example 2: Calmodulin Two forms of calcium-bound Calmodulin: Ligand free Complexed with trifluoperazine

Example 2: Calmodulin Global alignment: RMSD =15 Å /143 residues Local alignment: RMSD = 0.9 Å/ 62 residues

RMSD is not a Metric cRMS = 2.8 Ǻ cRMS = 2.85 Ǻ

Protein Structure Alignment Protein Structure Superposition Problem: Given two sets of points A=(a1, a2, …, an) and B=(b1,b2,…bm) in 3D space, find the optimal subsets A(P) and B(Q) with |A(P)|=|B(Q)|, and find the optimal rigid body transformation Gopt between the two subsets A(P) and B(Q) that minimizes a given distance metric D over all possible rigid body transformation G, i.e. The two subsets A(P) and B(Q) define a “correspondence”, and p = |A(P)|=|B(Q)| is called the correspondence length.

Two Subproblems 1.Find correspondence set 2.Find alignment transform (protein superposition problem)

Existing Software DALI (Holm and Sander, 1993) SSAP (Orengo and Taylor, 1989) STRUCTAL (Levitt et al, 1993) VAST [Gibrat et al., 1996] LOCK [Singh and Brutlag, 1996] CE [ Shindyalov and Bourne, 1998] SSM [Krissinel and Henrik, 2004] …

Trial-and-Error Approach to Protein Structure Alignment Iterate N times: 1.Set Correspondence C to a seed correspondence set (small set sufficient to generate an alignment transform) 2.Compute the alignment transform G for C and apply G to the second protein B 3.Update C to include all pairs of features that are close apart 4.If C has changed, then return to Step 2

Protein Structure Classification

Why Classifying ? Standard in biology: Aristotle: Plants and Animal Linnaeus: binomial system Darwin: systematic classification that reveals phylogeny It is easier to think about a representative than to embrace the information of all individuals

Protein Structure Classification Domain Definition 3 Major classifications - SCOP - CATH - DDD

Protein Structural Domains

Protein Domain: Definitions 1)Regions that display significant levels of sequence similarity 2)The minimal part of a gene that is capable of performing a function 3)A region of a protein with an experimentally assigned function 4)Region of a protein structure that recurs in different contexts and proteins 5)A compact, spatially distinct region of a protein

ProgramWeb access DIALhttp:// DomainParser DOMAKhttp:// PDPhttp://123d.ncifcrf.gov/pdp.html Web services for domain identification

Protein Structure Space 1CTF1TIM1K3R 1A1O1NIK1AON 68 AA247 AA268 AA 384 AA4504 AA8337 AA

Current state of the PDB

Classification of Protein Structure: SCOP

Classification of Protein Structure: SCOP SCOP is organized into 4 hierarchical layers: (1) Classes:

3) Superfamily: Probable common evolutionary origin Proteins that have low sequence identities, but whose structural and functional features suggest that a common evolutionary origin is probable are placed together in superfamilies 4) Family: Clear evolutionarily relationship Proteins clustered together into families are clearly evolutionarily related. Generally, this means that pairwise residue identities between the proteins are 30% and greater Classification of Protein Structure: SCOP (2) Folds: Major structural similarity Proteins are defined as having a common fold if they have the same major secondary structures in the same arrangement and with the same topological connections

Classification of Protein Structure: SCOP

Classification of Protein Structure: CATH

Classification of Protein Structure: CATH C A T Alpha Mixed Alpha Beta Sandwich Tim Barrel Other Barrel Super Roll Barrel

Classification of Protein Structure: CATH

The DALI Database

The DALI Domain Dictionary All-against-all comparison of PDB90 using DALI Define score of each pair as a Z-score Regroup proteins based on pair-wise score: –Z-score > 2: “Folds” –Z-score >4, 6, 8, 10 : sub-groups of “folds” (different from Families, and sub-families!)

Summary Classification is an important part of biology; protein structures are not exempt Prior to being classified, proteins are cut into domains While all structural biologists agree that proteins are usually a collection of domains, there is no consensus on how to delineate the domains There are three main protein structure classification: - SCOP (manual) source of evolutionary information - CATH (semi-automatic) source of geometric information - Dali (automatic) source of raw data