Structure Alignment in Polynomial Time Rachel Kolodny Stanford University Nati Linial The Hebrew University of Jerusalem.

Slides:



Advertisements
Similar presentations
Multiple Alignment Anders Gorm Pedersen Molecular Evolution Group
Advertisements

UMass Lowell Computer Science Analysis of Algorithms Prof. Giampiero Pecelli Fall, 2010 Paradigms for Optimization Problems Dynamic Programming.
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Heuristic alignment algorithms and cost matrices
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Multiple alignment June 29, 2007 Learning objectives- Review sequence alignment answer and answer questions you may have. Understand how the E value may.
Agenda A brief introduction The MASS algorithm The pairwise case Extension to the multiple case Experimental results.
Proteins  Proteins control the biological functions of cellular organisms  e.g. metabolism, blood clotting, immune system amino acids  Building blocks.
Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo,
Reading Report Ce WANG A segment alignment approach to protein comparison.
Efficient Nearest-Neighbor Search in Large Sets of Protein Conformations Fabian Schwarzer Itay Lotan.
Protein threading Structure is better conserved than sequence
Similar Sequence Similar Function Charles Yan Spring 2006.
BMI 731 Protein Structures and Related Database Searches.
PAM250. M. Dayhoff Scoring Matrices Point Accepted Mutations or PAM matrices Proteins with 85% identity were used -> the function is not significantly.
NP-complete and NP-hard problems. Decision problems vs. optimization problems The problems we are trying to solve are basically of two kinds. In decision.
Alignment II Dynamic Programming
Bioinformatics Workshop, Fall 2003 Algorithms in Bioinformatics Lawrence D’Antonio Ramapo College of New Jersey.
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
Protein Structure Alignment
15-853:Algorithms in the Real World
1.1 Chapter 1: Introduction What is the course all about? Problems, instances and algorithms Running time v.s. computational complexity General description.
Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
IBGP/BMI 705 Lab 4: Protein structure and alignment TA: L. Cooper.
Pair-wise Sequence Alignment What happened to the sequences of similar genes? random mutation deletion, insertion Seq. 1: 515 EVIRMQDNNPFSFQSDVYSYG EVI.
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
Chapter 9 Superposition and Dynamic Programming 1 Chapter 9 Superposition and dynamic programming Most methods for comparing structures use some sorts.
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
Amino Acid Scoring Matrices Jason Davis. Overview Protein synthesis/evolution Protein synthesis/evolution Computational sequence alignment Computational.
Approximation Algorithms
Construction of Substitution Matrices
Approximation of Protein Structure for Fast Similarity Measures Fabian Schwarzer Itay Lotan Stanford University.
Protein Structure Comparison. Sequence versus Structure The protein sequence is a string of letters: there is an optimal solution (DP) to the problem.
Input Sensitive Algorithms for Multiple Sequence Alignment Pankaj Yonatan University Rachel
Chapter 3 Computational Molecular Biology Michael Smith
1 CPSC 320: Intermediate Algorithm Design and Analysis July 28, 2014.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
CSCI 3160 Design and Analysis of Algorithms Tutorial 10 Chengyu Lin.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Pharm 201 Lecture 10, Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne.
A data-mining approach for multiple structural alignment of proteins WY Siu, N Mamoulis, SM Yiu, HL Chan The University of Hong Kong Sep 9, 2009.
1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu BIOINFORMATICS Structures Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a (last edit.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Database Similarity Search. 2 Sequences that are similar probably have the same function Why do we care to align sequences?
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2015.
MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance Andrew I. Jewett, Conrad C. Huang and Thomas.
Donghyun (David) Kim Department of Mathematics and Computer Science North Carolina Central University 1 Chapter 7 Time Complexity Some slides are in courtesy.
Construction of Substitution matrices
Doug Raiford Phage class: introduction to sequence databases.
Structural alignment methods Like in sequence alignment, try to find best correspondence: –Look at atoms –A 3-dimensional problem –No a priori knowledge.
Dynamic programming with more complex models When gaps do occur, they are often longer than one residue.(biology) We can still use all the dynamic programming.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures Rachel Kolodny Patrice Koehl Michael Levitt Stanford University.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
Local Alignment Vasileios Hatzivassiloglou University of Texas at Dallas.
Sequence Alignment. Assignment Read Lesk, Problem: Given two sequences R and S of length n, how many alignments of R and S are possible? If you.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
Find the optimal alignment ? +. Optimal Alignment Find the highest number of atoms aligned with the lowest RMSD (Root Mean Squared Deviation) Find a balance.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Protein Structure Comparison
Multiple sequence alignment (msa)
Unsupervised Learning
Possibilities and Limitations in Computation
Presentation transcript:

Structure Alignment in Polynomial Time Rachel Kolodny Stanford University Nati Linial The Hebrew University of Jerusalem

Problem Statement 2 structures in R 3 A={a 1,a 2,…,a n }, B={b 1,b 2,…,b m } Find subsequences s a and s b s.t the substructures {a s a (1),a s a (2),…, a s a (l) },{b s b (1),b s b (2),…, b s b (l) } are similar

Motivation Structure is better conserved than amino acid sequence –Structure similarity can give hints to common functionality/origin Allows automatic classification of protein structure

Correspondence  Position Given a correspondence the rotation and translation that minimize the cRMS distance can be calculated Kabsch, W. (1978).

Position  Correspondence Given a rotation and translation one can calculate the alignment that optimizes a (separable) score –Using dynamic programming –Essentially similar to sequence alignment Example score

Score  cRMS We want to give “bonus points” for longer correspondences –e.g. corresponding ONE atom from each structure has 0 cRMS Even better scores ? –vary gap penalty depending on position in structure –Incorporate sequence information

Score  cRMS A specific correspondence

Previous Work Distance MatricesHeuristics in rotation and translation space DALI [Holm and Sander 93] CONGENEAL [Yee & Dill 93] SSAP [Taylor & Orengo 89] Nussinov-Wolfson [89,93] Godzik [93] … STRUCTAL [Subibiah et al 93] COMPARER [Sali & Blundell 90] LOCK [Singh & Brutlag 97] CE [Shindyalov & Bourne 98] Taylor (??) [93] Zu-Kang & Sipppl 96 (?) … *most data taken from Orengo 94

“…It can be proved that, for these reasons, finding an optimal structural alignment between two protein structures is an NP hard problem and thus there are no fast structural alignment algorithms that are guaranteed to be optimal within any given similarity measure…” Adam Godzik ‘The structural alignment between two proteins: Is there a unique answer’ 1996 “There is no exact solution to the protein structure alignment problem, only the best solution for the heuristics used in the calculation.” Shindyalov & Bourne ‘Protein Structure Alignment by Incremental Combinatorial (CE) of the Optimal Path’ 1998

Exponentially many Focus on Scoring Functions

Exponentially many Focus on Scoring Functions

Exponentially many All Maxima are interesting Noisy data !!

Good scoring functions Each of the functions is well-behaved –Satisfies Lipschitz condition Thus, the maximum over a finite set is well-behaved In each dimension two points at distance  have function values that vary by O(n  ) Need O(n) samples in every dimension

Sampling is Sufficient

Polynomial Algorithm Sample in rotation and translation space –compute best score (and alignment) for each sample point Return maximum score Need O(n 6 n 2 ) time and O(n 2 ) space

Internal Distance Matrices Invariant to position and rotation of structures  can be compared directly Find largest common sub-matrices (LCM) whose distances are roughly the same

LCM is NP-complete Harder than MAX- CLIQUE Matrices encode distances that are positive, symmetric and obey triangle inequality

Example 1dme 28 amino acids 1jjd 51 amino acids Best STRUCTAL score 149 Best score found by exhaustive search 197

Heuristic Consider only translations that positions an atom from protein A on an atom of protein B O(m*n) instead of O((n+m) 3 )