Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.

Slides:



Advertisements
Similar presentations
Bioinformatics Tutorial I BLAST and Sequence Alignment.
Advertisements

Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Structural bioinformatics
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Heuristic alignment algorithms and cost matrices
CISC667, F05, Lec21, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction 3-Dimensional Structure.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
Discovery of RNA Structural Elements Using Evolutionary Computation Authors: G. Fogel, V. Porto, D. Weekes, D. Fogel, R. Griffey, J. McNeil, E. Lesnik,
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Tertiary protein structure modelling May 31, 2005 Graded papers will handed back Thursday Quiz#4 today Learning objectives- Continue to learn how to manipulate.
Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment III CIS 667 February 10, 2004.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Heuristic methods for sequence alignment in practice Sushmita Roy BMI/CS 576 Sushmita Roy Sep 27 th,
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.
Protein modelling ● Protein structure is the key to understanding protein function ● Protein structure ● Topics in modelling and computational methods.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
COMPARATIVE or HOMOLOGY MODELING
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
Representations of Molecular Structure: Bonds Only.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Rotamer Packing Problem: The algorithms Hugo Willy 26 May 2010.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Chapter 3 Computational Molecular Biology Michael Smith
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
DALI Method Distance mAtrix aLIgnment
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
JM - 1 Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction Jarek Meller Jarek Meller Division.
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Protein Structure Prediction Graham Wood Charlotte Deane.
Database Similarity Search. 2 Sequences that are similar probably have the same function Why do we care to align sequences?
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2015.
Step 3: Tools Database Searching
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2010.
Dynamic programming with more complex models When gaps do occur, they are often longer than one residue.(biology) We can still use all the dynamic programming.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
Protein Tertiary Structure Prediction Structural Bioinformatics.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Heuristic Methods for Sequence Database Searching BMI/CS 776 Mark Craven February 2002.
3.3b1 Protein Structure Threading (Fold recognition) Boris Steipe University of Toronto (Slides evolved from original material.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
Sequence Alignment 11/24/2018.
Protein Folding and Protein Threading
Protein Structures.
3-Dimensional Structure
Protein structure prediction.
Sequence alignment, E-value & Extreme value distribution
Protein structure prediction
Presentation transcript:

Protein Structure Prediction: Threading and Rosetta BMI/CS Colin Dewey Fall 2008

Protein Threading generalization of homology modeling –homology modeling: align sequence to sequence –threading: align sequence to structure (templates) key ideas –limited number of basic folds found in nature –amino acid preferences for different structural environments provides sufficient information to choose among folds

Components of a Threading Approach library of core fold templates objective function to evaluate any particular placement of a sequence in a core template method for searching over space of alignments between sequence and each core template method for choosing the best template given alignments

A Core Template core secondary structure segments loops protein A threaded on template protein B threaded on template Figure from R. Lathrop et al. Analysis and Algorithms for Protein Sequence-Structure Alignment, in Computational Methods in Molecular Biology, Salzberg et al. editors, template

Task Definition: Prediction Via Threading given: –a protein sequence –a library of core templates return: the best alignment of the sequence to a template

Prediction Via Threading do this by finding the best alignment of the sequence to each template

Task Definition: Threading Search given: –a protein sequence –a single template return: the best alignment of the sequence to the template

Threading Objective Functions possible sequence/template alignments are scored using a specified objective function the objective function scores the sequence/structure compatibility between –sequence amino acids –their corresponding positions in a core template it takes into account factors such as –a.a. preferences for solvent accessibility –a.a. preferences for particular secondary structures –interactions among spatially neighboring a.a.’s

Core Template with Interactions small circles represent amino acid positions thin lines indicate interactions represented in model Figure from R. Lathrop et al. Analysis and Algorithms for Protein Sequence-Structure Alignment. first amino acid in L interacts with last amino acid in L last amino acid in K interacts with first amino acid in L

One Threading a threading can be represented as a vector, where each element indicates the index of the amino acid placed in the first position of each core segment Figure from R. Lathrop et al. Analysis and Algorithms for Protein Sequence-Structure Alignment.

Threading Sets a set of potential threadings can be represented by bounds on the first position of each core segment

Possible Threadings finding the optimal alignment is NP-hard in the general case where –there are variable length gaps between the core segments, and –the objective function includes interactions between neighboring amino acids Figure from R. Lathrop et al. Analysis and Algorithms for Protein Sequence-Structure Alignment.

A General Pairwise Objective Function the general objective function with pairwise interactions is: scores for individual segments scores for segment interactions

Searching the Space of Alignments if interaction terms between amino acids not allowed –dynamic programming will find optimal alignment efficiently if interaction terms allowed –heuristic methods fast might not find the optimal alignment –exact methods (e.g. branch & bound) will find the optimal alignment might take exponential time

Branch and Bound Search

Branch and Bound Illustrated Figure from R. Lathrop and T. Smith. Global Optimum Protein Threading with Gapped Alignment and Empirical Pair Score Functions. Journal of Molecular Biology 255: , a hypothetical branch and bound search –each circle illustrates the space of possible threadings –solid lines indicate splits made in previous steps –dashed lines indicate splits made in current step –numbers indicate lower bounds for newly created subsets –arrows show the set that has been split

Branch and Bound Search there are two key issues to address in instantiating this approach –how to compute the lower bound for a set of threadings –how to split a threading set into subsets these aspects determine the expected efficiency of the branch and bound search

A Simple Lower Bound objective function lower bound in a nutshell: calculate minimum over each term separately

A Better Lower Bound [Lathrop & Smith, JMB ’96] interaction with preceding segment best case interaction with other segments

Splitting a Threading Set a threading set is split by choosing –a single core segment –a split point in the segment a simple method –split the segment having the widest interval, i.e. –choose the split point as the value that results in the lower bound for the set

Branch and Bound: Splitting a Set Figure from R. Lathrop et al. Analysis and Algorithms for Protein Sequence-Structure Alignment splitting into three sets

Branch and Bound Efficiency 58 proteins threaded against their “native” (i.e. correct) models Table from R. Lathrop and T. Smith, Journal of Molecular Biology 255: , 1996.

Threading Example Suppose we have three segments (i, j, k), each of which includes three amino acids. For a given sequence there are three possible starting positions for each segment. Suppose that you are given the following values for the scores of the individual segments and the scores for segment interactions. g1(i,2) = 5 g1(j,8) = 9 g1(k,13) = 3 g1(i,3) = 2 g1(j,9) = 7 g1(k,14) = 4 g1(i,4) = 8 g1(j,10) = 6 g1(k,15) = 1 g2(i,j,2,8) = 1 g2(i,j,2,9) = 2 g2(i,j,2,10) = 2 g2(i,j,3,8) = 5 g2(i,j,3,9) = 6 g2(i,j,3,10) = 4 g2(i,j,4,8) = 7 g2(i,j,4,9) = 3 g2(i,j,4,10) = 4 We’ll find the optimal threading using the "simple lower bound" and splitting a set on the segment with the minimal g1 value. When splitting the selected segment, we’ll divide it into three intervals of length one. g2(j,k,8,13) = 7 g2(j,k,8,14) = 8 g2(j,k,8,15) = 7 g2(j,k,9,13) = 1 g2(j,k,9,14) = 6 g2(j,k,9,15) = 8 g2(j,k,10,13) = 11 g2(j,k,10,14) = 12 g2(j,k,10,15) = 13 g2(i,k,2,13) = 1 g2(i,k,2,14) = 2 g2(i,k,2,15) = 5 g2(i,k,3,13) = 5 g2(i,k,3,14) = 6 g2(i,k,3,15) = 4 g2(i,k,4,13) = 1 g2(i,k,4,14) = 2 g2(i,k,4,15) = 4

Threading Example T=[2,4], [8,10], [13,15] T=[2,4], [8,10], [13] LB = g1(i,3) + g2(i,j,2,8) + g2(i,k,2,13) + g1(j,10) + g2(j,k,9,13) + g1(k,13) = 14 T=[2,4], [8,10], [14] LB = g1(i,3) + g2(i,j,2,8) + g2(i,k,2,14) + g1(j,10) + g2(j,k,9,14) + g1(k,14) = 21 T=[2,4], [8,10], [15] LB = g1(i,3) + g2(i,j,2,8) + g2(i,k,3,15) + g1(j,10) + g2(j,k,8,15) + g1(k,15) = 21 T=[2], [8,10], [13] LB = g1(i,2) + g2(i,j,2,8) + g2(i,k,2,13) + g1(j,10) + g2(j,k,9,13) + g1(k,13) = 17 T=[3], [8,10], [13] LB = g1(i,3) + g2(i,j,3,10) + g2(i,k,3,13) + g1(j,10) + g2(j,k,9,13) + g1(k,13) = 21 T=[4], [8,10], [13] LB = g1(i,4) + g2(i,j,4,9) + g2(i,k,4,13) + g1(j,10) + g2(j,k,9,13) + g1(k,13) = 22 T=[2], [8], [13] LB = g1(i,2) + g2(i,j,2,8) + g2(i,k,2,13) + g1(j,8) + g2(j,k,8,13) + g1(k,13) = 26 T=[2], [9], [13] LB = g1(i,2) + g2(i,j,2,9) + g2(i,k,2,13) + g1(j,9) + g2(j,k,9,13) + g1(k,13) = 19 T=[2], [10], [13] LB = g1(i,2) + g2(i,j,2,10) + g2(i,k,2,13) + g1(j,10) + g2(j,k,10,13) + g1(k,13) = 28

Overview of the Rosetta Approach (David Baker lab, Univ. of Washington) Given: protein sequence P for each window of length 3 and 9 in P assemble a set of structure fragments M = initial structure model of P (fully extended conformation) S = score(M) while stopping criteria not met randomly select a fixed width “window” of amino acids from P randomly select a fragment from the list for this window M’ = M with torsion angles in window replaced by angles from fragment S’ = score(M’) if Metropolis criterion(S, S’) satisfied M = M’ S = S’ Return: predicted structure M

Fragment Selection fragments are selected from known structures the window-fragment matches are calculated using –PSI-BLAST to build a profile model of the sequence –the predicted secondary structure of the sequence

Fragment Selection the key information that is exploited from the fragments are the torsion angles of the polypeptide backbone

Metropolis Criterion given the previous structure model with score S and the new one with score S’, accept the new one with probability “temperature” parameter that is varied during the search

Some Rosetta-Predicted Structures Native indicates the real structure Model indicates the predicted structure the rightmost structures in cases B. and C. show similar structures identified by searching a structure database with the model

Want to Help Predict Structures? go to (