Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.

Slides:



Advertisements
Similar presentations
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Advertisements

Protein Structure Prediction using ROSETTA
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Protein Tertiary Structure Prediction
Structural bioinformatics
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Tertiary protein structure viewing and prediction July 1, 2009 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Protein Structure Modeling (2). Prediction
Heuristic alignment algorithms and cost matrices
Tertiary protein structure viewing and prediction July 5, 2006 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Protein Fold recognition Morten Nielsen, Thomas Nordahl CBS, BioCentrum, DTU.
Thomas Blicher Center for Biological Sequence Analysis
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
1 Protein Structure Prediction Reporter: Chia-Chang Wang Date: April 1, 2005.
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
Similar Sequence Similar Function Charles Yan Spring 2006.
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.
Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Tertiary Structure Prediction Structural Bioinformatics.
An introduction and homology modeling
Bioinformatics Ayesha M. Khan Spring 2013.
Protein Structural Prediction. Protein Structure is Hierarchical.
Protein Structure Prediction Dr. G.P.S. Raghava Protein Sequence + Structure.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
COMPARATIVE or HOMOLOGY MODELING
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Lecture 10 – protein structure prediction. A protein sequence.
Representations of Molecular Structure: Bonds Only.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Structure prediction: Homology modeling
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Predicting Protein Structure: Comparative Modeling (homology modeling)
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Modelling protein tertiary structure Ram Samudrala University of Washington.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
. Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
Generating, Maintaining, and Exploiting Diversity in a Memetic Algorithm for Protein Structure Prediction Mario Garza-Fabre, Shaun M. Kandathil, Julia.
Protein Structure Prediction and Protein Homology modeling
Protein dynamics Folding/unfolding dynamics
Protein Folding and Protein Threading
Protein Structures.
Molecular Modeling By Rashmi Shrivastava Lecturer
Rosetta: De Novo determination of protein structure
Homology Modeling.
Protein structure prediction.
Volume 85, Issue 4, Pages (October 2003)
Protein structure prediction
Presentation transcript:

Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary

Lecture 12 CS5662 Motivation Holy Grail: Mapping between sequence and structure. Structure = F(Sequence). What is F? Why –Structure dictates chemistry, thermodynamics and therefore function –Not all structures can be (need be?) determined experimentally Cost Experimental limitations

Lecture 12 CS5663 Concepts – Prediction spectrum Decreasing reliance on known structures Homology Modeling Threading ab initioQuantum Mechanics

Lecture 12 CS5664 Concepts - Common Principles Constraints to reduce search space Consideration of many alternate conformations –Protein backbone dihedral angles (‘Twists along axis of protein’) –Amino-acid geometry (‘Amino-acids can have more than one shape’) Method for local optimization Scoring function to compare conformations

Lecture 12 CS5665 Evaluation of quality of prediction RMSD comparison with experimentally known structure Comparison with crystal structure quality criteria –Ramachandran Plot Residue specific dihedral angle distribution CASP (Critical assessment of structure prediction) and CAFASP (..Fully Automated..) competitions

Lecture 12 CS5666 Methods Knowledge-based constraints of search space –Homology Modeling –Threading –ab initio (Based on knowledge primitives: not true ab initio) Approaches to refinement –Quantum mechanics (ab initio) Based on quantum mechanical model of elementary particles Unscalable –Molecular mechanics Uses parametric Force Fields (Newton’s laws, Hooke’s law, …) Typically used for local or constrained global optimization Molecular Dynamics or Monte Carlo-based

Lecture 12 CS5667 Homology modeling Homology –Based on sequence-sequence similarity ( > ~25%, the higher, the better) –Steps Pair-wise local sequence similarity to identify related structures (possible templates) Refine alignment by global pair-wise sequence similarity and msa Overlay sequence backbone (N-C-C) on template Model loops based on –Statistical knowledge from databases of known structures –Molecular mechanics Model side-chains (approach similar to that of loops) Molecular mechanical unconstrained local optimization Pray for a good solution!

Lecture 12 CS5668 Threading Based on sequence-structure similarity Concept –Residues in core adopt fewer conformations than surface Approach –Thread sequence through all known structures –Score match with core of each structure based on Environmental scoring matrices and/or Amino acid neighborhood matrices (a la Dot matrix) –Refine structure using molecular mechanics based on best template(s)

Lecture 12 CS5669 Rosetta (“ab initio”) Approach Pioneered by David Baker’s group in the late 1990s Remarkable success in CASP and CAFASP experiments Recently made publicly available on an automated server by Christopher Bystroff’s group Pot pourri of many different approaches Key components –‘Divide and conquer’ strategy with respect to length of sequence to be modeled –Use of knowledge based energy function

Lecture 12 CS56610 ‘Divide and conquer’ Mimics natural process of protein folding Compromise between extremes of –Looking for homologous sequences with known structure –Modeling a priori (one amino acid at a time) Use library of 3D structures of fragments of length 3 and 9 derived from the crystal structure database (a priori estimates = 8K and ~ ). Break up query sequence into a set of 3mers and 9mers, to find matches with above library – using a sequence profile approach

Lecture 12 CS56611 ‘Divide and conquer’ Once matches found, reduces to combinatorial problem of selecting best set of fragments with most energetically favorable structure In practice, Monte Carlo based search of possible combinations is carried out.

Lecture 12 CS56612 Knowledge based energy function Fundamentally, ∆G = ∆H - T ∆S Free energy is the enthalpy less an entropic term that is proportional to temperature Entropy is proportional to the natural log of the number of conformations/possible states S = K ln W

Lecture 12 CS56613 Knowledge based energy function Hence makes sense to use existing distribution of structures to derive energy function Energy function is based on taking statistical distribution of 3D shapes in database of known structures as the underlying probability distribution For a given structure, deviations from probability distribution are subject to proportional energetic penalties

Lecture 12 CS56614 Rosetta – Steps used in CASP4 1.If possible, use PSI-BLAST to find similar sequences A.If found, use the multiple sequence alignment to break down sequence into domains to be modeled independently B.For domains with similarity to known structures, use Homology based approach C.For remaining domains, carry out Rosetta

Lecture 12 CS56615 Rosetta - Steps 2.For domains with similarity to other sequences, apply following steps to the homologs as well (consensus modeling) 3.Generate fragment library for each query A.Collect 3mer and 9mer sub-structures from the PDB with similarity to 3mer and 9mer subsequences 4.Use Monte Carlo approach for backbone fragment substitution into query A.Pick a fragment at random from library (~40,000 fragment substitutions for each structure) B.Repeat A several times C.Between 10K and 100K conformations (‘decoys’) generated for each target

Lecture 12 CS56616 Rosetta - Steps 5.Filter set of conformations to remove unlikely structures A.Remove structures with minimal long range interactions (low contact order) B.Remove structures with unrealistic strands 6.Add side chains as statistically predicted by the backbone conformation 7.Cluster set of conformations (including, when available, the generated structures of homologues) 8.Representative structures from the top 5 most-populous clusters are candidate structures

Lecture 12 CS56617 Summary Methods like Rosetta represents a breakthrough in the ab initio prediction of protein 3D structure and are very useful in cases where homology cannot be observed For CASP4, at least one subsequence longer than 50 residues could be predicted ‘correctly’ (< 6.5 rmsd) in 17 of 21 cases Combination of various approaches works best

Lecture 12 CS56618 Summary However, both completeness and accuracy of prediction leave ample room for improvement –RMS error frequently too high to be useful –Even in homology modeling, template per se is often better match! –Often, only subsequences are accurately modeled, and not the whole structure –The Nobel Prize is still up for grabs!