CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.

Slides:



Advertisements
Similar presentations
Local optimization technique G.Anuradha. Introduction The evaluation function defines a quality measure score landscape/response surface/fitness landscape.
Advertisements

PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Rosetta Energy Function Glenn Butterfoss. Rosetta Energy Function Major Classes: 1. Low resolution: Reduced atom representation Simple energy function.
CHAPTER 8 A NNEALING- T YPE A LGORITHMS Organization of chapter in ISSO –Introduction to simulated annealing –Simulated annealing algorithm Basic algorithm.
Optimization methods Morten Nielsen Department of Systems biology, DTU.
Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
Incorporating additional types of information in structure calculation: recent advances chemical shift potentials residual dipolar couplings.
Solving NMR structures II: Calculation and evaluation The NMR ensemble Methods for calculating structures distance geometry, restrained molecular dynamics,
CISC667, F05, Lec21, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction 3-Dimensional Structure.
Taking a Numeric Path Idan Szpektor. The Input A partial description of a molecule: The atoms The bonds The bonds lengths and angles Spatial constraints.
Graphical Models for Protein Kinetics Nina Singhal CS374 Presentation Nov. 1, 2005.
Thomas Blicher Center for Biological Sequence Analysis
This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
Simulated Annealing Van Laarhoven, Aarts Version 1, October 2000.
Physics of Protein Folding. Why is the protein folding problem important? Understanding the function Drug design Types of experiments: X-ray crystallography.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
Computational Structure Prediction Kevin Drew BCH364C/391L Systems Biology/Bioinformatics 2/12/15.
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
COMPARATIVE or HOMOLOGY MODELING
Conformational Sampling
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Representations of Molecular Structure: Bonds Only.
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
Some Uses of Probability Randomized algorithms –for CS in general –for games and robotics in particular Testing Simulation Solving probabilistic problems.
Rotamer Packing Problem: The algorithms Hugo Willy 26 May 2010.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Protein Design CS273: Final Project Charles Kou Crystal structure of top7 – A novel protein structure created with RosettaDesign.
Simulated Annealing.
Biomolecular Nuclear Magnetic Resonance Spectroscopy FROM ASSIGNMENT TO STRUCTURE Sequential resonance assignment strategies NMR data for structure determination.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
SimBioSys Inc.© 2004http:// Conformational sampling in protein-ligand complex environment Zsolt Zsoldos SimBioSys Inc., © 2004 Contents:
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Structure prediction: Homology modeling
New Strategies for Protein Folding Joseph F. Danzer, Derek A. Debe, Matt J. Carlson, William A. Goddard III Materials and Process Simulation Center California.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Predicting Protein Structure: Comparative Modeling (homology modeling)
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Modelling protein tertiary structure Ram Samudrala University of Washington.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Molecular Modelling - Lecture 2 Techniques for Conformational Sampling Uses CHARMM force field Written in C++
Protein Design with Backbone Optimization Brian Kuhlman University of North Carolina at Chapel Hill.
Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson Let’s think!
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
Mean Field Theory and Mutually Orthogonal Latin Squares in Peptide Structure Prediction N. Gautham Department of Crystallography and Biophysics University.
Ab-initio protein structure prediction ? Chen Keasar BGU Any educational usage of these slides is welcomed. Please acknowledge.
How NMR is Used for the Study of Biomacromolecules Analytical biochemistry Comparative analysis Interactions between biomolecules Structure determination.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
3.3b1 Protein Structure Threading (Fold recognition) Boris Steipe University of Toronto (Slides evolved from original material.
Computational Structure Prediction
Generating, Maintaining, and Exploiting Diversity in a Memetic Algorithm for Protein Structure Prediction Mario Garza-Fabre, Shaun M. Kandathil, Julia.
FORA: Simple and Effective Approximate Single­-Source Personalized PageRank Sibo Wang, Renchi Yang, Xiaokui Xiao, Zhewei Wei, Yin Yang School of Information.
CSC321: Neural Networks Lecture 19: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
Protein Structure Prediction
3-Dimensional Structure
Protein structure prediction.
Understanding protein folding via free-energy surfaces from theory and experiment  Aaron R Dinner, Andrej Šali, Lorna J Smith, Christopher M Dobson, Martin.
Conformational Search
Paul Robustelli, Kai Kohlhoff, Andrea Cavalli, Michele Vendruscolo 
Protein structure prediction
Homology modeling in short…
Presentation transcript:

CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou

An Analogy Fill this with words from a set of 130 words (or their anagrams, to make the analogy more precise) as close to the author’s solution (not given) as possible. (ROSETTA) puzzles.about.com/library/graphics/blank19.gif

An Easier (NP-Complete) Problem? Knowing what the words mean tells you a little more about what anagram you should use. CS-ROSETTA

The ROSETTA Procedure Monte Carlo fragment replacement Monte Carlo side chain packing Monte Carlo minimization As t goes to infinity (cubed? more?), it converges to the answer!

Monte Carlo (Random Sampling) Randomly (or pseudorandomly) pick a configuration and evaluate its energy. If acceptably low, store result. If not, move a distance away from that point as a function of the energy (Metropolis criterion, a.k.a. simulated annealing) and evaluate again When some convergence threshold or time limit is met, stop and return stored results. hfa_03_img0571.jpg

Advantages of Monte Carlo Individual computations are cheap  Exponential search spaces are slow to search exhaustively  Probabilistic worst case is identical to simple brute-force Can be done as an empirical black box  Can approximate molecular dynamics with empirical energy functions

When Should Monte Carlo Be Used? No provable bounds on running time  Monte Carlo linear algebra?  Monte Carlo comparison sort? (Bozo Sort) No provable bounds on accuracy  Convergence != global minimum Only sample what you can’t reasonably deterministically predict

Application to ROSETTA Monte Carlo fragment replacement  Randomly select a position, and the 8 residues following it  Randomly select a 9 residue fragment from database, and match the fragment’s bond angles

Application to ROSETTA Monte Carlo side chain packing  Randomly pick a residue  Randomly pick a rotamer, and replace the residue configuration with the rotamer Monte Carlo minimization  Randomly pick a residue  Randomly perturb it, then minimize the protein

Chemical-Shift Rosetta Use NMR data as an additional criterion in fragment selection phase. PDBROSETTA Experimental NMR data MFR From fig. (3) Fragment Database

Molecular Fragment Replacement (MFR) Given AA sequence (from genomic data or otherwise) search PDB for best possible matches. Find fragments of known proteins that best match the sequence and predicted chemical shift best fit experimental data.  Chemical shifts predicted via SPARTA, which was trained on 200 proteins and is 10% more accurate than SHIFTX

Results MFR-selected fragments generate lower energy structures than standard ROSETTA fragments Lowest-energy conformations for C α deviated 1~2 Å from reference structure Some exceptions, but ROSETTA doesn’t consider the chemical shifts, and adding it to the empirical energy function improved results

Robustness When backbone chemical shift assignments are incomplete, CS- ROSETTA is still better at picking fragments than ROSETTA If a whole section of the protein’s chemical data is missing then it’s like that part is just being run with vanilla ROSETTA

Convergence Convergence is concluded when the newly derived structure has rmsd approx. 2Å from the lowest energy structure so far. Baker et al. suggest identifying a “funneling phenomenon”

Convergence Convergence rapidly decreases with increasing protein size, and CS- ROSETTA begins to fail at around 130 residues. Convergence is also adversely affected by long, disordered loops in the reference structure From fig(5)

Blind Prediction The ordered portions have remarkably good rmsd, values <1 Å for 6 and less than approx. 2 for the other 3

Blind Prediction Structures are strikingly similar:  ROSETTA’s energy model favors hydrogen bonds, which results in extended secondary structure by a few residues  Disordered sections can be detected by chemical shifts with Random Coil Index and thus prohibited from contributing to secondary structure  Core side-chain packing was also less accurate

Conclusions CS-ROSETTA is faster and thus able to handle bigger problems than traditional ROSETTA. CS-ROSETTA is 50% faster than traditional triple-NMR structure determination CS-ROSETTA is perhaps better able to determine the structure of systems not stable enough for conventional NMR…?

CS-ROSETTA? Is there a mathematically derived limit on how big a protein can be?  ROSETTA runs 28,000 iterations, so if the search space of a protein exceeds 28000n for some n it is most likely going to fail? Each additional sample gives us more information. Is it possible to identify the “statistically significant global minimum?” Given assignments, Chemical shifts should also tell us more about secondary structure (guided side chain packing and minimization?)