5. Ab initio modeling.

5. Ab initio modeling

And today… Introduction to ab initio modeling: the basic principles
Rosetta ab initio modeling protocol Grid-based large-scale modeling & FOLDIT I-Tasser CASP

Types of structure prediction
Comparative modeling Structural template detected from sequence similarity Fold recognition Structural template detected from fitness to fold (threading) Ab initio modeling (Free Modeling) No obvious structural template: model whole folding process…. Rosetta I-Tasser Similarity to known structure

Basic Ab Initio Rosetta protocol
Select fragments consistent with local sequence preferences Assemble fragments into models with native-like global properties Identify the best model from the population of decoys Figures adapted from Charlie Strauss; Protein structure prediction using ROSETTA Rohl et al (2004) Methods in Enzymology, 383:66

1. Select fragments: local sampling
Fragment libraries fragments for each trimer and nonamers Recent improvement was obtained by using fragments of additional sizes: For a helix: length 5-19 & 3-12 For b sheet: length 4-10 & 3-7 Selected from PDB < 2.5Å resolution & < 50% seq id Ranked by sequence similarity and similarity of predicted and known secondary structure Discard improbable conformations

2. Create compact decoys using fragment assembly
Advantages of approach Fragment library approximates Gibbs sampling Fragments allow an accurate, but implicit, representation of the potential energy surface for local interactions. Computer power can be invested in optimization of global features (e.g. compactness) local global

Low-resolution step Structure Representation:
Equilibrium bonds and angles (Engh & Huber 1991) Centroid: average location of center of mass of side-chain (Centroid | aa, f,) No modeling of side chains Fast

Low-resolution parameters
Senv - burial preference (number of neighbors) Spair - preferred amino acid pairs (e.g. cys-cys, glu-arg, etc) Sss + SHS - sheet and helix-sheet geometries compactness of structure Scb Svdw no clashes Srgyr globular structure small vs. large radius of gyration (Rgyr)

2. Create compact decoys using fragment assembly
MC search with simulated annealing – start with extended conformation 28K-36K random 9-mer fragment insertions (from top25 fragments) XK Only vdw score (until all f,y have changed) 2K Add strand pairing score (0.3 weight) 20K Compactness: Increase pairing score + add Cb and Rgyr : ±local strand pairing weight 6K/4K Full strand pairing; Full centroid function 3-mer fragment insertions 8K gunn-type (select among least perturbing fragments)

Further local refinement strategies
Local moves: how to perturb the backbone with minimal effect on remote regions 1. random torsion angle perturbation (helix - 0o,strand <2o, rest < 3o) Small move - random fi,yi pair Shear move - Dyi-1, -Dfi compensatory movements, move peptide plane 2. selection of globally non-perturbing fragments Chuck move – fragments that minimize atom msd Gunn move – fragments that minimize Dy, Df

Further local refinement strategies
Local moves: how to perturb the backbone with minimal effect on remote regions 3. adjacent f-y variation to offset global effect of fragment insertion Wobble move – fast analytical gradient calculation Crank shaft - combination of several wobble moves Smaller moves are accepted with higher frequency wobble crank shaft Fig. 2. Modified ‘‘crank’’ fragment insertion into 1 dan. (A) Superposition of the protein conformations preceding (black) and following (blue) insertion of a nine-residue fragment. The fragment insertion window is shown in red. The portion of the chain unperturbed by insertion is shown in gray. (B) Superposition of the protein conformations preceding (blue) and following (green) optimization of angles at a wobble site (cyan) adjacent to the insertion window. (C) Superposition of the protein conformations preceding (green) and following (magenta) optimization of angles at a second wobble site (orange) nonadjacent to the insertion window. (D) Superposition of the original (black) and final (magenta) conformations. Before insertion after insertion insert No changes Final conformation

Global sampling Fragment exchange Local moves Initial global changes
Further refinement Movie by Jens Meiler

3. Identify best structure
Generate decoy population ( ) Filter to correct sampling biases Cluster analysis identifies broadest minimum Fullatom refinement will identify lowest energy minimum

High-resolution step: parameters
VdW 12-6 Lennard Jones linear repulsion Cutoff within Å Solvation (Lazaridis-Karplus) Hydrogen bonds rij + Weak pair potential Electrostatic interactions p-p, p-+ Backbone torsions (rama score) - polar polar N H O C d  

High-resolution refinement of models
MCM protocol: 120 steps of small & shear moves Random perturbation of 5/10 torsions angles (2-3o) Side chain optimization: rotamer trial (each 10 steps full repacking) minimization steps 1-60: gradually ramp up vdw repulsive steps : add side chain minimization vdW repulsive Small backbone moves and MCM Side chain optimization Backbone optimization Side chain optimization+ minimization Backbone optimization

First atom-resolution model
Target 0281 CASP6 Topology sampled by ab initio trajectory of homolog sequence (rmsd=2.2Å) Full atom refinement reduces rmsd to 1.5Å Side chain packing accurately recovered

Atom-resolution Ab Initio (I)
Challenge: Sample near-native conformations (<~2.5A) Approach: Model set of homologs → diverse population samples basin of attraction Example: exposed Leucine Models starting from extended conf Models starting from native conf Toward high-resolution de novo structure prediction. Bradley et al (2005) Science 309:1868

Low-resolution homolog folding improves prediction
Collect 50 homologs (psi-blast 2 rounds; 60% non-redundant) For each create 2000 low-resolution models cluster, retain large clusters (n>5), and select 500 models Thread query sequence back onto ~20-30K models Proceed to fullatom refinement: evaluate also homolog sequences (2 rounds of MCM protocol) … … …

Atom-resolution Ab Initio (II)
Hox-B1 Ubiquitin Step1: low resolution model homologs Step2: atom resolution models Energy-based model selection Results: 11/16 proteins of length <88 residues are modeled within <5Å

Sampling of b sheet topologies
Fold-tree representation of protein allows tailored optimization

How can we improve? (1) More computer time
BOINC – donate idle time of many home computers for Rosetta runs Tera=1012 strong desktop: ~ gigaflop (109)

More computer time – is sampling an issue?
Perform very long runs on the grid (>106 decoys) 3 categories (a) Near-native lowest energy model (<3.5Å) ✔ (b) Problem with sampling (E near-native structures <<E decoys) (c) Problem with energy function (E near-native structures >E decoys) Why don’t we sample these conformations (b) ????? Sampling bottlenecks in de novo protein structure prediction Kim et al (2007) JMB 393:249

“linchpin features” are rarely sampled
Describe models as feature vectors Identify native features not sampled in low-energy models O: w=cis E: left-handed strand G: left-handed helix B: right-handed strand A: right handed helix Torsion bins Residue position Position 23 never samples native helix conformation  simulations never succeed Native torsion bin Frequently sampled Native torsion bin Enforcement of native-like value for feature  Some simulations now succeed Never sampled Native torsion bin Sampling bottlenecks in de novo protein structure prediction Kim et al (2007) JMB 393:249

Examples for “linchpin features”
Near active site Regions that form late in folding Irregular b strand pairing (mostly in edge strands)

How can we improve? More brains
FOLDIT – folding game donate idle time of many brains to improve structure prediction Now as Android application! “win the Nobel prize by just playing a game” Look also for “black belt” foldit lessons

Foldit Players: human spatial reasoning
Explore also strategy space: new search algorithms Excel in solving problems where substantial backbone rearrangement is needed to bury hydrophobic residue Challenge: Formulate problem as game Easy to understand to non-scientists Competition/Collaborations Native structure Starting structure Foldit Model Predicting protein structures with a multiplayer online game Cooper et al (2010) Nature 466:756

Example1: help in structure determination
Solved structure Starting model Figure 2 M-PMV retroviral protease structure improvement by the Foldit Contenders Group. (a) Progress of structure refinement over the first 16 d of game play. The x axis shows progression in time, and the y axis shows the Phaser log-likelihood (LLG) of each model in a near-native orientation. To identify a solution as correct by molecular replacement using Phaser, the model must have an LLG better than the best random models. The distribution of these best random predictions is indicated by the intensity of the pale blue band. (Because almost all the models are too poor to allow correct placement in the unit cell, Phaser LLGs are calculated after optimal superposition of each model onto the solved crystal structure and rigid-body optimization.) (b) Starting from a quite inaccurate NMR model (red), Foldit player spvincent generated a model (yellow) considerably more similar to the later determined crystal structure (blue) in the β-strand region. (c) Starting from spvincent’s model, Foldit player grabhorn generated a model (magenta) considerably closer to the crystal structure with notable improvement of side-chain conformations in the hydrophobic core. (d) Foldit player mimi made additional improvements (in the loop region at the top left) and generated a model (green) of sufficient accuracy to provide an unambiguous molecular replacement solution which allowed rapid determination of the ultimate crystal structure (blue). LLG: log likelihood of a model: useful models must have better LLG than best random models (in shade) Nature Structure and Molecular Biology 2011

Example 2: Foldit Puzzle #986875
Foldit detects better structures, … using trajectories that visit high energy structures on the way Native structure Starting structure Foldit Model Predicting protein structures with a multiplayer online game Cooper et al (2010) Nature 466:756

Algorithm discovery by Foldit players
Added ability to create, edit, share and rate “recipes” (each player can create its own “cookbook”) Evaluated what strategies evolve and how they spread among players Top Players Used at different stages in during modeling Main strategies All Players Algorithm discovery by protein folding game players Khatib et al (2012) PNAS 108:18949

Algorithm discovery by Foldit players
Many new recipes evolve from “Blue Fuse” “Blue Fuse” and “Quake” are most popular Algorithm discovery by protein folding game players Khatib et al (2012) PNAS 108:18949

Foldit players detect algorithms that are similar to those used in Rosetta
Foldit “Blue Fuse”: very similar to new Rosetta protocol “Fast Relax” (repeated decrease/increase of repulsive term) Comparable efficiency for short runs Algorithm discovery by protein folding game players Khatib et al (2012) PNAS 108:18949

ab initio modeling – summary:
Highly accurate Computationally expensive (~150 CPU hours/protein) Server of Rosetta Good alternative: I-Tasser Protocol developed by Zhang and Skolnick Based on threading of parts of sequence onto parts of known structures Very efficient and accurate (~5 CPU hours/protein) Server of iTasser Roy, Kukucural, Zhang (2010) I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc. 5:725

I-Tasser Iterative Threading Assembly Refinement (Zhang, & Skolnick)
Separate training of protocol for: easy/ medium/ hard targets

i-Tasser (Zhang & Skolnick)
Threading: Create profile: Psiblast -> sequence profile Psipred -> secondary structure profile LOMETS: Metaserver for threading (FUGUE, HHSEARCH, MUSTER, PROSPECT, PPA, SP3 & SPARKS) Excise aligned structure elements from top-scoring templates for next step (20/30/50, depending on difficulty of target) Wu, Solnick, Zhang (2007) Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biology 5:17 Roy, Kukucural, Zhang (2010) I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc. 5:725

Tasser: Schematic representation of polypeptide chain in on- and off-lattice Ca model
Structure assembly - efficient modeling: 2 points/residue (Ca + SG) on-lattice ab initio for unaligned regions off-lattice for aligned regions Schematic representation of a piece of polypeptide chain in the on- and off-lattice CAS model. Each residue is described by its Cα and side chain center of mass (SG). Whereas Cα values (white) of unaligned residues are confined to the underlying cubic lattice system with a lattice space of 0.87 Å, Cα values (yellow) of aligned residues are excised from templates and traced off-lattice. SG values (red) are always off-lattice and determined by using a two-rotamer approximation (9). Zhang Y., Skolnick J. PNAS 2004;101: ©2004 by National Academy of Sciences

Monte Carlo Search by replica exchange Exchange between simulations at different temperatures: better sampling Scoring function: separately trained for easy, medium and hard targets Secondary structure (PSIPRED & SAM) Statistical terms: backbone hydrogen bonds; hydrophobicity and Ca/side chain correlations Spatial restraints from threading templates Sequence-based contact predictions (SVM) (and accessible surface area prediction; NN)

i-Tasser (Zhang) Example for improvement over template
Constraints from threading; contact prediction are located at different sites and complement each other

i-Tasser (Zhang) Clustering
additional iteration of MC simulation starting from cluster centers Final model created by optimizing hydrogen bonds

Contact-assisted structure prediction
ab initio restricted to small (100aa), single domain proteins + information about contacts -> dramatic increase of scope (… 500aa) Info from: Contact prediction (bioinfo) Experiments: e.g. NMR chemical shifts, mutagenesis, etc Contacts may assist in Determination of Topology: Filter fragments Find fragment pairs Refinement of Topology: Refine structure by imposing constraints Assessment on CASP10 of Rosetta ab initio modeling: one reliable non-local contact every <12aa> needed for reliable modeling Kim et al. (2013) One contact for every twelve residues allows robust and accurate topology-level protein structure modeling Proteins 82:208

Flowchart of protocol Topology determination: from partial threading (SPARKS, Rosetta) Topology refinement: RosettaCM recombination protocol (next week) Kim et al. (2013) One contact for every twelve residues allows robust and accurate topology-level protein structure modeling Proteins 82:208

Improved models for large structures using contacts: native Ab initio Assisted ab initio Kim et al. (2013) One contact for every twelve residues allows robust and accurate topology-level protein structure modeling Proteins 82:208

CASP Identification of “winner strategies”:
Rosetta in CASP4-6 iTasser in CASP7 & CASP8 servers improved combination of multiple templates in CASP9 CASP10: refinement with MD CASP11: contact prediction methods & contact-assisted modeling Double-blind structure prediction experiment allows assessment of different approaches every 2 years; summer 2014: CASP11 Steady improvement of methodology Categories: Template based modeling (TBM) Free modeling (FM) Refinement of initial models New: prediction of contacts, unstructured regions, ligand binding sites Proteins special issue vol:82, S2

CASP Around 130 targets in last rounds
Until CASP 9: Target difficulty decreases CASP10: 131 domains (20 free modeling) Targets now more difficult than previous CASPs Proteins special issue vol:79, S10 Kryshtafovych et al.(2011). Proteins 79:S196–207

Measure of performance
Compare predicted to solved structure superimpose short fragments (length n=3,5,7 residues; iteratively) find maximal superimposed part N, where N Ca atom pairs are within xÅ 4 thresholds: x=1.0, 2.0, 4.0, 8.0 GDT_TS = ¼ (N1+N2+N4+N8)

CASP4 ab initio summary 18 newly solved structures predicted prior to publication of structure. none recognized by sequence similarity none with close structural homologs Rosetta Independently assessed scoring: 2=“Well Above Average”, 1=“okay”, 0=“lousy”

Improvement over the years
Improvement in each round CASP7: in difficult region CASP8: accuracy in template-based modeling (few difficult cases) CASP9: intermediate difficulty targets CASP10: refinement using MD (Michael Feig)

Free Modeling with Rosetta in CASP8

Server model: predicts kinked helix Only model with 4 beta strands (most predictions: all helical protein) model best template Kinch et al. (2011). Proteins, 79:S59–73

best template

Rosetta in CASP8: modification of fragment size improves prediction
longer fragments for alpha helical proteins (5-19; 3-12) shorter fragments for beta sheets (4-10; 3-7)

FM with ITasser

Improved automatic servers
increased contribution of automatic servers predictions of mostly similar quality improve now also difficult targets Human server +

CASP10: Foldit platform joins the game for coopetition
Start from Foldit models; proceed with different approaches Joint forces produce best model

Summary steady improvement of structure prediction over the years
impressing quality of current ab initio modeling efficient combination of appropriate sampling strategies and a tailored energy function models now often better than template automatic servers outperform now also FM

5. Ab initio modeling.

Similar presentations

Presentation on theme: "5. Ab initio modeling."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

5. Ab initio modeling.

Similar presentations

Presentation on theme: "5. Ab initio modeling."— Presentation transcript:

Similar presentations

About project

Feedback