Presentation is loading. Please wait.

Presentation is loading. Please wait.

2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and.

Similar presentations


Presentation on theme: "2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and."— Presentation transcript:

1 2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and its prediction modes Cartesian and polar coordinates Sampling (finding the structure) and scoring (selecting the structure)

2 Structural Modeling of Proteins - Approaches

3 Prediction of Structure from Sequence Flowchart Comparison of query sequence to nr database Similar to a sequence of known structure? Homology Modeling (Comparative Modeling) No Fold Recognition (Threading) Fits a known fold? Yes Ab initio prediction No

4 The Rosetta framework and its prediction modes

5 A short history of Rosetta In the beginning: ab initio modeling of protein structure starting from sequence  Short fragments of known proteins are assembled by a Monte Carlo strategy to yield native-like protein conformations Reliable fold identification for short proteins. Recently improved to high-resolution models (within 2A RMSD) ATCSFFGRKLL…..

6 A short history of Rosetta Success of ab initio protocol lead to extension to  Protein design  Design of new fold: TOP7  Protein loop modeling; homology modeling  Protein-protein docking; protein interface design  Protein-ligand docking  Protein-DNA interactions; RNA modeling  Many more, e.g. solving the phase problem in Xray crystallography ATCSFFGRKLL…..

7 The Rosetta Strategy Observation: local sequence preferences bias, but do not uniquely define, the local structure of a protein Goal: mimic interplay of local and global interactions that determine protein structure Local interactions: fragments derived from known structures (sampled for similar sequences/secondary structure propensity) Global (non-local) interactions: buried hydrophobic residues, paired  strands, specific side chain interactions, etc

8 The Rosetta Strategy Local interactions – fragments – Fragment library representing accessible local structures for all short sequences in a protein chain, derived from known structures Global (non-local) interactions – scoring function – Derived from conformational statistics of known structures

9 Scoring and Sampling

10 The basic assumption in structure prediction Native structure located in global minimum (free) energy conformation (GMEC) ➜ A good Energy function can select the correct model among decoys ➜ A good sampling technique can find the GMEC in the rugged landscape E E GMEC Conformation space

11 Two-Step Procedure 1.Low-resolution step locates potential minima (fast) 2.Cluster analysis identifies broadest basins in landscape 3.High-resolution step can identify lowest energy minimum in the basins (slow) GMEC E E Conformation space

12 Structure Representation: Equilibrium bonds and angles (Engh & Huber 1991) Centroid: average location of center of mass of side- chain (Centroid | aa, ,  ) No modeling of side chains Fast Low-Resolution Step

13 Bayes Theorem: Independent components prevent over-counting P(str | seq) = P(str)*P(seq|str) / P(seq) Low-Resolution Scoring Function constant sequence- dependent features sequence- dependent features structure dependent features structure dependent features

14 Bayes Theorem: P(seq | str) P(str | seq) = P(str) * P(seq | str) / P(seq) Score = S env + S pair + … neighbors: C  -C  <10Ǻ Sequence-Dependent Components Rohl et al. (2004) Methods in Enzymology 383:66 Origin: Simons et al., JMB 1997; Simons et al., Proteins 1999

15 P(str) P(str | seq) = P(str) * P(seq | str) / P(seq) Score = … + Sr g + Sc  + S vdw + … Structure-Dependent Components

16 P(str) P(str | seq) = P(str) * P(seq | str) / P(seq) Score = … + S ss + … Structure-Dependent Components

17 P(str) P(str | seq) = P(str) * P(seq | str) / P(seq) Score = … + S sheet + S hs + … + S rama 10 Structure-Dependent Components

18 Slow, exact step Locates global energy minimum Structure Representation: All-atom (including polar and non-polar hydrogens, but no water) Side chains as rotamers from backbone-dependent library Side chain conformation adjusted frequently High-Resolution Step Dunbrack 1997

19 Side chains have preferred conformations They are summarized in rotamer libraries Select one rotamer for each position Best conformation: lowest-energy combination of rotamers High-Resolution Step: Rotamer Libraries Serine  1 preferences t=180 o g - =-60 o g + =+60 o

20 High-Resolution Scoring Function Major contributions: – Burial of hydrophobic groups away from water – Void-free packing of buried groups and atoms – Buried polar atoms form intra-molecular hydrogen bonds

21 Packing interactions Score = S LJ(atr + rep) + …. r ij Linearized repulsive part e: well depth from CHARMm19 High-Resolution Scoring Function

22 Implicit solvation Score = … + S solvation + …. Lazaridis & Karplus, Proteins 1999 solvation free energy density of i polar High-Resolution Scoring Function x ij =(r ij - R i )/ i x ij 2 x ji 2

23 N H OC d   (Kortemme, 2003; Morozov 2004) Hydrogen Bonds (original function) Score = …. + S hb(srbb+lrbb+sc) + …. sr bb : short range, backbone HB lr bb : long range, backbone HB sc: HB with side chain atom High-Resolution Scoring Function

24 Hydrogen Bonding Energy Based on statistics from high-resolution structures in the Protein Data Bank (rcsb.org) (Kortemme, Morozov & Baker 2003 JMB) Slide from Jeff Gray ]

25 Rotamer preference Score = … + S dunbrack + …. Dunbrack, 1997 High-Resolution Scoring Function

26 One long, generic function …. Score = S env + S pair + Sr g + Sc  + S vdw + S ss + S sheet + S hs + S rama + S hb (srbb + lrbb) + docking_score + S disulf_cent + S r  + S co + S contact_prediction + S dipolar + S projection + S pc + S tether + S  + S  + S symmetry + S splicemsd + ….. docking_score = S d env + S d pair + S d contact + S d vdw + S d site constr + S d + S fab score Score = S LJ(atr + rep) + S solvation + S hb(srbb+lrbb+sc) + S dunbrack + S pair – S ref + S prob1b + S intrares + S gb_elec + S gsolt + S h2o (solv + hb) + S _plane Scoring Function: Summary

27 Representations of protein structure: Cartesian and polar coordinates Position PHI PSI OMEGA CHI1 CHI2 CHI3 CHI4 1 0.00 -60.00 -180.00 -60.00 0.00 0.00 0.00 2 3 …. … PDB x y z ATOM 490 N GLN A 31 52.013 -87.359 -8.797 1.00 7.06 N ATOM 491 CA GLN A 31 52.134 -87.762 -10.201 1.00 8.67 C ATOM 492 C GLN A 31 51.726 -89.222 -10.343 1.00 10.90 C ATOM 493 O GLN A 31 51.015 -89.601 -11.275 1.00 9.63 O ….. ….

28 2 ways to represent the protein structure Cartesian coordinates (x,y,z; pdb format)  Intuitive – look at molecules in space  Easy calculation of energy score (based on atom- atom distances) – Difficult to change conformation of structure (while keeping bond length and bond angle unchanged) Polar coordinates (  equilibrium angles and bond lengths)  Compact (3 values/residue)  Easy changes of protein structure (turn around one or more dihedral angles) – Non-intuitive – Difficult to evaluate energy score (calculation of neighboring matrix complicated)

29 A snake in the 2D world Cartesian representation: points: (0,0),(1,1),(1,2),(2,2),(3,3) connections (predefined): 1-2,2-3,3-4,4-5 x y (0,0) (1,1) (1,2) (2,2) (3,3) 1-2 2-3 3-4 4-5 1 1 2 2 3 3 4 4 5 5

30 A snake in the 2D world Internal coordinates: bond lengths (predefined): √2,1,1,√2 angles: 45 0,90 o,0 o,45 o x y √2 1 1 1 1 x y 45 o 90 o From wikipedia

31 A snake wiggling in the 2D world Constraint: keep bond length fixed Move in Cartesian representation (0,0),(1,1),(1,2),(2,2),(3,3)  (0,0),(1,1),(1,2),(2,2),(3,0) Bond length changed! x y √2 √3

32 A snake wiggling in the 2D world Constraint: keep bond length fixed Move in polar coordinates 45 0,90 o,0 o,45 o  45 0,90 o,45 o,45 o Bond length unchanged! Large impact on structure x y

33 Polar  Cartesian coordinates Convert r and  to x and y (0,0),(1,1),(1,2),(2,2),(3,3) 45 0,90 o,0 o,45 o √2,1,1,√2 x y From wikipedia

34 Cartesian  polar coordinates Convert x and y to r and  (0,0),(1,1),(1,2),(2,2),(3,3) 45 0,90 o,0 o,45 o √2,1,1,√2 x y

35 Moving the snake to the 3D world x y Cartesian representation: points: additional z-axis (0,0,0),(1,1,0),(1,2,0),(2,2,0),(3,3,0) connections (predefined): 1-2,2-3,3-4,4-5 Internal coordinates: bond lengths (predefined): √2,1,1,√2 angles: 45 0,90 o,0 o,45 o dihedral angles: 180 0,180 o z Proteins: bond lengths and angles fixed. Only dihedral angles are varied

36 Dihedral angles Dihedral angles  1 -  4 define side chain From wikipedia Dihedral angle: defines geometry of 4 consecutive atoms (given bond lengths and angles)

37 What we learned from our snake x y Cartesian representation: Easy to look at, difficult to move – Moves do not preserve bond length (and angles in 3D) Internal coordinates: Easy to move, difficult to see – calculation of distances between points not trivial z Proteins: bond lengths and angles fixed. Only dihedral angles are varied

38 Solution: toggle CALCULATE ENERGY - Cartesian coordinates: Derive distance matrix (neighbor list) for energy score calculation CALCULATE ENERGY - Cartesian coordinates: Derive distance matrix (neighbor list) for energy score calculation Transform: build positions in space according to dihedral angles PDB x y z ATOM 490 N GLN A 31 52.013 -87.359 -8.797 1.00 7.06 N ATOM 491 CA GLN A 31 52.134 -87.762 -10.201 1.00 8.67 C ATOM 492 C GLN A 31 51.726 -89.222 -10.343 1.00 10.90 C ATOM 493 O GLN A 31 51.015 -89.601 -11.275 1.00 9.63 O ….. …. MOVE STRUCTURE - Polar coordinates: introduce changes in structure by rotating around dihedral angle(s) (change  values) MOVE STRUCTURE - Polar coordinates: introduce changes in structure by rotating around dihedral angle(s) (change  values) Position PHI PSI OMEGA CHI1 CHI2 CHI3 CHI4 1 0.00 -60.00 -180.00 -60.00 0.00 0.00 0.00 2 3 …. … Transform: calculate dihedral angles from coordinates (0,0),(1,1),(1,2),(2,2),(3,3)45 0,90 o,0 o,45 o

39 Cartesian  polar coordinates Position PHI PSI OMEGA CHI1 CHI2 CHI3 CHI4 ….. 32 -59.00 -60.00 -180.00 0.00 0.00 0.00 0.00 33 34 …. … PDB x y z … ATOM 490 C GLN A 31 52.013 -87.359 -8.797 1.00 7.06 N ATOM 491 N GLY A 32 52.134 -87.762 -10.201 1.00 8.67 C ATOM 492 CA GLY A 32 51.726 -89.222 -10.343 1.00 10.90 C ATOM 493 O GLY A 32 51.015 -89.601 -11.275 1.00 9.63 O ….. …. How to calculate polar from Cartesian coordinates: example  : C’-N-Ca-C – define plane perpendicular to N-Ca (b 2 ) vector – calculate projection of Ca-C (b 3 ) and C’-N (b 1 ) onto plane – calculate angle between projections (0,0),(1,1),(1,2),(2,2),(3,3)45 0,90 o,0 o,45 o

40 Polar  Cartesian coordinates Position PHI PSI OMEGA CHI1 CHI2 CHI3 CHI4 ….. 32 -59.00 -60.00 -180.00 0.00 0.00 0.00 0.00 33 34 …. … PDB x y z … ATOM 490 C GLN A 31 52.013 -87.359 -8.797 1.00 7.06 N ATOM 491 N GLY A 32 52.134 -87.762 -10.201 1.00 8.67 C ATOM 492 CA GLY A 32 51.726 -89.222 -10.343 1.00 10.90 C ATOM 493 O GLY A 32 51.015 -89.601 -11.275 1.00 9.63 O ….. …. Find x,y,z coordinates of C, based on atom positions of C’, N and Ca, and a given  value (  : C’-N-Ca-C) create Ca-C vector: – size Ca-C=1.51A (equilibrium bond length) – angle N-Ca-C= 111 o (equilibrium value for N-Ca-C angle) rotate vector around N-Ca axis to obtain projections of Ca-C and N-C’ with wanted  (0,0),(1,1),(1,2),(2,2),(3,3) 45 0,90 o,0 o,45 o

41 Representation of protein structure 43128756 Rosetta folding 3 backbone dihedral angles per residue Sampling and minimization in TORSIONAL space: change angle and rebuild, starting from changed angle Build coordinates of structure starting from first atom, according to dihedral angles (and equilibrium bond length and angle) 43128756 87 Based on slides by Chu Wang

42 Representation of protein structure 43128756 431287564’3’1’2’8’7’5’6’ Backbone dihedral angles fixed (rigid-body) Rosetta folding 3 backbone dihedral angles per residue Rosetta docking 6 rigid-body DOFs -- 3 translational vectors 3 rotational angles Sampling and minimization in TORSIONAL space Sampling and minimization in RIGID-BODY space How can those two types of degrees of freedom be combined?

43 Fold tree representation “long-range” edge – 6 rigid-body DOFs 4’3’1’2’8’7’5’6’ “peptide” edge – 3 backbone dihedral angles 43128756 Example: fold-tree based docking  Originally developed to improve sampling of strand registers in  -sheet proteins.  Allows simultaneous optimization of rigid-body and backbone/sidechain torsional degrees of freedom. Fold tree: Bradley and Baker, Proteins (2006) 4’3’1’2’8’7’5’6’  Construct fold-trees to treat a variety of protein folding and docking problems.

44 Fold-trees for different modeling tasks protein folding NC N: N-terminal; C: C-terminal; X: chain break; O: root of the tree; Flexible “peptide” edgerigid “peptide” edge 11’ rigid “jump” 11’ flexible “jump” Color – flexible bb Gray – fixed bb

45 Fold-trees for different modeling tasks N11’C22’xx loop modeling N: N-terminal; C: C-terminal; X: chain break; O: root of the tree; Flexible “peptide” edgerigid “peptide” edge 11’ rigid “jump” 11’ flexible “jump” Color – flexible bb Gray – fixed bb

46 Fold-trees for different modeling tasks N1C N1’C fully flexible docking N: N-terminal; C: C-terminal; X: chain break; O: root of the tree; Flexible “peptide” edgerigid “peptide” edge 11’ rigid “jump” 11’ flexible “jump” N1C N1’C docking w/ hinge motion N1 N1’C 22’xC 3’3x docking w/ loop modeling Color – flexible bb Gray – fixed bb

47 Fold-trees for different modeling tasks Color – flexible bb Gray – fixed bb Pale – symmetry operation

48 Fold-trees for different modeling tasks Color – flexible bb Gray – fixed bb Filled colored circles - flexible sc

49 Fold-trees for different modeling tasks Color – flexible bb Gray – fixed bb Filled colored circles - flexible sc o empty colored circles – flexible amino acid: design

50 Fold-trees for different modeling tasks Color – flexible bb Gray – fixed bb Filled colored circles - flexible sc o empty colored circles – flexible amino acid: design

51 The Rosetta sampling strategy: a general overview 9 residue fragments 3 residue fragments Gradual addition of parameters to scoring function Quick quenching Fragment Sampling Strategies to keep fragment insertion/perturbation local Monte Carlo (MC) Sampling MC sampling with minimization Local optimization Repacking and refinement Side chain rearrangement


Download ppt "2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and."

Similar presentations


Ads by Google