Presentation is loading. Please wait.

Presentation is loading. Please wait.

UNC Chapel Hill David A. O’Brien Chain Growing Using Statistical Energy Functions David A. O'Brien Balasubramanian Krishnamoorthy: Jack Snoeyink Alex Tropsha.

Similar presentations


Presentation on theme: "UNC Chapel Hill David A. O’Brien Chain Growing Using Statistical Energy Functions David A. O'Brien Balasubramanian Krishnamoorthy: Jack Snoeyink Alex Tropsha."— Presentation transcript:

1 UNC Chapel Hill David A. O’Brien Chain Growing Using Statistical Energy Functions David A. O'Brien Balasubramanian Krishnamoorthy: Jack Snoeyink Alex Tropsha Andrew Leaver-Fey Shuquan Zong

2 UNC Chapel Hill David A. O’Brien Overview  Lattice Chain Growth Algorithm  Statistical Energy Functions  2-body Miyazawa-Jernigan Potential  4-body Potential  Local Shape Potential  Results  Chains  Identifying Good Decoys  Current Work  New Scoring Functions  Incremental Tetrahedralization  Future work

3 UNC Chapel Hill David A. O’Brien Chain Growing - Introduction  Lattice Chain Growing Goals:  Test measures of proteins  Build protein chains that maximize a given measure  If these chains appear native like, confirms that this is valid measure  Predict protein structures from just sequence information, ab initio.  Develop an algorithm to build 3D folded protein decoys from the sequence that are similar to the native structure  Evaluate these decoys and determine which are native-like. In short, be able to pick the most native-like structure from the large set of decoys we will generate.

4 UNC Chapel Hill David A. O’Brien Lattice Chain Growth Algo.  Cubic lattice (311) w/ 24 possible moves {(3,1,1),(3,1,-1),…,(-3,1,1)}  Generate chain configuration by sequential addition of links until full length of chain is reached.  New links can not be placed in the zone of exclusion of of other links and must satisfy angle constraints.

5 UNC Chapel Hill David A. O’Brien Lattice Chain Growth Algo.: Adding a new link  Generate a set of possible open lattice nodes.  For each, calculate a temperature-dependent transition probability.  Choose one of these open lattice nodes with a Monte Carlo step.  Variations such as look 2 steps ahead or building from middle

6 UNC Chapel Hill David A. O’Brien Temperature-Dependent Transition Probability  Probability at step i of picking configuration x’ from x 1 … x C :  T = temperature  k B = Boltzman Constant  E = Energy (Lower is better.)

7 UNC Chapel Hill David A. O’Brien Overview  Lattice Chain Growth Algorithm  Statistical Energy Functions  2-body Miyazawa-Jernigan Potential  4-body Potential  Local Shape Potential  Results  Chains  Identifying Good Decoys  Current Work  New Scoring Functions  Incremental Tetrahedralization  Future work

8 UNC Chapel Hill David A. O’Brien Statistical Energy Functions  Statistical energy functions assume that “contact” energies between amino acid residues in native proteins are related to their observed frequency in a representative structural database.  If a potential configuration (decoy) has a certain set of nearby residues that is common in nature, give this a good score.  Score for entire protein is sum of all contact energies.  We use three statistical energy functions:  2-body Miyazawa-Jernigan  4-body Potential  Local Shape Potential

9 UNC Chapel Hill David A. O’Brien Statistical Energy Functions Overview  Global vs. Local  Global:Measures well the entire protein (or partial fragment)  Local:Measures just a small sequence of consecutive residues  2-body Miyazawa-Jernigan  Easy to calculate  Can be global or local  4-body Potential  Expensive to calculate  Works better as a global measure  Good for determining native-like folded structures  Local Shape Potential  Easy to calculate  Defined as a local measure  Global measure ?

10 UNC Chapel Hill David A. O’Brien Overview  Lattice Chain Growth Algorithm  Statistical Energy Functions  2-body Miyazawa-Jernigan Potential  4-body Potential  Local Shape Potential  Results  Chains  Identifying Good Decoys  Current Work  New Scoring Functions  Incremental Tetrahedralization  Future work

11 UNC Chapel Hill David A. O’Brien  For two-body potentials:  Actual  ij values are taken from the Miyazawa-Jernigan matrix as reevaluated in 1996 Two-body Statistical Energy Function Miyazawa S, Jernigan RL. Residue residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J Mol Biol 1996;256: 623 644.

12 UNC Chapel Hill David A. O’Brien Overview  Lattice Chain Growth Algorithm  Statistical Energy Functions  2-body Miyazawa-Jernigan Potential  4-body Potential  Local Shape Potential  Results  Chains  Identifying Good Decoys  Current Work  New Scoring Functions  Incremental Tetrahedralization  Future work

13 UNC Chapel Hill David A. O’Brien  Calculates the energy based on a sets of 4 nearby residues (quad).  Quads calculated from the Delaunay Tessellation.  The 4 vertices of each tetrahedra define a quad.  Each quad is given a statistical score. Four-Body Statistical Energy Function Convex hull formed by the tetrahedral edges Each tetrahedron corresponds to a cluster of four residues

14 UNC Chapel Hill David A. O’Brien Four-Body Statistical Energy Function - Overview  Four-body potential is written.  Training set of 1166 proteins were tessellated  Frequency of each quad type is counted  Each quad is typed in two ways  by the combination of the four residue types {i,j,k,l}  by the number of consecutively appearing residues (  ) 25.5% 35.6%11.4% 22.1% 5.4%

15 UNC Chapel Hill David A. O’Brien Four-Body Statistical Energy Function - Classifying quadruplets  Denote each quad by {i,j,k,l}  i,j,k and l can be any of the 20 amino acids (L20)  e.g. AALV, TLKM, TTLK, YYYY etc.  8855 possible combinations  Or 20 amino acids can be grouped into just 6 types (L6)  Groups defined by chemical properties of amino acids  126 possible combinations c={cysteine}f={phenylaline, tyrosine, tryptophan} h={histiine, arginine, lysine} n={asparagine, aspartic acid, glutamine, glutamic acid} s={serine, threonine, proline, alanine, glycine} v={methionine, isoleucine, leucine, valine}

16 UNC Chapel Hill David A. O’Brien Four-Body Statistical Energy Function - Classifying quadruplets  L20 Case:  5  -types x 8855 combination ==> 44,275 quad types  Not all quad types observed in training set  Potential of unfound types set to some fraction of the lowest score for a represented quad type.  L6 Case:  5  -types x 126 combination ==> 630 quad types  All but a few quad types observed in training set

17 UNC Chapel Hill David A. O’Brien Four-Body Statistical Energy Function - Formulation  Formulation is an extension of the previous 2-body formula: where,

18 UNC Chapel Hill David A. O’Brien Overview  Lattice Chain Growth Algorithm  Statistical Energy Functions  2-body Miyazawa-Jernigan Potential  4-body Potential  Local Shape Potential  Results  Chains  Identifying Good Decoys  Current Work  New Scoring Functions  Incremental Tetrahedralization  Future work

19 UNC Chapel Hill David A. O’Brien  Motivation :  Fragment libraries model protein structures accurately.  Use the frequency of common fragments to construct a statistical function that supplements the 2 and 4-body energy functions to grow better decoys  Good fragment libraries exist, but for the lattice-chain building we need fragments that fit in the 311 lattice  Main Idea:  For each possible consecutive sequence of four residues, i, j, k, and l, calculate in which shape these residues most often occur. Shape – A Shape – B  If Shape – A is found more often in nature, try to build chain accordingly Local Shape Statistical Energy Function

20 UNC Chapel Hill David A. O’Brien  Create set of canonical lattice shapes of length 4 (and 5)  Calculate ways to embed chain of length 4 (or 5) in 311 lattice.  155 canonical shapes for length 4, (2789 for length 5)  For L6, there are 6 4 =1,296 sequences  155 x 1,296 = 200,880 combinations Parse representative set of 971 proteins into segments.  For each 4 length segment, calculate RMSD against each canonical shape Local Shape Statistical Energy Function … Shape 1 Shape 2 Shape 155 Sample protein

21 UNC Chapel Hill David A. O’Brien  Turning RMSD values into frequencies  If only the canonical shape with best RMSD are counted, not all 200,880 shapes found in training set.  If two canonical shapes have low RMSD, give each some credit  If each For each RMSD  i,j,k,l, i,j,k,l = residue type,  = shape  Normalize the 155 RMSD values Local Shape Statistical Energy Function

22 UNC Chapel Hill David A. O’Brien Overview  Lattice Chain Growth Algorithm  Statistical Energy Functions  2-body Miyazawa-Jernigan Potential  4-body Potential  Local Shape Potential  Results  Chains  Identifying Good Decoys  Current Work  New Scoring Functions  Incremental Tetrahedralization  Future work

23 UNC Chapel Hill David A. O’Brien  Decoys produced by the Chain Growing still not good enough.  Relatively good correlation between RMSD and 4-Body Energy.  2mhu Built with MJ PotentialLocal Shape Pot. Results-Building Decoys Native state Four-body Energy per residue

24 UNC Chapel Hill David A. O’Brien Overview  Lattice Chain Growth Algorithm  Statistical Energy Functions  2-body Miyazawa-Jernigan Potential  4-body Potential  Local Shape Potential  Results  Chains  Identifying Good Decoys  Current Work  New Scoring Functions  Incremental Tetrahedralization  Future work

25 UNC Chapel Hill David A. O’Brien  20L or 6L Non-bonded  Sum only the contribution of  -type 0 tetrahedra. Identifying good Decoys

26 UNC Chapel Hill David A. O’Brien  Non-Bounded L20 scoring function applied to a set of folded and unfolded decoys. Discriminating Native & Non-Native

27 UNC Chapel Hill David A. O’Brien Overview  Lattice Chain Growth Algorithm  Statistical Energy Functions  2-body Miyazawa-Jernigan Potential  4-body Potential  Local Shape Potential  Results  Chains  Identifying Good Decoys  Current Work  New Scoring Functions  Incremental Tetrahedralization  Future work

28 UNC Chapel Hill David A. O’Brien  20L or 6L Non-bonded  Sum only the contribution of  -type 0 tetrahedra.  20L or 6L 5T  Sum contribution of all tetrahedra.  20L Ratio All  As above, but Define: Adjustments to Scoring Functions

29 UNC Chapel Hill David A. O’Brien Incremental Tetrahedralization  Maintain constant tetrahedralization and only add and remove single vertices.  When evaluating a new candidate, update total energy by tagging new quadruplets as well as any that have been removed.  Add the effect of the new, and subtract effect of those removed. Add candidate and evaluate. Add next candidate and reevaluate. Remove candidate and reset state.

30 UNC Chapel Hill David A. O’Brien References Generating folded protein structures with a lattice chain-growth algorithm. H.H. Gan, A. Tropsha and T. Schlick, J. Chem. Phys. 113, 5511-5524 (2000). Lattice protein folding with two and four-body statistical potentials. H.H. Gan, A. Tropsha and T. Schlick, Proteins: Structure, Function, and Genetics 43, 161-174 (2001). Miyazawa S, Jernigan RL. Residue–residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J Mol Biol 1996;256: 623–644. Tropsha A, Sigh RK, Vaisman LI. Delaunay tessellation of proteins: Four body nearest neighbor propensities of amino acid residues, J. Comput. Biol. 1996:3:2, 213-222 (1996). R. Kolodny, P. Koehl, L. Guibas and M. Levitt. Small libraries of protein fragments model native protein structures accurately, J. Mol. Biol., 323, 297-307 (2002).


Download ppt "UNC Chapel Hill David A. O’Brien Chain Growing Using Statistical Energy Functions David A. O'Brien Balasubramanian Krishnamoorthy: Jack Snoeyink Alex Tropsha."

Similar presentations


Ads by Google