Efficient Energy Computation for Monte Carlo Simulation of Proteins

Efficient Energy Computation for Monte Carlo Simulation of Proteins
Itay Lotan Fabian Schwarzer Jean-Claude Latombe Stanford University

Monte Carlo Simulation (MCS)
Popular method for studying the conformation space of proteins: Estimation of thermodynamic quantities over the space Search for low-energy conformations, in particular the native (folded) state Monte Carlo simulation is a common technique for studying the space of conformations that a protein molecule can assume. It has two main uses: To estimate some thermodynamic quantity over the entire space. To search the space for low energy conformations and in particular the native state

Preview of What’s to Come
Method for speeding up MCS of proteins Exploits the fact that a protein backbone is a kinematic chain Avoids the combinatorial explosion of atomic interactions Gives as much as 12X speed-up for proteins we tested In my talk I will present to you a new method for speeding up Mote Carlo simulation of proteins. Our method exploits the fact that a protein backbone can be represented as a kinematic chain. Our approach allows us to overcome the combinatorial explosion that is inherent to the calculation of pairwise atomic interactions. Our tests show that it speeds up the simulation by as much as a factor of 12 over the method currently used in the research community

MCS: What It Is Random walk through the conformation space of a protein that samples conformations on its path. Converges to the underlying distribution of conformations after enough time. Let me first briefly explain what is a Monte-Carlo simulation. It is a random walk through the space of conformations of a protein that collects a sample of the conformations it encounters along its path. This sample is guaranteed to converge to the underlying distribution of conformations if the simulation is allowed to run for a sufficient amount of time.

MCS: How It Works Propose random change in conformation
Each step of the simulation consists of: Proposing a random change to the current conformation. Usually a small number of DOFs are perturbed to produce a new conformation Computing the energy of the new conformation Accepting the move based on the Metropolis criterion, which uses the difference in energy to determine an acceptance probability Compute energy E of new conformation Accept new conformation with probability:

Energy Function Bonded terms: Non-bonded terms
Bond length, Bond angle, etc.. Non-bonded terms Van der Waals, Electrostatic and heuristic Most energy functions sum up two kinds of terms: Bonded terms such as bond length, bond angle, dihedral angles etc.. There are a linear number of such terms Non-bonded terms such as van der Waals potential, Electrostatic potential and many heuristic potentials such as attraction between hydrophobic residues or between native contacts. Since non-bonded terms depend on distances between pairs of atoms, there are a quadratic number of them and thus computing all of the is computationally expensive.. Non-bonded terms depend on distances between pairs of atoms  O(n2), expensive to compute

Pairwise Interactions
Use cutoff distance (6 - 12Å) Only O(n) interactions (Halperin & Overmars ’98) O(1) interactions per atom Therefore a common practice by biologists is to use a cutoff distance. Since all the pairwise potentials that are used become negligible at long distances the error produced by cutoffs is small. The result of using a cutoff is that the number of pairs of atoms that actually contribute to the energy of a conformation is reduced dramatically. We call these pairs the “interacting pairs”. Using cutoffs there can be only a constant number of interactions per atom which adds up to a linear number for the whole molecule. The key to efficient computation of the energy is to be able to find all interacting pairs without going through all possible pairs. Find interacting pairs without enumerating all pairs!

Reusing Energy Terms Only few DOFs are changed at each step 1) 2)
It is also important to note that since only a few DOFs are changed at each step, large sub-chains remain rigid between steps. Therefore many energy terms that depend on pairs of atoms from the same rigid sub-chain will be unaffected by the change. Significant savings can be made by reusing previously computed partial sums that were not affected by the last chamge. Large sub-chains remain rigid between steps Many energy terms unaffected by change

Our Goal Improve computational efficiency of MCS by reducing average time to accept/reject a new conformation Independent of: Energy function Step generator Acceptance criterion We set out to improve the computational efficiency of MCS by reducing the average time it takes to accept or reject a new conformation, which is dominated by the time it takes to compute the energy of the new conformation. Our algorithm does not depend on The energy function one uses What steps are taken and how they are selected, The acceptance criterion that is used. Its efficiency stems from treating the protein backbone as a kinematic chain. Exploiting: protein backbone is kinematic chain

Outline Related work The ChainTree Energy maintenance Tests Conclusion

Grid Method Subdivide space into cubic cells
Compute cell that contains each atom center Store results in hash table dcutoff The prevailing algorithm currently used to compute energy in MC simulation is the grid method. The space is divided into cubic cells, and for each atom center, the cell that contains it is computed. A hash table can be used to index the grid cells so there is no need to allocate all cells a priori The atoms that interact with a given atom are found by looking in that atom’s grid cell and in its immediate neighbors.

Grid Method – cont. Θ(n) time to recompute
O(1) time to find interactions for each atom Θ(n) to find all interactions in all cases No way of detecting unchanged interactions It takes linear time to recompute the grid after each step since the positions of all atoms may need to be recomputed. Once the grid is recomputed it takes constant time to find the interactions of each atom, so altogether the grid requires linear time to compute all interactions. This bound is achieved at each step, independent of the actual number of interacting pairs. The grid does differentiate between new interactions and unchanged interactions, making the reuse of partial energy sums complicated. Since there could be as many as O(n) interactions the grid is asymptotically optimal in the worst case. Asymptotically optimal in worst-case!

The ChainTree TNO= TJK*TKL TJK TKL BV(A,B) BV(C,D)
The chaintree is a binary tree of bounding volumes and coordinate transforms that is superimposed on the chain. The leaves of the tree correspond to the links of the chain which are rigid pieces of the protein backbone with their attached side-chain. Each leaf node holds the bounding volume of its corresponding link and the transform to the reference frame of the next link. Each internal node has the frame of its left child associated with it. It hold the bounding volume of its two children’s bounding volumes and the transform to the frame of the next node at its level. For example, the transform from the frame of node J to that of node K is stored at node J, and the transform from node K to node L is held by node K. Since J and N share the same coordinate frame and L and O shre the same coordinate frame, the transform from N to O is the product of the transform from J to K and the transform from K to L. We say that Tno shortcuts the transform Tjk and Tkl. Also The BV of the BVs held at nodes A and B is stored at node J, and the BV of the BVs held at nodes C and D is stored at node K.

Updating the ChainTree
When a change is applied to a DOF of the chain, all transforms that shortcut this DOF are updated as well as all BVs that enclose the two links that are connected by this DOF. This is done by tracing the path from the leaf node immediately to the left of the changed DOF to the root of the tree. When multiple DOFs are changed simultaneously, the chaintree is updated one level at a time to ensure that each node is updated only once. All updated nodes are marked by a time stamp which will be used when searching for interacting pairs. The complexity of the update is O(log N) per DOF. For example, when the DOF between links F and G is changed, The transforms held at F and L need to be recomputed as well as the BVs held at nodes O and P Update path to root: Recompute transforms that shortcut change Recompute BVs that contain change

Finding Interacting Pairs
We need to find all interacting pairs. In order to be efficient we will only search for the interactions that were affected by the last change and ignore at this stage the interactions that did not change. We conduct the search by testing the ChainTree against itself in the following manner: Test the ChainTree against itself

The search begins at the root and proceeds downward. In this example we start by computing the distance between the BV at node P and itself. Since the distance is trivially 0 we proceed to test all pairs of children of P, namely N vs. N, N vs. O and O vs. O. The general rule is that whenever two BV’s are found to be closer than the cutoff distance, all pairs of their children are tested. When we need to test two leaf nodes, we actually examine all pairs of atoms, one from each leaf. Two rules help us speed up the search and avoid finding interacting pairs that were not affected by the last change: We do not search inside rigid sub-chains. Such sub-chains have a completely unmarked sub-hierarchy above them. We do not test two nodes that do not have a marked node betweens them Do not search inside rigid sub-chains (unmarked nodes) Do not test two nodes with no marked node in between

We can gain a better understanding of the search procedure by examining a tree-like diagram of all the tests that we can possibly perform during the search. If a change is performed to the DOF between links F and G and we saw before the search the two rules I just mentioned cause the search path to be pruned at the places that are marked on the diagram. For example, the search is pruned when testing node N vs. itself since node N was not affected by the latest change. The part of the diagram that is actually visited during the search is highlighted in green. The relative size of this part decreases as the size of the protein we are dealing with grows. It increases as the number of DOFs that changed simultaneously is increased.

Summing the Interactions
At each step need to sum contribution of: New interactions Changed interactions Unchanged interactions The energy of the new conformation is a sum the energetic contribution of all the interactions. The types of interactions contribute to this sum: New interactions that were brought about by the last change Interactions that existed before the last change and still exist after it, with the interaction distance changing. Interactions that were not affected by the last change. The first two kinds of interactions are discovered by the chaintree search method and we will need to sum them all up from scratch. The third kind is ignored by the chaintree to make it more efficient. We have computed the energetic contribution of these interactions before, and we have even summed then up. The challenge here is to be able to retrieve those sums efficiently. (1) & (2) are found by ChainTree search How to retrieve (3) efficiently?

The EnergyTree A caching scheme for partial energy sums:
Efficient to update Efficient to query In order to do that we use a caching scheme of partial energy sums which has the structure of the tree-like diagram we saw before. We call this tree the energy tree. Each leaf stores the total energy contribution of the interactions between the two links it represents. Each internal node holds the sum of the values stored at its immediate children. This scheme is efficient to update and compute since we traverse the energy tree simultaneously with the chaintree search we perform, updating changed sums when necessary and retrieving unchanged sums at no extra cost.

Using the EnergyTree E(N,N) E(J,L) E(L,L) E(K,L) E(M,M)
Let’s illustrate this with our running example. We traverse the energytree together with the chaintree search. Recall that the search was pruned at these places in the search diagram. Thus when the search is pruned when testing node N vs. itself, the traversal of the energytree is at the node N,N where the sum of all interactions inside the subchain N is ready to be retrieved. The same holds true for all other places where the search is pruned. As the search recursion unwinds, values stored in the energytree are updated, and finally the total energy for the protein can be retrieved from the root of the energytree.

Test Setup Energy function: 300,000 steps MCS
Van der Waals Electrostatic Attraction between native contacts Cutoff at 12Å 300,000 steps MCS Early rejection for large vdW terms We compared the performance of our algorithm, which we call ChainTree to that of the grid method. We used an energy function that had van der Waals and electrostatic potentials as well as a quadratic well attractive potential between native contacts. We ran the exact same simulation for 300,000 steps using both algorithms. For each algorithm, when in the middle of the energy computation a very large van der Waals term was detected, the computation is stopped and th e conformation rejected.

Results: 1-DOF change (68) (144) (374) (755)
We performed the simulation on 4 proteins of various sizes. At each step changing only one DOF. The simulation started with an unfolded conformation. Our algorithm performed 3.5 times faster for 1CTF and 12 time faster for 1JB0.

Results: 5-DOF change (68) (144) (374) (755)
We performed the same simulation, only this time changing 5 DOFs at every move. Our algorithm was 1.7 times faster for 1CTF and almost 6 times faster for 1JB0. The speed-up of our algorithm decreases as the number of simultaneous DOF changes is increased.

Conclusion Novel method to reduce average time per step in MCS of proteins Exploits kinematic chain nature of protein Significant speed-up for small number of simultaneous DOF changes Better for larger proteins We have presented a new algorithm that reduces the average time it takes to perform a step of MCS of proteins. This reduction is achieved by exploiting the kinematic chain nature of proteins. Our experimental result shows significant speed ups over the current grid method when the number of DOF changes per step is small Our algorithm performs better for larger proteins.

MCS Software http://robotics.stanford.edu/~itayl/mcs
EEF1 force field (Lazaridis & Karplus ’99) Backbone DOFs (Φ,Ψ) and fixed rotamers for side-chains (Dunbrack & Cohen ’97) Classical MCS with simple move-set Download and customize Finally, I would like to encourage you try our implementation of the methods I just mentioned to speed up your MC simulations. We have implemented the EEF1 force-field of Lazaridis and Karplus, which is based on the CHARMM19 force field with the addition of an implicit solvent term. Our protein model allows for changes to Phi and Psi backbone angles as well as changing side-chain rotamers from a fixed set taken from the backbone independent rotamer library of Dunbrack. The acceptance criterion we use is the metropolis criterion and the changes at each step are taken from a simple set of moves. You may freely download this software and customize it for your own needs. It is very easy to implement a different acceptance criterion or a different move set, and with some more work even a different force-field.

Efficient Energy Computation for Monte Carlo Simulation of Proteins

Similar presentations

Presentation on theme: "Efficient Energy Computation for Monte Carlo Simulation of Proteins"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Efficient Energy Computation for Monte Carlo Simulation of Proteins

Similar presentations

Presentation on theme: "Efficient Energy Computation for Monte Carlo Simulation of Proteins"— Presentation transcript:

Similar presentations

About project

Feedback