# Basics of molecular dynamics simulations

## Presentation on theme: "Basics of molecular dynamics simulations"— Presentation transcript:

Basics of molecular dynamics simulations
Reduction of the quantum problem to a classical one Parameterization of force fields Boundaries, energy minimization, integration algorithms Analysis of trajectory data Lecture notes are at :

Quantum, classical and stochastic description of molecular systems
Quantum mechanics (Schroedinger equation) The most fundamental approach but feasible only for few atoms (~10). Approximate methods (e.g. density functional theory) allows treatment of larger systems (~1000) and dynamic simulations for picoseconds. Classical mechanics (Newton’s equation of motion) Most atoms are heavy enough to justify a classical treatment (except H). The main problem is finding accurate potential functions (force fields). MD simulation of 100,000 atoms for many nanoseconds is now feasible. Stochastic mechanics (Langevin equation) Most biological processes occur in the range of microsec to millisec. Thus to describe such processes, a simpler (coarse-grained) representation of atomic system is essential (e.g. Brownian dynamics).

Many-body Schroedinger eq. for a molecular system
(1) nuclear hamiltonian electronic hamiltonian elect-nucl. interact. Here m and Mi are the mass of the electrons and nuclei, ra and Ri denote the electronic and nuclear coordinates, and a and i denote the respective gradients.

Separation of the electronic wave function
Nuclei are much heavier and hence move much slower than electrons. This allows decoupling of their motion from those of electrons. Introduce the product wave function: Substituting this ansatz in the Schroedinger eq. gives For fixed nuclei, the electronic part gives (2)

Substitute the electronic part back in the Schroedinger eq.
Born-Oppenheimer (adiabatic) approximation consists of neglecting the cross terms arising from (which are of order m/M), so that the nuclear part becomes (3) Eqs. (2, 3) need to be solved simultaneously, which is a formidable problem for most systems. For two nuclei, there is only one coordinate for R (the distance), so it is feasible. But for three-nuclei, there are 4 coordinates (in general for N nuclei, 3N-5 coordinates are required), which makes numerical solution very difficult.

Classical approximation for nuclear motion
Nuclei are heavy so their motion can be described classically, that is, instead of solving the Schroedinger Eq. (3), we solve the corresponding Newton’s eq. of motion (4) At zero temperature, the potential can be minimized with respect to the nuclear coordinates to find the equilibrium conformation of molecules. At finite temperature, Eqs. (2) and (4) form the basis of ab initio MD (ignores quantum effects in nuclear motion and electronic exc. at finite T)

Methods of solution for the electronic equation
Electronic part of the Schroedinger eq. (2) has the form (2) Two basic methods of solution: Hartree-Fock (HF) based methods: HF is a mean field theory. One finds the average, self-consistent potential in which electrons move. Electron correlations are taken into account using various methods. 2. Density functional theory: Solves for the density of electrons. Better scaling than HF (which is limited to ~10 atoms); 100’s of atoms. Car-Parrinello MD (DFT+MD) has become popular in recent years.

Density functional theory (DFT)
In DFT one uses the density function of electrons, n(r) instead of their wavefunction Expectation value of the electronic energy in the ground state becomes

Classical mechanics Molecular dynamics (MD) is the most popular method for simulation studies of biomolecules. It is based on Newton’s equation of motion. For N interacting atoms, one needs to solve N coupled Diff. Eq’s: Force fields are determined from experiments and ab initio methods. Analytically this is an intractable problem for N>2, but we can solve it easily on a computer using numerical methods. Current computers can handle N ~ particles, which is large enough for description of most biomolecules. Integration time, however is still a bottleneck (106 1 fs = 1 ns)

Stochastic mechanics In order to deal with the time bottle-neck in MD, one has to simplify the simulation system (coarse graining). This is achieved by describing parts of the system as continuum with dielectric constants. Examples: transport of ions in electrolyte solutions (water → continuum) protein folding (water → continuum) function of transmembrane proteins (lipid, water → continuum) To include the effect of the atoms in the continuum, modify Newton’s eq. of motion by adding frictional and random forces: Langevin equation

Molecular dynamics (MD) simulations
MD programs consist of over 100,000 lines of coding. To understand what is going on in an MD simulation, we need to consider several topics: Force fields (potential functions among atoms) Boundaries and computation of long-range interactions Molecular mechanics (T=0, energy minimization) MD simulations (integration algorithms, ensembles) Analysis of trajectory data to characterize structure and function of proteins

Empirical force fields (potential functions)
Problem: Reduce the ab initio potential energy surface among the atoms To a classical many-body interaction in the form Such a program has been tried for water but so far it has failed to produce a classical potential that works. In strongly interacting systems, it is difficult to describe the collective effects by summing up the many-body terms. Practical solution: Truncate the series at the two-body level and assume (hope!) that the effects of higher order terms can be absorbed in U2

Non-bonded interactions
Interaction of two atoms at a distance R = |Ri - Rj| can be decomposed into 4 pieces Coulomb potential Induced polarization Attractive dispersion (van der Waals) (1/R6) Short range repulsion (e-R/a) The last two terms are combined into a 6-12 Lennard-Jones potential Combination rules for different atoms:

12-6 Lennard-Jones potential (U is in kT, r in Å)

Because the polarization interaction is many-body and requires iterations,
it has been neglected in current generations of force fields. The non-bonded interactions are thus represented by the Coulomb and 12-6 LJ potentials. Popular water models (rigid) Model RO-H (Å) qHOH qH (e) e (kT) s (Å) m (D) eps (T=298) SPC ±5 TIP3P ±7 Exp SPC: simple point charge TIP3P: transferable intermolecular potential with 3 points

Covalent bonds In molecules, atoms are bonded together with covalent bonds, which arise from sharing of electrons in partially occupied orbitals. If the bonds are very strong, the molecule can be treated as rigid as in water molecule. In most large molecules, however, the structure is quite flexible and this must be taken into account for a proper description of the molecules. This is literally done by replacing the bonds by springs. The nearest neighbour interactions involving 2, 3 and 4 atoms are described by harmonic and periodic potentials bond stretching bending torsion (not very good)

Interaction of two H atoms in the ground (1s) state can be described
using Linear Combinations of Atomic Orbitals (LCAO) From symmetry, two solutions with lowest and highest energies are: + Symmetric - Anti-symmetric

The R dependence of the potential energy is approximately given by the Morse potential
where De: dissociation energy R0: equilibrium bond distance Controls the width of the potential Classical representation: k

An in-depth look at the force fields
Three force fields, constructed in the eighties, have come to dominate the MD simulations CHARMM Harvard) Optimized for proteins, works well also for lipids and nucleic acids AMBER (Kollman & UCSF) Optimized for nucleic acids, otherwise quite similar to CHARMM 3. GROMOS (Berendsen & van Groningen) Optimized for lipids and does not work very well for proteins (due to smaller partial charges in the carbonyl and amide groups) The first two use the TIP3P water model and the last one, SPC model. They all ignore the polarization interaction (polarizable versions are under construction)

Charm parameters for alanine
Partial charge (e) ATOM N | ATOM HN HN—N ATOM CA | HB1 ATOM HA | / GROUP HA—CA—CB—HB2 ATOM CB | \ ATOM HB | HB3 ATOM HB O==C ATOM HB | ATOM C ATOM O Total charge:

Bond lengths : kr (kcal/mol/Å2) r0 (Å) N CA (1) CA C (2) C N (2) (peptide bond) O C (2) (double bond) N H (2) HA CA (2) CB CA (3) HB CB (3) NMA (N methyl acetamide) vibrational spectra Alanine dipeptide ab initio calculations From alkanes

Bond angles : kq (kcal/mol/rad2) q0 (deg) C N CA (1) C N H (1) H N CA (1) N CA C (2) N CA CB (2) N CA HA (2) HA CA CB (2) HA CA C (2) CB CA C (2) N C CA (1) O C CA (2) O C N (1) Total 360 deg. Total 360 deg.

Dihedrals: Basic dihedral configurations trans cis Definition of the dihedral angle for 4 atoms A-B-C-D

Dihedral parameters: Vn (kcal/mol) n f0 (deg) Name C N CA C f N CA C N y CA C N CA w H N CA CB H N CA HA C N CA CB CA C N H O C N H O C N CA

Boundaries In macroscopic systems, the effect of boundaries on the
dynamics of biomolecules is minimal. In MD simulations, however, the system size is much smaller and one has to worry about the boundary effects. Using nothing is not realistic for bulk simulations. Minimum requirement: water beyond the simulation box must be treated using a continuum representation Most common solution: periodic boundary conditions. The simulation box is replicated in all directions just like in a crystal. The cube and rectangular prism are the obvious choices for a box shape.

Periodic boundary conditions in two dimensions: 8 nearest neighbours
Particles in the box freely move to the next box, which means when one moves out another appears from the opposite side of the same box. In 3-D, there are 26 nearest neighbours.

Treatment of long-range interactions
Problem: the number of non-bonded interactions grows as N2 where N is the number of particles. This is the main computational bottle neck that limits the system size. Because the Coulomb potential between two unit charges, has long range [U=560 kT/r (Å)], use of any cutoff is problematic and should be avoided. All the modern MD codes use the Ewald sum method to treat the long-range interactions without cutoffs. Here one replaces the point charges with Gaussian distributions, which leads to a much faster convergence of the sum in the reciprocal (Fourier) space. The remaining part (point particle – Gaussian) falls exponentially in real space, hence can be computed easily.

Molecular mechanics Molecular mechanics deals with the static features of biomolecular systems at T=0 K, that is, the energy is given solely by the potential energy of the system. Two important applications of molecular mechanics are: Energy minimization (geometry optimization): Find the coordinates for the configuration that minimizes the potential energy. Normal mode analysis: Find the vibrational excitations of the system built on the absolute minimum using the harmonic approximation.

Molecular dynamics In MD simulations, one follows the trajectories of N particles according to Newton’s equation of motion: where U(r1,…, rN) is the potential energy function consisting of bonded and non-bonded interactions. We need to consider: Integration algorithms, e.g., Verlet, Leap-frog Initial conditions and choice of the time step MD simulations at constant temperature and pressure Constrained dynamics for rigid molecules (SHAKE)

Integration algorithms
Given the position and velocities of N particles at time t, integration of Newton’s equation of motion yields at t+Dt (*) In the popular Verlet algorithm, one eliminates velocities using the positions at t-Dt, Adding eq. (*) yields:

In the Leap-frog algorithm, the positions and velocities are calculated at different times separated by Dt/2 To show its equivalence to the Verlet algorithm, consider Subtracting the two equations yields the Verlet result. If required, velocity at t is obtained from:

To iterate these equations, we need to specify the initial conditions.
The initial configuration of biomolecules can be taken from the Protein Data Bank (PDB) (if available). In addition, membrane proteins need to be embedded in a lipid bilayer. All the MD codes have facilities to hydrate a biomolecule, i.e., fill the void in the simulation box with water molecules at the correct density. Ions can be added at random positions. After energy minimization, these coordinates provide the positions at t=0. Initial velocities are sampled from a Maxwell-Boltzmann distribution:

MD simulations at constant temperature and pressure
MD simulations can be performed in the NVE ensemble, where all 3 quantities are constant. Due to truncation errors, keeping the energy constant in long simulations can be problematic. To avoid this problem, the alternative NVT and NPT ensembles are employed. The temperature of the system can be obtained from the average K. E. Thus an obvious way to keep the temperature constant at T0 is to scale the velocities as: Because K. E. has considerable fluctuations, this is a rather crude method.

A better method which achieves the same result more smoothly is the Berendsen method, where the atoms are weakly coupled an external heat bath with the desired temperature T0 If T(t) > T0 , the coupling term is negative, which invokes a viscous force slowing the velocity, and vice-versa for T(t) < T0 Similarly in the NPT ensemble, the pressure can be kept constant by simply scaling the volume. Again a better method (Langevin piston), is to weakly couple the pressure difference to atoms using a force as in above, which will maintain the pressure at the desired value (~1 atm).

Analysis of trajectory data
1. Fundamental quantities Total energy, temperature, pressure, volume (or density) 2. Structural quantities Root Mean Square Deviation (RMSD) Distribution functions (e.g., pair and radial) Conformational analysis (e.g., Ramachandran plots, rotamers) 3. Dynamical quantities Time correlation functions & transport coefficients Free energy calculations using perturbation, umbrella sampling and steered MD methods

Fundamental quantities
Total energy: strictly conserved but due to numerical errors it may drift. Temperature: would be constant in a macroscopic system (fluctuations are proportional to 1/N, hence negligible). But in a small system, they will fluctuate around a base line. Temperature coupling ensures that the base line does not drift from the set temperature. Pressure: similar to temperature, though fluctuations are much larger compared to the set value of 1 atm. Volume (or density): fixed in the NVT ensemble but varies in NPT. Therefore, it needs to be monitored during the initial equilibration to make sure that the system has converged to the correct volume. Relatively quick for globular proteins but may take a long time (~1 ns) for systems involving lipids (membrane proteins).

Root Mean Square Deviation
For an N-atom system, variance and RMSD at time t are defined as Where ri(0) are the reference coordinates. Usually they are taken from the first frame in an MD simulation, though they can also be taken from the PDB or a target structure. Because side chains in proteins are different, it is common to use a restricted set of atoms such as backbone or Ca. RMSD is very useful in monitoring the approach to equilibrium (typically, signalled by the appearance of a broad shoulder). It is a good practice to keep monitoring RMSD during production runs to ensure that the system stays near equilibrium.

Evolution of the RMSD for the backbone atoms of ubiquitin

Distribution functions
Pair distribution function gij(r) gives the probability of finding a pair of atoms (i, j) at a distance r. It is obtained by sampling the distance rij in MD simulations and placing each distance in an appropriate bin, [r, r+Dr]. Pair distribution functions are used in characterizing correlations between pair of atoms, e.g., hydrogen bonds, cation-carbonyl oxygen. The peak gives the average distance and the width, the strength of the interaction. When the distribution is isotropic (e.g., ion-water in an electrolyte), one samples all the atoms in the spherical shell, [r, r+Dr]. Thus to obtain the radial distribution function (RDF), the sampled number needs to be normalized by the volume ~4pr2Dr (as well as density)

Conformational analysis
In proteins, the bond lengths and bond angles are more or less fixed. Thus we are mainly interested in conformations of the torsional angles. As the shape of a protein is determined by the backbone atoms, the torsional angles, f (N−Ca) and y (C−Ca), are of particular interest. These are conveniently analyzed using the Ramachandran plots. Conformational changes in a protein during MD simulations can be most easily revealed by plotting these torsional angles as a function of time. Also of interest are the torsional angles of the side chains, c1, c2, etc. Each side chain has several energy minima (called rotamers), which are separated by low-energy barriers (~ kT). Thus transitions between different rotamers is readily achievable, and they could play functional roles, especially for charged side chains.

Time correlation functions & transport coefficients
At room temperature, all the atoms in the simulation system are in a constant motion characterized by their average kinetic energy: (3/2)kT. Free atoms or molecules diffuse according to the equation While this relationship can be used to determine the diffusion coefficient, more robust results can be obtained using correlation functions, e.g., D is related to the velocity autocorrelation function as Similarly, conductance of charged particles is given by the Kubo formula (current)

Bound atoms or groups of atoms fluctuate around a mean value.
Most of the high-frequency fluctuations (e.g. H atoms), do not directly contribute to the protein function. Nevertheless they serve as “lubricant” that enables large scale motions in proteins (e.g. domain motions) that do play a significant role in their function. As mentioned earlier, large scale motions occur in a ~ ms to s time scale, hence are beyond the present MD simulations. (Normal mode analysis provides an alternative.) But torsional fluctuations occur in the ns time domain, and can be studied in MD simulations using time correlation functions, e.g. These typically decay exponentially and such fluctuations can be described using the Langevin equation.

Free energy calculations
Free energy is the most important quantity that characterizes a dynamical process. Calculation of the absolute free energies is difficult in MD simulations. However free energy differences can be estimated more easily and several methods have been developed for this purpose. The starting point for most approaches is Zwanzig’s perturbation formula for the free energy difference between two states A and B: If the two states are not similar enough, there is a large hysteresis effect and the forward and backward results are not equal.

Alchemical transformation and free energy perturbation
To obtain accurate results with the perturbation formula, the energy difference between the states should be < 2 kT, which is not satisfied for most biomolecular processes. To deal with this problem, one introduces a hybrid Hamiltonian and performs the transformation from A to B gradually by changing the parameter l from 0 to 1 in small steps. That is, one divides [0,1] into n subintervals with {li, i = 0, n}, and for each li value, calculates the free energy difference from the ensemble average

The total free energy change is then obtained by summing the contributions from each subinterval
The number of subintervals is chosen such that the free energy change at each step is < 2 kT, otherwise the method may lose its validity. Thermodynamic integration : Another way to obtain the free energy difference is to integrate the derivative of the hybrid Hamiltonian: This integral is evaluated most efficiently using a Gaussian quadrature.

Absolute free energies from umbrella sampling
Here one samples the densities along a reaction coordinate and determines the potential of mean force (PMF) from the Boltzmann eq. Here z0 is a reference point, e.g. a point in bulk where W vanishes. In general, a particle cannot be adequately sampled at high-energy points. To counter that, one introduces harmonic potentials, which restrain the particle at desired points, and then unbias its effect. For convenience, one introduces umbrella potential at regular intervals along the reaction coordinate (e.g. ~0.5 Å). The PMF’s obtained in each interval are unbiased and optimally combined using the Weighted Histogram Analysis Method (WHAM).

Steered MD (SMD) simulations and Jarzynski’s equation
Steered MD is a more recent method where a harmonic force is applied to an atom on a peptide and the reference point of this force is pulled with a constant velocity. It has been used to study unfolding of proteins and binding of ligands. The discovery of Jarzynski’s equation in 1997 enabled determination of PMF from SMD, which has boosted its applications. Jarzynski’s equation: Work done by the harmonic force This method seems to work well in simple systems and when DF is large but beware of its applications in complex systems!

A critical look at simulation methods
Molecular dynamics is the “standard model” of biomolecules. A higher level (quantum) description is simply not feasible for any biomolecular system. A lower level description sacrifices the atomic detail and cannot be expected to succeed without guidance from MD. Nevertheless, MD simulations alone cannot provide a complete description of biomolecules, and hence we need to appeal to both higher and lower level theories to make progress. Quantum description (in a mixed QM/MM scheme) is required for Development and testing of force fields Enzyme reactions that involve making or breaking of bonds Transport of light particles (e.g., protons and electrons)

Stochastic description and coarse-graining are required for any process that is beyond the current capabilities of MD simulations, that is, more than atoms and longer than microseconds. Protein-protein interactions, protein aggregation (water in continuum, protein may be rigid, coarse-grained, semi-flexible or flexible) Aggregation of membrane proteins (water and lipid in continuum, protein as above) Ion channels and transporters: modelling of ion permeation (water and lipid in continuum, protein may be rigid or semi-flexible) Protein folding (water in continuum, protein coarse-grained) Self assembly of lipid bilayers and micelles (water and lipid molecules coarse-grained) DNA condensation (water in continuum, DNA coarse-grained)

The guiding principle: Occam’s razor
Use the simplest method that yields the answer, provided: The method is justified (domain of validity) The parameters employed can be derived from a more fundamental theory. Most of the people who use molecular dynamics or other simulation methods treat them as black boxes and apply them to biomolecules without worrying too much whether they are valid or relevant. You also need to watch out for experts who publish only the good results! A cautionary tale: Continuum description of ion channels using the Poisson-Nernst-Planck (drift-diffusion) equations.

Poisson-Nernst-Planck (PNP) Equations
Poisson equation Nernst-Planck equation Here, f: potential, r: charge density, J: flux, n: concentration of ions PNP equations are solved simultaneously, yielding potential and flux Assumption: Ions are represented by an average charge density, zvenv which invokes a mean-field approximation. This is alright in a bulk electrolyte, but in ion channels with radius smaller than the Debye length (8 Å for 0.15 M), one should worry about the validity of PNP equations.

The problem: Ions induce like-charges at a water-protein interface, which
repel them. When ions are represented as a continuous charge density, this image force (or dielectric self-energy) is severely underestimated. Induced charges at the water-protein interface e = 80 e = 2 + + + + + + + Image force on an ion water protein In application of PNP to Ca channels, diffusion coeff. is reduced by 10-4 !

A simple test: PNP vs Brownian dynamics (BD)
Control study: Set artificially ε = 80 in the protein. No induced charges on the boundary, hence no discrepancy between the two methods r = 4 Å Cl - Na +

Realistic case: In the realistic case (ε = 2 in the protein), ions do not enter the channel in BD due to the dielectric self-energy barrier. Only in large pores (r > 10 Å), validity of PNP is restored. r = 4 Å PNP BD