Statistical Mechanics for Free Energy Calculations WARNING: MATH AHEAD !!!
Statistical mechanics Some definitions Hamiltonian Microscopic state: point in phase space {r,p} Canonical partition function Object at heart of statistical mechanics Z and its derivatives give access to any property Requires sampling of all configurations; fortunately major contributions are due to low energy states
Statistical mechanics Probability to find a particular microstate (Boltzmann distribution) The average of any property can be written as (ensemble average / thermodynamic average) For example the internal energy
Statistical mechanics Molecular dynamics samples phase space by following a trajectory in time {r(t),p(t)} This gives access to dynamic averages Ergodic hypothesis Used as justification to calculate thermodynamic averages from MD trajectories However: Proper sampling can be problematic
Statistical mechanics Molecule with two conformations Relative population of two states at room temperature
Statistical mechanics Relative populations at different temperatures
Can we get free energy differences just by running MD long enough? Only if the barriers are very low, and we visit each state many times. Important: if you have 10000 total frames in your simulation, the MAXIMUM possible value of delta G you can observe at 300K is -0.6*ln(1/9999)=5.52 kcal/mol
Statistical mechanics The absolute free energy: A: Helmholtz free energy, Canonical (N,V,T) ensemble G: Gibbs free energy, Isothermal-isobaric (N,P,T) ensemble Difficult to compute because high energy states contribute and MD (or MC) samples low energy regions of phase space
Phase space - Phase Space is the idea of describing the properties of a system in terms of its position and momentum coordinates, defining all possible states. - The concept of phase space, is only about 100 years old and at the root of Statistical Thermodynamics - A phase space description of a system is what we should work with but can not, for various reasons: - Can‘t be computed - Can‘t be visualized/understood - Can‘t be measured
So we use Reaction coordinates Typically one or two dimensional Be aware of massively reduced dimensionality (effectively integrating over 1020 orthogonal DOF) This allows us to define comprehensible states (i.e. folded vs. unfolded) Note: The whole notion of phase space is at odds with quantum mechanics, we use a classical approximation here, bc we lack equivalent QM statistics tools
MD Simulations The purpose of MD simulations is to explore phase space A single structure is only one microstate! A trajectory approximates a (small) region of phase space We are often interested in transitions and equilibria between states Sadly, MD does not always give the desired results (in finite time) At this point, Free Energy Calculations can be employed
Motivation for Free Energy Calculations Free energy of binding Binding could mean: - Activation - Disactivation - Destruction - competitive Inhibition … "Corpora non agunt nisi fixata" (No compound is active unless it is bound by a receptor) Paul Ehrlich, 1913
Free energy calculations Direct calculation of free energies not feasible Quantity of interest is usually a free energy difference Several approaches are in use: Non-equilibrium Jarzynski's Equation Equilibrium Real coordinates Umbrella Sampling Abstract coordinates FEP / Thermodynamic Integration
Thermodynamic Cycles Useful for calculating relative free energies Relative free energy of solvation For instance, solvation of benzene versus toluene.
Thermodynamic Cycles Relative free energy of binding Two different drugs binding to the same target.
Free Energy Perturbation
Free energy perturbation
Free energy perturbation Called FEP although the result is formally exact; but it connects the perturbed system B to the reference system A Popular and quick to implement Potential statistics / sampling problems A and B need to overlap in phase space perturbation VAB must be small transformation must be divided into small steps
FEP: Severe sampling problems DF Taking the logarithm of the average of the exponential of a noisy function is a bad idea! Often done in multiple small steps, mixing the potential functions
Free energy perturbation Subdivide into N steps (“Computational Alchemy”)
Thermodynamic Integration and Softcore Potentials
Thermodynamic Integration Often linear mixing is used: f(λ)=λ
Thermodynamic Integration Integral has to be solved by numerical quadrature Perform series of simulations corresponding to discrete values of λ and form the averages of the derivatives of the Hamiltonian
Practical Issues for TI/FEP How many windows are needed ? Where are the windows placed? How long should each window be run? Should windows be run simultaneously or consecutively How to pick the region that changes
Ready for ΔΔGbind via FEP/TI? Relative free energy of binding
Drug Design Applications Potential Drug Design applications
Adding and Removing Atoms
Adding and removing atoms for example vie “dummy atoms” End state: Molecule is made of "ghost" atoms λ=1 Start state: Molecule exists λ=0
Adding and removing atoms The interaction between the molecule and the solvent at λ=0 is and at λ=1 is
Adding and removing atoms But, as λ1, the repulsive wall is still infinite! This means that water, when it tries to occupy the empty space, cannot! This leads to serious convergence problems.
The origin singularity effect red: LJ Potential green: LJ at λ=0.99 It is very hard to converge and fluctuations are very large
A modified vdW Potential: Softcore 99 A modified vdW Potential: Softcore With this change, the calculation does not blow up at the end points.
Dual Topology Approach
Softcore TI: Mixing of Forces
Softcore TI: Modified mdout Style A V E R A G E S O V E R 100000 S T E P S NSTEP = 100000 TIME(PS) = 220.000 TEMP(K) = 300.03 PRESS = 3.9 Etot = -6966.1798 EKtot = 1623.1799 EPtot = -8589.3596 BOND = 0.0000 ANGLE = 0.0000 DIHED = 0.0000 1-4 NB = 0.0000 1-4 EEL = 0.0000 VDWAALS = 1280.5874 EELEC = -9869.9470 EHBOND = 0.0000 RESTRAINT = 0.0000 DV/DL = 3.1777 EKCMT = 807.0253 VIRIAL = 804.6984 VOLUME = 27514.2972 Density = 0.9842 Ewald error estimate: 0.2081E-03 ------------------------------------------------------------------------------ Softcore part of the system: 15 atoms, TEMP(K) = 299.13 SC_BOND= 5.7267 SC_ANGLE= 4.3317 SC_DIHED = 2.7563 SC_14NB= 3.9414 SC_14EEL= 0.0000 SC_EKIN = 13.3748
Softcore TI example: Solvated Toluene
Softcore TI example: Solvated Toluene Resulting free energy curve 69 l-points 20 ps equilibration 2 ns data collection ΔG(Solv) = 2.19 kcal/mol Published: 2.45 kcal/mol λ
Softcore TI: data collection data collection still very noisy, be cautious Time
Softcore TI: AMBER input See Chapter of AMBER 14 manual on Free Energy Calculations
Free energies along a defined reaction coordinate via Umbrella Sampling
Umbrella Sampling How to obtain free energy changes associated with conformational changes? How to force barrier crossings without compromising thermodynamic properties? Very slow spontaneous transitions
Umbrella Sampling Free energy profile along the indicated dihedral angle? Define a reaction coordinate This is called a potential of mean force, and requires a partition function integrated along all coordinates except the one we want to look at.
Umbrella Sampling One could just run dynamics and wait until all space has been sampled. Then, if one extracts ρ(xk) from the trajectory, the PMF can be written as: This is called unbiased sampling However, it takes forever to properly sample all conformations, and to jump over the barrier. The solution is to bias the system towards whatever value of the coordinate we want.