3. Optimization Methods for Molecular Modeling by Barak Raveh.

Slides:

Advertisements

Similar presentations

Advertisements

Least squares CS1114

Monte Carlo Methods and Statistical Physics

Optimization : The min and max of a function

CHAPTER 8 A NNEALING- T YPE A LGORITHMS Organization of chapter in ISSO –Introduction to simulated annealing –Simulated annealing algorithm Basic algorithm.

Optimization Introduction & 1-D Unconstrained Optimization

Least Squares example There are 3 mountains u,y,z that from one site have been measured as 2474 ft., 3882 ft., and 4834 ft.. But from u, y looks 1422 ft.

The loss function, the normal equation,

Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.

Optimization methods Morten Nielsen Department of Systems biology, DTU.

Easy Optimization Problems, Relaxation, Local Processing for a single variable.

x – independent variable (input)

Function Optimization Newton’s Method. Conjugate Gradients

Motion Analysis (contd.) Slides are from RPI Registration Class.

MAE 552 – Heuristic Optimization Lecture 6 February 6, 2002.

Gradient Methods May Preview Background Steepest Descent Conjugate Gradient.

Optimization Methods One-Dimensional Unconstrained Optimization

Advanced Topics in Optimization

Linear Discriminant Functions Chapter 5 (Duda et al.)

Stochastic Roadmap Simulation: An Efficient Representation and Algorithm for Analyzing Molecular Motion Mehmet Serkan Apaydin, Douglas L. Brutlag, Carlos.

Optimization Methods One-Dimensional Unconstrained Optimization

CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM.

Molecular Modeling: Geometry Optimization C372 Introduction to Cheminformatics II Kelsey Forsythe.

Ch. 11: Optimization and Search Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 some slides from Stephen Marsland, some images.

Computational Optimization

UNCONSTRAINED MULTIVARIABLE

Collaborative Filtering Matrix Factorization Approach

Bioinf. Data Analysis & Tools Molecular Simulations & Sampling Techniques117 Jan 2006 Bioinformatics Data Analysis & Tools Molecular simulations & sampling.

Introduction to Monte Carlo Methods D.J.C. Mackay.

Ch 8.1 Numerical Methods: The Euler or Tangent Line Method

1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo

1 Statistical Mechanics and Multi- Scale Simulation Methods ChBE Prof. C. Heath Turner Lecture 11 Some materials adapted from Prof. Keith E. Gubbins:

Geometry Optimisation Modelling OH + C 2 H 4 *CH 2 -CH 2 -OH CH 3 -CH 2 -O* 3D PES.

CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM CISE301_Topic1.

Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.

84 b Unidimensional Search Methods Most algorithms for unconstrained and constrained optimisation use an efficient unidimensional optimisation technique.

MonteCarlo Optimization (Simulated Annealing) Mathematical Biology Lecture 6 James A. Glazier.

Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 101 Quasi-Newton Methods.

Nonlinear programming Unconstrained optimization techniques.

Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.

Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.

Simulated Annealing.

The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.

Computer Animation Rick Parent Computer Animation Algorithms and Techniques Optimization & Constraints Add mention of global techiques Add mention of calculus.

Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:

Vaida Bartkutė, Leonidas Sakalauskas

ECE-7000: Nonlinear Dynamical Systems Overfitting and model costs Overfitting  The more free parameters a model has, the better it can be adapted.

Molecular Modelling - Lecture 2 Techniques for Conformational Sampling Uses CHARMM force field Written in C++

Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Lecture 13. Geometry Optimization References Computational chemistry: Introduction to the theory and applications of molecular and quantum mechanics, E.

Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.

Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.

Chapter 10 Minimization or Maximization of Functions.

Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.

Optimization methods Morten Nielsen Department of Systems biology, DTU IIB-INTECH, UNSAM, Argentina.

INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.

Lecture 18, CS5671 Multidimensional space “The Last Frontier” Optimization Expectation Exhaustive search Random sampling “Probabilistic random” sampling.

Ch. Eick: Num. Optimization with GAs Numerical Optimization General Framework: objective function f(x 1,...,x n ) to be minimized or maximized constraints:

Computational Biology BS123A/MB223 UC-Irvine Ray Luo, MBB, BS.

Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

3. Optimization Methods for Molecular Modeling

Structure Refinement BCHM 5984 September 7, 2009.

Collaborative Filtering Matrix Factorization Approach

~ Least Squares example

The loss function, the normal equation,

~ Least Squares example

Section 3: Second Order Methods

Stochastic Methods.

Presentation transcript:

3. Optimization Methods for Molecular Modeling by Barak Raveh

Outline Introduction Local Minimization Methods (derivative-based) – Gradient (first order) methods – Newton (second order) methods Monte-Carlo Sampling (MC) – Introduction to MC methods – Markov-chain MC methods (MCMC) – Escaping local-minima

Prerequisites for Tracing the Minimal Energy Conformation I. The energy function: The in-silico energy function should correlate with the (intractable) physical free energy. In particular, they should share the same global energy minimum. II. The sampling strategy: Our sampling strategy should efficiently scan the (enormous) space of protein conformations

The Problem: Find Global Minimum on a Rough One Dimensional Surface rough = has multitude of local minima in a multitude of scales. *Adapted from slides by Chen Kaeasar, Ben-Gurion University

The landscape is rough because both small pits and the Sea of Galilee are local minima. The Problem: Find Global Minimum on a Rough Two Dimensional Surface *Adapted from slides by Chen Kaeasar, Ben-Gurion University

The Problem: Find Global Minimum on a Rough Multi-Dimensional Surface A protein conformation is defined by the set of Cartesian atom coordinates (x,y,z) or by Internal coordinates (φ /ψ/χ torsion angles ; bond angles ; bond lengths) The conformation space of a protein with 100 residues has ≈ 3000 dimensions The X-ray structure of a protein is a point in this space. A 3000-dimensional space cannot be systematically sampled, visualized or comprehended. *Adapted from slides by Chen Kaeasar, Ben-Gurion University

Characteristics of the Protein Energetic Landscape smooth?rugged? Images by Ken Dill space of conformations energy

Outline Introduction Local Minimization Methods (derivative-based) – Gradient (first order) methods – Newton (second order) methods Monte-Carlo Sampling (MC) – Introduction to MC methods – Markov-chain MC methods (MCMC) – Escaping local-minima

Example: removing clashes from X-ray models Local Minimization Allows the Correction of Minor Local Errors in Structural Models *Adapted from slides by Chen Kaeasar, Ben-Gurion University

The path to the closest local minimum = local minimization What kind of minima do we want?

*Adapted from slides by Chen Kaeasar, Ben-Gurion University The path to the closest local minimum = local minimization What kind of minima do we want?

*Adapted from slides by Chen Kaeasar, Ben-Gurion University The path to the closest local minimum = local minimization What kind of minima do we want?

*Adapted from slides by Chen Kaeasar, Ben-Gurion University The path to the closest local minimum = local minimization What kind of minima do we want?

*Adapted from slides by Chen Kaeasar, Ben-Gurion University The path to the closest local minimum = local minimization What kind of minima do we want?

*Adapted from slides by Chen Kaeasar, Ben-Gurion University The path to the closest local minimum = local minimization What kind of minima do we want?

*Adapted from slides by Chen Kaeasar, Ben-Gurion University The path to the closest local minimum = local minimization What kind of minima do we want?

A Little Math – Gradients and Hessians Gradients and Hessians generalize the first and second derivatives (respectively) of multi-variate scalar functions ( = functions from vectors to scalars) GradientHessian Energy = f(x 1, y 1, z 1, …, x n, y n, z n )

Analytical Energy Gradient (i) Cartesian Coordinates Energy, work and force: recall that  Energy ( = work) is defined as force integrated over distance  Energy gradient in Cartesian coordinates = vector of forces that act upon atoms (but this is not exactly so for statistical energy functions, that aim at the free energy ΔG) E = f(x 1, y 1,z 1, …, x n, y n, z n ) Example: Van der-Waals energy between pairs of atoms – O(n 2 ) pairs:

Enrichment: Transforming a gradient between Cartesian and Internal coordinates (see Abe, Braun, Nogoti and Gö, 1984 ; Wedemeyer and Baker, 2003) Consider an infinitesimal rotation of a vector r around a unit vector n . From physical mechanics, it can be shown that: Analytical Energy Gradient (ii) Internal Coordinates (torsions, etc.) E = f(  1,  1,  1,  11,  12, …) Note: For simplicity, bond lengths and bond angles are often ignored cross product – right hand rule n  x r r nn  Using the fold-tree (previous lesson), we can recursively propagate changes in internal coordinates to the whole structure (see Wedemeyer and Baker 2003) nn adapted from image by Sunil Singh

Gradient Calculations – Cartesian vs. Internal Coordinates For some terms, Gradient computation is simpler and more natural with Cartesian coordinates, but harder for others: Distance / Cartesian dependent: Van der-Waals term ; Electrostatics ; Solvation Internal-coordinates dependent: Bond length and angle ; Ramachandran and Dunbrack terms (in Rosetta) Combination: Hydrogen-bonds (in some force-fields) Reminder: Internal coordinates provide a natural distinction between soft constraints (flexibility of φ/ψ torsion angles) and hard constraints with steep gradient (fixed length of covalent bonds).  Energy landscape of Cartesian coordinates is more rugged.

Analytical solutions require a closed-form algebraic formulation of energy score Numerical solution try to approximate the gradient (or Hessian) – Simple example: f’(x) ≈ f(x+1) – f(x) – Another example: the Secant method (soon) Analytical vs. Numerical Gradient Calculations

Outline Introduction Local Minimization Methods (derivative-based) – Gradient (first order) methods – Newton (second order) methods Monte-Carlo Sampling (MC) – Introduction to MC methods – Markov-chain MC methods (MCMC) – Escaping local-minima

Gradient Descent Minimization Algorithm Sliding down an energy gradient good ( = global minimum) local minimum Image by Ken Dill

1.Coordinates vector (Cartesian or Internal coordinates): X=(x 1, x 2,…,x n ) 2.Differentiable energy function: E(X) 3.Gradient vector: *Adapted from slides by Chen Kaeasar, Ben-Gurion University Gradient Descent – System Description

Gradient Descent Minimization Algorithm: Parameters: λ = step size ;  = convergence threshold x = random starting point While   (x)  >  – Compute  (x) – x new = x + λ  (x) Line search: find the best step size λ in order to minimize E(x new ) (discussion later) Note on convergence condition: in local minima, the gradient must be zero (but not always the other way around)

Line Search Methods – Solving argmin λ E[x + λ  (x)]: (1)This is also an optimization problem, but in one-dimension… (2)Inexact solutions are probably sufficient Interval bracketing – (e.g., golden section, parabolic interpolation, Brent’s search) – Bracketing the local minimum by intervals of decreasing length – Always finds a local minimum Backtracking (e.g., with Armijo / Wolfe conditions) : – Multiply step-size λ by c<1, until some condition is met – Variations: λ can also increase 1-D Newton and Secant methods We will talk about this soon…

The (very common) problem: a narrow, winding “valley” in the energy landscape  The narrow valley results in miniscule, zigzag steps 2-D Rosenbrock’s Function: a Banana Shaped Valley Pathologically Slow Convergence for Gradient Descent 100 iterations 1000 iterations 0 iterations 10 iterations

(One) Solution: Conjugate Gradient Descent Use a (smart) linear combination of gradients from previous iterations to prevent zigzag motion Parameters: λ = step size ;  = convergence threshold x 0 = random starting point Λ 0 =  (x 0 ) While  Λ i  >  – Λ i+1 =  (x i ) + β i ∙Λ i – choice of β i is important – X i+1 = x i + λ ∙ Λ i Line search: adjust step size λ to minimize E(X i+1 ) gradient descent Conjugated gradient descent The new gradient is “A-orthogonal” to all previous search direction, for exact line search Works best when the surface is approximately quadratic near the minimum (convergence in N iterations), otherwise need to reset the search every N steps (N = dimension of space) The new gradient is “A-orthogonal” to all previous search direction, for exact line search Works best when the surface is approximately quadratic near the minimum (convergence in N iterations), otherwise need to reset the search every N steps (N = dimension of space)

A-orthogonality We would like to “stretch” the narrow vallies, and then search only in orthogonal directions, to prevent a waste of time Image taken from: An Introduction to the Conjugate Gradient Method without the Agonizing Pain. Shewchuk 1994

Outline Introduction Local Minimization Methods (derivative-based) – Gradient (first order) methods – Newton (second order) methods Monte-Carlo Sampling (MC) – Introduction to MC methods – Markov-chain MC methods (MCMC) – Escaping local-minima

Root Finding – when is f(x) = 0?

Taylor’s Series First order approximation: Second order approximation: The full Series: Example: = (a=0)

Taylor’s Approximation: f(x)=e x

Taylor’s Approximation of f(x) = sin(x) 2x at x=1.5

From Taylor’s Series to Root Finding (one-dimension) First order approximation: Root finding by Taylor’s approximation:

Newton-Raphson Method for Root Finding (one-dimension) 1.Start from a random x 0 2.While not converged, update x with Taylor’s series: 

Image from Newton-Raphson: Quadratic Convergence Rate THEOREM: Let x root be a “nice” root of f(x). There exists a “neighborhood” of some size Δ around x root, in which Newton method will converge towards x root quadratically ( = the error decreases quadratically in each round)

The Secant Method (one-dimension) Just like Newton-Raphson, but approximate the derivative by drawing a secant line between two previous points: Secant algorithm: 1.Start from two random points: x 0, x 1 2.While not converged: Theoretical convergence rate: golden-ratio (~1.62) Often faster in practice: no gradient calculations

Newton’s Method: from Root Finding to Minimization Second order approximation of f(x): Minimum is reached when derivative of approximation is zero: take derivative (by X) So… this is just root finding over the derivative (which makes sense since in local minima, the gradient is zero)

Newton’s Method for Minimization: 1.Start from a random vector x=x 0 2.While not converged, update x with Taylor’s series:  Notes: if f’’(x)>0, then x is surely a local minimum point We can choose a different step size than one

Newton’s Method for Minimization: Higher Dimensions 1.Start from a random vector x=x 0 2.While not converged, update x with Taylor’s series:  Notes: H is the Hessian matrix (generalization of second derivative to high dimensions) We can choose a different step size using Line Search (see previous slides)

Generalizing the Secant Method to High Dimensions: Quasi-Newton Methods Calculating the Hessian (2 nd derivative) is expensive  numerical calculation of Hessian Popular methods: – DFP (Davidson – Fletcher – Powell) – BFGS (Broyden – Fletcher – Goldfarb – Shanno) – Combinations Timeline: Newton-Raphson (17 th century)  Secant method  DFP (1959, 1963)  Broyden Method for roots (1965)  BFGS (1970) Timeline: Newton-Raphson (17 th century)  Secant method  DFP (1959, 1963)  Broyden Method for roots (1965)  BFGS (1970)

Comparison of Derivative Methods Quasi-Newton methods take less iterations to converge than conjugate gradient descent, but: – The computation in each iteration is more complex – Takes more memory (a limited memory variant called L-BFGS) For Quasi-Newton methods, BFGS is generally considered superior to the older DFP. – They can be also combined together – Rosetta currently uses DFP, with Brent or Armijo Line-Search

Some more Resources on Gradient and Newton Methods Conjugate Gradient Descent Quasi-Newton Methods: HUJI course on non-linear optimization by Benjamin Yakir Line search: – – Wikipedia…

Outline Introduction Local Minimization Methods (derivative-based) – Gradient (first order) methods – Newton (second order) methods Monte-Carlo Sampling (MC) – Introduction to MC methods – Markov-chain MC methods (MCMC) – Escaping local-minima

Arbitrary starting point Example: predict protein structure from its AA sequence. Harder Goal: Move from an Arbitrary Model to a Correct One *Adapted from slides by Chen Kaeasar, Ben-Gurion University

10 iteration *Adapted from slides by Chen Kaeasar, Ben-Gurion University

100 iteration *Adapted from slides by Chen Kaeasar, Ben-Gurion University

200 iteration *Adapted from slides by Chen Kaeasar, Ben-Gurion University

400 iteration *Adapted from slides by Chen Kaeasar, Ben-Gurion University

800 iteration *Adapted from slides by Chen Kaeasar, Ben-Gurion University

1000 iteration *Adapted from slides by Chen Kaeasar, Ben-Gurion University

1200 iteration *Adapted from slides by Chen Kaeasar, Ben-Gurion University

1400 iteration *Adapted from slides by Chen Kaeasar, Ben-Gurion University

1600 iteration *Adapted from slides by Chen Kaeasar, Ben-Gurion University

1800 iteration *Adapted from slides by Chen Kaeasar, Ben-Gurion University

2000 iteration *Adapted from slides by Chen Kaeasar, Ben-Gurion University

4000 iteration *Adapted from slides by Chen Kaeasar, Ben-Gurion University

7000 iteration This time succeeded, in many cases not. *Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want? The path to the global minimum *Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want? The path to the global minimum *Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want? The path to the global minimum *Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want? The path to the global minimum *Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want? The path to the global minimum *Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want? The path to the global minimum *Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want? The path to the global minimum *Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want? The path to the global minimum *Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want? The path to the global minimum *Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want? The path to the global minimum *Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want? The path to the global minimum *Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want? The path to the global minimum *Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want? The path to the global minimum *Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want? The path to the global minimum *Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want? The path to the global minimum *Adapted from slides by Chen Kaeasar, Ben-Gurion University

Monte-Carlo Methods (a.k.a. MC simulations, MC sampling or MC search) Monte-Carlo methods (“casino” methods) are a very general term for estimations that are based on a series of random samples – Samples can be dependent or independent – MC physical simulations are most famous for their role in the Manhattan Project (Uncle of Polish mathematician Stanisław Marcin Ulam’s was said to be a heavy gambler)

Example: Estimating Π by Independent Monte-Carlo Samples (I) Suppose we throw darts randomly (and uniformly) at the square: Algorithm: For i=[1..ntrials] x = (random# in [0..r]) y = (random# in [0..r]) distance = sqrt (x^2 + y^2) if distance ≤ r hits++ End Output: Adapted from course slides by Craig Douglas ng/joy/mclab/mcintro.html

Buffon's original form was to drop a needle of length L at random on grid of parallel lines of spacing D. For L less than or equal D we obtain : P(needle intersects the grid) = 2 L / PI D. If we drop the needle N times and count R intersections we obtain : P = R / N, PI = 2 L N / R D. Slide from course CS521 by Craig Douglas Georges Louis Leclerc Comte de Buffon (7/9/ /4/1788)

Slide from course CS521 by Craig Douglas

Outline Introduction Local Minimization Methods (derivative-based) – Gradient (first order) methods – Newton (second order) methods Monte-Carlo Sampling (MC) – Introduction to MC methods – Markov-chain MC methods (MCMC) – Escaping local-minima

Drunk Sailor’s Random Walk What is the probability that the sailor will leave through each exit? 0.25

Markov-Chain Monte Carlo (MCMC) Markov-Chain: future state depends only on present state Markov-Chain Monte-Carlo on Graphs: we randomly walk from node to node with a certain probability, that depends only on our current location

Analysis of a Two-Nodes Walk AB After n rounds, what is the probability of being in node A? Assume Pr n+1 A ≈ Pr n A for a large n :  Pr n+1 A = Pr n A x Pr n B x 0.5  0.25 x Pr n A = Pr n B x 0.5  Pr n A = 2 x Pr n B So: Pr ∞ A = ⅔ Pr ∞ B = ⅓ Assume Pr n+1 A ≈ Pr n A for a large n :  Pr n+1 A = Pr n A x Pr n B x 0.5  0.25 x Pr n A = Pr n B x 0.5  Pr n A = 2 x Pr n B So: Pr ∞ A = ⅔ Pr ∞ B = ⅓

After a long run, we want to find low- energy conformations, with high probability Sampling Protein Conformations with MCMC Protein image taken from Chemical Biology, 2006 Markov-Chain Monte-Carlo (MCMC) with “proposals”: 1.Perturb Structure to create a “proposal” 2.Accept or reject new conformation with a “certain” probability Markov-Chain Monte-Carlo (MCMC) with “proposals”: 1.Perturb Structure to create a “proposal” 2.Accept or reject new conformation with a “certain” probability But how? A (physically) natural * choice is the Boltzmann distribution, proportional to: E i = energy of state i k B = Boltzmann constant T = temperature Z = “Partition Function” constant A (physically) natural * choice is the Boltzmann distribution, proportional to: E i = energy of state i k B = Boltzmann constant T = temperature Z = “Partition Function” constant * In theory, the Boltzmann distribution is a bit problematic in non-gas phase, but never mind that for now…

The Metropolis-Hastings Criterion Boltzmann Distribution: The energy score and temperature are computed (quite) easily The “only” problem is calculating Z (the “partition function”) – this requires summing over all states. Metropolis showed that MCMC will converge to the true Boltzmann distribution, if we accept a new proposal with probability "Equations of State Calculations by Fast Computing Machines“ – Metropolis, N. et al. Journal of Chemical Physics (1953)

If we run till infinity, with good perturbations, we will visit every conformation according to the Boltzmann distribution Sampling Protein Conformations with Metropolis-Hastings MCMC Protein image taken from Chemical Biology, 2006 Markov-Chain Monte-Carlo (MCMC) with “proposals”: 1.Perturb Structure to create a “proposal” 2.Accept or reject new conformation by the Metropolis criterion 3.Repeat for many iterations Markov-Chain Monte-Carlo (MCMC) with “proposals”: 1.Perturb Structure to create a “proposal” 2.Accept or reject new conformation by the Metropolis criterion 3.Repeat for many iterations But we just want to find the energy minimum. If we do our perturbations in a smart manner, we can still cover relevant (realistic, low- energy) parts of the search space

Outline Introduction Local Minimization Methods (derivative-based) – Gradient (first order) methods – Newton (second order) methods Monte-Carlo Sampling (MC) – Introduction to MC methods – Markov-chain MC methods (MCMC) – Escaping local-minima

*Adapted from slides by Chen Kaeasar, Ben-Gurion University Getting stuck in a local minimum

*Adapted from slides by Chen Kaeasar, Ben-Gurion University Getting stuck in a local minimum

*Adapted from slides by Chen Kaeasar, Ben-Gurion University Getting stuck in a local minimum

*Adapted from slides by Chen Kaeasar, Ben-Gurion University Getting stuck in a local minimum

*Adapted from slides by Chen Kaeasar, Ben-Gurion University Getting stuck in a local minimum

*Adapted from slides by Chen Kaeasar, Ben-Gurion University Getting stuck in a local minimum

Trick 1: Simulated Annealing The Boltzmann distribution depends on the in-silico temperature T: In low temperatures, we will get stuck in local minima (we will always get zero if the energy rises even slightly) In high temperatures, we will always get 1 (jump between conformations like nuts). The Boltzmann distribution depends on the in-silico temperature T: In low temperatures, we will get stuck in local minima (we will always get zero if the energy rises even slightly) In high temperatures, we will always get 1 (jump between conformations like nuts). In simulated annealing, we gradually decrease (“cool down”) the virtual temperature factor, until we converge to a minimum point

Trick 2: Monte-Carlo with Energy Minimization (MCM) Scheraga et al., 1987 Derivative-based methods (Gradient Descent, Newton’s method, DFP) are excellent at finding near-by local minima In Rosetta, Monte-Carlo is used for bigger jumps between near-by local minima

Trick 3: Switching between Low-Resolution (smooth) and High-Resolution (rugged) energy functions In Rosetta, the Centroid energy function is used to quickly sample large perturbations The Full-Atom energy function is used for fine tuning START energy conformations Smooth Low-res Rugged High-res

Trick 4: Repulsive Energy Ramping The repulsive VdW energy is the main reason for getting stuck Start simulations with lowered repulsive energy term, and gradually ramp it up during the simulation Similar rational to Simulated Annealing Trick 5: Modulating Perturbation Step Size A too small perturbation size can lead to a very slow simulation  we remain stuck in the local minimum A large perturbation size can lead to clashes and a very high rejection rate  we remain stuck in the same local minimum We can increase or decrease the step size until a fixed rejection rate (for example, 50%) is achieved

Monte-Carlo in Rosetta In Rosetta, it is common to use any of the above tricks, MCM in particular In general, a single simulation is pretty short (no more than a few minutes), but is repeated k independent times – getting k sampled “decoys” – We use energy scoring to decide which is the best decoy structure – hopefully this is the near-native solution – Low-resolution sampling is often used to create a very large number of initial decoys, and only the best ones are moved to high-resolution minimization

Summary Derivative-based methods can effectively reach near-by energy minima Metropolis-Hastings MCMC can recover the Boltzmann distribution in some applications, but for protein folding, we cannot hope to cover the huge conformational space, or recover the Boltzmann distribution. Still, useful tricks help us find good low-energy near-native conformations (Simulated Annealing, Monte-Carlo with Minimization, Centroid mode, Ramping, Step size modulation, and other smart sampling steps, etc.). We didn’t cover some very popular non-linear optimization methods: – Linear and Convex Programming ; Expectation Maximization algorithm ; Branch and Bound algorithms ; Dead-End Elimination (Lesson 4) ; Mean Field approach ; And more…