Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optimization methods Morten Nielsen Department of Systems biology, DTU.

Similar presentations


Presentation on theme: "Optimization methods Morten Nielsen Department of Systems biology, DTU."— Presentation transcript:

1 Optimization methods Morten Nielsen Department of Systems biology, DTU

2 *Adapted from slides by Chen Kaeasar, Ben-Gurion University The path to the closest local minimum = local minimization Minimization

3 *Adapted from slides by Chen Kaeasar, Ben-Gurion University The path to the closest local minimum = local minimization Minimization

4 The path to the global minimum *Adapted from slides by Chen Kaeasar, Ben-Gurion University Minimization

5 Outline Optimization procedures –Gradient descent –Monte Carlo Overfitting –cross-validation Method evaluation

6 Linear methods. Error estimate I1I1 I2I2 w1w1 w2w2 Linear function o

7 Gradient descent (from wekipedia) Gradient descent is based on the observation that if the real-valued function F(x) is defined and differentiable in a neighborhood of a point a, then F(x) decreases fastest if one goes from a in the direction of the negative gradient of F at a. It follows that, if for  > 0 a small enough number, then F(b)<F(a)

8 Gradient descent (example)

9 Gradient descent

10 Weights are changed in the opposite direction of the gradient of the error

11 Gradient descent (Linear function) Weights are changed in the opposite direction of the gradient of the error I1I1 I2I2 w1w1 w2w2 Linear function o

12 Gradient descent Weights are changed in the opposite direction of the gradient of the error I1I1 I2I2 w1w1 w2w2 Linear function o

13 Gradient descent. Example Weights are changed in the opposite direction of the gradient of the error I1I1 I2I2 w1w1 w2w2 Linear function o

14 Gradient descent. Example Weights are changed in the opposite direction of the gradient of the error I1I1 I2I2 w1w1 w2w2 Linear function o

15 Gradient descent. Doing it your self Weights are changed in the opposite direction of the gradient of the error 10 W 1 =0.1W 2 =0.1 Linear function o What are the weights after 2 forward (calculate predictions) and backward (update weights) iterations with the given input, and has the error decrease (use  =0.1, and t=1)?

16 Fill out the table itrW1W2O 00.1 1 2 What are the weights after 2 forward/backward iterations with the given input, and has the error decrease (use  =0.1, t=1)? 10 W 1 =0.1W 2 =0.1 Linear function o

17 Fill out the table itrW1W2O 00.1 10.190.10.19 20.270.10.27 What are the weights after 2 forward/backward iterations with the given input, and has the error decrease (use  =0.1, t=1)? 10 W 1 =0.1W 2 =0.1 Linear function o

18 Monte Carlo Because of their reliance on repeated computation of random or pseudo-random numbers, Monte Carlo methods are most suited to calculation by a computer. Monte Carlo methods tend to be used when it is unfeasible or impossible to compute an exact result with a deterministic algorithm Or when you are too stupid to do the math yourself?

19 Example: Estimating Π by Independent Monte-Carlo Samples Suppose we throw darts randomly (and uniformly) at the square: Algorithm: For i=[1..ntrials] x = (random# in [0..r]) y = (random# in [0..r]) distance = sqrt (x^2 + y^2) if distance ≤ r hits++ End Output: Adapted from course slides by Craig Douglas http://www.chem.unl.edu/zeng/joy/m clab/mcintro.html

20 Estimating 

21 After a long run, we want to find low- energy conformations, with high probability Sampling Protein Conformations with MCMC (Markov Chain Monte Carlo) Protein image taken from Chemical Biology, 2006 Markov-Chain Monte-Carlo (MCMC) with “proposals”: 1.Perturb Structure to create a “proposal” 2.Accept or reject new conformation with a “certain” probability Markov-Chain Monte-Carlo (MCMC) with “proposals”: 1.Perturb Structure to create a “proposal” 2.Accept or reject new conformation with a “certain” probability But how? A (physically) natural * choice is the Boltzman distribution, proportional to: E i = energy of state i k B = Boltzman constant T = temperature Z = “Partition Function” A (physically) natural * choice is the Boltzman distribution, proportional to: E i = energy of state i k B = Boltzman constant T = temperature Z = “Partition Function” * In theory, the Boltzman distribution is a bit problematic in non-gas phase, but never mind that for now… Slides adapted from Barak Raveh

22 The Metropolis-Hastings Criterion Boltzman Distribution: The energy score and temperature are computed (quite) easily The “only” problem is calculating Z (the “partition function”) – this requires summing over all states. Metropolis showed that MCMC will converge to the true Boltzman distribution, if we accept a new proposal with probability "Equations of State Calculations by Fast Computing Machines“ – Metropolis, N. et al. Journal of Chemical Physics (1953) Slides adapted from Barak Raveh

23 If we run till infinity, with good perturbations, we will visit every conformation according to the Boltzman distribution Sampling Protein Conformations with Metropolis-Hastings MCMC Protein image taken from Chemical Biology, 2006 Markov-Chain Monte-Carlo (MCMC) with “proposals”: 1.Perturb Structure to create a “proposal” 2.Accept or reject new conformation by the Metropolis criterion 3.Repeat for many iterations Markov-Chain Monte-Carlo (MCMC) with “proposals”: 1.Perturb Structure to create a “proposal” 2.Accept or reject new conformation by the Metropolis criterion 3.Repeat for many iterations But we just want to find the energy minimum. If we do our perturbations in a smart manner, we can still cover relevant (realistic, low- energy) parts of the search space Slides adapted from Barak Raveh

24 Monte Carlo (Minimization) dE<0dE>0

25 The Traveling Salesman Adapted from www.mpp.mpg.de/~caldwell/ss11/ExtraTS.pdf

26

27

28

29

30

31

32 Gibbs sampler. Monte Carlo simulations RFFGGDRGAPKRG YLDPLIRGLLARPAKLQV KPGQPPRLLIYDASNRATGIPA GSLFVYNITTNKYKAFLDKQ SALLSSDITASVNCAK GFKGEQGPKGEP DVFKELKVHHANENI SRYWAIRTRSGGI TYSTNEIDLQLSQEDGQTIE RFFGGDRGAPKRG YLDPLIRGLLARPAKLQV KPGQPPRLLIYDASNRATGIPA GSLFVYNITTNKYKAFLDKQ SALLSSDITASVNCAK GFKGEQGPKGEP DVFKELKVHHANENI SRYWAIRTRSGGI TYSTNEIDLQLSQEDGQTIE E1 = 5.4 E2 = 5.7 E2 = 5.2 dE>0; P accept =1 dE<0; 0 < P accept < 1 Note the sign. Maximization

33 Monte Carlo Temperature What is the Monte Carlo temperature? Say dE=-0.2, T=1 T=0.001

34 MC minimization

35 Monte Carlo - Examples Why a temperature?

36 Local minima


Download ppt "Optimization methods Morten Nielsen Department of Systems biology, DTU."

Similar presentations


Ads by Google