Lecture 18, CS5671 Multidimensional space “The Last Frontier” Optimization Expectation Exhaustive search Random sampling “Probabilistic random” sampling.

Slides:



Advertisements
Similar presentations
Contrastive Divergence Learning
Advertisements

Neural and Evolutionary Computing - Lecture 4 1 Random Search Algorithms. Simulated Annealing Motivation Simple Random Search Algorithms Simulated Annealing.
Monte Carlo Methods and Statistical Physics
CS 678 –Boltzmann Machines1 Boltzmann Machine Relaxation net with visible and hidden units Learning algorithm Avoids local minima (and speeds up learning)
CHAPTER 8 A NNEALING- T YPE A LGORITHMS Organization of chapter in ISSO –Introduction to simulated annealing –Simulated annealing algorithm Basic algorithm.
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Hidden Markov Models Theory By Johan Walters (SR 2003)
Suggested readings Historical notes Markov chains MCMC details
BAYESIAN INFERENCE Sampling techniques
Optimization methods Morten Nielsen Department of Systems biology, DTU.
Simulated Annealing Student (PhD): Umut R. ERTÜRK Lecturer : Nazlı İkizler Cinbiş
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
MAE 552 – Heuristic Optimization Lecture 8 February 8, 2002.
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
Optimization via Search CPSC 315 – Programming Studio Spring 2009 Project 2, Lecture 4 Adapted from slides of Yoonsuck Choe.
MAE 552 – Heuristic Optimization Lecture 6 February 6, 2002.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Simulated Annealing Van Laarhoven, Aarts Version 1, October 2000.
Optimization via Search CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 4 Adapted from slides of Yoonsuck Choe.
Molecular Modeling: Geometry Optimization C372 Introduction to Cheminformatics II Kelsey Forsythe.
Simulated Annealing G.Anuradha. What is it? Simulated Annealing is a stochastic optimization method that derives its name from the annealing process used.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Advanced methods of molecular dynamics Monte Carlo methods
Geometry Optimisation Modelling OH + C 2 H 4 *CH 2 -CH 2 -OH CH 3 -CH 2 -O* 3D PES.
1 IE 607 Heuristic Optimization Simulated Annealing.
Free energies and phase transitions. Condition for phase coexistence in a one-component system:
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory Mixed Integer Problems Most optimization algorithms deal.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Monte Carlo Methods1 T Special Course In Information Science II Tomas Ukkonen
Simulated Annealing.
Markov Chain Monte Carlo and Gibbs Sampling Vasileios Hatzivassiloglou University of Texas at Dallas.
Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.
Iterative Improvement Algorithm 2012/03/20. Outline Local Search Algorithms Hill-Climbing Search Simulated Annealing Search Local Beam Search Genetic.
Simulated Annealing G.Anuradha.
559 Fish 559; Lecture 5 Non-linear Minimization. 559 Introduction Non-linear minimization (or optimization) is the numerical technique that is used by.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Molecular Modelling - Lecture 2 Techniques for Conformational Sampling Uses CHARMM force field Written in C++
MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative.
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
Review Session BS123A/MB223 UC-Irvine Ray Luo, MBB, BS.
Lecture 9 State Space Gradient Descent Gibbs Sampler with Simulated Annealing.
Ramakrishna Lecture#2 CAD for VLSI Ramakrishna
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
CS621: Artificial Intelligence
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
Optimization via Search
Heuristic Optimization Methods
Van Laarhoven, Aarts Version 1, October 2000
Local Search Algorithms
Subject Name: Operation Research Subject Code: 10CS661 Prepared By:Mrs
Markov chain monte carlo
Haim Kaplan and Uri Zwick
CSE 589 Applied Algorithms Spring 1999
5.2.3 Optimization, Search and
More on Search: A* and Optimization
Boltzmann Machine (BM) (§6.4)
Xin-She Yang, Nature-Inspired Optimization Algorithms, Elsevier, 2014
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
Neural Network Training
Markov Networks.
Simulated Annealing & Boltzmann Machines
Stochastic Methods.
Presentation transcript:

Lecture 18, CS5671 Multidimensional space “The Last Frontier” Optimization Expectation Exhaustive search Random sampling “Probabilistic random” sampling

Lecture 18, CS5672 Optimization – “I want the bestest there is” Problem: Given F(x 1, x 2, …x n ), find argmin {x} F(x 1, x 2, …x n ) where x i = Data AND/OR Model AND/OR Parameters and F = P or some other function, e.g., energy Ideal goal: Finding global minimum (xor maximum) –“Finding the happiest person in the whole wide world” –“Finding the deepest trench in the ocean” Ideal approach: Exhaustive search –Guaranteed to find global minimum Brute force –Subject to resource limits (“Will take a lot of diving!”)

Lecture 18, CS5673 Optimization - Random Sampling Spectrum of Approaches: –Exhaustive…………………………………………………….Random Sampling Random sampling: –Find several local minima starting from random points –Pick the lowest minimum as the approximation of the global minimum –Examples: “Pick a few air travel websites at random, and buy your ticket from the one giving the lowest fare” (finding the cheapest website) “Pick a few casinos at random, and try your luck at each” (finding the luckiest casino in the world) –Problems If the number of websites/casinos is very large, a small sample may be far from global minimum If a large sample is taken, begins to approximate exhaustive search –Example: Monte Carlo algorithm

Lecture 18, CS5674 Optimization Dynamic programming –Equivalent to exhaustive search without having to use brute force –However, applicable only to problems that satisfy principle of optimality Gradient descent: Need to combine with some random/iterative element to find global minimum –Backpropagation in NN training: Classic gradient descent –Line search: Update direction only when minimum along a particular direction is reached (“Go down mid-line of saddle before turning towards flank”) –Second derivative based methods Step size based on second derivative of minimization function, and history of past steps taken Examples: Newton-Raphson, Conjugate gradient (scales better) Evolutionary/Genetic algorithms –Simulation based on principles of biological evolution, with cost function of choice –Benefiting from no holds barred multiple inheritance

Lecture 18, CS5675 Optimization – Simulated Annealing Analogies –Annealing steel to a stable form –Crystallization best with slow evaporation –Using a pogo stick with a bounce that decreases with time, to find lowest valley Cost function has an additional term that is directly proportional to temperature –Relative importance of this term is progressively decreased Heat system to high temperature, i.e., give it a lot of energy (all states become roughly equally probable) Cool system slowly (probability distribution gradually approaches the underlying normal temperature one) Example: –Producing a 3D structural model of a molecule, given diffraction/NMR data (constraints)

Lecture 18, CS5676 Computing Expectations “How great am I?” Problem: Given F(x 1, x 2, …x n ), find E[F(x 1, x 2, …x n )] where x i = Data AND/OR Model AND/OR Parameters and F = P or some other function, e.g., energy Ideal goal: Finding global average –“What is the average happiness in the whole wide world?” –“What is the average depth of the ocean?” Ideal approach: Exhaustive search –Guaranteed to find global average Brute force –Subject to resource limits (“Will take a lot of divers and a lot of diving!”) Typical application: Once you know E[F] –I scored 3. How great am I? If this is soccer or baseball, GREAT! If this is basketball, time for practice…. –What is the statistical significance of a particular value of F?

Lecture 18, CS5677 Expectation - Random Sampling Spectrum of Approaches: –Exhaustive…………………………………………………….Random Sampling Random sampling: –Take the average of the values of the function at the random points –Examples: “Pick a few air travel websites at random, and buy your ticket from the one giving the lowest fare” (get the average price for a market survey) “Pick a few casinos at random, and try your luck at each” (compute the general expectation of winning at a casino) –Problems If the number of websites/casinos is very large and/or highly variable in odds of winning (complex space), an estimate based on a small sample may be far from the global average If a large sample is taken, begins to approximate exhaustive search –Example: Monte Carlo algorithm

Lecture 18, CS5678 Best of Both Worlds - MCMC Pragmatic goal: To approximate expectation –“What is the average happiness in the world, give or take a laugh or two?” –“What is the average depth of the ocean, rounded off to miles?” Pragmatic approach: –Markov Chain Monte Carlo (“Probabilistic random” sampling) –Principles: Monte Carlo E[F(x 1, x 2, …x n )] = Σ {x} F(x 1, x 2, …x n ) P(x 1, x 2, …x n ) is approximated by E[F(x 1, x 2, …x n )] ~ 1/T Σ T F( t x 1, t x 2, … t x n ) Where T refers to transitions from one state of the multi-dimensional variables to another AND Markov Chain approximation State = A particular set of values for {x} Transition to the next state depends only on current values of variables Stationary Markov chain: Constant transition probabilities Ergodic distribution: Average number of transitions between two states is equal in either direction, represented in the Markov chain transition matrix. Thus, following a series of Markov steps in the region will not alter the distribution. (Ergodicity: The distribution converges to this, irrespective of the starting distribution) Represent the distribution as a Markov chain at equilibrium (ergodic) and sample from it

Lecture 18, CS5679 MCMC – Metropolis algorithm Goal: To sample states (and compute F for each of them) based on MCMC Metropolis: –Separate probability of transition from state j to state i into two conditional probabilities q ik : Probability of selecting state i as the next step, while in state k (Clark Kent [while dancing with Lois Lane]: “May I have the next dance, Lana?”) r ik : Having selected state i as a candidate next step, probability of i actually becoming the next step (Lana Lang: “I’d love to, Clark! See you tomorrow, Lois”) Thus, t ik = q ik r ik r ik given by relative probabilities of P(s i ) and P(s k ). If P(s i ) < P(s k ), r ik = P(s i )/P(s k ), else r ik = 1. (If Clark usually dances with Lois Lane, then the probability of him switching partners follows the relative probabilities; if Clark usually dances with Lana, then he should switch!) –Algorithm: For x number of iterations 1.Start in some state s k 2.Pick a possible next state based on q ik 3.Evaluate for acceptance based on r ik 4.If accepted, calculate F(s i ), and back to 1 5.If not accepted, back to 2 Calculate E[F(s)]

Lecture 18, CS56710 MCMC – Gibbs sampling algorithm Goal: To sample states (and compute F for each of them) based on MCMC Gibbs sampling: –At every step, subclassify variables into free and fixed, and evaluate probability of transition. If transition is made, the next state differs from the previous one only in the value of the free variables. –Algorithm: For x number of iterations 1.Start in some state S k = (s 1, s 2, s 3, …… s n ) 2.Make the transition to S i based on P({sub} | s l, s l+1, …… s n-|sub| ) 3.Repeat 2, but with a different variable freed Choice of free variable(s) based on cycling, or probabilistic sampling Calculate E[F(s)]

Lecture 18, CS56711 Optimizing Expectation Expectation Maximization –Example: Baum-Welch algorithm (déjà vu) Given training data {s}, maximize E[P(s|w,M)] “The average probability of a sequence that is part of the model should be as high as possible” or “On the average, the probability of a sequence that is part of the model should be high” –Most useful when both parameters and data are to be optimized. More generally, when two subclasses of parameters need to be optimized together. –General form (GEM) just looks for a higher/lower value, not necessarily maximum/minimum