Markov Chain Monte Carlo and Gibbs Sampling Vasileios Hatzivassiloglou University of Texas at Dallas.

Slides:



Advertisements
Similar presentations
. Markov Chains. 2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution.
Advertisements

Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.
Simulated Annealing Premchand Akella. Agenda Motivation The algorithm Its applications Examples Conclusion.
Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.
Computer Vision Lab. SNU Young Ki Baik An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)
Bayesian Methods with Monte Carlo Markov Chains III
Markov Chains 1.
Markov Chain Monte Carlo Prof. David Page transcribed by Matthew G. Lee.
11 - Markov Chains Jim Vallandingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Suggested readings Historical notes Markov chains MCMC details
BAYESIAN INFERENCE Sampling techniques
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
1 CE 530 Molecular Simulation Lecture 8 Markov Processes David A. Kofke Department of Chemical Engineering SUNY Buffalo
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Gibbs sampling for motif finding in biological sequences Christopher Sheldahl.
Local search algorithms
Local search algorithms
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
MAE 552 – Heuristic Optimization Lecture 8 February 8, 2002.
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
Optimization via Search CPSC 315 – Programming Studio Spring 2009 Project 2, Lecture 4 Adapted from slides of Yoonsuck Choe.
MAE 552 – Heuristic Optimization Lecture 6 February 6, 2002.
1 Markov Chains Algorithms in Computational Biology Spring 2006 Slides were edited by Itai Sharon from Dan Geiger and Ydo Wexler.
Simulated Annealing Van Laarhoven, Aarts Version 1, October 2000.
The Human Genome (Harding & Sanger) * *20  globin (chromosome 11) 6*10 4 bp 3*10 9 bp *10 3 Exon 2 Exon 1 Exon 3 5’ flanking 3’ flanking 3*10 3.
Monte Carlo Methods in Partial Differential Equations.
Introduction to Simulated Annealing 22c:145 Simulated Annealing  Motivated by the physical annealing process  Material is heated and slowly cooled.
Optimization via Search CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 4 Adapted from slides of Yoonsuck Choe.
Simulated Annealing G.Anuradha. What is it? Simulated Annealing is a stochastic optimization method that derives its name from the annealing process used.
Introduction to Monte Carlo Methods D.J.C. Mackay.
6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.
1 IE 607 Heuristic Optimization Simulated Annealing.
Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory Mixed Integer Problems Most optimization algorithms deal.
Markov Decision Processes1 Definitions; Stationary policies; Value improvement algorithm, Policy improvement algorithm, and linear programming for discounted.
Module 1: Statistical Issues in Micro simulation Paul Sousa.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Simulated Annealing.
Bayesian Reasoning: Tempering & Sampling A/Prof Geraint F. Lewis Rm 560:
MS Sequence Clustering
Simulated Annealing G.Anuradha.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
Local search algorithms In many optimization problems, the state space is the space of all possible complete solutions We have an objective function that.
The Markov Chain Monte Carlo Method Isabelle Stanton May 8, 2008 Theory Lunch.
1 Motifs for Unknown Sites Vasileios Hatzivassiloglou University of Texas at Dallas.
COMS Network Theory Week 5: October 6, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.
Lecture 18, CS5671 Multidimensional space “The Last Frontier” Optimization Expectation Exhaustive search Random sampling “Probabilistic random” sampling.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
The Monte Carlo Method/ Markov Chains/ Metropolitan Algorithm from sec in “Adaptive Cooperative Systems” -summarized by Jinsan Yang.
Optimization via Search
Simulated Annealing Chapter
Heuristic Optimization Methods
Van Laarhoven, Aarts Version 1, October 2000
Advanced Statistical Computing Fall 2016
Artificial Intelligence (CS 370D)
Markov chain monte carlo
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Haim Kaplan and Uri Zwick
Boltzmann Machine (BM) (§6.4)
Xin-She Yang, Nature-Inspired Optimization Algorithms, Elsevier, 2014
Artificial Intelligence
Stochastic Methods.
Presentation transcript:

Markov Chain Monte Carlo and Gibbs Sampling Vasileios Hatzivassiloglou University of Texas at Dallas

2 Markov chains A subtype of random walks (not necessarily uniform) where the entire memory of the system is contained in the current state Described by a transition matrix P, where p ij is the probability of going from state i to state j Very useful for describing stochastic discrete systems

3 Markov chain example CG A T p AC p TC p CT p TG p GT p GA p AG p CA p TA p AT p CG p GC p CC p AA p GG p TT

4 Example application of Markov chains Markov chains model dependencies across time or position The model assigns a probability to each sequence of observed data and can be used to measure how likely an observed sequence is to follow it The GeneMark algorithm uses 5th-order Markov chains (why?) to find genes (distinguishing them from other regions in the DNA)

5 Markov Chains – Stationary Distribution Under general assumptions (irreducibility and aperiodicity), Markov chains have a stationary distribution π, the limit of P k as k goes to infinity An irreducible MC has no identical states An aperiodic MC can reach any state from any other state Such a Markov chain is called ergodic

6 Markov Chain Monte Carlo Used as a means to guide the selection of samples We want the stationary distribution of the Markov chain to be the distribution we are sampling from In many cases, this is relatively easy to do

7 Gibbs sampling A special case of MCMC where the conditional probabilities on specific variables can be calculated easily, but the joint probability must be sampled from

8 Gibbs sampling in our problem Start with a single candidate S={S 1,...,S k } where each S i chosen randomly and uniformly from input sequence i Calculate A and D(A||B) for S Choose one member of S randomly to remove Choose an alternative (from the corresponding sequence) with probability proportional to the corresponding D(A||B) Repeat until D(A||B) converges

9 Exploring alternative strings When we replace a string from sequence i –We examine in turn each of the m-n+1 strings that that sequence could offer –For each such string, we add it temporarily to S and calculate the new A and D(A||B) –Then we assign to each string S ij (j varies across these strings) probability May pick a “worse” string, or the same string we just removed

10 Gibbs sampler convergence Return best S seen across all iterations (may not be the last one) Stop after a fixed number of iterations, or when D(A||B) does not change very much Solution is sensitive to the starting S, so we typically run the algorithm several (thousand) times from different starting points

11 Complexity of Gibbs sampler Construct initial S and calculate A and D(A||B) in O(kn) time Each iterative step takes O(n) time to remove a string and recalculate D(A||B), O(mn) time to calculate the probabilities of the m-n+1 alternatives Total time is O(mnd) where d is the number of iteration steps (dm>>k), multiplied by the number of random restarts

12 Why Gibbs sampling works Retains elements of the greedy approach –weighing by relative entropy makes likely to move towards locally better solutions Allows for locally bad moves with a small probability, to escape local maxima

13 Variations in Gibbs sampling Discard substrings non-uniformly (weighed by relative entropy, analogous to subsequent selection of new string) Use simulated annealing to reduce chance of making a bad move (and gradually ensure convergence)

14 Annealing Annealing is a process in metallurgy for improving metals by increasing crystal size and reducing defects The process works by heating the metal and controlled cooling which lets the atoms go through a series of states with gradually lower internal energy The goal is to have the metal settle in a configuration with lower than the original internal energy

15 Simulated Annealing Simulated annealing (SA) adopts an energy function equal to the function we want to minimize Transitions between neighboring states are adopted with probability specified by a function f of the energy gain ΔE=E new -E old and the temperature T f(ΔE,T)>0 even if ΔE>0, but as T→0, f(ΔE,T)→1 if ΔE 0

16 Simulated Annealing Original f (from the Metropolis-Hasting algorithm) T controls acceptance of locally bad solutions The annealing schedule is a process for gradually reducing T so that eventually only good moves are accepted

17 Special cases If T is always zero, –simulated annealing reduces to greedy local optimization If T is constant but non-zero, –simulated annealing reduces to the process we described for Gibbs sampling (choose solutions randomly with probability proportional to their improvement)