Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.

Slides:



Advertisements
Similar presentations
Slice Sampling Radford M. Neal The Annals of Statistics (Vol. 31, No. 3, 2003)
Advertisements

Monte Carlo Methods and Statistical Physics
Bayesian Estimation in MARK
Efficient Cosmological Parameter Estimation with Hamiltonian Monte Carlo Amir Hajian Amir Hajian Cosmo06 – September 25, 2006 Astro-ph/
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
Markov-Chain Monte Carlo
Bayesian Methods with Monte Carlo Markov Chains III
Markov Chains 1.
11 - Markov Chains Jim Vallandingham.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Bayesian Reasoning: Markov Chain Monte Carlo
Bayesian statistics – MCMC techniques
Suggested readings Historical notes Markov chains MCMC details
BAYESIAN INFERENCE Sampling techniques
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Evaluating Hypotheses
Today Introduction to MCMC Particle filters and MCMC
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Lecture II-2: Probability Review
Introduction to Monte Carlo Methods D.J.C. Mackay.
Bayes Factor Based on Han and Carlin (2001, JASA).
1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo
1 Statistical Mechanics and Multi- Scale Simulation Methods ChBE Prof. C. Heath Turner Lecture 11 Some materials adapted from Prof. Keith E. Gubbins:
Priors, Normal Models, Computing Posteriors
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Module 1: Statistical Issues in Micro simulation Paul Sousa.
Monte Carlo Methods1 T Special Course In Information Science II Tomas Ukkonen
Simulated Annealing.
Markov Chain Monte Carlo and Gibbs Sampling Vasileios Hatzivassiloglou University of Texas at Dallas.
Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Bayesian Reasoning: Tempering & Sampling A/Prof Geraint F. Lewis Rm 560:
Confidence Interval Estimation For statistical inference in decision making:
Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
1/18/2016Atomic Scale Simulation1 Definition of Simulation What is a simulation? –It has an internal state “S” In classical mechanics, the state = positions.
CS774. Markov Random Field : Theory and Application Lecture 15 Kyomin Jung KAIST Oct
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
CHAPTER 2.3 PROBABILITY DISTRIBUTIONS. 2.3 GAUSSIAN OR NORMAL ERROR DISTRIBUTION  The Gaussian distribution is an approximation to the binomial distribution.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Copyright © Cengage Learning. All rights reserved. 5 Joint Probability Distributions and Random Samples.
STAT 534: Statistical Computing
CHAPTER 4 ESTIMATES OF MEAN AND ERRORS. 4.1 METHOD OF LEAST SQUARES I n Chapter 2 we defined the mean  of the parent distribution and noted that the.
Bayesian II Spring Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have a large enough sample of the posterior?
Markov-Chain-Monte-Carlo (MCMC) & The Metropolis-Hastings Algorithm P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/19/2016:
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
6/11/2016Atomic Scale Simulation1 Definition of Simulation What is a simulation? –It has an internal state “S” In classical mechanics, the state = positions.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 1 Computational Statistics with Application to Bioinformatics Prof. William.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Introduction to Sampling based inference and MCMC
MCMC Output & Metropolis-Hastings Algorithm Part I
Advanced Statistical Computing Fall 2016
Markov chain monte carlo
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Ch13 Empirical Methods.
Lecture 15 Sampling.
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

Kevin Stevenson AST 4762/5765

What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of high probability rather than uniform sampling  Faster  More efficient  Region is called “phase space”

Phase Space  Space in which all possible states of a system are represented  Each space corresponds to one unique point  Every parameter (or DoF) is represented by an axis  Eg. 3 position vectors (x, y, z) require a 3- dimensional phase space  Eg. Add time to produce a 4-D phase space  Can be represented very easily in Python using arrays

Markov Chain  A stochastic (or random) process having the Markov property  Indeterminate future, evolution is described by probability distributions  “Given the present state, future states are independent of the past states”  In other words…  At a given step, the system has a set of parameters that define its state  At the next step, the system might change states or it might remain in the same state according to a certain probability  Each prospective step is determined ONLY by its current state (no past memory)

Example: Random Walk  Consider a drunk standing under a lamppost trying to get home  He takes a step in a random direction (N, E, S, W), each having equal probability  Having forgotten his previous step, he again takes a step in a random direction  Forms a Markov chain

Random Walk Methods  Metropolis-Hastings algorithm  Vary all parameters simultaneously  Accept step with a certain probability  Gibbs sampling  Special (usually faster) case of M-H  Hold all parameters constant, except one  Vary parameter to find best fit  Choose next parameter and repeat  Slice sampling  Multi-try Metropolis

Avoiding Random Walk  May want stepper to avoid doubling back  Faster convergence  Harder to implement  Methods  Successive over-relaxation  Variation on Gibbs sampling  Hybrid Monte Carlo  Introduces momentum

Metropolis-Hastings Algorithm  Goal: want to estimate model parameters and their uncertainty  M-H algorithm generates a sequence of samples from a probability distribution that is difficult to sample from directly  Distribution may not be Gaussian  May not know distribution at all  How does it generate this set?

Preferential Probability  Want to visit a point x with a probability proportional to some given distribution functions, π(x)  “Probability distribution” or “target density”  Preferentially samples where π(x) is large  Probability distribution:  Probability of x falling within a particular interval  Ergodic  Must, in principle, be able to reach every point in the region of interest

Let Me Propose…  Proposal distribution/density:  Depends on current state, x 1  Generates a new proposed sample, x 2  Must also be ergodic  Can be approximated by a Gaussian centered around x 1  May be symmetric:

Target & Proposal Densities  P(x) = target density  Q(x,x t ) = proposal density

Don’t We All Want To Feel Accepted?  Acceptance probability:  If α ≥ 1:  Accept the proposed step  Current state becomes x 2  If α < 1:  Accept step with probability α  Reject step with probability 1 – α  State remains at x 1

Not Too Hot, Not Too Cold  Acceptance rate: fraction of accepted steps  Want an acceptance rate of 30 – 70%  Too high => slow convergence  Too low => small sample size  Must tune the proposal density, Q, to obtain an acceptable acceptance rate  If Q is Gaussian the we tune the standard deviation, σ  Think of σ as a step size

What Is π?

Where to start  Some starting positions are better than others  The equilibrium distribution is rapidly approached from any starting position, x 0  Proof: Due to ergodicity, choosing any point as the starting point is equivalent to jumping into the equilibrium distribution chain at that particular point in time  Suggestions for choosing a starting point:  Best or mean parameters from previous run  Least squares fit (scipy.optimize)  Several starting locations from corners of phase space

Infinite Iterations  How long do you run MCMC?  As # iterations -> ∞, algorithm converges to a precise value  This is NOT the true value, but the best apparent value for the dataset  Run MCMC long enough to:  Forget initial conditions (burn-in)  Characterize your distribution  Error in your parameter mean is smaller than observed dispersion in you Markov Chain

Burn-in  Need burn-in to “forget” the starting position  Remove AT LEAST the first 2% of the total run length  Better yet, look at your data!

Through The Fire And Flames  Remaining set of states represent a sample from the distribution π(x)  Compute the mean (or median) and error of each parameter in your set  Use every m th step for computations and histograms where m should be longer than the correlation time scale between steps  m ~ 10 – 100  Relation between apparent and true values is indicated by the width of the distribution  Plot histogram to see shape  Fit Gaussian to determine width and, hence, error

Now Here We Stand  Recap  Chosen our proposal distribution, initial parameters and number of iterations  Ran MCMC and removed burn-in portion  Determined the mean/median of the apparent values  Computed their errors  What’s next?  Plug those parameters into the model  Analyze your results (do science!!!)