Machine Learning CUNY Graduate Center Lecture 7b: Sampling.

Slides:



Advertisements
Similar presentations
Monte Carlo Methods and Statistical Physics
Advertisements

Expectation Maximization
Supervised Learning Recap
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
Markov-Chain Monte Carlo
Computer Vision Lab. SNU Young Ki Baik An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)
Markov Chains 1.
Markov Networks.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Bayesian Reasoning: Markov Chain Monte Carlo
Graduate School of Information Sciences, Tohoku University
Stochastic approximate inference Kay H. Brodersen Computational Neuroeconomics Group Department of Economics University of Zurich Machine Learning and.
BAYESIAN INFERENCE Sampling techniques
Exact Inference (Last Class) variable elimination  polytrees (directed graph with at most one undirected path between any two vertices; subset of DAGs)
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
1 CE 530 Molecular Simulation Lecture 8 Markov Processes David A. Kofke Department of Chemical Engineering SUNY Buffalo
June 2, MARKOV CHAIN MONTE CARLO: A Workhorse for Modern Scientific Computation Xiao-Li Meng Department of Statistics Harvard University.
Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.
What if time ran backwards? If X n, 0 ≤ n ≤ N is a Markov chain, what about Y n = X N-n ? If X n follows the stationary distribution, Y n has stationary.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
Announcements Homework 8 is out Final Contest (Optional)
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Monte Carlo Methods in Partial Differential Equations.
Introduction to Monte Carlo Methods D.J.C. Mackay.
Bayes Factor Based on Han and Carlin (2001, JASA).
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Perceptual and Sensory Augmented Computing Machine Learning, Summer’09 Machine Learning – Lecture 16 Approximate Inference Bastian Leibe RWTH.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Lecture 19: More EM Machine Learning April 15, 2010.
Module 1: Statistical Issues in Micro simulation Paul Sousa.
Monte Carlo Methods1 T Special Course In Information Science II Tomas Ukkonen
Monte Carlo Methods So far we have discussed Monte Carlo methods based on a uniform distribution of random numbers on the interval [0,1] p(x) = 1 0  x.
4. Numerical Integration. Standard Quadrature We can find numerical value of a definite integral by the definition: where points x i are uniformly spaced.
Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
7. Metropolis Algorithm. Markov Chain and Monte Carlo Markov chain theory describes a particularly simple type of stochastic processes. Given a transition.
CS 188: Artificial Intelligence Bayes Nets: Approximate Inference Instructor: Stuart Russell--- University of California, Berkeley.
Inference Algorithms for Bayes Networks
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
CS774. Markov Random Field : Theory and Application Lecture 15 Kyomin Jung KAIST Oct
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Gil McVean, Department of Statistics Thursday February 12 th 2009 Monte Carlo simulation.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Markov Chain Monte Carlo in R
Introduction to Sampling based inference and MCMC
Markov chain monte carlo
Markov Networks.
Bayesian Models in Machine Learning
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Lecture 15 Sampling.
Expectation-Maximization & Belief Propagation
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
Markov Networks.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Presentation transcript:

Machine Learning CUNY Graduate Center Lecture 7b: Sampling

Today Sampling –Technique to approximate the expected value of a distribution Gibbs Sampling –Sampling of latent variables in a Graphical Model 1

Expected Values Want to know the expected value of a distribution. –E[p(t | x)] is a classification problem We can calculate p(x), but integration is difficult. Given a graphical model describing the relationship between variables, we’d like to generate E[p(x)] where x is only partially observed. 2

Sampling We have a representation of p(x) and f(x), but integration is intractable E[f] is difficult as an integral, but easy as a sum. Randomly select points from distribution p(x) and use these as representative of the distribution of f(x). It turns out that if correctly sampled, only points can be sufficient to estimate the mean and variance of a distribution. –Samples must be independently drawn –Expectation may be dominated by regions of high probability, or high function values 3

Monte Carlo Example Sampling techniques to solve difficult integration problems. What is the area of a circle with radius 1? –What if you don’t know trigonometry? 4

Monte Carlo Estimation How can we approximate the area of a circle if we have no trigonometry? Take a random x and a random y between 1 and -1 –Sample from x and sample from y. Determine if Repeat many times. Count the number of times that the inequality is true. Divide by the area of the square 5

Rejection Sampling The distribution p(x) is easy to evaluate –As in a graphical model representation But difficult to integrate. Identify a simpler distribution, kq(x), which bounds p(x), and sample, x 0, from it. –This is called the proposal distribution. Generate another sample u from an even distribution between 0 and kq(x 0 ). –If u ≤ p(x 0 ) accept the sample E.g. use it in the calculation of an expectation of f –Otherwise reject the sample E.g. omit from the calculation of an expectation of f 6

Rejection Sampling Example 7

Importance Sampling One problem with rejection sampling is that you lose information when throwing out samples. If we are only looking for the expected value of f(x), we can incorporate unlikely samples of x in the calculation. Again use a proposal distribution to approximate the expected value. –Weight each sample from q by the likelihood that it was also drawn from p. 8

Graphical Example of Importance Sampling 9

Markov Chain Monte Carlo Markov Chain: –p(x 1 |x 2,x 3,x 4,x 5,…) = p(x 1 |x 2 ) For MCMC sampling start in a state z (0). At each step, draw a sample z (m+1) based on the previous state z (m) Accept this step with some probability based on a proposal distribution. –If the step is accepted: z (m+1) = z (m) –Else: z (m+1) = z (m) Or only accept if the sample is consistent with an observed value 10

Markov Chain Monte Carlo Goal: p(z (m) ) = p*(z) as m →∞ –MCMCs that have this property are ergodic. –Implies that the sampled distribution converges to the true distribution Need to define a transition function to move from one state to the next. –How do we draw a sample at state m+1 given state m? –Often, z (m+1) is drawn from a gaussian with z (m) mean and a constant variance. 11

Markov Chain Monte Carlo Goal: p(z (m) ) = p*(z) as m →∞ –MCMCs that have this property are ergodic. Transition properties that provide detailed balance guarantee ergodic MCMC processess. –Also considered reversible. 12

Metropolis-Hastings Algorithm Assume the current state is z (m). Draw a sample z* from q(z|z (m) ) Accept probability function Often use a normal distribution for q –Tradeoff between convergence and acceptance rate based on variance. 13

Gibbs Sampling We’ve been treating z as a vector to be sampled as a whole However, in high dimensions, the accept probability becomes vanishingly small. Gibbs sampling allows us to sample one variable at a time, based on the other variables in z. 14

Gibbs sampling Assume a distribution over 3 variables. Generate a new sample for each variable conditioned on all of the other variables. 15

Gibbs Sampling in a Graphical Model The appeal of Gibbs sampling in a graphical model is that the conditional distribution of a variable is only dependent on its parents. Gibbs sampling fixes n-1 variables, and generates a sample for the the n th. If each of the variables are assumed to have easily sample-able distributions, we can just sample from the conditionals given by the graphical model given some initial states. 16

Next Time Perceptrons Neural Networks 17