Bayesian Methods with Monte Carlo Markov Chains III

Slides:



Advertisements
Similar presentations
Introduction to Monte Carlo Markov chain (MCMC) methods
Advertisements

02/12/ a tutorial on Markov Chain Monte Carlo (MCMC) Dima Damen Maths Club December 2 nd 2008.
Maximum Likelihood Estimates and the EM Algorithms II
Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.
1 Maximum Likelihood Estimates and the EM Algorithms II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Bayesian Estimation in MARK
Introduction of Markov Chain Monte Carlo Jeongkyun Lee.
Markov-Chain Monte Carlo
Markov Chains Modified by Longin Jan Latecki
1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.
Markov Chains 1.
Markov Chain Monte Carlo Prof. David Page transcribed by Matthew G. Lee.
11 - Markov Chains Jim Vallandingham.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
The Rate of Concentration of the stationary distribution of a Markov Chain on the Homogenous Populations. Boris Mitavskiy and Jonathan Rowe School of Computer.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Bayesian Reasoning: Markov Chain Monte Carlo
Bayesian statistics – MCMC techniques
Suggested readings Historical notes Markov chains MCMC details
BAYESIAN INFERENCE Sampling techniques
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.
What if time ran backwards? If X n, 0 ≤ n ≤ N is a Markov chain, what about Y n = X N-n ? If X n follows the stationary distribution, Y n has stationary.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Approximate Inference 2: Monte Carlo Markov Chain
Image Analysis and Markov Random Fields (MRFs) Quanren Xiong.
6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.
Bayes Factor Based on Han and Carlin (2001, JASA).
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Maximum Likelihood Estimates and the EM Algorithms I Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Markov Chain Monte Carlo and Gibbs Sampling Vasileios Hatzivassiloglou University of Texas at Dallas.
1 Part 6 Markov Chains. Markov Chains (1)  A Markov chain is a mathematical model for stochastic systems whose states, discrete or continuous, are governed.
1 Bayesian Methods with Monte Carlo Markov Chains I Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Maximum Likelihood Estimates and the EM Algorithms I Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
An Efficient Sequential Design for Sensitivity Experiments Yubin Tian School of Science, Beijing Institute of Technology.
Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Bayesian Reasoning: Tempering & Sampling A/Prof Geraint F. Lewis Rm 560:
Markov Chain Monte Carlo Prof. David Page transcribed by Matthew G. Lee.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
1 Three examples of the EM algorithm Week 12, Lecture 1 Statistics 246, Spring 2002.
To be presented by Maral Hudaybergenova IENG 513 FALL 2015.
CS774. Markov Random Field : Theory and Application Lecture 15 Kyomin Jung KAIST Oct
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Gil McVean, Department of Statistics Thursday February 12 th 2009 Monte Carlo simulation.
Introduction: Metropolis-Hasting Sampler Purpose--To draw samples from a probability distribution There are three steps 1Propose a move from x to y 2Accept.
Maximum Likelihood Estimates and the EM Algorithms III Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.
How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian.
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Introduction to Sampling based inference and MCMC
MCMC Output & Metropolis-Hastings Algorithm Part I
Advanced Statistical Computing Fall 2016
Markov Chain Monte Carlo
Markov chain monte carlo
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Can you figure out where our buzzwords go??
Can you figure out where our buzzwords go??
Presentation transcript:

Bayesian Methods with Monte Carlo Markov Chains III Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm

Part 8 More Examples of Gibbs Sampling

An Example with Three Random Variables (1) To sample (X,Y,N) as follows:

An Example with Three Random Variables (2) One can see that

An Example with Three Random Variables (3) Gibbs sampling Algorithm: Initial Setting: t=0, Sample a value (xt+1,yt+1) from t=t+1, repeat step 2 until convergence.

An Example with Three Random Variables by R 10000 samples with α=2, β=7 and λ=16

An Example with Three Random Variables by C (1) 10000 samples with α=2, β=7 and λ=16

An Example with Three Random Variables by C (2)

An Example with Three Random Variables by C (3)

Example 1 in Genetics (1) Two linked loci with alleles A and a, and B and b A, B: dominant a, b: recessive A double heterozygote AaBb will produce gametes of four types: AB, Ab, aB, ab A B b a a B b A F (Female) 1- r’ r’ (female recombination fraction) M (Male) 1-r r (male recombination fraction) 10 10

Example 1 in Genetics (2) r and r’ are the recombination rates for male and female Suppose the parental origin of these heterozygote is from the mating of . The problem is to estimate r and r’ from the offspring of selfed heterozygotes. Fisher, R. A. and Balmukand, B. (1928). The estimation of linkage from the offspring of selfed heterozygotes. Journal of Genetics, 20, 79–92. http://en.wikipedia.org/wiki/Genetics http://www2.isye.gatech.edu/~brani/isyebayes/bank/handout12.pdf 11 11

Example 1 in Genetics (3) MALE AB (1-r)/2 ab aB r/2 Ab F E M A L AABB (1-r) (1-r’)/4 aABb aABB r (1-r’)/4 AABb AaBb aabb aaBb Aabb r’/2 AaBB (1-r) r’/4 aabB aaBB r r’/4 AabB aAbb AAbb 12 12

Example 1 in Genetics (4) Four distinct phenotypes: A*B*, A*b*, a*B* and a*b*. A*: the dominant phenotype from (Aa, AA, aA). a*: the recessive phenotype from aa. B*: the dominant phenotype from (Bb, BB, bB). b* : the recessive phenotype from bb. A*B*: 9 gametic combinations. A*b*: 3 gametic combinations. a*B*: 3 gametic combinations. a*b*: 1 gametic combination. Total: 16 combinations. 13 13

Example 1 in Genetics (5) 14 14

Example 1 in Genetics (6) Hence, the random sample of n from the offspring of selfed heterozygotes will follow a multinomial distribution: 15 15

Example 1 in Genetics (7) Suppose that we observe the data of y = (y1, y2, y3, y4) = (125, 18, 20, 24), which is a random sample from Then the probability mass function is 16

Example 1 in Genetics (8) How to estimate MME (shown in the last week): http://en.wikipedia.org/wiki/Method_of_moments_%28statistics%29 MLE (shown in the last week): http://en.wikipedia.org/wiki/Maximum_likelihood Bayesian Method: http://en.wikipedia.org/wiki/Bayesian_method

Example 1 in Genetics (9) As the value of is between ¼ and 1, we can assume that the prior distribution of is Uniform (¼,1). The posterior distribution is The integration in the above denominator, does not have a close form.

Example 1 in Genetics (10) We will consider the mean of posterior distribution (the posterior mean), The Monte Carlo Markov Chains method is a good method to estimate even if and the posterior mean do not have close forms.

Example 1 by R Direct numerical integration when We can assume other prior distributions to compare the results of posterior means: Beta(1,1), Beta(2,2), Beta(2,3), Beta(3,2), Beta(0.5,0.5), Beta(10-5,10-5)

Example 1 by C/C++ Replace other prior distribution, such as Beta(1,1),…,Beta(1e-5,1e-5)

Beta Prior

Comparison for Example 1 (1) Estimate Method MME 0.683616 Bayesian Beta(2,3) 0.564731 MLE 0.663165 Beta(3,2) 0.577575 U(¼,1) 0.573931 Beta(½,½) 0.574928 Beta(1,1) 0.573918 Beta(10-5,10-5) 0.588925 Beta(2,2) 0.572103 Beta(10-7,10-7) show below

Comparison for Example 1 (2) Estimate Method Bayesian Beta(10,10) 0.559905 Beta(10-7,10-7) 0.193891 Beta(102,102) 0.520366 0.400567 Beta(104,104) 0.500273 0.737646 Beta(105,105) 0.500027 0.641388 Beta(10n,10n) Not stationary

Part 9 Gibbs Sampling Strategy

Sampling Strategy (1) Strategy I: Run one chain for a long time. After some “Burn-in” period, sample points every some fixed number of steps. The code example of Gibbs sampling in the previous lecture use sampling strategy I. http://www.cs.technion.ac.il/~cs236372/tirgul09.ps Burn-in N samples from one chain

Sampling Strategy (2) Strategy II: Run the chain N times, each run for M steps. Each run starts from a different state points. Return the last state in each run. N samples from the last sample of each chain Burn-in

Sampling Strategy (3) Strategy II by R:

Sampling Strategy (4) Strategy II by C/C++:

Strategy Comparison Strategy I: Strategy II: Perform “burn-in” only once and save time. Samples might be correlated (--although only weakly). Strategy II: Better chance of “covering” the space points especially if the chain is slow to reach stationary. This must perform “burn-in” steps for each chain and spend more time.

Hybrid Strategies (1) Run several chains and sample few samples from each. Combines benefits of both strategies. N samples from each chain Burn-in

Hybrid Strategies (2) Hybrid Strategy by R:

Hybrid Strategies (3) Hybrid Strategy by C/C++:

Part 10 Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm (1) Another kind of the MCMC methods. The Metropolis-Hastings algorithm can draw samples from any probability distribution π(x), requiring only that a function proportional to the density can be calculated at x. Process in three steps: Set up a Markov chain; Run the chain until stationary; Estimate with Monte Carlo methods. http://en.wikipedia.org/wiki/Metropolis-Hastings_algorithm

Metropolis-Hastings Algorithm (2) Let be a probability density (or mass) function (pdf or pmf). f(‧) is any function and we want to estimate Construct P={Pij} the transition matrix of an irreducible Markov chain with states 1,2,…,n, where and π is its unique stationary distribution.

Metropolis-Hastings Algorithm (3) Run this Markov chain for times t=1,…,N and calculate the Monte Carlo sum then Sheldon M. Ross(1997). Proposition 4.3. Introduction to Probability Model. 7th ed. http://nlp.stanford.edu/local/talks/mcmc_2004_07_01.ppt

Metropolis-Hastings Algorithm (4) In order to perform this method for a given distribution π , we must construct a Markov chain transition matrix P with π as its stationary distribution, i.e. πP= π. Consider the matrix P was made to satisfy the reversibility condition that for all i and j, πiPij= πjPij. The property ensures that and hence π is a stationary distribution for P.

Metropolis-Hastings Algorithm (5) Let a proposal Q={Qij} be irreducible where Qij =Pr(Xt+1=j|xt=i), and range of Q is equal to range of π. But π is not have to a stationary distribution of Q. Process: Tweak Qij to yield π. States from Qij not π Tweak States from Pij π

Metropolis-Hastings Algorithm (6) We assume that Pij has the form where is called accepted probability, i.e. given Xt=i,

Metropolis-Hastings Algorithm (7) WLOG for some (i,j), . In order to achieve equality (*), one can introduce a probability on the left-hand side and set on the right-hand side.

Metropolis-Hastings Algorithm (8) Then These arguments imply that the accepted probability must be

Metropolis-Hastings Algorithm (9) M-H Algorithm: Step 1: Choose an irreducible Markov chain transition matrix Q with transition probability Qij. Step 2: Let t=0 and initialize X0 from states in Q. Step 3 (Proposal Step): Given Xt=i, sample Y=j form QiY.

Metropolis-Hastings Algorithm (10) M-H Algorithm (cont.): Step 4 (Acceptance Step): Generate a random number U from Uniform(0,1). Step 5: t=t+1, repeat Step 3~5 until convergence.

Metropolis-Hastings Algorithm (11) An Example of Step 3~5: Qij X1= Y1 X2= Y1 X3= Y3 ‧ ‧ XN Pij Tweak Y1 Y2 Y3 ‧ YN X(t) Y

Metropolis-Hastings Algorithm (12) We may define a “rejection rate” as the proportion of times t for which Xt+1=Xt. Clearly, in choosing Q, high rejection rates are to be avoided. Example: Xt π Y

Example (1) Simulate a bivariate normal distribution:

Example (2) Metropolis-Hastings Algorithm:

Example of M-H Algorithm by R

Example of M-H Algorithm by C (1)

Example of M-H Algorithm by C (2)

Example of M-H Algorithm by C (3)

An Figure to Check Simulation Results Black points are simulated samples; color points are probability density.

Exercises Write your own programs similar to those examples presented in this talk, including Example 1 in Genetics and other examples. Write programs for those examples mentioned at the reference web pages. Write programs for the other examples that you know. 54