Presentation is loading. Please wait.

Presentation is loading. Please wait.

Markov Chain Monte Carlo methods --the final project of stat 6213

Similar presentations


Presentation on theme: "Markov Chain Monte Carlo methods --the final project of stat 6213"โ€” Presentation transcript:

1 Markov Chain Monte Carlo methods --the final project of stat 6213
Jian Sun; Zichen Zhang The George Washington University

2 Content 1. Definition of MCMC
2. the process of MCMC, Hastings-Metropolis Algorithm and Gibbs sampling 3. the discrete and continuous case of Hastings-Metropolis Algorithm 4. burn in

3 Problems ๐œƒ๏ผ๐ธ[โ„Ž(๐‘‹)]= ๐‘–=1 ๐‘› โ„Ž(๐‘ฅ๐‘–) P{X=xi} How to get ๐œƒ ? Sometimes we cannot calculate ๐œƒ since h(x) is hard to evaluate.

4 Background History A brief history
: Real use of MC started during the WWIIโ€จ--- study of atomic bomb (neutron diffusion in fissile material) 1948: Fermi, Metropolis, Ulam obtained MC estimates for the eigenvalues of the Schrodinger equations. 1950s: Formating of the basic construction of MCMC, e.g. the Metropolis method --- applications to statistical physics model, such as Ising model : Using MCMC to study phase transition; material growth/defect, macro molecules (polymers), etc. 1980s: Gibbs samplers, Simulated annealing, data augmentation, Swendsen- Wang, etc global optimization; image and speech; quantum field theory, 1990s: Applications in genetics; computational biology.

5 Definition 1 Markov Chain: Let {Xn, n = 0, 1, 2, , } be a stochastic process that takes on a finite or countable number of possible values. Unless otherwise mentioned, this set of possible values of the process will be denoted by the set of nonnegative integers {0, 1, 2, . . .}. If X n = i , then the process is said to be in state i at time n. We suppose that whenever the process is in state i, there is a fixed probability Pij that it will next be in state j. That is, we suppose that P{Xn+1 = j | Xn = i, Xnโˆ’1 = inโˆ’1,..., X1 = i1, X0 = i0} = Pij for all states i0,i1,...,inโˆ’1,i, j and all n >= 0. Such a stochastic process is known as a Markov chain.

6 Definition 2 Monte Carlo Method: Monte Carlo methods (or Monte Carlo experiments) are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. In principle, Monte Carlo methods can be used to solve any problem having a probabilistic interpretation. By the law of large numbers, integrals described by the expected value of some random variable can be approximated by taking the empirical mean (a.k.a. the sample mean) of independent samples of the variable.

7 When the probability distribution of the variable is parameterized, mathematicians often use a Markov Chain Monte Carlo (MCMC) sampler.

8 Definition of MCMC MCMC
The central idea is to design a judicious Markov chain model with a prescribed stationary probability distribution. By the Ergodic theorem, the stationary distribution is approximated by the empirical measures of the random states of the MCMC sampler.

9 In statistics, Markov chain Monte Carlo (MCMC) methods are a class of algorithms for sampling from a probability distribution based on constructing a Markov chain that has the desired distribution as its equilibrium distribution. The state of the chain after a number of steps is then used as a sample of the desired distribution. The quality of the sample improves as a function of the number of steps.

10 Markov Chain We use the stationary probabilities of Markov Chain to do random sampling in sample space.

11 Monte Carlo Algorithm We use Monte Carlo Algorithm to simulate and get random sampling, then we do integration by the sample we get.

12 Generate a random number
1. Inverse function method 2. Transformation 3. Acceptance-rejection method 4. R

13 The process of MCMC 1. We construct a Markov Chain with stationary probabilities ๐œ‹(๐‘ฅ) and transition probabilities Pi,j in state space D. 2. According to the Markov Chain in step 1, we get point sequence X(1),โ€ฆ,X(n) from the selected point X(0) among D. 3. For some m and very large n, we estimate any function f(x) by the following equation: Enf= 1 ๐‘›โˆ’๐‘š ๐‘–=๐‘š+1 ๐‘› ๐‘“(๐‘‹(๐‘–)) f(x) should meet two conditions: 1 non-negative; 2 the integral is 1.

14 Sample output

15 The advantage of MCMC 1. it is used to solve wide-ranging and difficult problem 2. the increasing dimensions in problem will not reduce the speed of convergence or make problem more complex. 3. calculate high dimensional integral 4. solve algebraic equations 5. calculate inverse matrix

16 Hastingsโ€“Metropolis algorithm
In statistics and in statistical physics, the Metropolisโ€“ Hastings algorithm is a Markov chain Monte Carlo (MCMC) method for obtaining a sequence of random samples from a probability distribution for which direct sampling is difficult. This sequence can be used to approximate the distribution (i.e., to generate a histogram), or to compute an integral (such as an expected value).

17 The purpose of HM algorithm
It can be used to generate a time reversible Markov chain whose stationary probabilities are ฯ€(j) = b(j)/B, j = 1,2,...

18 The principal of Hastingsโ€“Metropolis algorithm
1. construct a proposal distribution g(.|Xt) (meeting irreducible, aperiodic and positive recurrent) g (.|Xt) makes the stationary process of Markov Chain be sampling distribution f 2 get X0 from g 3 repeat (a) generate Y from g (.|Xt) ; (b) generate U from Uni(0,1); (c) if U โ‰ค ๐‘“ ๐‘Œ g(Xt|Y) ๐‘“ Xt ๐‘”(๐‘Œ|๐‘‹๐‘ก) , we accept Y and let Xt+1 =Y, otherwise Xt+1=Xt .

19 (d) increase t, do (a) again
The acceptable probability in above algorithm is ๐›ผ(๐‘‹๐‘ก,๐‘Œ)=min(1, ๐‘“ ๐‘Œ g(Xt|Y) ๐‘“ Xt ๐‘”(๐‘Œ|๐‘‹๐‘ก) )

20 A General Case General Case select x (choose initial state) for t=1 to N do (loop this for n times) y~g(.|xt-1) (generate a new state) h(xt-1,y)=min{1,f(y)g(xt-1|y)/f(xt-1)g(y|xt-1) (calculate the acceptance possibility) If r~U(0,1)โ‰คh(xt-1,y) then xtโ†y else xtโ†xt-1 end if end for draw histogram of Xn

21 A Discrete Case๏ผš Simulate the game of rolling 2 dice
d=zeros(1,20000); x=5; for i=1: U=rand; if x = 2; if U< y=3; else y=2; end elseif x = 12; if u< y=11; else y=12; end else if U< y=x-1; else y=x+1; end end h=min(1,f(y)/f(x)); U= rand; if U<h x=y; end d(i)=x; end a=1:1:12; hist(d,a)

22 A Continuous Case f = inline(โ€˜o.5*x*x*exp(-x)โ€™); d = zeros(1,40000); x = 2; for i = 1: y = x-1+2*rand; if y< y=x; end A = [1 f(y)*f(x)]; h = min(A); U = rand; if U<h x=y; end d(i)=x; end a=0:0.08:20; hist(d(20001:40000),a)

23 Some version of HM algorithm
1 random walk metropolis 2 independence sampler 3 Gibbs sampling (it is used when space is high dimension. We would not update the whole Xt , but do updating to component one by one)

24 Gibbs sampler Gibbs sampling, in its basic incarnation, is a special case of the Metropolisโ€“Hastings algorithm. The point of Gibbs sampling is that given a multivariate distribution it is simpler to sample from a conditional distribution than to marginalize by integrating over a joint distribution.

25 Application Scenario 1. approximate the joint distribution (e.g., to generate a histogram of the distribution); 2. approximate the marginal distribution of one of the variables, or some subset of the variables (for example, the unknown parameters or latent variables); 3. compute an integral (such as the expected value of one of the variables).

26 The principal of Gibbs sampling
1 let Xt=(Xt,1,โ€ฆ,Xt,k) denote the tth state of Markov Chain let X t,-i=(Xt,1,โ€ฆ,Xt,i-1,Xt,i+1,โ€ฆ,Xt,k) denote the other component except i in tth state . 2 f(x)= f(x1,โ€ฆ,xk) is the target distribution f(xi|x-i)= ๐‘“(๐‘ฅ) ๐‘“ ๐‘ฅ1,..,๐‘ฅ๐‘˜ ๐‘‘๐‘ฅ๐‘– is the conditional distribution given x-i . 3 Xt,i is the ith component of Xt after t times iteration. We update Xt,i by using HM algorithm in the t+1 th iteration. X*t,-i=(Xt,1,โ€ฆ,Xt,i-1,Xt,i+1,โ€ฆ,Xt,k) ๐›ผ(Xโˆ—t,โˆ’๐‘–,๐‘‹๐‘ก,๐‘–,๐‘Œ๐‘–)=min{1, ๐‘“ ๐‘Œ๐‘–|Xโˆ—t,โˆ’๐‘– qi(Xโˆ—t,๐‘– |Yi,Xโˆ—t,โˆ’๐‘–) ๐‘“ ๐‘‹๐‘ก,๐‘–|Xโˆ—t,โˆ’๐‘– qi(Yi|๐‘‹๐‘ก,๐‘–,Xโˆ—t,โˆ’๐‘–) }

27 If we accept Yi, then Xt+1,i = Yi
Otherwise Xt+1=Xt,i

28 Burn in MCMC depends on the convergence of simulation. But it is hard to find a satisfying method to judge convergence

29 Thank You


Download ppt "Markov Chain Monte Carlo methods --the final project of stat 6213"

Similar presentations


Ads by Google