Presentation is loading. Please wait.

Presentation is loading. Please wait.

Suggested readings Historical notes Markov chains MCMC details

Similar presentations


Presentation on theme: "Suggested readings Historical notes Markov chains MCMC details"— Presentation transcript:

1 Suggested readings Historical notes Markov chains MCMC details
Chap1, Andrieu, C., et al. (2003). An Introduction to MCMC for Machine Learning. Machine Learning, 50, 5–43. Section by Gelman. Markov chains Chap11, David J.C. MacKay. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press. Chap11, Grinstead & Snell, Introduction to Probability, found on internet. Section 7.1, Tan & Fox, (2007) Lecture Notes on Inverse Problems MCMC details Andrieu, C., et al. (2003). An Introduction to MCMC for Machine Learning. Machine Learning, 50, 5–43. Section 11.2~11.6 by Gelman. Section 7.2, Tan & Fox, (2007) Lecture Notes on Inverse Problems

2 Markov Chain Weather in the land of Oz
Land of Oz has unique weather system. Note that next day state is not affected by the previous states. Case study It is R today. What is the probability of S tomorrow ? What is the probability of S, day after tomorrow ? For i’th state, what is the probability of j’th state, day after tomorrow ? After many days like 100 days ? Let the initial state is always R. Calculate states after 3, 10, 100 days. Let the initial state has equal chance. Calculate states after 3, 10, 100 days. If N, likely to S and R the next day, not N again. If R or S, 50% of the same the next day. If changed, only half of this change to N.

3 Markov Chain Remarks For a set of states S={s1,s2,…,sr}, Markov chain is a process that a new state sj is determined by current state si with probability pij. Note that the new state is not affected by the previous states. pij is called transition probabilities, denoted as P(j|i) = pij. The row vector, called probability vector (or distribution), sums to 1. pij(n) of the matrix Pn gives the probability that we have state sj, starting from state si, after n steps. As n→∞, Pn approaches limiting matrix W with all rows the same vector w. w is obtained by using wP=w or w(P-I)=0. in fact, w is left eigenvector of matrix P. Let u be arbitrary starting dist. Then probability distribution after n →∞ steps is uPn = w, which means that regardless of initial u, one gets always the same distribution w after many steps.

4 Markov Chain Monte Carlo
Introduction MCMC is a strategy for generating samples x(i) by exploring the states using a Markov chain mechanism, such that the samples x(i) mimic samples drawn from the target distribution p(x). MCMC is used when it is not possible (or not computationally efficient) to sample q directly from p(q|y). Instead, sample iteratively such that at each step of the process we expect to draw from a distribution that becomes closer and closer to p(q|y). MCMC doesn’t require that p(q|y) be normalized. For a wide class of problems, this appears to be the easiest way to get reliable results.

5 Markov Chain Monte Carlo
Forward process: Markov process Stochastic process x(i) is called Markov chain if Given a transition matrix T, find equilibrium distribution p. Example: Inverse process: MCMC process Given a distribution p, construct transition matrix T of Markov chain. Then we obtain after Markov process, the target distribution p.

6 Df In Markov simulation, sequences of simulation draws are created; each sequence is produced by starting at an arbitrary point , and drawing from a transition distribution . The transition probability distribution should be constructed so that the Markov chain converges to a stationary distribution which is the target of our concern. Recent survey places the Metropolis algorithm as one of the greatest ten algorithms of science and engineering development of 20th century. The algorithm is playing key role in sampling method known as MCMC. These algorithms have played a significant role in statistics, econometrics, physics and computing science over the last two decades.

7 Metropolis-Hastings algorithm
Introduction Most popular MCMC method. Other MCMC algorithms are interpreted as special cases or extensions of this algorithm. MH Algorithm A proposal distribution q(x*|x) is introduced. The MH step is to sample a new candidate x* given the current x based on q(x*|x). Then the point is accepted with probability Otherwise it remains at x. By repeating this steps, one obtains in the end the samples of p(x).

8 Metropolis-Hastings algorithm
Metropolis algorithm A proposal distribution q(x*|x) is symmetric w.r.t x* and x. Then the ratio is simplified. Example: normal pdf at x* with mean at x equals to the vice versa. Practice with matlab Generate samples of this distribution using a proposal pdf As the random walk progresses, the number of samples are increased, and the distribution converges to the target distribution.

9 Metropolis-Hastings algorithm
Notes MH requires careful design of the proposal distribution Allowing asymmetric jump rules can increase the speed of random walk. There is matlab function for MH algorithm. mhsample(1,N,'pdf',p,'proprnd',qrnd,'proppdf',q); Variants of MCMC are Hybrid Monte Carlo Slice sampling Reversible jump MCMC

10 Simulated annealing for global optimization
Algorithm Target distribution is objective function in optimization problem. After sampling via MH algorithm, the optimum solution is obtained as the mode of the simulated distribution, i.e., Using just original function is inefficient. Simulated annealing was developed to enhance speed of finding this, in which pdf value at iteration i is replaced by where Ti is decreasing cooling schedule Thus SA is just a minor modification of standard MCMC algorithms.

11 Simulated annealing for global optimization
Algorithm

12 Gibbs sampler Algorithm
Particular Markov chain useful for multidimensional problems, also called alternating conditional sampling. Let x=(x1,x2,…,xn). Each iteration of the Gibbs sampler cycles through the components, drawing each parameter conditional on the values of all the others. There are thus n steps in an iteration. Each parameter xj is updated conditional on the latest values of the other components which are the iteration i values for the already updated components and the iteration i-1 values for the others not updated yet.

13 Gibbs sampler Practice with matlab
Generate samples of bivariate normal distribution with correlation r. To apply Gibbs, we need conditional distributions, using the properties of multivariate normal distribution ((A.1) or (A.2) on page 579) Gibbs sampling where sample for conditional pdf drawn by matlab function. Gibbs sampling where sample for conditional pdf drawn by MH algorithm.

14 MH Algorithm for two parameters
Remark Gibbs sampler samples alternatively for each component of the parameter vector while MH algorithm samples altogether for the parameter vector. Algorithm is outlined via example.

15 Inference and convergence
Remarks If iterations not proceeded long enough, result can be erroneous. Recommended simulation is to employ multiple sequences with starting points dispersed over the domain. Then monitor the variation between and within the sequence and proceed until the within variation equals the between variation. More detail is addressed in the textbook p296. Early iterations should be discarded. Usually the first half of each sequence is discarded. This discarding activity is referred as ‘burn-in’. One can apply several graphical and statistical tests to assess, roughly, if the chain has stabilized. In general, none of these tests provide entirely satisfactory diagnostics. See another comment on the convergence at Chap. 29 Monte Carlo Methods in David J.C. MacKay. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press.

16 Homework Problem For the 3.7 Example: Bioassay experiment,
Perform Metropolis algorithm of two parameters to obtain samples of the posterior distribution. Draw scatter plot. Draw traces of each parameters and sequence of points in 2-D domain. Compute mean, 95% conf. intervals, variances and correlation of the two parameters. Try different initial point, different proposal density and compare the results 4.


Download ppt "Suggested readings Historical notes Markov chains MCMC details"

Similar presentations


Ads by Google