MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative.

MCMC (Part II) By Marc Sobel

Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative constant of proportionality. Newton- Raphson says that we can pick a point nearer a mode by using the transformation:

Langevin Algorithms  Monte Carlo demands that we explore the distribution rather than simply moving toward a mode. Therefore, we can introduce a noise factor via:  (Note that we have replaced ‘ε’ by σ. We can just use it as is or combine it with a Metropolis Hastings step:

Langevin Algorithm with Metropolis Hastings  The move probability is:

Extending the Langevin to a Hybrid Monte Carlo Algorithm  Instead of moving based entirely on the gradient (with noise added on) we could add  ‘kinetic energy’ via:  Iterate this algorithm. 

Matlab Code for Hybrid MC: A total of Tau steps along the constant energy path  g=gradient(x); (set gradient)  E=log(f(x)); (set energy)  For i=1:L  P=randnorm(size(x));  H=p’*p/2 + E;  gnew=G; xnew=x;  for tau=1:Tau  p=p-epsilon*gnew/2; (make half step in p)  xnew=xnew+epsilon*p (make an x step)  gnew=gradient(xnew); (update gradient)  p=p-epsilon*gnew/2; (make another half step in p)  end  Enew=log(f(xnew)); (find anew value)  Hnew=p’*p/2+Enew; (find new H)  dH=Hnew-H;  if(rand<exp(-dH)) Accept=1; else Accept=0; end  if(Accept==1) H=Hnew; end  end 

Example  Log(f(x))= x 2 +a 2 -log(cosh(ax)); k(p)=p 2 ;

Project  Use Hybrid MC to sample from a multimodal multivariate density. Does it improve simulation?

Monte Carlo Optimization: Feedback, random updates, and maximization  Can monte Carlo help us search for the optimum value of a function. We’ve already talked about simulated annealing. There are other methods as well.

Random Updates to get to the optimum  Suppose we return to the problem of finding modes: Let ζ denote a uniform random variable on the unit sphere, and α x, β x are determined by numerical analytic considerations (see Duflo 1998). (We don’t get stuck using this).

Optimization of a function depending on the data  Minimize the (two-way) KLD between a density q(x) and a Gaussian mixture  f=∑α i φ(x-θ i ) using samples. The two way KLD is:  We can minimize this by first sampling X 1,…,X n from q, and then sampling Y 1,…,Y n from s 0 (x) (assuming it contains the support of the f’s) and minimizing

Example (two-way) KLD  Monte Carlo rules dictate that we can’t sample from a distribution which depends on the parameters we want to optimize. Hence we importance sample the second KLD equation using s 0. We also employ an EM type step involving latent variables Z:

Prior Research  We (Dr Latecki, Dr. Lakaemper and I) minimized the one way KLD between a nonparametric density q and a gaussian mixture. (paper pending)  But note that for mixture models which put large weight on places where the NPD is not well-supported, minimizing may not give you the best possible result. 

Project  Use this formulation to minimize the KLD distance between q (e.g., a nonparametric density based on a data set) and a gaussian mixture.

General Theorem in Monte Carlo Optimization  One way of finding an optimal value for a function f(θ), defined on a closed bounded set, is as follows: Define a distribution:  for a parameter λ which we let tend to infinity. If we then simulate θ 1,…,θ n ≈ h(θ), then 

Monte Carlo Optimization Observe (X 1,…,X n |θ)≈ L(X|θ): Simulate, θ 1,…,θ n from the prior distribution π(θ). Observe (X 1,…,X n |θ)≈ L(X|θ): Simulate, θ 1,…,θ n from the prior distribution π(θ). Define the posterior (up to a constant of proportionality) by, l(θ|X). It follows that, converges to the MLE. Proof uses laplace approximation (see Robert (1993)).

Exponential Family Example  Let X~exp{λθx-λψ(θ)}, and θ~π

Possible Example  It is known that calculating maximum likelihood estimators for the parameters in a k- component mixture model are hard to compute. If, instead maximizing the likelihood, we treat the mixture as a Bayesian model together with a scale parameter λ and an indifference prior, we can (typically) use Gibbs sampling to sample from this model. Letting λ tend to infinity leads to our being able to construct MLE’s.

Project  Implement an algorithm to find the MLE for a simple 3 component mixture model. (Use Robert (1993)).

MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative.

Similar presentations

Presentation on theme: "MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative.

Similar presentations

Presentation on theme: "MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative."— Presentation transcript:

Similar presentations

About project

Feedback