Presentation is loading. Please wait.

Presentation is loading. Please wait.

The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.

Similar presentations


Presentation on theme: "The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010."— Presentation transcript:

1 The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010

2 Overview This paper proposes a highly analytically tractable horseshoe estimator that is more robust and adaptive to different sparsity patterns than existing approaches. Two theorems are proved characterizing the proposed estimator’s tail robustness and demonstrating super-efficient rate of convergence to the correct estimate of the sampling density in sparse situation. The proposed estimator’s performance is demonstrated using both real and simulated data. The authors show its answer correspond quite closely to those obtained by Bayesian model averaging.

3 Consider a p-dimensional vector where is sparse, the authors propose the following model for estimation and prediction: where is a standard half-Cauchy distribution with mean 0 and scale parameter a. The name horseshoe prior arises from the observation that, for fixed values where and is the amount of shrinkage toward zero, a posteriori. has a horseshoe shaped prior. The horseshoe estimator

4 The meaning of is as follows: yields virtually no shrinkage, and describes signals while yields near total shrinkage and (hopefully) describes noise. At right is the prior on the shrinkage coefficient.

5 The horseshoe density function An analytic density function lacks an analytic form, but very tight bounds are available: Theorem 1. The univariate horseshoe density satisfies the following: (a) (b) For where Alternatively, it is possible to integrate over yielding though the dependence among causes more issues. Therefore the authors do not take this approach.

6 Horseshoe estimator for sparse signals

7 Review of similar methods Scott & Berger (2006) studied the discrete mixture where Tipping (2001) studied the Student-t prior is defined by an inverse-gamma mixing density, The double-exponential prior (Bayesian lasso) has mixing density

8 Review of similar methods The normal-Jeffreys prior is an improper prior and is induced by placing the Jeffreys’ prior on each variance term leading to. This choice is commonly used in the absence of a global scale parameter. The Strawderman-Berger prior does not have an analytic form, but arises from assuming, with The normal-exponential-gamma family of priors generalizes the lasso specification using to mix over the exponential rate parameter, leading to

9 Review of similar methods Shrinkage of noiseTail robustness of prior

10 Robustness to large signals Theorem 2. Let be the likelihood, and suppose that is a zero-mean scale mixture of normals: with having proper prior. Assume further that the likelihood and are such that the marginal density is finite for all. Define the following three pseudo-densities, which may be improper: Then

11 If is a Gaussian likelihood, then the result of Theorem 2 reduces to A key result of Theorem 2 is that if the prior on is chosen such that the derivative of the log probability leads to the derivative of the log predictive probability that is bounded at 0 at large. This happens for heavy-tailed priors, including the proposed horseshoe prior. This yields Robustness to large signals

12 The horseshoe score function Theorem 3. Suppose. Let denote the predictive density under the horseshoe prior for known scale parameter, i.e. where and. Then for some that depends upon, and Corollary: Although the horseshoe prior has no analytic form, it does lead to the following posterior mean where is a degenerate hypergeometric function of two variables.

13 Estimating The conditional posterior distribution of is approximately if dimensionality p is large. This approximately yields a distribution for where. If most observations are shrunk toward 0, then will be small with high probability.

14 Comparison to double exponential

15 Super-efficient convergence Theorem 4. Suppose the true sampling model is. Then: (1) For under the horseshoe prior, the optimal rate of convergence of when is where b is a constant. When, the optimal rate is. (2) Suppose is any other density that is continuous, bounded above, and strictly positive on a neighborhood of the true value. For under, the optimal rate of convergence of, regardless of, is

16 Example - simulated data Data generated from

17 Example-Vanguard mutual-fund data Here, the authors show how the horseshoe can provide a regularized estimate of a large covariance matrix whose inverse may be sparse. Vanguard mutual funds dataset containing n = 86 weekly returns for p = 59 funds. Suppose the observation matrix is with each p-dimensional vector is drawn from a zero-mean Gaussian with covariance matrix. We will model the Cholesky decomposition of.

18 Example-Vanguard mutual-fund data The goal is to estimate the ensemble of regression models in the implied triangular system, where is the column of Y. The regression coefficients are assumed to have a Horseshoe prior, and posterior means were computed using MCMC.

19 Conclusions This paper introduces the horseshoe prior as a good default prior for sparse problems. Empirically, the model performs similarly to Bayesian model averaging, the current standard. The model exhibits strong global shrinkage and robust local adaptation to signals.


Download ppt "The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010."

Similar presentations


Ads by Google