Presentation is loading. Please wait.

Presentation is loading. Please wait.

RSC 20031 Priors Trevor Sweeting Department of Statistical Science University College London.

Similar presentations

Presentation on theme: "RSC 20031 Priors Trevor Sweeting Department of Statistical Science University College London."— Presentation transcript:

1 RSC Priors Trevor Sweeting Department of Statistical Science University College London

2 RSC Structure of talk  Bayesian inference: the basics  Specification of the prior  Examples  Subjective priors  Nonsubjective priors  Examples  Methods of prior construction  Coverage probability bias  Relative entropy loss  Wrap-up

3 RSC Bayesian inference: the basics  X – the experimental or observational data to be observed Y – the future observations to be predicted  Data model  (Possibly improper) prior distribution  The posterior density of  is  Posterior density  Prior density x Likelihood function  posterior probabilities, moments, marginal densities, expected losses, predictive densities...

4 RSC Bayesian inference  The predictive density of Y given X is   Where are we?...  Philosophical basis  Practical implementation  Prior construction...

5 RSC Specification of the prior  Approaches vary  from fully Bayesian analyses based on fully elicited subjective priors  to fully frequentist analyses based on nonsubjective (‘objective’) priors Fully Bayesian Fully Frequentist SubjectiveElicited prior MixedPerformancePenalty fn NonsubjectiveDefault priorDual verificationPerformance

6 RSC Examples  Four examples  All taken from Applied Statistics, 52 (2003)  Competing risks  Image analysis  Diagnostic testing  Geostatistical modelling

7 RSC Competing risks (Basu and Sen)  System failure data; cause of failure not identified  n systems, R competing risks  Datum for each system is (T, S, C)  T is failure time, S are the possible causes of failure, C is a censoring indicator  Parameters in the model are of location & scale type  Use (i) informative conjugate priors  Source: historical data  or (ii) ‘noninformative’ priors  Such that they have a ‘minimal effect’ on the analysis  Implementation: via Gibbs sampling

8 RSC Image analysis (Dryden, Scarr and Taylor)  Segmentation of weed and crop textures  Automatic identification of weeds in images of row crops  Parameters are (k, C,  )  k is the number of texture components, C are texture labels,  are parameters associated with the distribution of pixel intensities  Highly structured prior for (k, C,  )  Markov random field for C, truncated conjugate priors for   Hyperparameters set in context e.g. to ‘encourage relatively few textures’  Implementation: via Markov chain Monte Carlo

9 RSC Diagnostic testing (Georgiadie, Johnson, Gardner and Singh)  Multiple-test screening data models are unidentifiable  A Bayesian analysis therefore depends critically on prior information  Parameters consist of various (at least 8) joint sensitivity and specificity probabilities  Independent beta priors; two informative, the rest noninformative  Investigate coverage performance and sensitivities for various choices of prior  Implementation: via Gibbs sampling

10 RSC Geostatistical modelling (Kammann and Wand)  Geostatistical mapping to study geographical variability of reproductive health outcomes (disease mapping)  Geoadditive models  Universal kriging model involves a stationary zero- mean stochastic process over sites  leads to ‘borrowing strength’  Non-Bayesian analysis, but model could be formulated in a Bayesian way, with the mean responses at the given sites having a multivariate normal prior  Implementation: residual ML and splines

11 RSC Table for examples Fully Bayesian Fully Frequentist Subjective Nonsubjective Competing risks Image analysis Diagnostic testing Geostatistical modelling

12 RSC Subjective priors  To some extent, all the previous examples included subjective prior specification  Methods of elicitation  Industrial and medical contexts  Scientific reporting  Range of prior specifications; conduct sensitivity analyses

13 RSC Subjective priors  Psychological research: should take account of when devising methods for prior elicitation  Construction of questions  Anchors  Probability assessment by frequency  Availability; inverse expertise effect  Priors are often ‘too narrow’ Experimental Psychology, Behavioural Decision Making, Management Science, Cognitive Psychology

14 RSC Nonsubjective priors  Nonsubjective (‘objective’) priors: why?  Sensible default priors for non-experts (and experts!)  Recognise basis often weak  Possible nasty surprises!  Reference priors for regulatory bodies  Clinical trials, industrial standards, official statistics  Safe default priors for high-dimensional problems  Priors more difficult to specify and possibly more severe effect

15 RSC Nonsubjective priors Some general problems  Improper priors  Improper posteriors  E.g. Hierarchical models  Marginalisation and sampling theory paradoxes  Dutch books  Inconsistency  Posterior doesn’t concentrate around true value asymptotically  Inadmissibility  of Bayes decision rules/estimators

16 RSC Nonsubjective priors  Proper ‘diffuse’ priors  Near-impropriety of posterior  Unintended large impact on posterior  Example to follow...  Arbitrary choice of hyperparameters  Non-objectivity  Lack of invariance  Egg on face... Two examples...

17 RSC WinBUGS - the Movie!  Data: 529.0, 530.0, 532.0, 533.1, 533.4, 533.6, 533.7, 534.1, 534.8,  Prior parameters: a = b = c =  Relatively diffuse prior  Results... (  is the precision)

18 RSC WinBUGS - the Movie! Just another few iterations to make sure...

19 RSC WinBUGS - the Movie! Oops!

20 RSC WinBUGS - the Movie! Effect of choice of c (the prior precision of  )  c = WinBUGS eventually gets the ‘right’ answer  but presumably not the answer we wanted!  The ‘noninformative’ prior dominates the likelihood.

21 RSC WinBUGS - the Movie!  c = WinBUGS gives the ‘right’ answer with the likelihood dominating  However, it's the ‘wrong’ answer as the true marginal posterior of  is still dominated by the prior

22 RSC WinBUGS - the Movie!  c = WinBUGS again gives the ‘right’ answer with the likelihood dominating  But it's still the ‘wrong’ answer  The true marginal posterior distribution of  is bimodal

23 RSC WinBUGS - the Movie!  c = WinBUGS gives the right answer ... and presumably the one we wanted! Care needed in the choice of prior parameters in diffuse but proper priors

24 RSC Normal regression  Conjugate prior:  Limit as is   Jeffreys' prior  Here gives exact matching in both posterior and predictive distributions (  is the precision)

25 RSC Normal regression  Data: n = 25, R = residual sum of squares =

26 RSC Normal regression  Prediction. Let Y be a future observation and let denote the ‘usual’ predictive pivotal quantity. Then Prediction less sensitive to prior than estimation

27 RSC Methods of prior construction  Limits of proper priors  Uniform priors/choice of scale  Data-translated likelihood  Constant asymptotic precision  Canonical parameterisation  Coverage Probability Bias  Decision-theoretic

28 RSC Coverage probability bias  Sometimes investigated in papers via simulation (cf. the diagnostic testing example) Parametric CPB  When do Bayesian credible intervals have the correct frequentist coverage?  In regular one-parameter problems, ‘matching’ is asymptotically achieved by Jeffreys' prior (Welch and Peers, 1963)  In multiparameter families cannot in general achieve matching for all marginals using the same prior  Usually contravenes the likelihood principle (see Sweeting, 2001 for a discussion)  Avoid infinite confidence sets! (e.g. ratios of parameters)

29 RSC Coverage probability bias Predictive CPB  When do Bayesian predictive intervals have the correct frequentist coverage?  In regular one-parameter problems, there exists a unique prior for which there's no asymptotic CPB... ... but in general this depends on the probability level  !  If there does exist a matching prior that is free from  then it is Jeffreys' prior (Datta, Mukerjee, Ghosh and Sweeting, 2000)  In the multiparameter case, if there exists a matching prior then it is usually not Jeffreys' prior

30 RSC Relative entropy loss  The ‘reference prior’ (Bernardo, 1979) maximises the Shannon mutual information between  and X  Maximises the ‘distance’ between the prior and posterior; minimal effect of the prior  Also arises as an asymptotically minimax solution under relative entropy loss (Clarke and Barron, 1994, Barron, 1998)

31 RSC Relative entropy loss  Define the prior-predictive regret  Minimax/reference prior solution for the full parameter is usually Jeffreys' prior  Bernardo argues that when nuisance parameters are present the reference prior should depend on which parameter(s) are considered to be of primary interest

32 RSC Relative entropy loss A predictive relative entropy approach  Geisser (1979) suggested a predictive information criterion introduced by Aitchison (1975)  Standard argument for using log q(Y) as an operational/default utility function for q as a predictive density for a future observation Y (c.f. Good, 1968)

33 RSC Relative entropy loss  is the expected regret under the loss function associated with using the predictive density when Y arises from  Appropriate object to study for constructing objective prior distributions when we are interested in predictive performance of  under repeated use or under alternative subjective priors   Define

34 RSC Relative entropy loss  Now define the predictive relative entropy loss (PREL)  where J is Jeffreys’ prior  Studying the behaviour of the regret over  in sets of constant 'predictive information' is equivalent to studying the behaviour of the PREL

35 RSC Relative entropy loss

36 RSC Relative entropy loss  Under suitable regularity conditions we get  Although the defined loss functions cover an infinite variety of possibilities for (a) amount of data to be observed and (b) predictions to be made, they are all approximately equivalent to provided that a sufficient amount of data is to be observed.  Call the (asymptotic) predictive loss

37 RSC Relative entropy loss  More generally define  represents the asymptotically worst-case loss  Investigate its behaviour  Let  The prior is minimax if

38 RSC Relative entropy loss  Example 1.  Consider the class of improper priors  These all deliver constant risk, with  All the priors with c nonzero are therefore inadmissible  Jeffreys' prior (c = 0) is minimax

39 RSC Relative entropy loss  Example 2.  Consider the class of improper priors  These all deliver constant risk, with  L attains its minimum value when a = 1, which corresponds to  Jeffreys' independence prior  The minimum value -½ < 0 so that Jeffreys' prior is inadmissible

40 RSC Relative entropy loss  Example 3.  Consider again the class of improper priors  These all deliver constant risk, with  L attains its minimum value when a = 1, which again corresponds to Jeffreys' independence prior  The drop in predictive loss increases as the square of the number q of regressors in the model

41 RSC Relative entropy loss  The above predictive minimax priors also give rise to minimum predictive coverage probability bias (Datta, Mukerjee, Ghosh and Sweeting, 2000)  Final note: an inappropriately elicited subjective prior may lead to very high predictive risk!

42 RSC Wrap-up  We have reviewed some common approaches to prior construction, from full elicitation to using default recipes  Need to be aware of dangers, whatever the approach  As model complexity increases it becomes more difficult to make sensible prior assignments. At the same time, the effect of the prior specification can become more pronounced  Important to have a sound methodology for the construction of priors in the multiparameter case  Data-dependent priors may be justifiable (e.g. Box-Cox transformation model)

43 RSC Wrap-up  More extensive analysis of the predictive risk approach needed  Developing general methods of finding exact and approximate solutions for practical implementation  Investigating connections with predictive coverage probability bias  Analysing dependent and non-regular problems  Investigating problems involving mixed subjective/nonsubjective priors  Priors for model choice or model averaging... ... another talk!

44 RSC Wrap-up  And finally Have a great conference!

Download ppt "RSC 20031 Priors Trevor Sweeting Department of Statistical Science University College London."

Similar presentations

Ads by Google