# Topic Discussion: Bayesian Analysis of Informative Hypotheses Quantitative Methods Forum 20 January 2014.

## Presentation on theme: "Topic Discussion: Bayesian Analysis of Informative Hypotheses Quantitative Methods Forum 20 January 2014."— Presentation transcript:

Topic Discussion: Bayesian Analysis of Informative Hypotheses Quantitative Methods Forum 20 January 2014

2 Outline Articles: 1.A gentle introduction to Bayesian analysis: Applications to developmental research 2.Moving beyond traditional null hypothesis testing: Evaluating expectations directly 3.A prior predictive loss function for the evaluation of inequality constrained hypotheses Note: The author, Rens van de Schoot,was awarded the APA Division 5 dissertation award in 2013.

3 A gentle introduction to Bayesian analysis: Applications to developmental research.

4 Probability Frequentist Paradigm –R. A. Fisher, Jerzy Neyman, Egon Pearson –Long-run frequency Subjective Probability Paradigm –Bayes’ theorem –Probability as the subjective experience of uncertainty

5

6 Ingredients of Bayesian Statistics Prior distribution –Encompasses background knowledge on parameters tested –Parameters of prior distribution called hyperparameters Likelihood function –Information in the data Posterior Inference –Combination of prior and likelihood via Bayes’ theorem –Reflects updated knowledge, balancing prior and observed information

7 Defining Prior Knowledge Lack of information –Still important to quantify ignorance –Noninformative: Uniform Distribution Akin to a Frequentist analysis Considerable information –Meta-analyses, previous studies Sensitivity analyses may be conducted to quantify effect of different prior specifications Priors reflect knowledge about model parameters before observing data.

8 Effect of Priors Horse and donkey analogy Benefits of priors: –Incorporate findings from previous studies –Smaller Bayesian credible intervals (cf. confidence intervals) Credible intervals are also known as posterior probability intervals (PPI) PPI gives the probability that a certain parameter lies within the interval. The more prior information available, the smaller the credible intervals. When priors are misspecified, posterior results are affected.

9

10 Empirical Example Theory of dynamic interactionism: Individuals believed to develop through a dynamic and reciprocal transaction between personality and environment 3 studies: Neyer & Asendorph (2001) Sturaro et al. (2010) Asendorpf & van Aken (2003) Note: N&A involved young adults, S and A&vA involved adolescents.

11 Analytic Strategy Prior Specification –Used frequentist estimates from one study to another Assessment of convergence –Gelman-Rubin criterion and other ‘tuning’ variables Cutoff value Minimum number of iterations Start values Examination of trace plots Model fit assessed with posterior predictive checking

12 Results 1

13 Results 2

14 Observations Point estimates do not differ between Frequentist and Bayesian approaches. Credible intervals are smaller than confidence intervals. Using prior knowledge in the analyses led to more certainty about outcomes of the analyses; i.e., more confidence (precision) in conclusions.

15 Theoretical Advantages of Bayesian Approach Interpretation –More intuitive because focus on predictive accuracy –Bayesian framework eliminates contradictions in traditional NHST Offers more direct expression of uncertainty Updating knowledge –Incorporate prior information into estimates instead of conducting NHST repeatedly. NHST = Null Hypothesis Significance Testing

16 Practical Advantages of Bayesian Approach Smaller sample sizes required for Bayesian estimation compared to Frequentist approaches. In context of small sample size, Bayesian methods would produce a slowly increasing confidence regarding coefficients compared to Frequentist approaches. Bayesian methods can handle non-normal parameters better than Frequentist approaches. Protection against overinterpreting unlikely results. Elimination of inadmissible parameters.

17 Limitations Influence of prior specification. Prior distribution specification. –Assumption that every parameter has a distribution. Computational time DIAGNOSTICS?

19 Moving beyond traditional null hypothesis testing: Evaluating expectations directly.

20 What is “wrong” with the traditional H 0 ? Example: Shape of the earth H 0 : The shape of the earth is a flat disk H 1 : The shape of the earth is not a flat disk Evidence gathered against H 0. Conclusion: The earth is not a sphere  modification of testable hypotheses.

21 H A : The shape of the earth is a flat disk H B : The shape of the earth is a sphere H A is no longer the complement of H B. Instead, H A and H B are competing models regarding the shape of the earth. Testing of such competing hypotheses will result in a more informative conclusion.

22 What does this example teach us? The evaluation of informative hypotheses presupposes that prior information is available. Prior knowledge is available in the form of specific expectations of the ordering of statistical parameters. Example: Mean comparisons H I1: μ 3 < μ 1 < μ 5 < μ 2 < μ 4 H I2: μ 3 < {μ 1, μ 5, μ 2 } < μ 4 where “,” denotes no ordering versus traditional setup H 0 : μ 1 = μ 2 = μ 3 = μ 4 = μ 5 H u : μ 1, μ 2, μ 3, μ 4, μ 5

23 Evaluating Informative Hypotheses Hypothesis Testing Approaches –F-bar test for ANOVA (Silvapulle, et al., 2002; Silvapulle & Sen, 2004) –Constraints on variance terms in SEM (Stoel, et al., 2006; Gonzalez & Griffin, 2001) Model Selection Approaches –Evaluate competing models for model fit and model complexity. Akaike Information Criterion (AIC; Akaike, 1973) Bayes Information Criterion (BIC; Schwarz, 1978) Deviance Information Criterion (DIC; Spiegelhalter, et al., 2002) –These cannot deal with inequality constraints Paired Comparison Information Criterion (PCIC; Dayton, 1998 & 2003) Order restricted Information Criterion (ORIC; Anraku, 1999; Kuiper, et al., in press) Bayes Factor

25 A prior predictive loss function for the evaluation of inequality constrained hypotheses.

26 Inequality Constrained Hypotheses Example 1 General linear model with two group means: y i = μ 1 d i1 + μ 2 d i2 + ε i ε i ~ N(0,σ 2 ) d ig takes on 0 or 1 to indicate group H 0 : μ 1, μ 2 (unconstrained hypothesis) H 1 : μ 1 < μ 2 (inequality constraint imposed)

27 Deviance Information Criteria y: data θ: unknown parameter p(.) likelihood C: constant Taking the expectation: A measure of how well model fits data. Effective number of parameters: : expectation of θ

28 More on DIC model fit + penalty for model complexity Smaller is better. Only valid when posterior distribution approximates multivariate normal. Assumes that specified parametric family of pdfs that generate future observations encompasses true model. (Can be violated.) Data y used to construct posterior distribution AND evaluate estimated model  DIC selects overfitting models. Solution: Bayesian predictive information criterion.

29 Bayesian Predictive Information Criterion Developed by Ando (2007) to avoid overfitting problems associated with DIC. … looks like the posterior DIC presented in van de Schoot et al. (2012).

30 Posterior (Predictive?) DIC Are these the same? Note:

31 Performance of postDIC H 0 : μ 1, μ 2 H 1 : μ 1 < μ 2 H 2 : μ 1 = μ 2 Data generated to be consistent (cases 1 to 4) or reversed in direction (cases 5 to 7) with H 2. postDIC does not distinguish H 0 from H 1. Recall: Smaller is better.

32 Prior DIC Specification of the prior distribution has more importance for prDIC than postDIC. What’s the intuitive difference between a posterior predictive vs. prior predictive approach?

33 Performance of prDIC H0: μ1, μ2 H1: μ1 < μ2 H2: μ1 = μ2 Data generated to be consistent (cases 1 to 4) or reversed in direction (cases 5 to 7) with H2. prDIC distinguishes H 0 from H 1 when data are in agreement with H 1. But chooses a bad fitting model for cases 5 to 7. Recall: Smaller is better.

34 Prior (Predictive?) Information Criterion Omits from prDIC. New loss function accounts for agreement between and y. It quantifies how well replicated data x fits a certain hypothesis, and how well the hypothesis fits the data y.

35 Performance of PIC H0: μ1, μ2 H1: μ1 < μ2 H2: μ1 = μ2 Data generated to be consistent (cases 1 to 4) or reversed in direction (cases 5 to 7) with H2. PIC selects the hypotheses that is most consistent with the data – outperforming postDIC and prDIC. (?) Recall: Smaller is better.

36 Paper Conclusions The (posterior?) DIC performs poorly when evaluating inequality constrained hypotheses. The prior DIC can be useful for model selection when the population from which the data are generated is in agreement with the constrained hypotheses. The PIC, which is related to the marginal likelihood, is better to select the best set of inequality constrained hypotheses.