PSY 626: Bayesian Statistics for Psychological Science

PSY 626: Bayesian Statistics for Psychological Science
2/17/2019 Bayes Factors Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2018 Purdue University PSY200 Cognitive Psychology

Hypothesis testing Suppose the null is true and check to see if a rare event has occurred e.g., does our random sample produce a t value that is in the tails of the null sampling distribution? If a rare event occurred, reject the null hypothesis

Hypothesis testing But what is the alternative?
Typically: “anything goes” But that seems kind of unreasonable Maybe the “rare event” would be even less common if the null were not true!

Bayes Theorem Conditional probabilities

Ratio Ratio of posteriors conveniently cancels out P(D) Posterior odds
Bayes Factor Prior odds

Bayesian Model Selection
It’s not really about hypotheses, but hypotheses suggest models The Bayes Factor is often presented as BF12 You could also compute BF21 Posterior odds Bayes Factor Prior odds

2/17/2019 Bayes Factor Evidence for the alternative hypothesis (or the null) is computed with the Bayes Factor (BF) BF>1 indicates that the data is evidence for the alternative, compared to the null BF<1 indicates that the data is evidence for the null, compared to the alternative PSY200 Cognitive Psychology 7

2/17/2019 Bayes Factor When BF10 = 2, the data are twice as likely under H1 as under H0. When BF01 = 2, the data are twice as likely under H0 as under H1. These interpretations do not require you to believe that one model is better than the other You can still have priors that favor one model, regardless of the Bayes Factor You would want to make important decisions based on the posterior Still, if you consider both models to be plausible, then the priors should not be so different from each other PSY200 Cognitive Psychology 8

2/17/2019 Rules of thumb Evidence for the alternative hypothesis (or the null) is computed with the Bayes Factor (BF) BF10 Interpretation <0.01 Decisive evidence for null 0.01 to 0.1 Strong evidence for null 0.1 to 0.3 Substantial evidence for null 0.3 to 1 Anecdotal evidence for null 1 to 3 Anecdotal evidence for alternative 3 to 10 Substantial evidence for alternative 10 to 100 Strong evidence for alternative >100 Decisive evidence for alternative PSY200 Cognitive Psychology 9

Similar to AIC For a two-sample t-test, the null hypothesis (reduced model) is that a score from group s (1 or 2) is defined as With the same mean for each group s X12 X21 X22 X11

AIC For a two-sample t-test, the alternative hypothesis (full model) is that a score from group s (1 or 2) is defined as With different means for each group s X11 X21 X12 X22

AIC AIC and its variants are a way of comparing model structures
One mean or two means? Always uses maximum likelihood estimates of the parameters Bayesian approaches identify a posterior distribution of parameter values We should use that information!

Models of what? We have been building models of trial-level scores
# Model without intercept (more natural) model2 = brm(Leniency ~ 0+SmileType, data = SLdata, iter = 2000, warmup = 200, chains = 3) print(summary(model2)) GrandSE = 10 stanvars <-stanvar(GrandMean, name='GrandMean') + stanvar(GrandSE, name='GrandSE') prs <- c(prior(normal(GrandMean, GrandSE), class = "b") ) model6 = brm(CorrectResponses ~ 0 + Dosage + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = prs, stanvars=stanvars ) print(summary(model6))

Models of what? We have been building models of trial-level scores
That is not the only option In traditional hypothesis testing, we care more about effect sizes than about individual scores Signal-to-noise ratio Of course, the effect size is derived from the individual scores In many cases, it is enough to just model the effect size itself rather than the individual scores Cohen’s d t-statistic p-value Correlation r “Sufficient” statistic

Models of means It’s not really going to be practical, but let’s consider a case where we assume that the population variance is known (and equals 1) and we want to compare null and alternative hypotheses of fixed values

Models of means The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample)

Models of means The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Suppose we observe Data are more likely under null than under alternative

Models of means The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Suppose we observe Data are more likely under alternative than under null

Bayes Factor The ratio of likelihood for the data under the null compared to the alternative Or the other way around Suppose we observe Data are more likely under alternative than under null

Decision depends on alternative
The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Suppose we observe Data are more likely under null than under alternative

The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample)

For a fixed sample mean, evidence for the alternative only happens for alternative population mean values of a given range For big alternative values, the observed sample mean is less likely than for a null population value The sample mean may be unlikely for both models Rouder et al. (2009) Evidence for null Evidence for alternative Mean of alternative

Models of means Typically, we do not hypothesize a specific value for the alternative, but a range of plausible values

Likelihoods For the null, we compute likelihood in the same way
Suppose n=100 (one sample)

Likelihoods For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values Suppose n=100 (one sample)

Average Likelihood For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values Suppose n=100 (one sample) Prior for value of mu Likelihood for given value of mu (from sampling distribution)

Bayes Factor Ratio of the likelihood for the null compared to the (average) likelihood for the alternative P(D | H1)

Uncertainty The prior standard deviation for mu establishes a range of plausible values for mu Less flexible More flexible

Uncertainty With a very narrow prior, you may not fit the data
0.15 0.15 Less flexible More flexible

Uncertainty With a very broad prior, you will fit well for some values of mu and poorly for other values of mu 0.15 0.15 Less flexible More flexible

Uncertainty Uncertainty in the prior functions similar to the penalty for parameters in AIC 0.15 0.15 Less flexible More flexible

Penalty Averaging acts like a penalty for extra parameters
Rouder et al. (2009) Evidence for null Evidence for alternative Width of alternative prior

Models of effect size Consider the case of two-sample t-test
We often care about the standardized effect size Which we can estimate from data as:

Models of effect size If we were doing traditional hypothesis testing, we would compare a null model: Against an alternative: Equivalent statements can be made using the standardized effect size As long as the standard deviation is not zero

Priors on effect size For the null, the prior is (again) a spike at zero

JZS Priors on effect size
For the alternative, a good choice is a Cauchy distribution (t-distribution with df=1) Rouder et al. (2009) Jeffreys, Zellner, Siow

JZS Priors on effect size
It is a good choice because the integration for the alternative hypothesis can be done numerically t is the t-value you use in a hypothesis test (from the data) v is the “degrees of freedom” (from the data) This might not look easy, but it is simple to calculate with a computer

Variations of JZS Priors
Scale parameter “r” Bigger values make for a broader prior More flexibility! More penalty!

Variations of JZS Priors
Medium r= 1 Wide r= sqrt(2)/2 Ultrawide r=sqrt(2)

How do we use it? Super easy Rouder’s web site:
In R library(BayesFactor)

How do we use it?

How do we use it? library(BayesFactor)
ttest.tstat(t=2.2, n1=15, n2=15, simple=TRUE) B10

What does it mean? Guidelines BF Evidence 1 – 3 Anecdotal
3 – Substantial 10 – Strong 30 – Very strong > Decisive

Conclusions JZS Bayes Factors Easy to calculate
Pretty easy to understand results A bit arbitrary for setting up Why not other priors? How to pick scale factor? Criteria for interpretation are arbitrary Fairly painless introduction to Bayesian methods

PSY 626: Bayesian Statistics for Psychological Science

Similar presentations

Presentation on theme: "PSY 626: Bayesian Statistics for Psychological Science"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

PSY 626: Bayesian Statistics for Psychological Science

Similar presentations

Presentation on theme: "PSY 626: Bayesian Statistics for Psychological Science"— Presentation transcript:

Similar presentations

About project

Feedback