Presentation is loading. Please wait.

Presentation is loading. Please wait.

Let’s do a Bayesian analysis

Similar presentations


Presentation on theme: "Let’s do a Bayesian analysis"— Presentation transcript:

1 Let’s do a Bayesian analysis
2/17/2019 Let’s do a Bayesian analysis Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2018 Purdue University PSY200 Cognitive Psychology

2 Visual Search A classic experiment in perception/attention involves visual search Respond as quickly as possible whether an image contains a target (a green circle) or not Log in with ID=GregTest-145, password=

3 Visual Search A classic experiment in perception/attention involves visual search Respond as quickly as possible whether an image contains a target (a green circle) or not Vary number of distractors: 4, 16, 32, 64 Vary type of distractors: feature (different color), conjunctive (different color or shape)

4 Visual Search A classic experiment in perception/attention involves visual search Respond as quickly as possible whether an image contains a target (a green circle) or not Vary number of distractors: 4, 16, 32, 64 Vary type of distractors: feature (different color), conjunctive (different color or shape)

5 Visual Search A classic experiment in perception/attention involves visual search Respond as quickly as possible whether an image contains a target (a green circle) or not Vary number of distractors: 4, 16, 32, 64 Vary type of distractors: feature (different color), conjunctive (different color or shape) Measure reaction time: time between onset of image and participant’s response 5 trials for each of 16 conditions: 4 number of distractors x 2 target (present or absent) x 2 distractor types (feature or conjunctive) = 80 trials

6 Visual Search Typical results: For conjunctive distractors, response time increases with the number of distractors

7 Linear model Suppose you want to model the search time on the Conjunctive search trials when the target is Absent as a linear equation Let’s do it for a single participant We are basically going through Section 4.4 of the text, but using a new data set and using brms instead of rethinking Download files from the class web site and follow along in class

8 Read in data rm(list=ls(all=TRUE)) # clear all variables
Clear old variables out from R’s memory graphics.off() Remove old graphics from the display VSdata<-read.csv(file="VisualSearch.csv",header=TRUE,stringsAsFactors=FALSE) Reads in the contents of the file VisualSearch.csv Uses the contents of the first row as a header, and creates variables by the names of those headers (no spaces in the header names!) VSdata2<-subset(VSdata, VSdata$Participant=="Francis200S16-2" & VSdata$Target=="Absent" & VSdata$DistractorType=="Conjunction") Creates a new variable that contains just the data for a particular Participant Only includes data from the Target absent trials Only includes data for the Conjunction distractors Check the contents of VSdata2 in the R console head(VSdata2)

9 Define the model library(brms)
This loads up the R library we want to use It has some nice functions for plotting and displaying tables model1= brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2 ) Defines the linear model Assigns default prior distributions to each parameter (b1, b2, sigma) “Improper” uniform priors

10 brms Here, defining the model basically does all the work
This function takes the priors and the data, and computes posterior distributions The main output is the set of b1, b2, and sigma that have the highest probabilities Often this is actually probability density rather than probability

11 Posterior Distribution
Remember, these are distributions of the population parameters!

12 Posterior Distribution
Remember, these are distributions of the population parameters!

13 Posterior Distribution
Remember, these are distributions of the population parameters!

14 brms The main output of the brm( ) calculation is the set of b1, b2, and sigma that have the highest probabilities Often this is actually probability density rather than probability

15 Posterior Distributions
It is more complicated because we have three parameters with a joint posterior probability density function Finding the peak of a multidimensional surface is not a trivial problem brm calls an algorithm called Stan that does the computations Involves lots of random sampling to estimate distributions Things can go wrong sometimes, and the model may not “converge” Sometimes takes a long time

16 Estimates print(summary(model1))
Prints out a table of estimates of b1, b2, and sigma Family: gaussian Links: mu = identity; sigma = identity Formula: RT_ms ~ NumberDistractors Data: VSdata2 (Number of observations: 20) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept NumberDistractors Family Specific Parameters: sigma Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). There were 16 warnings (use warnings() to see them)

17 Estimates (2nd run) print(summary(model1))
Prints out a table of estimates of b1, b2, and sigma Family: gaussian Links: mu = identity; sigma = identity Formula: RT_ms ~ NumberDistractors Data: VSdata2 (Number of observations: 20) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept NumberDistractors Family Specific Parameters: sigma Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). There were 16 warnings (use warnings() to see them)

18 Posteriors plot(model1)
Fuzzy graphs on right indicate convergence of the sampling (these look good)

19 Working with posteriors
post<-posterior_samples(model1) Pulls out the sampled values for each parameter > head(post) b_Intercept b_NumberDistractors sigma lp__

20 Working with posteriors
Plot posterior of b_NumberDistractors density(post$b_NumberDistractors) This is the same plot we got from plot(model1)

21 Posterior You can ask all kinds of questions about predictions and so forth by just using probability For example, what is the posterior probability that b_NumDistractors is less than 42? > length(post$b_NumberDistractors[post$b_NumberDistractors < 42])/length(post$b_NumberDistractors) [1] > mean(post$b_NumberDistractors) [1]

22 Posterior You can ask all kinds of questions about predictions and so forth by just using probability For example, what is the posterior distribution of RT_ms for 35 distractors? RT_at_35 <- post$b_Intercept + post$b_NumberDistractors * 35 dev.new() plot(density(mu_at35))

23 Posterior You can ask all kinds of questions about predictions and so forth by just using probability What is the probability that RT_ms for 35 distractors is greater than 2400? probGT2400 <-length(RT_at_35[RT_at_35 > 2400])/length(RT_at_35) cat("Probability RT_ms for 35 distractors is greater than 2400 = ", probGT2400, "\n") Probability RT_ms for 35 distractors is greater than 2400 =

24 Questions What is the probability that predicted RT_ms is greater than 3000 for 42 distractors? What is the probability that b_Intercept is more than 900? What is the probability that b_NumberDistractors is less than 35?

25 Predictions Automatically predict RT_ms for different values of NumberDistractors plot(marginal_effects(model1), points=TRUE) Gray shading indicates 95% of posterior distribution Called “credible intervals” This is not the same thing as a 95% confidence interval!

26 Compare to tradition Is this any different than traditional linear regression? The “traditional” best fit line is b_Intercept=832.94 b_NumberDistractors=41.38 Compare to > mean(post$b_Intercept) [1] > mean(post$b_NumberDistractors) [1]

27 Standard linear regression
What parameter values b1, b2, minimize the prediction error? This is necessarily just a pair of numbers b1, b2

28 Bayesian estimates The mean of the posterior distributions is just one possible characterization of population values But the posterior distribution contains a lot more information Uncertainty about the estimates Uncertainty about the prediction Compare to confidence intervals

29 Changing priors Before running the model, we may have useful information about the model parameters For example, maybe we know that the slope parameter is usually around 30 We can set a prior model2= brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(30, 1), class = "b")) ) print(summary(model2)) > summary(model2) Family: gaussian Links: mu = identity; sigma = identity Formula: RT_ms ~ NumberDistractors Data: VSdata2 (Number of observations: 20) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept NumberDistractors Family Specific Parameters: sigma Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).

30 Changing priors Before running the model, we may have useful information about the model parameters For example, maybe we know that the slope parameter is usually around 30 We can set a prior model2= brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(30, 1), class = "b")) ) plot(marginal_effects(model2), points = TRUE)

31 Changing priors Before running the model, we may have useful information about the model parameters For example, maybe we know that the slope parameter is usually around 30 We can set a prior model2= brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(30, 1), class = "b")) ) plot(model2)

32 Changing priors Before running the model, we may have useful information about the model parameters For example, maybe we know that the slope parameter is usually around 55 We can set a prior model3= brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(50, 1), class = "b")) ) print(summary(model3)) > print(summary(model3)) Family: gaussian Links: mu = identity; sigma = identity Formula: RT_ms ~ NumberDistractors Data: VSdata2 (Number of observations: 20) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept NumberDistractors Family Specific Parameters: sigma Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).

33 Changing priors Before running the model, we may have useful information about the model parameters For example, maybe we know that the slope parameter is usually around 55 We can set a prior model3= brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(55, 1), class = "b")) ) plot(marginal_effects(model3), points = TRUE)

34 Changing priors Before running the model, we may have useful information about themodel parameters For example, maybe we know that the slope parameter is usually around 55 We can set a prior model2= brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(55, 1), class = "b")) ) plot(model3)

35 Changing priors Before running the model, we may have useful information about the model parameters For example, maybe we know that the slope parameter is usually around 55, but we believe it could be rather larger or smaller We can set a prior model4= brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(50, 10), class = "b")) ) print(summary(model4)) > print(summary(model4)) Family: gaussian Links: mu = identity; sigma = identity Formula: RT_ms ~ NumberDistractors Data: VSdata2 (Number of observations: 20) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept NumberDistractors Family Specific Parameters: sigma Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).

36 Changing priors Before running the model, we may have useful information about the model parameters For example, maybe we know that the slope parameter is usually around 55, but we believe it could be rather larger or smaller We can set a prior model4= brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(50, 10), class = "b")) ) plot(marginal_effects(model4), points = TRUE)

37 Changing priors Before running the model, we may have useful information about the model parameters For example, maybe we know that the slope parameter is usually around 55, but we believe it could be rather larger or smaller We can set a prior model4= brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(50, 10), class = "b")) ) plot(model4)

38 Conclusions That wasn’t so bad
An uninformed prior is nearly the same as standard linear regression But you get a distribution of population values rather than just point estimates You can compute and consider probabilities of different ranges of population values You can compute and consider probabilities of different outcomes If you have an informed prior, you get quite different results from standard linear regression How different depends on differences between the priors and the data Better information leads to better inference!


Download ppt "Let’s do a Bayesian analysis"

Similar presentations


Ads by Google