Presentation is loading. Please wait.

Presentation is loading. Please wait.

PSY 626: Bayesian Statistics for Psychological Science

Similar presentations


Presentation on theme: "PSY 626: Bayesian Statistics for Psychological Science"— Presentation transcript:

1 PSY 626: Bayesian Statistics for Psychological Science
3/24/2018 Prediction Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2016 Purdue University PSY200 Cognitive Psychology

2 Hypothesis tests Follow the data, but do not follow it blindly
Hypothesis tests are commonly used as part of a method to establish scientific “truth” Is there an effect? What should I believe? An alternative approach is to give up on “truth” and instead focus on “prediction” The question is not “Is there an effect?” or “What should I believe?” Rather: “How should I behave?” Follow the data, but do not follow it blindly Build a quantitative model, but test the model

3 Model building Suppose you have two samples and you are interested in the means Further suppose that the population properties are: μ1=0, μ2=0.3 σ1=σ2=1 Typically, we would draw random samples from each group and run a t-test to determine if we should treat the means as being different Treatment Theory Future work Prediction

4 Model building We typically build the following kind of model
The score for subject k is related to the grand mean, to deviations from the grand mean due to being in group 1 or group 2, and to random noise This model gives mean values for each group

5 Model building Draw samples (n1=n2) from populations having
μ1=0, μ2=0.3 σ1=σ2=1 Construct different models that vary in the estimate of the mean values: Hypothesis testing model Null model Full model If do not reject H0 (p<.05) If do reject H0

6 Small samples 20 experiments n1=n2=10

7 Bigger samples 20 experiments n1=n2=50

8 Big samples 20 experiments n1=n2=100

9 Model fit/error A standard way of judging the quality of a model is by its fit to a data set One fit measure is root mean squared error We want a model with low RMSE

10 Checking model approaches
Draw samples (n1=n2) from populations having μ1=0, μ2=0.3 σ1=σ2=1 Repeat for 10,000 simulated experiments Compute RMSE for each model and average across experiments Vary sample size n1=n2

11 Comparing models μ2 - μ1=0.3

12 Comparing models For small samples, the null model provides the smallest average RMSE For large samples, the full model provides the smallest average RMSE

13 Comparing models There is always a better model (on average) than what is derived by hypothesis testing Hypothesis testing (on average) leads to over fitting for some small samples (when it rejects) Hypothesis testing (on average) leads to under fitting for some large samples (when it does not reject)

14 Bigger effects Similar for other effect sizes: μ2 - μ1=0.8

15 Null effects Similar for other effect sizes: μ2 - μ1=0

16 Known unknowns But these simulations are all theoretical
To compute RMSE we need to know the true means However, we can do something similar if we do not compute RMSE relative to the true means, but relative to test data

17 Prediction / validation
Suppose I build my models from one set of data, x1i, and x2i, and then test them with another set of data, y1i, and y2i Here, we compute RMSE relative to means from the test data set You could also compute RMSE relative to individual data points

18 Small effect When μ2 - μ1=0.3

19 Bigger effect When μ2 - μ1=0.8

20 Null effect When μ2 - μ1=0

21 Prediction / validation
Can better see differences by subtracting full model RMSE from other models’ RMSE μ2 - μ1=0.8 μ2 - μ1=0.3 μ2 - μ1=0 Smallest number (biggest negative) Indicates the best model (with the smallest RMSE).

22 Prediction / validation
This looks good At least on average, the RMSE patterns for testing means of new data are similar to those for RMSE for testing against the true means If we want to deduce which model best predicts values, we can pick the model that minimizes the test RMSE value Cost: we have to run the experiment twice Testing does not require equal sample sizes, but you trade off model development against model testing

23 Cross validation We partly avoid that cost by using cross-validation to approximate RMSETest Divide the data set x1i, and x2i into multiple subsets (a common choice is 10 subsets) Build your model using all but one of the subsets Compute RMSE for the left-out subset Repeat for all possible combinations 10 build and test “folds” Compute mean RMSE across the subsets

24 Cross validation When μ2 - μ1=0.3, 5-fold cross validation

25 Cross validation When μ2 - μ1=0.8, 5-fold cross validation

26 Cross validation When μ2 - μ1=0.0, 5-fold cross validation

27 Optional stopping Actual use: μ2 - μ1=0.3, 10-fold cross validation
Start with n1=n2=10, compute cross-validated RMSE Add 10 scores and repeat until n1=n2=200

28 Optional stopping Actual use: μ2 - μ1=0.8, 10-fold cross validation
Start with n1=n2=10, compute cross-validated RMSE Add 10 scores and repeat until n1=n2=200

29 Optional stopping Actual use: μ2 - μ1=0.0, 10-fold cross validation
Start with n1=n2=10, compute cross-validated RMSE Add 10 scores and repeat until n1=n2=200

30 Cross validation At each step, you should follow the data and use the best model for minimizing RMSE As the data changes, so does your model You can have an intermediate decision, but still expect it to change If you have to make a decision with the current data it makes sense to choose the best model Note that the best model is not necessarily a good model You have to judge whether the RMSE is small enough for whatever purpose you have in mind

31 Prediction / validation
Cross validation and test validation naturally generalize to more complicated models and experimental designs Interactions, nonlinear models Details of how to generate validation “folds” can get complicated It’s mostly a matter of being careful about generating representative folds and not inputting your own bias and No need to use RMSE Other “cost” functions work in a similar way

32 Conclusions Prediction / validation seems like a viable approach
It encourages data accumulation But it gives up on the idea of establishing “truth” from data Instead, it focuses on practical uses of data There are Bayesian methods that have the same goal They are better if you have useful prior knowledge


Download ppt "PSY 626: Bayesian Statistics for Psychological Science"

Similar presentations


Ads by Google