PSY 626: Bayesian Statistics for Psychological Science

Slides:



Advertisements
Similar presentations
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Advertisements

Testing Theories: Three Reasons Why Data Might not Match the Theory.
Topic 6: Introduction to Hypothesis Testing
Statistics for the Social Sciences Psychology 340 Fall 2006 Hypothesis testing.
Chapter 10 - Part 1 Factorial Experiments.
Statistics for the Social Sciences Psychology 340 Spring 2005 Hypothesis testing.
Statistics for the Social Sciences Psychology 340 Fall 2006 Hypothesis testing.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Hypothesis Testing:.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Theory testing Part of what differentiates science from non-science is the process of theory testing. When a theory has been articulated carefully, it.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 7 Sampling Distributions.
The Scientific Method in Psychology.  Descriptive Studies: naturalistic observations; case studies. Individuals observed in their environment.  Correlational.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Hypothesis Testing An understanding of the method of hypothesis testing is essential for understanding how both the natural and social sciences advance.
Confidence Interval Estimation For statistical inference in decision making:
1 Probability and Statistics Confidence Intervals.
Inferential Statistics Psych 231: Research Methods in Psychology.
15 Inferential Statistics.
PSY 626: Bayesian Statistics for Psychological Science
PSY 626: Bayesian Statistics for Psychological Science
Step 1: Specify a null hypothesis
Dependent-Samples t-Test
Unit 5 – Chapters 10 and 12 What happens if we don’t know the values of population parameters like and ? Can we estimate their values somehow?
Information criterion
Psych 231: Research Methods in Psychology
Effect sizes, power, and violations of hypothesis testing
Unit 5: Hypothesis Testing
Statistics for the Social Sciences
Let’s continue to do a Bayesian analysis
Chapter 25 Comparing Counts.
4 Sampling.
Sampling Distributions
Reasoning in Psychology Using Statistics
Hypothesis Tests for a Population Mean,
Hypothesis Tests for a Population Mean,
Quantitative Methods PSY302 Quiz 6 Confidence Intervals
PSY 626: Bayesian Statistics for Psychological Science
Statistics for the Social Sciences
Statistics for the Social Sciences
Discrete Event Simulation - 4
Chapter Nine Part 1 (Sections 9.1 & 9.2) Hypothesis Testing
Hypothesis tests for the difference between two proportions
Psych 231: Research Methods in Psychology
Reasoning in Psychology Using Statistics
One-Way Analysis of Variance
PSY 626: Bayesian Statistics for Psychological Science
Psych 231: Research Methods in Psychology
Sampling.
Sampling Distributions
Reasoning in Psychology Using Statistics
Psych 231: Research Methods in Psychology
Effect sizes, power, and violations of hypothesis testing
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Sampling and Power Slides by Jishnu Das.
Chapter 26 Comparing Counts.
What are their purposes? What kinds?
Inferential Statistics
Intro to Confidence Intervals Introduction to Inference
Psych 231: Research Methods in Psychology
Reasoning in Psychology Using Statistics
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Chapter 26 Comparing Counts.
Objectives 6.1 Estimating with confidence Statistical confidence
Objectives 6.1 Estimating with confidence Statistical confidence
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

PSY 626: Bayesian Statistics for Psychological Science 11/16/2018 Prediction Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2018 Purdue University PSY200 Cognitive Psychology

Hypothesis tests Follow the data, but do not follow it blindly Hypothesis tests are commonly used as part of a method to establish scientific “truth” Is there an effect? What should I believe? An alternative approach is to give up on “truth” and instead focus on “prediction” The question is not “Is there an effect?” or “What should I believe?” Rather: “How should I behave?” Follow the data, but do not follow it blindly Build a quantitative model, but test the model

Model building Suppose you have two samples and you are interested in the means Further suppose that the population properties are: μ1=0, μ2=0.3 σ1=σ2=1 Typically, we would draw random samples from each group and run a t-test to determine if we should treat the means as being different Treatment Theory Future work Prediction

Model building We typically build the following kind of model The score for subject k is related to the grand mean, to deviations from the grand mean due to being in group 1 or group 2, and to random noise This model gives mean values for each group

Model building Draw samples (n1=n2) from populations having μ1=0, μ2=0.3 σ1=σ2=1 Construct different models that vary in the estimate of the mean values: Hypothesis testing model Null model Full model If do not reject H0 (p<.05) If do reject H0

Small samples 20 experiments n1=n2=10

Bigger samples 20 experiments n1=n2=50

Big samples 20 experiments n1=n2=100

Model fit/error A standard way of judging the quality of a model is by its fit to a data set One fit measure is root mean squared error We want a model with low RMSE

Checking model approaches Draw samples (n1=n2) from populations having μ1=0, μ2=0.3 σ1=σ2=1 Repeat for 10,000 simulated experiments Compute RMSE for each model and average across experiments Vary sample size n1=n2

Comparing models μ2 - μ1=0.3

Comparing models For small samples, the null model provides the smallest average RMSE For large samples, the full model provides the smallest average RMSE

Comparing models There is always a better model (on average) than what is derived by hypothesis testing Hypothesis testing (on average) leads to over fitting for some small samples (when it rejects) Hypothesis testing (on average) leads to under fitting for some large samples (when it does not reject)

Bigger effects Similar for other effect sizes: μ2 - μ1=0.8

Null effects Similar for other effect sizes: μ2 - μ1=0

Known unknowns But these simulations are all theoretical To compute RMSE we need to know the true means However, we can do something similar if we do not compute RMSE relative to the true means, but relative to test data

Prediction / validation Suppose I build my models from one set of data, x1i, and x2i, and then test them with another set of data, y1i, and y2i Here, we compute RMSE relative to means from the test data set You could also compute RMSE relative to individual data points

Small effect When μ2 - μ1=0.3

Bigger effect When μ2 - μ1=0.8

Null effect When μ2 - μ1=0

Prediction / validation Can better see differences by subtracting full model RMSE from other models’ RMSE μ2 - μ1=0.8 μ2 - μ1=0.3 μ2 - μ1=0 Smallest number (biggest negative) Indicates the best model (with the smallest RMSE).

Prediction / validation This looks good At least on average, the RMSE patterns for testing means of new data are similar to those for RMSE for testing against the true means If we want to deduce which model best predicts values, we can pick the model that minimizes the test RMSE value Cost: we have to run the experiment twice Testing does not require equal sample sizes, but you trade off model development against model testing

Cross validation We partly avoid that cost by using cross-validation to approximate RMSETest Divide the data set x1i, and x2i into multiple subsets (a common choice is 10 subsets) Build your model using all but one of the subsets Compute RMSE for the left-out subset Repeat for all possible combinations 10 build and test “folds” Compute mean RMSE across the subsets

Cross validation When μ2 - μ1=0.3, 5-fold cross validation

Cross validation When μ2 - μ1=0.8, 5-fold cross validation

Cross validation When μ2 - μ1=0.0, 5-fold cross validation

Optional stopping Actual use: μ2 - μ1=0.3, 10-fold cross validation Start with n1=n2=10, compute cross-validated RMSE Add 10 scores and repeat until n1=n2=200

Optional stopping Actual use: μ2 - μ1=0.8, 10-fold cross validation Start with n1=n2=10, compute cross-validated RMSE Add 10 scores and repeat until n1=n2=200

Optional stopping Actual use: μ2 - μ1=0.0, 10-fold cross validation Start with n1=n2=10, compute cross-validated RMSE Add 10 scores and repeat until n1=n2=200

Cross validation At each step, you should follow the data and use the best model for minimizing RMSE As the data changes, so does your model You can have an intermediate decision, but still expect it to change If you have to make a decision with the current data it makes sense to choose the best model Note that the best model is not necessarily a good model You have to judge whether the RMSE is small enough for whatever purpose you have in mind

Prediction / validation Cross validation and test validation naturally generalize to more complicated models and experimental designs Interactions, nonlinear models Details of how to generate validation “folds” can get complicated It’s mostly a matter of being careful about generating representative folds and not inputting your own bias and No need to use RMSE Other “cost” functions work in a similar way

Conclusions Prediction / validation seems like a viable approach It encourages data accumulation But it gives up on the idea of establishing “truth” from data Instead, it focuses on practical uses of data There are Bayesian methods that have the same goal They are better if you have useful prior knowledge