Bayesian data analysis

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

ADVANCED STATISTICS FOR MEDICAL STUDIES Mwarumba Mwavita, Ph.D. School of Educational Studies Research Evaluation Measurement and Statistics (REMS) Oklahoma.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Bayesian Estimation in MARK
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Chapter 10 Simple Regression.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Chapter 11 Multiple Regression.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Applied Bayesian Analysis for the Social Sciences Philip Pendergast Computing and Research Services Department of Sociology
Today Concepts underlying inferential statistics
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Simple Linear Regression Analysis
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Leedy and Ormrod Ch. 11 Gray Ch. 14
Active Learning Lecture Slides
Chapter 13: Inference in Regression
Correlation and Linear Regression
Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
QMS 6351 Statistics and Research Methods Regression Analysis: Testing for Significance Chapter 14 ( ) Chapter 15 (15.5) Prof. Vera Adamchik.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
G. Cowan Lectures on Statistical Data Analysis Lecture 1 page 1 Lectures on Statistical Data Analysis London Postgraduate Lectures on Particle Physics;
Inference for Regression Chapter 14. Linear Regression We can use least squares regression to estimate the linear relationship between two quantitative.
5.1 Chapter 5 Inference in the Simple Regression Model In this chapter we study how to construct confidence intervals and how to conduct hypothesis tests.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 13 Multiple Regression
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Week 41 Estimation – Posterior mean An alternative estimate to the posterior mode is the posterior mean. It is given by E(θ | s), whenever it exists. This.
Experimental Research Methods in Language Learning Chapter 10 Inferential Statistics.
1 CHAPTER 4 CHAPTER 4 WHAT IS A CONFIDENCE INTERVAL? WHAT IS A CONFIDENCE INTERVAL? confidence interval A confidence interval estimates a population parameter.
Chapter 10 The t Test for Two Independent Samples
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
Appendix I A Refresher on some Statistical Terms and Tests.
Chapter 9 Introduction to the t Statistic
INTRODUCTION TO MULTIPLE REGRESSION MULTIPLE REGRESSION MODEL 11.2 MULTIPLE COEFFICIENT OF DETERMINATION 11.3 MODEL ASSUMPTIONS 11.4 TEST OF SIGNIFICANCE.
Non-linear relationships
John Loucks St. Edward’s University . SLIDES . BY.
Generalized Linear Models
Chapter 12 Using Descriptive Analysis, Performing
Correlation and Regression
CHAPTER 29: Multiple Regression*
Chapter 12 Inference on the Least-squares Regression Line; ANOVA
Essential Statistics Introduction to Inference
PSY 626: Bayesian Statistics for Psychological Science
CHAPTER 12 More About Regression
Statistics II: An Overview of Statistics
Intro to Confidence Intervals Introduction to Inference
CS639: Data Management for Data Science
Bayesian Data Analysis in R
Mathematical Foundations of BME Reza Shadmehr
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Bayesian data analysis R101: A practical guide to making R your everyday statistical tool (PSY532) Lecture 5 Bayesian data analysis

Programme Bayes’ Rule Simple “real-world” Bayesian problems: a demonstration of the rule’s rationality Bayesian data analysis and its rationality Bayesian ANOVA example: Described in detail in readings from Kruschke textbook Demonstration: A Bayesian approach to our most recent analyses Logistic regression and multilevel modelling demonstrations will make use of a different data set

Bayes’ Rule Derivable from the laws of probability Reading: Navarro lecture 4; Kruschke Ch 4 Derivable from the laws of probability Premised on the idea that probability is a degree of belief inside a learner’s head The likelihood: Our degree of belief in seeing the data if h is true (given our assumptions about the generating mechanism behind all possible datasets – i.e., the sample space) The prior: Our degree of belief in h before seeing the data Cough and cancer example; alternatives: lung cancer and stomach flu Dash refers to the fact that we are evaluating this expression after having seen the data The probability that hypothesis h is true, given that we (the learners) have observed data d; i.e., our degree of belief in h after seeing the data The marginal likelihood: The likelihood of the data under hypothesis h and all other possible hypotheses in the hypothesis space H; can also be expressed as: or

Bayesian inference: rational in the real world Reading: Navarro lecture 4 The taxicab problem illustrates that, in the real world, the prior matters! The game show problem illustrates that, in the real world, the likelihood matters! (The following slides describing the problems and their solutions are from Daniel Navarro’s 2011 lectures on Computational Cognitive Science at The University of Adelaide. This is the link to the full course, and this is the link to the specific lecture used – Lecture 4.) Use lecture: constrained Y – counts, binary

The taxicab problem

The game show problem

Argued to also be more rational in data analysis Reading: Kruschke Ch 11 (not provided) In the Bayesian version of the analyses we’ve been doing, the linear model and its assumptions are expressed in the likelihood function, which takes the form of a normal distribution (or a t-distribution in some cases). For each predictor, there is a prior distribution (usually of gamma form), expressing the learner’s beliefs about the strength of the relationship between the predictor and the outcome variable. The marginal likelihood is usually not computed. All four components of Bayes’ Rule are probability distributions; that is: They have a certain shape, expressed in a function (i.e., formula). For probability density functions (continuous variables), the area under the shape (i.e., the integral of the function) equals 1. For probability mass functions (categorical variables) the sum of the categories’ probabilities is 1.

Bayesian data analysis logic continued The marginal likelihood is the conjoint (summed) probability of the data points given the model. The reason we usually need specialised techniques to estimate this value is that it takes the form of a complex integral. For example, for a model with three parameters: It follows from Bayes’ Rule that the posterior is a weighted average of the prior mean and the data, with weighting corresponding to the relative precisions of the prior and likelihood. Precisions are related to standard deviations and reflect the learner’s (i.e., the analyst’s) uncertainty in prior beliefs, or, alternatively, model predictions. The precisions are estimated from the data through the use of hyperpriors. That is, if the prior is fairly vague, and the data are numerous, the posterior will be near the parameter values that maximise the likelihood of the data. “natural shrinkage” occurs, similar to the shrinkage in multilevel modelling. Possible parameters: phi – prob of being human, theta-one – number of agreed-to survey statements by humans, theta-two – number of agreed-to survey statements by robots

Regression slope value Bayesian data analysis : Typical steps Determine priors, likelihood, and hyperpriors (or get R to do this for you) Estimate the posterior (and possibly the marginal likelihood) through a Markov Chain Monte Carlo process (Metropolis-Hastings, Gibbs’ sampling) Since there is a prior for each predictor, there is a posterior for each parameter. Determine whether the parameter value predicted by the null hypothesis (e.g., a slope value of zero) falls within the 95% Highest Density Interval (HDI) of the posterior for the parameter. It does in the case below! Possible parameters: phi – prob of being human, theta-one – number of agreed-to survey statements by humans, theta-two – number of agreed-to survey statements by robots Regression slope value

Bayesian data analysis : Advantages over the standard “Frequentist” approach The use of priors allows us to incorporate findings from previous studies into the data analysis. This is particularly useful if there is a large number of preceding studies while the current study’s data set is small. HDI more interpretable than frequentist “confidence intervals”: If a parameter value falls within the 95% HDI, it is among the most believable parameter values. Frequentist confidence intervals indicate that it is 95% likely that the parameter value could not be in the interval range if the null hypothesis were true. Subtle but philosophically important point discussed at length by Kruschke: The Bayesian analyst’s model assumptions are transparent and not dependent on sample size (degrees of freedom). Possible parameters: phi – prob of being human, theta-one – number of agreed-to survey statements by humans, theta-two – number of agreed-to survey statements by robots

Bayesian ANOVA example Reading: Kruschke Ch 18, 19 Likelihood function Normal Prior over slope parameters (Hyper)prior over prior precisions Folded-t Prior over error (likelihood function precision) Uniform

A Bayesian approach to our most recent analyses Reading: Gelman et al 2009 Demonstration – bayesglm function in arm package. For generalized linear modelling using the Bayesian approach. For mixture models (discussed in Lecture 4), you could explore the package MCMCglmm and the function of the same name within it. Likelihood function Normal Prior over slope parameters Cauchy, t (Hyper)prior over prior precisions - (standardization of predictors and fixing of SD, which is related to the precision, to 0.5 or 1) Prior over error (likelihood function precision) Uniform

Reading Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y. (2009). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 2, 1360-1383. Available online: http://www.stat.columbia.edu/~gelman/research/published/priors11.pdf Kruschke, J. K. (2011). Doing Bayesian Data Analysis. Elsevier: Oxford. Chapter 4 “Bayes’ Rule”, Chapter 18 “Bayesian Oneway ANOVA”, and Chapter 19 “Metric predicted variable with multiple nominal predictors” will be provided online