Introduction  Bayesian methods are becoming very important in the cognitive sciences  Bayesian statistics is a framework for doing inference, in a principled.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
AP Statistics: Section 10.1 A Confidence interval Basics.
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Bayesian Inference for Signal Detection Models of Recognition Memory Michael Lee Department of Cognitive Sciences University California Irvine
Flipping A Biased Coin Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations,
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Machine Learning CMPT 726 Simon Fraser University
Applied Bayesian Analysis for the Social Sciences Philip Pendergast Computing and Research Services Department of Sociology
Chapter 10: Estimating with Confidence
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Lecture 9: p-value functions and intro to Bayesian thinking Matthew Fox Advanced Epidemiology.
Bayesian Inference Using JASP
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
AP Statistics Chapter 9 Notes.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Naive Bayes Classifier
Instructor Resource Chapter 5 Copyright © Scott B. Patten, Permission granted for classroom use with Epidemiology for Canadian Students: Principles,
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Methodological Problems in Cognitive Psychology David Danks Institute for Human & Machine Cognition January 10, 2003.
Uncertainty Management in Rule-based Expert Systems
Not in FPP Bayesian Statistics. The Frequentist paradigm Defines probability as a long-run frequency independent, identical trials Looks at parameters.
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Introduction to Science.  Science: a system of knowledge based on facts or principles  Science is observing, studying, and experimenting to find the.
Simple examples of the Bayesian approach For proportions and means.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
The Language of Science.  Hypothesis: a prediction that can be tested; an educated guess base on observations and prior knowledge  Theory: a well tested.
One Sample Mean Inference (Chapter 5)
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Univariate Gaussian Case (Cont.)
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
Chapter 8: Estimating with Confidence
J. Daunizeau ICM, Paris, France TNU, Zurich, Switzerland
Chapter 8: Estimating with Confidence
Naive Bayes Classifier
ICS 280 Learning in Graphical Models
Ch3: Model Building through Regression
Bayes for Beginners Stephanie Azzopardi & Hrvoje Stojic
Special Topics In Scientific Computing
Chapter 8: Estimating with Confidence
LECTURE 09: BAYESIAN LEARNING
LECTURE 07: BAYESIAN ESTIMATION
Chapter 8: Estimating with Confidence
Bayes for Beginners Luca Chech and Jolanda Malamud
Chapter 8: Estimating with Confidence
Bayesian inference J. Daunizeau
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
CS639: Data Management for Data Science
Bayesian Data Analysis in R
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Presentation transcript:

Introduction  Bayesian methods are becoming very important in the cognitive sciences  Bayesian statistics is a framework for doing inference, in a principled way, based on probability theory  Three types of application –Bayes in the head: Use Bayes as a theoretical metaphor, assuming when people make inferences they apply (at some level) Bayesian methods (Tenenbaum, Griffiths, Yuille, Chater, Kemp, …) –Bayes for data analysis: Instead of using frequentist estimation, null hypothesis testing, and so on, use Bayesian inference to analyze data (Kruschke) –Bayes for modeling: Use Bayesian inference to relate models of psychological processes to behavioral data

Psychological Models in Bayesian Framework  Psychological models can be thought of as generative statistical processes, mapping latent parameters to observed data Parameter Space Data Space Data generating function

Psychological Models in Bayesian Framework  The data generating function (primarily) and the prior distribution on parameters (under-used) formalize the model Parameter Space Data Space prior Data generating function

Psychological Models in Bayesian Framework  This model, the prior plus data generating function (aka likelihood function), predict the nature of observed data Parameter Space Data Space prior predictive Data generating function

Psychological Models in Bayesian Framework  Once data are observed, probability theory (via Bayes theorem) allows the prior over parameters to be updated to a posterior Parameter Space Data Space prior predictive posterior Data generating function

Psychological Models in Bayesian Framework  The posterior distribution over parameters quantifies uncertainty about what is know and unknown, and makes predictions Parameter Space Data Space prior predictive posterior posterior predictive Data generating function

Psychological Models in Bayesian Framework  Bayesian inference is a complete framework for representing and incorporating information, in the context of psychological modeling Parameter Space Data Space prior predictive posterior posterior predictive Data generating function

4.3. Repeated Measurement of IQ An example of the role of information, in the prior, or in the data, or both, in influencing estimation

9 Repeated IQ Measures  Three people each have their IQ assessed 3 times by repeated versions of the same test  The goals are –To infer each person’s IQ –To infer the accuracy or reliability of the testing instrument

10 Four Scenarios  We do the inference four times –Their scores are either – Imprecise test: (90,95,100), (105,110,115), and (150,155,160) – Precise test: (94,95,96), (109,110,111) and (154,155,156) –The prior placed on each person’s IQ is either – Vague prior: A flat prior from 0 to 300 – Informed prior: A Gaussian prior with a mean of 100 and standard deviation of 15

11 Results Summary  The expectations of the posterior IQ distributions in each case are approximately DataVague PriorInformed Prior (90,95,100) (105,110,115) (150,155,160) (94,95,96)95 (109,110,111)110 (154,155,156)

12 Imprecise Test  The informed prior changes the estimate of the extreme case

13 Precise Test  The data provide information that overwhelms the priors

14 Main Messages  Bayesian methods are naturally able to incorporate relevant prior information –This must improve inference, because additional information is able to be used  The IQ example shows how inferences from an imprecise test can be influenced by prior knowledge about IQ distributions  There is a familiar catch-cry that “with enough data, the influence of the prior will disappear” –This is often true, but not the best way to think about things – Irrelevant data will not update knowledge of a psychological parameter – The same number of data, if they provide more information, will lessen the influence of the prior  Bayesian statistics is about analyzing the available information

6.1. Exams and Quizzes An example of using latent mixture models to explain data as coming from more than one type of cognitive process

16 Exam Scores  16 people take a 40-item true-or-false test, and score 17, 18, 21, 21, 22, 28, 31, 31, 34, 34, 35, 35, 35, 36, 36, 39  Model as a latent mixture of guessing and knowledge groups

17 Latent Assignment Results  The people who –Scored are all classified as “guessers” with certainty –Scored 31+ are all classified as “knowing” with certainty  There is uncertainty about the classification of the person who scored 28

18 Extensions and Main Messages  The following exercises in the chapter extend this basic latent mixture model to make it more psychologically interesting and plausible –Allow for individual differences in the “knowledge” group –Allow for the base-rate of guessers vs knowers to be inferred (which in turn influences inference)  Latent mixtures are a basic but probably under-used tool for cognitive science –Account for data as hierarchical mixtures of quantitatively and qualitatively different processes

Motivating Example Memory Retention (From Chapter 8) An example of some ways to model individual differences in cognitive processes, and the use of joint posterior and posterior predictive distributions

20 Cognitive Model and Data  This example uses a “cognitive process model” for memory retention  The probability of remembering an item is an exponential decay function of time –With a parameter controlling rate of forgetting, and another controlling the baseline

21 Fabricated Data  Our case study uses fabricated data in the table –showing the number of items recalled of out 18 –recalled by 3 subjects –at 9 time intervals  Data have a missing subject and interval to look at prediction

22 No Individual Differences  Assumes every subject has the same parameterization …

23 Joint and Marginal Posterior Distributions  Note that the joint posterior distribution has more information than the marginal distributions

24 Posterior Predictive Distributions and Data  Posterior predictives are easy to generate in WinBUGS –These show clear evidence of individual differences

25 Complete Individual Differences Model  Each subject has their own independent parameters for the exponential decay model

26 Joint and Marginal Posterior Distributions  The individual differences show in the posterior, but the final (data-less) subject is still modeled by the prior

27 Posterior Predictive  Now the first three subjects are well described, but prediction for the fourth is terrible

28 Structured Individual Differences  Now we use hierarchical individual differences, so that the individual subjects parameters are drawn from distributions, and we also infer the parameters of those “group” distributions

29 Joint and Marginal Posterior Distributions  First three subjects give almost the same parameter inference, but now the fourth subject borrows from what is learned about them (“sharing statistical strength”)

30 Posterior Predictive  Posterior predictive now seems reasonable for the fourth subject, showing some regularities, but also some uncertainty, consistent with the individual differences