A Hierarchical Bayesian Look at Some Debates in Category Learning

Slides:



Advertisements
Similar presentations
Bayesian mixture models for analysing gene expression data Natalia Bochkina In collaboration with Alex Lewin, Sylvia Richardson, BAIR Consortium Imperial.
Advertisements

Lahore University of Management Sciences, Lahore, Pakistan Dr. M.M. Awais- Computer Science Department 1 Lecture 12 Dealing With Uncertainty Probabilistic.
Statistical Modeling and Data Analysis Given a data set, first question a statistician ask is, “What is the statistical model to this data?” We then characterize.
The General Linear Model Or, What the Hell’s Going on During Estimation?
Probabilistic inference in human semantic memory Mark Steyvers, Tomas L. Griffiths, and Simon Dennis 소프트컴퓨팅연구실오근현 TRENDS in Cognitive Sciences vol. 10,
Introduction  Bayesian methods are becoming very important in the cognitive sciences  Bayesian statistics is a framework for doing inference, in a principled.
Nonparametric Bayes and human cognition Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
Individual Differences in Attention During Category Learning Michael D. Lee UC Irvine Ruud Wetzels University of Amsterdam.
Exemplar-based accounts of “multiple system” phenomena in perceptual categorization R. M. Nosofsky and M. K. Johansen Presented by Chris Fagan.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Computer vision: models, learning and inference
Uncertainty, Neuromodulation and Attention Angela Yu, and Peter Dayan.
Normative models of human inductive inference Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley.
Probability, Bayes’ Theorem and the Monty Hall Problem
SW388R7 Data Analysis & Computers II Slide 1 Assumption of Homoscedasticity Homoscedasticity (aka homogeneity or uniformity of variance) Transformations.
Bayesian Hierarchical Models of Individual Differences in Skill Acquisition Dr Jeromy Anglim Deakin University 22 nd May 2015.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Inferring structure from data Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
Uncertainty Management in Rule-based Expert Systems
Understanding User Goals in Web Search University of Seoul Computer Science Database Lab. Min Mi-young.
Chapter 7. Learning through Imitation and Exploration: Towards Humanoid Robots that Learn from Humans in Creating Brain-like Intelligence. Course: Robots.
Quality estimation of adaptive tutoring systems Volgograd State Technical University CAD department Pavel Vorobkalov Faculty advisor Shabalina O.A. Volgograd.
Summary of Bayesian Estimation in the Rasch Model H. Swaminathan and J. Gifford Journal of Educational Statistics (1982)
Chi-Square Analyses.
Outline of Today’s Discussion 1.The Chi-Square Test of Independence 2.The Chi-Square Test of Goodness of Fit.
The Effect of Database Size Distribution on Resource Selection Algorithms Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Using Asymmetric Distributions to Improve Text Classifier Probability Estimates Paul N. Bennett Computer Science Dept. Carnegie Mellon University SIGIR.
Does the brain compute confidence estimates about decisions?
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Bayesian Estimation and Confidence Intervals Lecture XXII.
Structured Probabilistic Models: A New Direction in Cognitive Science
PSY 626: Bayesian Statistics for Psychological Science
Vision Sciences Society Annual Meeting 2012 Daniel Mann, Charles Chubb
Chapter 7. Classification and Prediction
Bayesian Estimation and Confidence Intervals
Confidence Intervals.
Bayesian data analysis
A Bayesian account of context-induced orientation illusions
Statistical Data Analysis
CHAPTER 11 Inference for Distributions of Categorical Data
8th Grade Mathematics Curriculum
PSY 626: Bayesian Statistics for Psychological Science
Elementary Statistics: Picturing The World
William Norris Professor and Head, Department of Computer Science
Computational models for imaging analyses
Filtering and State Estimation: Basic Concepts
Revealing priors on category structures through iterated learning
Using Ensembles of Cognitive Models to Answer Substantive Questions
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Statistical Data Analysis
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CS639: Data Management for Data Science
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 16: Inference in Practice
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Section 11.1: Significance Tests: Basics
Reuben Feinman Research advised by Brenden Lake
CHAPTER 11 Inference for Distributions of Categorical Data
Volume 23, Issue 21, Pages (November 2013)
Reward associations do not explain transitive inference performance in monkeys by Greg Jensen, Yelda Alkan, Vincent P. Ferrera, and Herbert S. Terrace.
Presentation transcript:

A Hierarchical Bayesian Look at Some Debates in Category Learning Michael Lee (UCI) Wolf Vanpaemel (KU Leuven)

Bayesian Statistics There are (at least) two ways Bayesian statistics can be applied to understand a cognitive modeling problem like the nature of categories Use Bayes as a statistician would, to analyze models and data Use Bayes as a theoretician would, as a working hypothesis about how the mind solves inference problems This talk takes the first approach How can hierarchical Bayesian methods help us relate models to data in better ways, and further our understanding of category representations?

Debates and Models We are going to focus on two specific debates in the cognitive modeling literature Exemplar vs prototype representations: What is the role and extent of abstraction in forming category representations? Similarity vs rules: What is the role and extent of similarity in constraining category representations? We are going to focus on two existing models The Generalized Context Model (GCM: Nosofsky 1986) as an account of how a representation produces category learning behavior The Varying Abstraction Model (VAM: Vanpaemel et al 2005) as an account of the types of possible category representations

Hierarchical Bayesian Contributions Our basic goals are to show how hierarchical Bayesian analysis Encourages theorizing at different levels of psychological abstraction Can covert (hard and limited) model selection problems into (easier and more informative) parameter estimation problems Yields useful additional information, especially when analyzing data from multiple experiments or tasks simultaneously Gives one approach to developing theoretically-satisfying priors for different models

VAM Representations The VAM model assumes categories are represented by merging all, some, or none of the original stimuli

Generalized Context Model Calculate the distances between the point representing the presented stimulus, and the points representing the two categories Use a generalization function to calculate similarities from these distances Determine the probability of responding with each category decision according to the similarities

Combining the VAM and GCM The first uses of the VAM combined it with the GCM, and make inferences about which representation people were using from their categorization behavior Implicitly assumes a uniform prior over all the representations The model selection problem of choosing between representations involves significant computation No formal notion of the relationships between different representations No account of where the representations came from

Interpreting and Relating VAM Representations Blue and yellow are more similar than blue and red, and green is more sensible than gray

Hierarchical Bayesian Extension Our hierarchical Bayesian extension adds an account, called the Merge Process, of how the VAM category representation are generated Merge Process is driven by two parameters Theta controls the degree of abstraction Gamma controls the emphasis on similarity

Merge Process Start with the exemplar representation Do another merge with probability theta, otherwise finish Calculate the similarity between the current representing points Calculate the probability that each pair of representing points will be the ones to be merged Return to step 2

Indexing Representations High theta values encourage merging, and result in more abstracted (prototype-like) representations High gamma values emphasize similarity in merging, so nearby stimuli are joined

Priors for VAM Representations The hierarchical model automatically makes a prior prediction (or “inductive bias”) over each of the VAM representations The inductive bias shown comes from the priors

Hierarchical Bayesian Solutions Previous shortcomings are all addressed in some way by the hierarchical extension “Implicitly assumes a uniform prior over all the representations” Now have sensible prior coming from the merge process and the priors on its parameters “The model selection problem of choosing between representations involves significant computation” Now a problem of parameter estimation for theta and gamma at a higher level of abstraction “No formal notion of the relationships between different representations” Similar values of (theta, gamma) index similar representations “No account of where the representations came from” The Merge Process provides one

Thirty Previously Studied Data Sets

Exemplar vs Prototype Representation Data sets show a range of inferred representations, spanning the exemplar (5, 12, 16, 23, 24) to prototype (4) spectrum, and theta captures this spectrum

Uncertainty About Exemplar vs Prototype For a few data sets, more than one VAM representation had significant posterior mass The model is uncertain about the degree of abstraction, and this is reflected in the posterior for theta

Time Course of Representation There are two groups of three related data sets, measuring the beginning, middle and end of categorization behavior on the same task There is a shift from more abstract to less abstract category representations in both cases Captured by the specific representations in each case, but also (commensurately across experiments) by the change in theta

Role of Similarity Some of the data sets relate to category learning tasks where subsets of subjects were asked/encouraged to use rules to form categories These rules did not group similar stimuli, and so the gamma parameter detects the lack of emphasis on stimuli in abstraction

Other Issues Data sets 3 and 4 relate to two different subjects doing the same task, and suggest individual differences This would be expressed naturally by including an additional level in our hierarchical model Data set 13 suggests an “prototype plus outlier” representation, which the Merge Process indexes, but only by looking at the joint posterior for (theta, gamma), because you need high values of both Data sets 1 and 2 suggest an alternative or extension to the Merge Process that allow for the deletion of stimulus points in representing categories

Conclusions Demonstrated one way of doing a hierarchical Bayesian analysis an existing model of category representation (the VAM) and of category learning (the GCM) Can covert (hard and limited) model selection problems into (easier and more informative) parameter estimation problems Gives one approach to developing theoretically-satisfying priors for different models Yields useful additional information, especially when analyzing data from multiple experiments or tasks simultaneously Encourages theorizing at different levels of psychological abstraction

Thanks! Questions?