Revealing priors on category structures through iterated learning

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

EMNLP, June 2001Ted Pedersen - EM Panel1 A Gentle Introduction to the EM Algorithm Ted Pedersen Department of Computer Science University of Minnesota.
Causes and coincidences Tom Griffiths Cognitive and Linguistic Sciences Brown University.
The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman.
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences.
A Bayesian view of language evolution by iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana.
Nonparametric Bayes and human cognition Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Exploring subjective probability distributions using Bayesian statistics Tom Griffiths Department of Psychology Cognitive Science Program University of.
Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences.
Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky.
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Markov chain Monte Carlo with people Tom Griffiths Department of Psychology Cognitive Science Program UC Berkeley with Mike Kalish, Stephan Lewandowsky,
Exploring cultural transmission by iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana With thanks to: Anu Asnaani, Brian.
Part II: How to make a Bayesian model. Questions you can answer… What would an ideal learner or observer infer from these data? What are the effects of.
Modeling fMRI data generated by overlapping cognitive processes with unknown onsets using Hidden Process Models Rebecca A. Hutchinson (1) Tom M. Mitchell.
Analyzing iterated learning Tom Griffiths Brown University Mike Kalish University of Louisiana.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Normative models of human inductive inference Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley.
Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.
Maximum Entropy Model & Generalized Iterative Scaling Arindam Bose CS 621 – Artificial Intelligence 27 th August, 2007.
The Neymann-Pearson Lemma Suppose that the data x 1, …, x n has joint density function f(x 1, …, x n ;  ) where  is either  1 or  2. Let g(x 1, …,
Computer vision: models, learning and inference Chapter 6 Learning and Inference in Vision.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
5-1 Introduction 5-2 Inference on the Means of Two Populations, Variances Known Assumptions.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Optimal predictions in everyday cognition Tom Griffiths Josh Tenenbaum Brown University MIT Predicting the future Optimality and Bayesian inference Results.
Machine Learning CSE 681 CH2 - Supervised Learning.
Naive Bayes Classifier
T 7.0 Chapter 7: Questioning for Inquiry Chapter 7: Questioning for Inquiry Central concepts:  Questioning stimulates and guides inquiry  Teachers use.
Universit at Dortmund, LS VIII
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
1 LING 696B: Midterm review: parametric and non-parametric inductive inference.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Reasoning Under Uncertainty. 2 Objectives Learn the meaning of uncertainty and explore some theories designed to deal with it Find out what types of errors.
Ensemble Methods in Machine Learning
Data Mining and Decision Support
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Research Word has a broad spectrum of meanings –“Research this topic on ….” –“Years of research has produced a new ….”
CS Ensembles and Bayes1 Ensembles, Model Combination and Bayesian Combination.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Structured Probabilistic Models: A New Direction in Cognitive Science
Machine Learning: Ensemble Methods
Lecture 1.31 Criteria for optimal reception of radio signals.
Computer vision: models, learning and inference
Naive Bayes Classifier
CS 9633 Machine Learning Concept Learning
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Markov chain Monte Carlo with people
CH 5: Multivariate Methods
Statistical Models for Automatic Speech Recognition
CSC 594 Topics in AI – Natural Language Processing
Let’s do a Bayesian analysis
“Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child's?” Alan Turing, 1950.
Data Mining Lecture 11.
Analyzing cultural evolution by iterated learning
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Inference Concerning a Proportion
Hidden Markov Models Part 2: Algorithms
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Discriminative Frequent Pattern Analysis for Effective Classification
Chapter 2: Evaluative Feedback
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Science A process through which nature is studied, discovered, and understood. All areas of science involve posing INQUIRIES (questions) about nature.
Volume 23, Issue 21, Pages (November 2013)
Chapter 2: Evaluative Feedback
Presentation transcript:

Revealing priors on category structures through iterated learning Tom Griffiths Brian Christian Mike Kalish University of California, Berkeley Brown University University of Louisiana, Lafayette Inductive biases Iterated learning Experiment 1: Two examples Experiment 2: Three examples Category structures (Shepard, Hovland, & Jenkins, 1961) color size Many of the questions studied in cognitive science involve inductive problems, where people evaluate underdetermined hypotheses using sparse data. Examples: Learning languages from utterances Learning functions from (x,y) pairs Learning categories from instances of their members Solving inductive problems requires inductive biases: a priori preferences that make it possible to choose among hypotheses. These biases limit the hypotheses that people entertain, and determine how much evidence is needed to accept a particular hypothesis. Compositional vs. holistic languages Linear vs. non-linear functions Categories defined by one-dimensional vs. multidimensional rules Understanding how people solve inductive problems requires understanding their inductive biases. A framework for stating rational solutions to inductive problems, in which inductive biases are made explicit. Inductive biases are encoded in the prior distribution. How can we discover the priors of human learners? In this work, we develop a novel method for revealing the priors of human learners, and test this method using stimuli for which people’s inductive biases are well understood - category structures. A total of 117 participants performed an iterated concept learning task where they saw two examples from a category, and had to guess the remainder. A total of 73 participants performed an iterated concept learning task where they saw three examples from a category, and had to guess the remainder. shape Three binary features and four objects per category results in 70 possible category structures. Collapsing over negations and feature values reduces this to six types of structure. Much of human knowledge is not learned from the world directly, but from other people (e.g. language). Kirby (2001) calls this process iterated learning, with each learner generating the data from which the next learner forms a hypothesis. When the learners are Bayesian agents, choosing hypotheses by sampling from their posterior distribution, the probability that a learner chooses a particular hypothesis converges to the prior probability of that hypothesis as iterated learning proceeds. (Griffiths & Kalish, 2005) By reproducing iterated learning in the laboratory, can we discover the nature of human inductive biases? blicket toma dax wug blicket wug S  X Y X  {blicket,dax} Y  {toma, wug} Type I Type II Type III Type IV Type V Results (n = 64) Estimating the prior Type VI People Bayes Prior People Bayes Design and Analysis Type I 0.69 Type II 0.14 Iterated concept learning Six iterated learning chains were run, each started with a category structure of one of the six types, with subsequent structures being determined by the responses of the participants. As a control, six “independent” chains were run at the same time, with a structure of the appropriate type being generated randomly at each generation. With a total of 10 iterations per chain, trials were divided into 10 blocks of 12, with the order of the chains randomized within blocks. For each experiment, the prior probability assigned to each of the six types of structures was estimated at the same time as classifying participants into two groups: those that responded in a way that was consistent with the prior, and those that selected randomly among the possible structures (consistent with a uniform prior). This was done using the Expectation-Maximization (EM) algorithm. The responses of the participants classified as non-random were then analyzed further. Two experiments examined convergence to the prior and how well the dynamics of iterated learning were predicted by the Bayesian model. Type III 0.05 Each learner sees examples from a species of amoebae, and identifies the other members of that species (with a total of four amoebae per species). Iterated learning is run within-subjects, since the predictions are the same as for between-subjects. The hypothesis chosen on one trial is used to generate the data seen on the next trial, with the new amoebae being selected randomly from the chosen species. data hypotheses Type IV 0.01 Type V 0.08 Convergence to the prior was slower, as predicted by the Bayesian model. The iterated learning chains started with different structures now exhibited distinctive dynamics mirrored in the human data. Type VI 0.04 The prior was estimated from the choices of hypotheses in both the iterated learning and independent trials. Convergence to the prior occurred rapidly, as emphasized by the results for the iterated learning chains started with different structures. Results (n = 69) Type I Type II Type III Type IV Type V Type VI Bayesian inference Type I Type II Type III Type IV Type V Type VI 6 iterated learning chains People Bayes People Bayes Posterior probability Likelihood Prior Sum over space of hypotheses h: hypothesis d: data Type I Type II Type III Type IV Type V Type VI Bayesian model (Tenenbaum, 1999; Tenenbaum & Griffiths, 2001) 6 independent learning “chains” Conclusions Iterated learning may provide a valuable experimental method for investigating human inductive biases. With stimuli for which inductive biases are well understood - simple category structures - iterated learning converges to a distribution consistent with those biases. The dynamics of iterated learning correspond closely with the predictions of a Bayesian model. Future work will explore what this method can reveal about inductive biases for other kinds of hypotheses, such as languages and functions. d: m amoebae h: |h| amoebae Type I Type II Type III Type IV Type V Type VI People Posterior is renormalized prior What is the prior? Bayes