Nonparametric Bayes and human cognition Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.

Slides:

Advertisements

Similar presentations

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei

Advertisements

Teg Grenager NLP Group Lunch February 24, 2005

Xiaolong Wang and Daniel Khashabi

MAD-Bayes: MAP-based Asymptotic Derivations from Bayes

Hierarchical Dirichlet Processes

DEPARTMENT OF ENGINEERING SCIENCE Information, Control, and Vision Engineering Bayesian Nonparametrics via Probabilistic Programming Frank Wood

Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

HW 4. Nonparametric Bayesian Models Parametric Model Fixed number of parameters that is independent of the data we’re fitting.

Yazd University, Electrical and Computer Engineering Department Course Title: Machine Learning By: Mohammad Ali Zare Chahooki Bayesian Decision Theory.

Learning Scalable Discriminative Dictionaries with Sample Relatedness a.k.a. “Infinite Attributes” Jiashi Feng, Stefanie Jegelka, Shuicheng Yan, Trevor.

Bayesian Nonparametric Matrix Factorization for Recorded Music Reading Group Presenter: Shujie Hou Cognitive Radio Institute Friday, October 15, 2010 Authors:

Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.

Part IV: Monte Carlo and nonparametric Bayes. Outline Monte Carlo methods Nonparametric Bayesian models.

Concepts and Categories. Functions of Concepts By dividing the world into classes of things to decrease the amount of information we need to learn, perceive,

Latent Dirichlet Allocation a generative model for text

Exploring subjective probability distributions using Bayesian statistics Tom Griffiths Department of Psychology Cognitive Science Program University of.

Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.

MSRC Summer School - 30/06/2009 Cambridge – UK Hybrids of generative and discriminative methods for machine learning.

Physical Symbol System Hypothesis

Amos Storkey, School of Informatics. Density Traversal Clustering and Generative Kernels a generative framework for spectral clustering Amos Storkey, Tom.

Exemplar-based accounts of “multiple system” phenomena in perceptual categorization R. M. Nosofsky and M. K. Johansen Presented by Chris Fagan.

Markov chain Monte Carlo with people Tom Griffiths Department of Psychology Cognitive Science Program UC Berkeley with Mike Kalish, Stephan Lewandowsky,

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Normative models of human inductive inference Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley.

Latent Variable Models Christopher M. Bishop. 1. Density Modeling A standard approach: parametric models  a number of adaptive parameters  Gaussian.

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

Optimal predictions in everyday cognition Tom Griffiths Josh Tenenbaum Brown University MIT Predicting the future Optimality and Bayesian inference Results.

Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05 Sharing.

Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1.

(Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy Bart UCI/Caltech) Pietro Perona (Caltech)

Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.

Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.

Inferring structure from data Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.

Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.

Eric Xing © Eric CMU, Machine Learning Latent Aspect Models Eric Xing Lecture 14, August 15, 2010 Reading: see class homepage.

Infinite block models for belief networks, social networks, and cultural knowledge Josh Tenenbaum, MIT 2007 MURI Review Meeting Work of Charles Kemp, Chris.

Summary We propose a framework for jointly modeling networks and text associated with them, such as networks or user review websites. The proposed.

Randomized Algorithms for Bayesian Hierarchical Clustering

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

Stick-Breaking Constructions

Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

The Infinite Hierarchical Factor Regression Model Piyush Rai and Hal Daume III NIPS 2008 Presented by Bo Chen March 26, 2009.

Latent Dirichlet Allocation

Categorization and density estimation Tom Griffiths UC Berkeley.

Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Nonparametric Bayesian Models. HW 4 x x Parametric Model Fixed number of parameters that is independent of the data we’re fitting.

Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,

Hierarchical Beta Process and the Indian Buffet Process by R. Thibaux and M. I. Jordan Discussion led by Qi An.

The Nested Dirichlet Process Duke University Machine Learning Group Presented by Kai Ni Nov. 10, 2006 Paper by Abel Rodriguez, David B. Dunson, and Alan.

Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism

Nonparametric Bayesian Learning of Switching Dynamical Processes

Bayesian data analysis

Markov chain Monte Carlo with people

A Non-Parametric Bayesian Method for Inferring Hidden Causes

Revealing priors on category structures through iterated learning

A Hierarchical Bayesian Look at Some Debates in Category Learning

Michal Rosen-Zvi University of California, Irvine

Latent Dirichlet Allocation

Topic Models in Text Processing

Rational models of categorization

Mixture Models with Adaptive Spatial Priors

Presentation transcript:

Nonparametric Bayes and human cognition Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley

data hypothesis Statistics about the mind

Analyzing psychological data Dirichlet process mixture models for capturing individual differences (Navarro, Griffiths, Steyvers, & Lee, 2006) Infinite latent feature models… –…for features influencing similarity (Navarro & Griffiths, 2007; 2008) –…for features influencing decisions ()

Statistics in the mind data hypothesis data hypothesis Statistics about the mind

Flexible mental representations Dirichlet

Categorization How do people represent categories? cat dog ?

cat Prototypes (Posner & Keele, 1968; Reed, 1972) Prototype cat

Exemplars (Medin & Schaffer, 1978; Nosofsky, 1986) Store every instance (exemplar) in memory cat

Something in between (Love et al., 2004; Vanpaemel et al., 2005) cat

A computational problem Categorization is a classic inductive problem –data: stimulus x –hypotheses: category c We can apply Bayes’ rule: and choose c such that P(c|x) is maximized

Density estimation We need to estimate some probability distributions –what is P(c)? –what is p(x|c)? Two approaches: –parametric –nonparametric These approaches correspond to prototype and exemplar models respectively (Ashby & Alfonso-Reese, 1995)

Parametric density estimation Assume that p(x|c) has a simple form, characterized by parameters  (indicating the prototype) Probability density x

Nonparametric density estimation Approximate a probability distribution as a sum of many “kernels” (one per data point) estimated function individual kernels true function n = 10 Probability density x

Something in between mixture distribution mixture components Probability x Use a “mixture” distribution, with more than one component per data point (Rosseel, 2002)

Anderson’s rational model (Anderson, 1990, 1991) Treat category labels like any other feature Define a joint distribution p(x,c) on features using a mixture model, breaking objects into clusters Allow the number of clusters to vary… a Dirichlet process mixture model (Neal, 1998; Sanborn et al., 2006)

A unifying rational model Density estimation is a unifying framework –a way of viewing models of categorization We can go beyond this to define a unifying model –one model, of which all others are special cases Learners can adopt different representations by adaptively selecting between these cases Basic tool: two interacting levels of clusters –results from the hierarchical Dirichlet process (Teh, Jordan, Beal, & Blei, 2004)

The hierarchical Dirichlet process

A unifying rational model cluster exemplar category exemplar prototype Anderson

HDP +,  and Smith & Minda (1998) HDP +,  will automatically infer a representation using exemplars, prototypes, or something in between (with  being learned from the data) Test on Smith & Minda (1998, Experiment 2) Category A:Category B: exceptions

HDP +,  and Smith & Minda (1998) Probability of A prototype exemplarHDP Log-likelihood exemplar prototype HDP

The promise of HDP +,+ In HDP +,+, clusters are shared between categories –a property of hierarchical Bayesian models Learning one category has a direct effect on the prior on probability densities for the next category

Learning the features of objects Most models of human cognition assume objects are represented in terms of abstract features What are the features of this object? What determines what features we identify? (Austerweil & Griffiths, submitted)

Binary matrix factorization 

 How should we infer the number of features?

The nonparametric approach  Use the Indian buffet process as a prior on Z Assume that the total number of features is unbounded, but only a finite number will be expressed in any finite dataset (Griffiths & Ghahramani, 2006)

(Austerweil & Griffiths, submitted)

An experiment… (Austerweil & Griffiths, submitted) Training Testing Correlated Factorial Seen Unseen Shuffled

(Austerweil & Griffiths, submitted) Results

Conclusions Approaching cognitive problems as computational problems allows cognitive science and machine learning to be mutually informative Machine

Credits Categorization Adam Sanborn Kevin Canini Dan Navarro Learning features Joe Austerweil MCMC with people Adam Sanborn Computational Cognitive Science Lab