Graphical Models for Psychological Categorization David Danks Carnegie Mellon University; and Institute for Human & Machine Cognition.

Slides:



Advertisements
Similar presentations
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Advertisements

Markov Networks Alan Ritter.
A Tutorial on Learning with Bayesian Networks
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Dynamic Bayesian Networks (DBNs)
Introduction of Probabilistic Reasoning and Bayesian Networks
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
From: Probabilistic Methods for Bioinformatics - With an Introduction to Bayesian Networks By: Rich Neapolitan.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Bayesian Network Representation Continued
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
1 Validation and Verification of Simulation Models.
1 gR2002 Peter Spirtes Carnegie Mellon University.
Today Logistic Regression Decision Trees Redux Graphical Models
Bayesian Networks Alan Ritter.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Bayes Net Perspectives on Causation and Causal Inference
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Bayes’ Nets  A Bayes’ net is an efficient encoding of a probabilistic model of a domain  Questions we can ask:  Inference: given a fixed BN, what is.
Methods of Observation PS 204A, Week 2. What is Science? Science is: (think Ruse) Based on natural laws/empirical regularities. Based on natural laws/empirical.
Summary of the Bayes Net Formalism David Danks Institute for Human & Machine Cognition.
A Brief Introduction to Graphical Models
Sampling Theory Determining the distribution of Sample statistics.
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
第十讲 概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models
1 Robot Environment Interaction Environment perception provides information about the environment’s state, and it tends to increase the robot’s knowledge.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Introduction to Bayesian Networks
Methodological Problems in Cognitive Psychology David Danks Institute for Human & Machine Cognition January 10, 2003.
Classification Techniques: Bayesian Classification
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Course files
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
CS Statistical Machine learning Lecture 24
Slides for “Data Mining” by I. H. Witten and E. Frank.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks.
1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:
Lecture 2: Statistical learning primer for biologists
1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Reasoning Under Uncertainty: Independence and Inference CPSC 322 – Uncertainty 5 Textbook §6.3.1 (and for HMMs) March 25, 2011.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
Sampling Theory Determining the distribution of Sample statistics.
CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.
Copyright © 2009 Pearson Education, Inc. 9.2 Hypothesis Tests for Population Means LEARNING GOAL Understand and interpret one- and two-tailed hypothesis.
Slide 1 Directed Graphical Probabilistic Models: inference William W. Cohen Machine Learning Feb 2008.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
1 BN Semantics 3 – Now it’s personal! Parameter Learning 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 Readings:
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
CS 188: Artificial Intelligence Fall 2007
Markov Random Fields Presented by: Vladan Radosavljevic.
CS 188: Artificial Intelligence Spring 2007
Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.
BN Semantics 3 – Now it’s personal! Parameter Learning 1
CS 188: Artificial Intelligence Fall 2008
Presentation transcript:

Graphical Models for Psychological Categorization David Danks Carnegie Mellon University; and Institute for Human & Machine Cognition

A Puzzle Concepts & causation are intertwined  Concepts and categorization depend (in part) on causal beliefs and inferences  Causal learning and reasoning depend (in part) on the particular concepts we have But the most prevalent theories in the two fields use quite different formalisms Q: Can categorization and causal inference be represented in a common “language”?

Central Theoretical Claim Many psychological theories of categorization are equivalent to (special cases of) Bayesian categorization of probabilistic graphical models (and so the answer to the previous question is “Yes” – they can share the language of graphical models)

Overview Bayesian Categorization of Probabilistic Graphical Models (PGMs) Psychological Theories of Categorization Theoretical & Experimental Implications

Bayesian Categorization Set of exclusive, exhaustive models M  For each model m, a prior probability and a P(X | m) distribution (perhaps a generative model) Given X, update the model probabilities: (and use the updated probabilities for choices)

Probabilistic Graphical Models PGMs were developed to provide compact representations of probability distributions All PGMs are defined for a set of variables V, and composed of:  Graph over (nodes corresponding to) V  Probability distribution/density over V

Probabilistic Graphical Models Markov assumption: Graph entails certain (conditional and unconditional) independence constraints on the probability distribution  Markov assumptions imply a decomposition of the probability distribution into a product of simpler terms (i.e., fewer parameters) Different PGM-types have different graph- types and/or Markov assumptions

Probabilistic Graphical Models Also assume Faithfulness/Stability:  The only probabilistic independencies are those implied by Markov If we do not assume this, then every probability distribution can be represented by every PGM-type Faithfulness is assumed explicitly or implicitly by all PGM learning algorithms Def’n: A graph is a perfect map iff it is Markov & Faithful to the probability distribution

Probabilistic Graphical Models For a particular PGM-type, the set of probability distributions with a perfect map in that PGM-type form a natural group  This set will almost always be non-exhaustive  Shorthand: “Probability distribution for PGM” will mean “Probability distribution for which there is a perfect map in the PGM-type”

Bayesian Networks Directed acyclic graph Markov: V is independent of its non- descendants conditional on its parents Example: P(F 1, F 2, F 3, F 4 ) = P(F 1 )  P(F 2 )  P(F 4 | F 1, F 2 )  P(F 3 | F 4 ) F1F1 F2F2 F3F3 F4F4

Markov Random Fields Undirected graph (i.e., no arrowheads) Markov: V is independent of its non- neighbors conditional on its neighbors Example: P(F 1, F 2, F 3, F 4 ) = P(F 1, F 4 )  P(F 2, F 4 )  P(F 3, F 4 ) F1F1 F2F2 F3F3 F4F4

Bayesian Categorization of PGMs Use the standard updating equation: And require the P(X | a) distributions to be distributions for that PGM-type  I.e., the PGM-type supplies the generative model

Simple Example Suppose we have two equiprobable models F1F1 F2F2 F1F1 F2F2 P(F 1 = 1) = 0.1 P(F 2 = 1) = 0.2 P(F 1 = 1) = 0.8 P(F 2 = 1 | F 1 = 1) = 0.8 P(F 2 = 1 | F 1 = 0) = 0.6

Simple Example Suppose we have two equiprobable models Observe 11, and conclude Right  P(Left | 11) = 0.03<< P(Right | 11) = 0.97 F1F1 F2F2 F1F1 F2F2 P(F 1 = 1) = 0.1 P(F 2 = 1) = 0.2 P(F 1 = 1) = 0.8 P(F 2 = 1 | F 1 = 1) = 0.8 P(F 2 = 1 | F 1 = 0) = 0.6

Simple Example Suppose we have two equiprobable models Observe 11, and conclude Right  P(Left | 11) = 0.03<< P(Right | 11) = 0.97 Observe 00, and conclude Left  P(Left | 00) = 0.90>> P(Right | 00) = 0.10 and so on… F1F1 F2F2 F1F1 F2F2 P(F 1 = 1) = 0.1 P(F 2 = 1) = 0.2 P(F 1 = 1) = 0.8 P(F 2 = 1 | F 1 = 1) = 0.8 P(F 2 = 1 | F 1 = 0) = 0.6

Overview Bayesian Categorization of Probabilistic Graphical Models (PGMs) Psychological Theories of Categorization Theoretical & Experimental Implications

Psychological Theories All assume a fixed set of input features  Usually binary-, sometimes continuous-valued For the purposes of this talk, I will focus on static theories of categorization  That is, focus on the categories that are learned, as opposed to the learning process itself  The learning processes can also be captured/ explained in this framework

Shared Theoretical Structure For many psychological theories, categorization of a novel instance involves:  For each category under consideration, determine the similarity (according to a specific metric) between the category and the novel instance  Then use the category similarities to generate a response probability for each category Alternately, use a deterministic choice rule but assume noise in the perceptual system (e.g., Ashby)

Shared Theoretical Structure In this high-level picture,  We get different categorization theories by having (i) different classes of similarity metrics, and/or (ii) different response rules  Within a particular theory, different particular categories result from different actual similarity metrics (i.e., different parameter values)

Unconsidered Theories Not every categorization theory has this particular high-level structure  In particular, arbitrary neural network models don’t For practical reasons, I will focus on models with analytically defined similarity metrics  Excludes models such as RULEX & SUSTAIN that can only be investigated with simulations Finally, I won’t explore obvious connections with Anderson’s rational analysis model

Returning to the High-Level Picture… Step 2: “Use the category similarities to generate a response probability” Most common second stage rule is the Weighted Luce-Shepard rule:

Luce-Shepard & Bayesian Updating L-S is equivalent to Bayesian updating if, for each a, Sim(a, X) is a probability distribution  Sim(a, X) represents P(X | m)  (normalized) b a weights represent base rates Note: Unweighted L-S  equal base rates for the categories

Similarities as Probabilities When do similarities represent probabilities? The answer turns out to be “Always”  Similarity metrics are defined for arbitrary combinations of category features  So from the point-of-view of response probabilities, we can renormalize any similarity metric to produce a probability distribution (see also Myung, 1994; Ashby & Alfonso-Reese, 1995; and Rosseel, 2002)

Categorization as Bayesian Updating All psychological theories of categorization with this high-level structure are special cases of Bayesian updating  “Special cases” because they restrict the possible similarities (and so probability distributions)  Note: I focused on weighted L-S, but similar conclusions can be drawn for other response probability rules Common thread: treat similarities as probabilities (perhaps because of noise in the perceptual system)

Psychological Categorization & PGMs Claim: For each psychological theory,  [Class of similarity metrics] is equivalent to  [Probability distributions for (sub-classes of) a PGM-type] Three examples:  Causal Model Theory  Exemplar-based models (specifically, GCM)  Prototype-based models (first- and second-order)

Causal Model Theory Causal Model Theory:  Categories are defined by causal structures, represented as arbitrary causal Bayes nets  Similarity of an instance to a category is explicitly: Sim(m, X) = P(X | m) (where m is a Bayesian network)

Causal Model Theory CMT categorization (with weighted L-S) is equivalent to Bayesian updating with arbitrary Bayes nets as the generating PGMs  Varying weights in the L-S rule correspond to different category base rates

Exemplar-Based Models Generalized Context Model  Categories defined by a set of exemplars E j Exemplars are actually observed category instances

Exemplar-Based Models Generalized Context Model  Categories defined by a set of exemplars E j Exemplars are actually observed category instances  Similarity is the (weighted) average (exponential of) distance between the instance and exemplars Multiple distance metrics used (e.g., weighted city-block)

Exemplar-Based Models There is an equivalence between:  GCM-similarity functions; and  Probability distributions for Bayes nets with graph: and a regularity constraint on the distribution terms F1F1 F2F2 E [unobserved] FnFn …

Exemplar-Based Models GCM categorization (with weighted L-S) is equivalent to Bayesian updating with fixed- structure Bayes nets (+constraint) as the generating PGMs

Prototype-Based Models First-order Multiplicative Prototype Model:  Categories defined by a prototypical instance Q Prototype need not be actually observed

Prototype-Based Models First-order Multiplicative Prototype Model:  Categories defined by a prototypical instance Q Prototype need not be actually observed  Similarity is the (weighted exponential of the) distance between the instance and the prototype Again, different distance metrics can be used

Prototype-Based Models There is an equivalence between:  FOMPM-similarity functions; and  Probability distributions for empty-graph Markov random fields (and a regularity constraint on the distribution terms ) Note: The “no-edge Markov random field” probability distributions are identical with the “no-edge Bayes net” probability distributions

Prototype-Based Models First-order models fail to capture the intuition of “prototype as summary of observations”  Inter-feature correlations cannot be captured Second-order models with interaction terms  Define features F ij whose value depends on the state of F i and F j  Assume the similarity function is still factorizable into feature-based terms Non-trivial assumption, but not particularly restrictive

Prototype-Based Models There is an equivalence between:  SOMPM-similarity functions; and  Probability distributions for arbitrary-graph Markov random fields (and a regularity constraint on the distribution terms ) Constraint details highly dependant on the exact second-order feature definition and the similarity metric

Prototype-Based Models First-order prototype-based categorization (with weighted L-S) is equivalent to Bayesian updating with no-edge Markov random fields (+constraint) as the generating PGMs  And second-order prototypes are equivalent to Bayesian updating with arbitrary-graph Markov random fields

Summary of Theoretical Results Many psychological theories of categorization are equivalent to Bayesian updating, assuming a particular generative model-type Significant instances:  CMT  Arbitrary-graph Bayes nets  GCM  Fixed-graph Bayes net (+constraint)  Prototype  Empty- or Arbitrary-graph Markov random field (+constraint)

Overview Bayesian Categorization of Probabilistic Graphical Models (PGMs) Psychological Theories of Categorization Theoretical & Experimental Implications

Common Representational Language Common representational language for:  Many psychological theories of concepts and categorization; and  Psychological theories of causal inference and belief based on Bayes nets This shared language arguably facilitates the development of a unified theory of the psychological domains  Unfortunately, just a promissory note right now

Multiple Categorization Systems Several recent papers have argued (roughly):  Each psychological theory is empirically superior for some problems in some domains   There must be multiple categorization systems (corresponding to the different theories)

Multiple Categorization Systems Bayes nets and Markov random fields are special cases of chain graphs – PGMs with directed and undirected edges  We can model each categorization theory as a special case of Bayesian updating on a chain graph

Multiple Categorization Systems If all categorization is Bayesian updating on chain graphs, then we have one cognitive system with many different possible “parameters” (i.e., generative models)  Note: This possibility does not show that the “multiple systems” view is wrong, but does blunt the inference from multiple confirmed theories

Concepts as Chain Graphs How can we test “concepts as chain graphs”?

Concepts as Chain Graphs How can we test “concepts as chain graphs”?  Use a probability distribution for chain graphs with no Bayes net or Markov random field perfect map  Example: F1F1 F3F3 F4F4 F2F2

Concepts as Chain Graphs How can we test “concepts as chain graphs”?  Use a probability distribution for chain graphs with no Bayes net or Markov random field perfect map  Example: Experimental question: How accurately can people learn categories based on this graph? F1F1 F3F3 F4F4 F2F2

Expanded Equivalence Results These results extend known equivalencies to include (i) Causal model theory; and (ii) Second-order prototype models These various theoretical equivalencies can guide experimental design  Use them to determine whether a particular category structure can be equally well-modeled by multiple psychological theories

Expanded Equivalence Results Bayes nets and Markov random fields represent overlapping sets of distributions  Specifically, Bayes nets with no colliders are equivalent to Markov random fields with no cycles

Expanded Equivalence Results Bayes nets and Markov random fields represent overlapping sets of distributions  Specifically, Bayes nets with no colliders are equivalent to Markov random fields with no cycles F1F1 F2F2 F3F3 F4F4 Equal CMT & SOMPM model fits for this concept

Expanded Equivalence Results Bayes nets and Markov random fields represent overlapping sets of distributions  Specifically, Bayes nets with no colliders are equivalent to Markov random fields with no cycles F1F1 F2F2 F3F3 F4F4 Equal CMT & SOMPM model fits for this concept F1F1 F2F2 F3F3 F4F4 Different CMT & SOMPM model fits for this concept

Novel Suggested Theories Recall that the PGMs for both the GCM and SOMPM have additional constraints  These constraints have a relatively natural computational motivation Idea: Investigate generalized versions of the psychological theories  E.g., do we get significantly better model fits? how accurately do people learn concepts that violate the regularity constraints? and so on…

Conclusion Many psychological theories of categorization are equivalent to (special cases of) Bayesian categorization of probabilistic graphical models (and those equivalencies have implications for both (a) theory development & testing, and (b) experimental design & practice)

Appendix: GCM & Bayes Nets Example of the regularity constraint:  City-block distance metric, continuous features: For each F i, each P(F i | E = j) is a Laplace (double exponential) distribution with the same scale parameter, and possibly distinct means E (in the Bayes net) has as many values as there are exemplars (in the category)  P(E = j) is the exemplar weight  In the limit of infinite exemplars, we can represent arbitrary probability distributions