Presentation is loading. Please wait.

Presentation is loading. Please wait.

Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky.

Similar presentations


Presentation on theme: "Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky."— Presentation transcript:

1 Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky

2 Inductive problems blicket toma dax wug blicket wug S  X Y X  {blicket,dax} Y  {toma, wug} Learning languages from utterances Learning functions from (x,y) pairs Learning categories from instances of their members

3 Generalization requires induction Generalization: predicting the properties of an entity from observed properties of others y x

4 What makes a good inductive learner? Hypothesis 1: more representational power –more hypotheses, more complexity –spirit of many accounts of learning and development

5 Some hypothesis spaces Linear functions Quadratic functions 8th degree polynomials

6 Minimizing squared error

7

8

9

10 Measuring prediction error

11 What makes a good inductive learner? Hypothesis 1: more representational power –more hypotheses, more complexity –spirit of many accounts of learning and development Hypothesis 2: good inductive biases –constraints on hypotheses that match the environment

12 Outline The bias-variance tradeoff Bayesian inference and inductive biases Revealing inductive biases Conclusions

13 Outline The bias-variance tradeoff Bayesian inference and inductive biases Revealing inductive biases Conclusions

14 A simple schema for induction Data D are n pairs (x,y) generated from function f Hypothesis space of functions, y = g(x) Error is E = (y - g(x)) 2 Pick function g that minimizes error on D Measure prediction error, averaging over x and y y x

15 Bias and variance A good learner makes (f(x) - g(x)) 2 small g is chosen on the basis of the data D Evaluate learners by the average of (f(x) - g(x)) 2 over data D generated from f biasvariance (Geman, Bienenstock, & Doursat, 1992)

16 Making things more intuitive… The next few slides were generated by: –choosing a true function f(x) –generating a number of datasets D from p(x,y) defined by uniform p(x), p(y|x) = f(x) plus noise –finding the function g(x) in the hypothesis space that minimized the error on D Comparing average of g(x) to f(x) reveals bias Spread of g(x) around average is the variance

17 Linear functions (n = 10)

18 } bias pink is g(x) for each dataset red is average g(x) black is f(x) y x } variance

19 Quadratic functions ( n = 10) pink is g(x) for each dataset red is average g(x) black is f(x) y x

20 8-th degree polynomials ( n = 10) pink is g(x) for each dataset red is average g(x) black is f(x) y x

21 Bias and variance (for our (quadratic) f(x), with n = 10) Linear functions high bias, medium variance Quadratic functions low bias, low variance 8-th order polynomials low bias, super-high variance

22 In general… Larger hypothesis spaces result in higher variance, but low bias across several f(x) The bias-variance tradeoff: –if we want a learner that has low bias on a range of problems, we pay a price in variance This is mainly an issue when n is small –the regime of much of human learning

23 Quadratic functions ( n = 100) pink is g(x) for each dataset red is average g(x) black is f(x) y x

24 8-th degree polynomials ( n = 100) pink is g(x) for each dataset red is average g(x) black is f(x) y x

25 The moral General-purpose learning mechanisms do not work well with small amounts of data –more representational power isn’t always better To make good predictions from small amounts of data, you need a bias that matches the problem –these biases are the key to successful induction, and characterize the nature of an inductive learner So… how can we identify human inductive biases?

26 Outline The bias-variance tradeoff Bayesian inference and inductive biases Revealing inductive biases Conclusions

27 Bayesian inference Reverend Thomas Bayes Rational procedure for updating beliefs Foundation of many learning algorithms Lets us make the inductive biases of learners precise

28 Bayes’ theorem Posterior probability LikelihoodPrior probability Sum over space of hypotheses h: hypothesis d: data

29 Priors and biases Priors indicate the kind of world a learner expects to encounter, guiding their conclusions In our function learning example… –likelihood gives probability to data that decrease with sum squared errors (i.e. a Gaussian) –priors are uniform over all functions in hypothesis spaces of different kinds of polynomials –having more functions corresponds to a belief in a more complex world…

30 Outline The bias-variance tradeoff Bayesian inference and inductive biases Revealing inductive biases Conclusions

31 Two ways of using Bayesian models Specify models that make different assumptions about priors, and compare their fit to human data (Anderson & Schooler, 1991; Oaksford & Chater, 1994; Griffiths & Tenenbaum, 2006) Design experiments explicitly intended to reveal the priors of Bayesian learners

32 Iterated learning (Kirby, 2001) What are the consequences of learners learning from other learners?

33 Objects of iterated learning Knowledge communicated across generations through provision of data by learners Examples: –religious concepts –social norms –myths and legends –causal theories –language

34 Analyzing iterated learning P L (h|d): probability of inferring hypothesis h from data d P P (d|h): probability of generating data d from hypothesis h PL(h|d)PL(h|d) P P (d|h) PL(h|d)PL(h|d)

35 Variables x (t+1) independent of history given x (t) Converges to a stationary distribution under easily checked conditions (i.e., if it is ergodic) xx x xx x x x Transition matrix T = P(x (t+1) |x (t) ) Markov chains

36 Analyzing iterated learning d0d0 h1h1 d1d1 h2h2 PL(h|d)PL(h|d) PP(d|h)PP(d|h) PL(h|d)PL(h|d) d2d2 h3h3 PP(d|h)PP(d|h) PL(h|d)PL(h|d)  d P P (d|h)P L (h|d) h1h1 h2h2 h3h3 A Markov chain on hypotheses d0d0 d1d1  h P L (h|d) P P (d|h) d2d2 A Markov chain on data

37 Iterated Bayesian learning PL(h|d)PL(h|d) P P (d|h) PL(h|d)PL(h|d) Assume learners sample from their posterior distribution:

38 Stationary distributions Markov chain on h converges to the prior, P(h) Markov chain on d converges to the “prior predictive distribution” (Griffiths & Kalish, 2005)

39 Explaining convergence to the prior PL(h|d)PL(h|d) P P (d|h) PL(h|d)PL(h|d) Intuitively: data acts once, prior many times Formally: iterated learning with Bayesian agents is a Gibbs sampler on P(d,h) (Griffiths & Kalish, in press)

40 Revealing inductive biases If iterated learning converges to the prior, it might provide a tool for determining the inductive biases of human learners We can test this by reproducing iterated learning in the lab, with stimuli for which human biases are well understood

41 Iterated function learning Each learner sees a set of (x,y) pairs Makes predictions of y for new x values Predictions are data for the next learner datahypotheses (Kalish, Griffiths, & Lewandowsky, in press)

42 Function learning experiments Stimulus Response Slider Feedback Examine iterated learning with different initial data

43 1 2 3 4 5 6 7 8 9 Iteration Initial data

44 Identifying inductive biases Formal analysis suggests that iterated learning provides a way to determine inductive biases Experiments with human learners support this idea –when stimuli for which biases are well understood are used, those biases are revealed by iterated learning What do inductive biases look like in other cases? –continuous categories –causal structure –word learning –language learning

45 Outline The bias-variance tradeoff Bayesian inference and inductive biases Revealing inductive biases Conclusions

46 Solving inductive problems and forming good generalizations requires good inductive biases Bayesian inference provides a way to make assumptions about the biases of learners explicit Two ways to identify human inductive biases: –compare Bayesian models assuming different priors –design tasks to extract biases from Bayesian learners Iterated learning provides a lens for magnifying the inductive biases of learners –small effects for individuals are big effects for groups

47

48 Iterated concept learning Each learner sees examples from a species Identifies species of four amoebae Iterated learning is run within-subjects data hypotheses (Griffiths, Christian, & Kalish, in press)

49 Two positive examples data (d) hypotheses (h)

50 Bayesian model (Tenenbaum, 1999; Tenenbaum & Griffiths, 2001) d: 2 amoebae h: set of 4 amoebae m: # of amoebae in the set d (= 2) |h|: # of amoebae in the set h (= 4) Posterior is renormalized prior What is the prior?

51 Classes of concepts (Shepard, Hovland, & Jenkins, 1961) Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 shape size color

52 Experiment design (for each subject) Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 6 iterated learning chains 6 independent learning “chains”

53 Estimating the prior data (d) hypotheses (h)

54 Estimating the prior Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 0.861 0.087 0.009 0.002 0.013 0.028 Prior r = 0.952 Bayesian model Human subjects

55 Two positive examples (n = 20) Probability Iteration Probability Iteration Human learners Bayesian model

56 Two positive examples (n = 20) Probability Bayesian model Human learners

57 Three positive examples data (d) hypotheses (h)

58 Three positive examples (n = 20) Probability Iteration Probability Iteration Human learners Bayesian model

59 Three positive examples (n = 20) Bayesian model Human learners

60

61 Serial reproduction (Bartlett, 1932) Participants see stimuli, then reproduce them from memory Reproductions of one participant are stimuli for the next Stimuli were interesting, rather than controlled –e.g., “War of the Ghosts”

62

63 Discovering the biases of models Generic neural network:

64 Discovering the biases of models EXAM (Delosh, Busemeyer, & McDaniel, 1997):

65 Discovering the biases of models POLE (Kalish, Lewandowsky, & Kruschke, 2004):

66


Download ppt "Revealing inductive biases with Bayesian models Tom Griffiths UC Berkeley with Mike Kalish, Brian Christian, and Steve Lewandowsky."

Similar presentations


Ads by Google