Presentation is loading. Please wait.

Presentation is loading. Please wait.

Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences.

Similar presentations


Presentation on theme: "Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences."— Presentation transcript:

1 Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

2 data behavior What computational problem is the brain solving? Does human behavior correspond to an optimal solution to that problem?

3 Inductive problems Inferring structure from data Perception –e.g. structure of 3D world from 2D visual data data hypotheses cube shaded hexagon

4 Inductive problems Inferring structure from data Perception –e.g. structure of 3D world from 2D data Cognition –e.g. relationship between variables from samples datahypotheses

5 Reverend Thomas Bayes

6 Bayes’ theorem Posterior probability LikelihoodPrior probability Sum over space of hypotheses h: hypothesis d: data

7 Bayes’ theorem h: hypothesis d: data

8 Perception is optimal Körding & Wolpert (2004)

9 Cognition is not

10 Do people use priors? Standard answer: no (Tversky & Kahneman, 1974) This talk: yes What are people’s priors?

11 Explaining inductive leaps How do people –infer causal relationships –identify the work of chance –predict the future –assess similarity and make generalizations –learn functions, languages, and concepts... from such limited data?

12 Explaining inductive leaps How do people –infer causal relationships –identify the work of chance –predict the future –assess similarity and make generalizations –learn functions, languages, and concepts... from such limited data? What knowledge guides human inferences?

13 Prior knowledge matters when… …using a single datapoint –predicting the future –joint work …using secondhand data –effects of priors on cultural transmission

14 Outline …using a single datapoint –predicting the future –joint work with Josh Tenenbaum (MIT) …using secondhand data –effects of priors on cultural transmission –joint work with Mike Kalish (Louisiana) Conclusions

15 Outline …using a single datapoint –predicting the future –joint work with Josh Tenenbaum (MIT) …using secondhand data –effects of priors on cultural transmission –joint work with Mike Kalish (Louisiana) Conclusions

16 Predicting the future How often is Google News updated? t = time since last update t total = time between updates What should we guess for t total given t?

17 Making predictions You encounter a phenomenon that has existed for t units of time. How long will it continue into the future? (i.e. what’s t total ?) We could replace “time” with any other variable that ranges from 0 to some unknown upper limit

18 Everyday prediction problems You read about a movie that has made $60 million to date. How much money will it make in total? You see that something has been baking in the oven for 34 minutes. How long until it’s ready? You meet someone who is 78 years old. How long will they live? Your friend quotes to you from line 17 of his favorite poem. How long is the poem? You see taxicab #107 pull up to the curb in front of the train station. How many cabs in this city?

19 Bayesian inference p(t total |t)  p(t|t total ) p(t total ) posterior probability likelihoodprior

20 Bayesian inference p(t total |t)  p(t|t total ) p(t total ) p(t total |t)  1/t total p(t total ) assume random sample (0 < t < t total ) posterior probability likelihoodprior

21 Bayesian inference p(t total |t)  p(t|t total ) p(t total ) p(t total |t)  1/t total 1/t total assume random sample (0 < t < t total ) posterior probability likelihoodprior “uninformative” prior

22 How about maximal value of p(t total |t)? Bayesian inference p(t total |t)  1/t total 1/t total posterior probability What is the best guess for t total ? p(t total |t) t total t total = t random sampling “uninformative” prior

23 Bayesian inference p(t total |t) t total What is the best guess for t total ? Instead, compute t* such that p(t total > t*|t) = 0.5: p(t total |t)  1/t total 1/t total posterior probability random sampling “uninformative” prior

24 Bayesian inference Yields Gott’s Rule: P(t total > t*|t) = 0.5 when t* = 2t i.e., best guess for t total = 2t What is the best guess for t total ? Instead, compute t* such that p(t total > t*|t) = 0.5. p(t total |t)  1/t total 1/t total posterior probability random sampling “uninformative” prior

25 Applying Gott’s rule t  4000 years, t*  8000 years

26 Applying Gott’s rule t  130,000 years, t*  260,000 years

27 Predicting everyday events You meet someone who is 35 years old. How long will they live? –“70 years” seems reasonable Not so simple: –You meet someone who is 78 years old. How long will they live? –You meet someone who is 6 years old. How long will they live?

28 The effects of priors Different kinds of priors p(t total ) are appropriate in different domains. Uninformative: p(t total )  1/t total

29 The effects of priors Different kinds of priors p(t total ) are appropriate in different domains. e.g. wealthe.g. height

30 The effects of priors

31 Evaluating human predictions Different domains with different priors: –a movie has made $60 million [power-law] –your friend quotes from line 17 of a poem [power-law] –you meet a 78 year old man [Gaussian] –a movie has been running for 55 minutes [Gaussian] –a U.S. congressman has served for 11 years [Erlang] Prior distributions derived from actual data Use 5 values of t for each People predict t total

32 people parametric prior empirical prior Gott’s rule

33 Nonparametric priors You arrive at a friend’s house, and see that a cake has been in the oven for 34 minutes. How long will it be in the oven?

34 You learn that in ancient Egypt, there was a great flood in the 11th year of a pharaoh’s reign. How long did he reign? No direct experience

35 You learn that in ancient Egypt, there was a great flood in the 11th year of a pharaoh’s reign. How long did he reign? How long did the typical pharaoh reign in ancient Egypt? No direct experience

36 …using a single datapoint People produce accurate predictions for the duration and extent of everyday events Strong prior knowledge –form of the prior (power-law or exponential) –distribution given that form (parameters) –non-parametric distribution when necessary Reveals a surprising correspondence between probabilities in the mind and in the world

37 Outline …using a single datapoint –predicting the future –joint work with Josh Tenenbaum (MIT) …using secondhand data –effects of priors on cultural transmission –joint work with Mike Kalish (Louisiana) Conclusions

38 Cultural transmission Most knowledge is based on secondhand data Some things can only be learned from others –cultural objects transmitted across generations Cultural transmission provides an opportunity for priors to influence cultural objects

39 Iterated learning (Briscoe, 1998; Kirby, 2001) Each learner sees data, forms a hypothesis, produces the data given to the next learner c.f. the playground game “telephone”

40 Objects of iterated learning Languages Religious concepts Social norms Myths and legends Causal theories

41 Explaining linguistic universals Human languages are a subset of all logically possible communication schemes –universal properties common to all languages (Comrie, 1981; Greenberg, 1963; Hawkins, 1988) Two questions: –why do linguistic universals exist? –why are particular properties universal?

42 Explaining linguistic universals Traditional answer: –linguistic universals reflect innate constraints specific to a system for acquiring language Alternative answer: –iterated learning imposes “information bottleneck” –universal properties survive this bottleneck (Briscoe, 1998; Kirby, 2001)

43 Analyzing iterated learning What are the consequences of iterated learning? Simulations Analytic results Complex algorithms Simple algorithms Komarova, Niyogi, & Nowak (2002) Brighton (2002) Kirby (2001) Smith, Kirby, & Brighton (2003) ?

44 Iterated Bayesian learning p(h|d)p(h|d) p(d|h)p(d|h) p(h|d)p(h|d) p(d|h)p(d|h) Learners are rational Bayesian agents (covers a wide range of learning algorithms)

45 Markov chains Variables x (t+1) independent of history given x (t) Converges to a stationary distribution under easily checked conditions xx x xx x x x Transition matrix P(x (t+1) |x (t) )

46 Markov chain Monte Carlo A strategy for sampling from complex probability distributions Key idea: construct a Markov chain which converges to target distribution –e.g. Metropolis algorithm –e.g. Gibbs sampling

47 Gibbs sampling For variables x = x 1, x 2, …, x n Draw x i (t+1) from P(x i |x -i ) x -i = x 1 (t+1), x 2 (t+1),…, x i-1 (t+1), x i+1 (t), …, x n (t) (a.k.a. the heat bath algorithm in statistical physics) (Geman & Geman, 1984)

48 Gibbs sampling (MacKay, 2002)

49 Iterated Bayesian learning Defines a Markov chain on (h,d)

50 Iterated Bayesian learning Defines a Markov chain on (h,d) This Markov chain is a Gibbs sampler for

51 Iterated Bayesian learning Defines a Markov chain on (h,d) This Markov chain is a Gibbs sampler for Rate of convergence is geometric –Gibbs sampler converges geometrically (Liu, Wong, & Kong, 1995)

52 Analytic results Iterated Bayesian learning converges to Corollaries: –distribution over hypotheses converges to p(h) –distribution over data converges to p(d) –the proportion of a population of iterated learners with hypothesis h converges to p(h)

53 Implications for linguistic universals Two questions: –why do linguistic universals exist? –why are particular properties universal? Different answers: –existence explained through iterated learning –universal properties depend on the prior Focuses inquiry on the priors of the learners –cultural objects reflect the human mind

54 A method for discovering priors Iterated learning converges to the prior… …evaluate prior by producing iterated learning

55 Iterated function learning Each learner sees a set of (x,y) pairs Makes predictions of y for new x values Predictions are data for the next learner datahypotheses

56 Function learning in the lab Stimulus Response Slider Feedback Examine iterated learning with different initial data

57 1 2 3 4 5 6 7 8 9 Iteration Initial data

58 …using secondhand data Iterated Bayesian learning converges to the prior Constrains explanations of linguistic universals Open questions in Bayesian language evolution –variation in priors –other selective pressures Provides a method for evaluating priors –concepts, causal relationships, languages, …

59 Outline …using a single datapoint –predicting the future …using secondhand data –effects of priors on cultural transmission Conclusions

60 Bayes’ theorem A unifying principle for explaining inductive inferences

61 Bayes’ theorem behavior = f(data,knowledge) data behavior

62 Bayes’ theorem behavior = f(data,knowledge) data behavior knowledge

63 Explaining inductive leaps How do people –infer causal relationships –identify the work of chance –predict the future –assess similarity and make generalizations –learn functions, languages, and concepts... from such limited data? What knowledge guides human inferences?

64

65 HHTHT

66 HHHHT

67 p( HHTHT |random) p(random| HHTHT ) What’s the computational problem? An inference about the structure of the world

68

69 An example: Gaussians If we assume… –data, d, is a single real number, x –hypotheses, h, are means of a Gaussian,  –prior, p(  ), is Gaussian(  0,  0 2 ) …then p(x n+1 |x n ) is Gaussian(  n,  x 2 +  n 2 )

70  0 = 0,  0 2 = 1, x 0 = 20 Iterated learning results in rapid convergence to prior

71 An example: Linear regression Assume –data, d, are pairs of real numbers (x, y) –hypotheses, h, are functions An example: linear regression –hypotheses have slope  and pass through origin –p(  ) is Gaussian(  0,  0 2 ) } x = 1  y

72 }  y  0 = 1,  0 2 = 0.1, y 0 = -1

73


Download ppt "Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences."

Similar presentations


Ads by Google