Presentation is loading. Please wait.

Presentation is loading. Please wait.

The dynamics of incremental sentence comprehension A situation-space model Stefan Frank Department of Cognitive, Perceptual and Brain Sciences University.

Similar presentations


Presentation on theme: "The dynamics of incremental sentence comprehension A situation-space model Stefan Frank Department of Cognitive, Perceptual and Brain Sciences University."— Presentation transcript:

1 The dynamics of incremental sentence comprehension A situation-space model Stefan Frank Department of Cognitive, Perceptual and Brain Sciences University College London

2 sentence comprehension cognitive modelling information theory

3 Sentence comprehension as mental simulation The mental representation of a sentence’s meaning is not some symbolic structure But an analogical and modal simulation of the described state of affairs (e.g., Barsalou, 1999; Zwaan, 2004) Comparable to the result of directly experiencing the described situation Central property of analogical representations: direct inference

4 Sentence comprehension as mental simulation Stanfield & Zwaan (2001) John put the pen in the cup   John put the pen in the drawer Was this object mentioned in the sentence? fast RT Direct inference results from the analogical nature of mental representation

5 A model of sentence comprehension Frank, Haselager & Van Rooij (2009) Formalization of analogical representations and direct inference Any state of the world corresponds to a vector in situation space These representations are analogical: Relations between the vectors mirror probabilistic relations between the represented situations In practice, restricted to a microworld

6 The microworld Concepts and atomic situations 22 Concepts, e.g.,  people:charlie, heidi, sophia  games:chess, hide&seek, soccer  toys:puzzle, doll, ball  places:bathroom, bedroom, street, playground  predicates:play, place, win, lose 44 atomic situations, e.g., –play(charlie, chess) –win(sophia) –place(heidi, bedroom)

7 The microworld States of the world Atomic situations and boolean combinations thereof refer to states of the world: –play(sophia, hide&seek) ∧ place(sophia, playground) “sophia plays hide-and-seek in the playground” –lose(charlie) ∨ lose(heidi) ∨ lose(sophia) “someone loses” Interdependencies among states of the world affect probabilities of microworld states: –sophia and heidi are usually at the same place –the person who wins must play a game

8 Representing microworld situations Automatic generation of 25,000 observations of microworld states. Unsupervised Competitive Layer yields a situation vector μ(p)  [0,1] 150 for each atomic situation p Any state of the world can be represented by Boolean operations on vectors: μ(  p), μ(p  q), μ(p  q) Probability of a situation can be estimated from its representation: P(z) ≈ ∑ i μ i (z)/150

9 Representing microworld situations Direct inference The conditional probability of one situation given another, can be estimated from the two vectors: P(p|z) = P(p  z)/P(z) From the representations μ( play(sophia, soccer) ), μ( play(sophia, ball) ), μ( play(sophia, puzzle) ) it follows that P( play(sophia, ball) | play(sophia, soccer) ) ≈.99 P( play(sophia, puzzle) | play(sophia, soccer) ) ≈ 0 Representing sophia playing soccer is also representing her playing with ball, not puzzle

10 The microlanguage 40 words 13,556 possible sentences, e.g., –girl plays chess –ball is played with by charlie –heidi loses to sophia at hide-and-seek –someone wins Each sentence has –a unique semantics (represented by a situation vector) –a probability of occurrence (higher for shorter sentences)

11 A model of the comprehension process A simple recurrent network (SRN) maps microlanguage sentences onto the vectors of the corresponding situations Displays semantic systematicity (in the sense of Fodor & Pylyshyn, 1988; Hadley, 1994) input (40 units) words hidden (120 units) word sequences output (150 units) situation vectors

12 Simulated word-reading time No sense of processing a word over time in the standard SRN Addition: output vector update is a dynamical process, expressed by a differential equation (Frank, in press) This yields a processing time for each word: simulated reading times Word-processing times compared to formal measures of the amount of information conveyed by each word

13 Word information and reading time Assumption: human linguistic competence is captured by probabilistic language models Such models give rise to formal measures of the amount of word-information content The more information is conveyed by a word, the more cognitive effort is involved in processing it This leads to longer reading time on the word

14 highly expected word less expected word Word information and expectation 1a)It is raining cats and 1b)She is training cats and dogs These expectations arise from knowledge of linguistic forms

15 Word information and expectation Syntactic surprisal (Hale, 2001; Levy 2008) formalization of a word’s unexpectedness measure of word information follows from word’s probability given the sentence so far: − log P(w i+1 |w 1,…,w i ), under a particular probabilistic language model Any reasonably accurate language model estimates surprisal values that predict word-reading times (Demberg & Keller, 2008; Smith & Levy, 2008; Frank, 2009; Wu et al., 2010)

16 low uncertainty Word information and uncertainty about the rest of the sentence 2a)It is raining high uncertainty catshigh uncertainty reduction

17 high uncertainty Word information and uncertainty about the rest of the sentence 2a)It is raining 2b)She is training high uncertainty cats high uncertainty reduction low uncertainty reduction These uncertainties arise from knowledge of linguistic forms

18 Word information and uncertainty about the rest of the sentence Syntactic entropy –formalization of the amount of uncertainty about the rest of the sentence –can be computed from a probabilistic language model Entropy reduction is an alternative measure of the amount of information the word conveys (Hale, 2003, 2006) Predicts word-reading times independently from surprisal (Frank, 2010)

19 high semantic surprisal low semantic surprisal World knowledge and word expectation accepted These expectations arise from knowledge of the world 3a)The brilliant paper was immediately 3b)The terrible paper was immediately Traxler et al. (2000): words take longer to read if they are less expected given the situation described so far

20 high semantic entropy low semantic entropy World knowledge and uncertainty about the rest of the sentence accepted/rejected 4a)The brilliant paper was immediately 4b)The mediocre paper was immediately low semantic entropy

21 World knowledge and uncertainty about the rest of the sentence accepted/rejected 4a)The brilliant paper was immediately 4b)The mediocre paper was immediately low semantic entropy reduction high semantic entropy reduction These uncertainties arise from knowledge of the world

22 Syntactic versus semantic word information Syntactic information Semantic information Source of knowledge LanguageThe world Probabilities ofWord sequencesStates of the world Cognitive taskSentence recognition Simulation of described situation

23 Word-information measures in the sentence- comprehension model For each word of each microlanguage sentence, four information values can be computed Syntactic surprisal and syntactic entropy reduction: follow directly from the microlanguage sentence’s occurrence probabilities Semantic surprisal and semantic entropy reduction: follow from probabilities of situations described by the sentences (estimated by situation vectors)

24 Computing semantic surprisal sentence so far w 1,…,w i complete sentences described situations situation vectors vector for disjunction of situations w 1,…,w i,… sit 1 sit 2 sit 3 sit 4 sit 1 sit 2 sit 3 sit 4 sit 1  sit 2  sit 3  sit 4

25 Computing semantic surprisal sentence so farw 1,…,w i complete sentences w 1,…,w i,… described situations situation vectors vector for disjunction of situations w 1,…,w i,… sit 1 sit 2 sit 3 sit 4 sit 1  sit 2  sit 3  sit 4 w 1,…,w i+1 w 1,…,w i+1,… sit 2 sit 4 sit 2 sit 4 sit 2  sit 4

26 Computing semantic surprisal vector for disjunction of situations sit 1  sit 2  sit 3  sit 4 sit 2  sit 4 conditional probability estimate P(sit 2  sit 4 |sit 1  sit 2  sit 3  sit 4 ) semantic surprisal of word w i+1 −log P(sit 2  sit 4 |sit 1  sit 2  sit 3  sit 4 ) Computing semantic entropy reduction is more tricky, but also possible

27 Results Nested linear regression PredictorCoefficientR2R2 Semantic surprisal Semantic entropy reduction Syntactic surprisal Word position Syntactic entropy reduction all p < 10 − 8

28 Conclusions Mental simulation, word information, and processing time Semantic word information, formalized with respect to world knowledge, provides a more formal basis for the notion of mental simulation The sentence-comprehension model correctly predicts slower processing of more informative words Irrespective of information source (syntax/semantics) and information measure (surprisal/entropy red.)

29 More conclusions Learning syntax Words that convey more syntactic information take longer to process: The SRN is sensitive to sentence probabilities But sentence probabilities are irrelevant to the network’s task of mapping sentences to situations No part of the model is meant to learn anything about syntax. It is not a probabilistic language model. Merely learning the sentence-situation mapping, can result in the acquisition of useful syntactic knowledge


Download ppt "The dynamics of incremental sentence comprehension A situation-space model Stefan Frank Department of Cognitive, Perceptual and Brain Sciences University."

Similar presentations


Ads by Google