Evaluating Models of Computation and Storage in Human Sentence Processing Thang Luong CogACLL 2015 Tim J. O’Donnell & Noah D. Goodman.

Evaluating Models of Computation and Storage in Human Sentence Processing Thang Luong CogACLL 2015 Tim J. O’Donnell & Noah D. Goodman

What computed and what stored? A basic question for theories of language representation, processing, and acquisition. At the sub-word level (O’Donnell, 2015): – “ness” in pine-scentedness vs. “th” in warmth. Many empirical & theoretical work. bucket kick the Storage kick the bucket Few work applies to cognitive datasets.

Human Sentence Processing Probabilistic syntax models: – Reading times: (Roark et al., 2009). – Eye fixation times: (Demberg & Keller, 2008). Incremental parsing algorithms Probabilistic syntax models + Human reading difficulty No work has examined the influence of storage and computation in syntax.

This work Propose a framework to evaluate C&S models. Study the influence of storage units in predicting reading difficulty. Maximal computation Maximal storage Mixed-effects Analysis Incremental Parser C&S models Reading difficulty surprisals

Models of computation & storage 3 models of computation & storage (C&S) Gold parse trees are assumed to be known – Can do MAP estimation. Maximal computation Maximal storage Dirichlet multinomial PCFGs Fragment Grammars MAP Adaptor Grammars

C&S Models – Maximal Computation Dirichlet-Multinomial PCFG (Johnson, et al. 2007) – Storage: minimal abstract units – PCFG rules – Computation: maximal. Put less probability mass on frequent structures

C&S Models – Maximal storage MAP Adaptor Grammar (Johnson, et al. 2007) – Storage: DMPCFGs + maximally specific units. – Computation: minimal. Put probability mass on two many infrequent structures

C&S Models – Inference-based Fragment grammars (O’Donnell, et al. 2009) – Storage: inference over rules best explains data. Rules in MAG + rules rewrite to non-terminals / terminals – Computation: optimal. Make the right trade-off between storage and computation.

Human reading time prediction Mixed-effects Analysis Incremental Parser C&S models Reading difficulty surprisals Improve our parser to handle different grammars.

Surprisal Theory Lexical predictability of words given contexts – (Hale, 2001) and (Levy, 2008) – Surprisal value: Strong correlation with: – Eye-tracking time: (Demberg and Keller, ’08). – Self-paced reading time: (Roark et al., ’09).

Incremental Parser Top-down approach for CFG (Earley, 1970). Earley algorithm for PCFG (Stolcke, 1995): – Prefix probabilities – Needed to to compute surprisal values: Our parser: based on Levy (08)’s parser. – Additional features to handle different grammars. – Publicly available.

Incremental parser – Features Handle arbitrary PCFG rewrite rules: – MAP Adaptor Grammars: VP -> kick the bucket – Fragment Grammars: VP -> kick NP Handle large grammars: Grammars# rules DM-PCFG75K FG146K MAG778K

Human reading time prediction Mixed-effects Analysis Incremental Parser C&S models Reading data surprisals Show consistent results in two different corpora.

Experiments Grammars: DMPCFG, MAG, FG – trained on WSJ (length < 40 words). Corpora: – Eye-tracking: Dundee corpus (Kennedy & Pynte, 05). – Self-paced reading: MIT corpus (Bachrach et al., ’09). SentWordSubjOrigFiltered Dundee2,37058K10586K229K MIT1993.5K2381K70K

Model Prediction Evaluation How well models predict words in the test data? – Average the surprisal values. Ranking: FG ≻ DMPCFG ≻ MAG DundeeMIT DMPCFG6.826.80 MAG6.916.95 FG6.35

Evaluation on Cognitive Data How well models explain reading times? – Mixed-effects analysis. – Surprisal values for DMPCFG, MAG, FG as predictors. Settings: similar to (Fossum and Levy, 2012). – Random effects: by-word and by-subject intercepts. – Eye fixation and reading times: log-transformed. Nested model comparisons with 2 tests.

Additive tests Effect of each grammar predictor. Ranking: FG ≻ DMPCFG ≻ MAG 2 DundeeMIT Base + DMPCFG70.9**38.5** Base + MAG10.9*0.1 Base + FG118.3**62.5** (**: 99% significant, *: 95% significant)

Subtractive tests Effect of each grammar predictor explains above and beyond others. Ranking: FG ≻ MAG ≻ DMPCFG – DMPCFG doesn’t explain above and beyond FG. 2 DundeeMIT Full - DMPCFG4.0*3.5* Full - MAG14.3**23.6** Full - FG62.5**42.9** (**: 99% significant, *: 95% significant)

Mixed-effect coefficients Full setting: with predictors from all models. MAG is negatively correlated with reading time. – Syntax is still mostly compositional. – Only a small fraction of structures are stored. DundeeMIT DMPCFG0.001950.00324 MAG-0.00141-0.00282 FG0.005490.00697

Conclusion Study the effect of computation & storage in predicting reading difficulty: Provide a framework for future research in human sentence processing. Thank you! Maximal computation Maximal storage Dirichlet multinomial PCFGs Fragment Grammars MAP Adaptor Grammars

Earley parsing algorithm Top-down approach developed by Earley (1970): – States – pending derivations: [l, r] X ↦ Y. Z – Operations – state transitions: predict, scan, complete Predict Scan Complete 0123 dogschasecats Grammar: S ↦ NP VP, VP ↦ V NP, NP ↦ dogs, NP ↦ cats, V ↦ chase Root ↦. S S ↦. NP VP NP ↦. dogs NP ↦ dogs. S ↦ NP. VP VP ↦. V NP V ↦. chase V ↦ chase. VP ↦ V. NP NP ↦. cats NP ↦ cats. VP ↦ V NP. S ↦ NP VP. Root ↦ S.

Earley algorithm for PCFGs (Stolcke, 95) Earley path: a sequence of states linked by Earley operations (predict, scan, complete). – Partial derivations Earley paths. – P(d) = product of rule probs used in predicted states. Prefix probability: sum of derivation probabilities across all paths yielding a prefix x. wiwi wiwi w0w0 w0w0 w1w1 w1w1 … Root Prefix probability P(w 0 w 1 … w i ) Earley paths d 1 d 2... d n thedogspiggie P(d 1 ) P(d n ) P(d 2 )

Evaluating Models of Computation and Storage in Human Sentence Processing Thang Luong CogACLL 2015 Tim J. O’Donnell & Noah D. Goodman.

Similar presentations

Presentation on theme: "Evaluating Models of Computation and Storage in Human Sentence Processing Thang Luong CogACLL 2015 Tim J. O’Donnell & Noah D. Goodman."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Evaluating Models of Computation and Storage in Human Sentence Processing Thang Luong CogACLL 2015 Tim J. O’Donnell & Noah D. Goodman.

Similar presentations

Presentation on theme: "Evaluating Models of Computation and Storage in Human Sentence Processing Thang Luong CogACLL 2015 Tim J. O’Donnell & Noah D. Goodman."— Presentation transcript:

Similar presentations

About project

Feedback