Evaluating Models of Computation and Storage in Human Sentence Processing Thang Luong CogACLL 2015 Tim J. O’Donnell & Noah D. Goodman.

Slides:



Advertisements
Similar presentations
Albert Gatt Corpora and Statistical Methods Lecture 11.
Advertisements

Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.
Probabilistic and Lexicalized Parsing CS Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.
Structured SVM Chen-Tse Tsai and Siddharth Gupta.
Probabilistic Earley Parsing Charlie Kehoe, Spring 2004 Based on the 1995 paper by Andreas Stolcke: An Efficient Probabilistic Context-Free Parsing Algorithm.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
In Search of a More Probable Parse: Experiments with DOP* and the Penn Chinese Treebank Aaron Meyers Linguistics 490 Winter 2009.
CKY Parsing Ling 571 Deep Processing Techniques for NLP January 12, 2011.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
CS4705 Natural Language Processing.  Regular Expressions  Finite State Automata ◦ Determinism v. non-determinism ◦ (Weighted) Finite State Transducers.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 4.
1/13 Parsing III Probabilistic Parsing and Conclusions.
Grammar induction by Bayesian model averaging Guy Lebanon LARG meeting May 2001 Based on Andreas Stolcke’s thesis UC Berkeley 1994.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
Fall 2004 Lecture Notes #5 EECS 595 / LING 541 / SI 661 Natural Language Processing.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
More on Text Management. Context Free Grammars Context Free Grammars are a more natural model for Natural Language Syntax rules are very easy to formulate.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
The mental representation of sentences Tree structures or state vectors? Stefan Frank
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.
BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.
Some Probability Theory and Computational models A short overview.
Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-17: Probabilistic parsing; inside- outside probabilities.
Albert Gatt Corpora and Statistical Methods Lecture 11.
Coarse-to-Fine Efficient Viterbi Parsing Nathan Bodenstab OGI RPE Presentation May 8, 2006.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-16: Probabilistic parsing; computing probability of.
Basic Parsing Algorithms: Earley Parser and Left Corner Parsing
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.
CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Probabilistic Context Free Grammars Grant Schindler 8803-MDM April 27, 2006.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)
DERIVATION S RULES USEDPROBABILITY P(s) = Σ j P(T,S) where t is a parse of s = Σ j P(T) P(T) – The probability of a tree T is the product.
NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Formal grammars A formal grammar is a system for defining the syntax of a language by specifying sequences of symbols or sentences that are considered.
By Kyle McCardle.  Issues with Natural Language  Basic Components  Syntax  The Earley Parser  Transition Network Parsers  Augmented Transition Networks.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
General Information on Context-free and Probabilistic Context-free Grammars İbrahim Hoça CENG784, Fall 2013.
Natural Language Processing Vasile Rus
CSC 594 Topics in AI – Natural Language Processing
Programming Languages Translator
CS 388: Natural Language Processing: Statistical Parsing
CS4705 Natural Language Processing
Written by Yoshihiko Hasegawa and Hitoshi Iba
Prof. Pushpak Bhattacharyya, IIT Bombay
Presentation transcript:

Evaluating Models of Computation and Storage in Human Sentence Processing Thang Luong CogACLL 2015 Tim J. O’Donnell & Noah D. Goodman

What computed and what stored? A basic question for theories of language representation, processing, and acquisition. At the sub-word level (O’Donnell, 2015): – “ness” in pine-scentedness vs. “th” in warmth. Many empirical & theoretical work. bucket kick the Storage kick the bucket Few work applies to cognitive datasets.

Human Sentence Processing Probabilistic syntax models: – Reading times: (Roark et al., 2009). – Eye fixation times: (Demberg & Keller, 2008). Incremental parsing algorithms Probabilistic syntax models + Human reading difficulty No work has examined the influence of storage and computation in syntax.

This work Propose a framework to evaluate C&S models. Study the influence of storage units in predicting reading difficulty. Maximal computation Maximal storage Mixed-effects Analysis Incremental Parser C&S models Reading difficulty surprisals

Models of computation & storage 3 models of computation & storage (C&S) Gold parse trees are assumed to be known – Can do MAP estimation. Maximal computation Maximal storage Dirichlet multinomial PCFGs Fragment Grammars MAP Adaptor Grammars

C&S Models – Maximal Computation Dirichlet-Multinomial PCFG (Johnson, et al. 2007) – Storage: minimal abstract units – PCFG rules – Computation: maximal. Put less probability mass on frequent structures

C&S Models – Maximal storage MAP Adaptor Grammar (Johnson, et al. 2007) – Storage: DMPCFGs + maximally specific units. – Computation: minimal. Put probability mass on two many infrequent structures

C&S Models – Inference-based Fragment grammars (O’Donnell, et al. 2009) – Storage: inference over rules best explains data. Rules in MAG + rules rewrite to non-terminals / terminals – Computation: optimal. Make the right trade-off between storage and computation.

Human reading time prediction Mixed-effects Analysis Incremental Parser C&S models Reading difficulty surprisals Improve our parser to handle different grammars.

Surprisal Theory Lexical predictability of words given contexts – (Hale, 2001) and (Levy, 2008) – Surprisal value: Strong correlation with: – Eye-tracking time: (Demberg and Keller, ’08). – Self-paced reading time: (Roark et al., ’09).

Incremental Parser Top-down approach for CFG (Earley, 1970). Earley algorithm for PCFG (Stolcke, 1995): – Prefix probabilities – Needed to to compute surprisal values: Our parser: based on Levy (08)’s parser. – Additional features to handle different grammars. – Publicly available.

Incremental parser – Features Handle arbitrary PCFG rewrite rules: – MAP Adaptor Grammars: VP -> kick the bucket – Fragment Grammars: VP -> kick NP Handle large grammars: Grammars# rules DM-PCFG75K FG146K MAG778K

Human reading time prediction Mixed-effects Analysis Incremental Parser C&S models Reading data surprisals Show consistent results in two different corpora.

Experiments Grammars: DMPCFG, MAG, FG – trained on WSJ (length < 40 words). Corpora: – Eye-tracking: Dundee corpus (Kennedy & Pynte, 05). – Self-paced reading: MIT corpus (Bachrach et al., ’09). SentWordSubjOrigFiltered Dundee2,37058K10586K229K MIT1993.5K2381K70K

Model Prediction Evaluation How well models predict words in the test data? – Average the surprisal values. Ranking: FG ≻ DMPCFG ≻ MAG DundeeMIT DMPCFG MAG FG6.35

Evaluation on Cognitive Data How well models explain reading times? – Mixed-effects analysis. – Surprisal values for DMPCFG, MAG, FG as predictors. Settings: similar to (Fossum and Levy, 2012). – Random effects: by-word and by-subject intercepts. – Eye fixation and reading times: log-transformed. Nested model comparisons with 2 tests.

Additive tests Effect of each grammar predictor. Ranking: FG ≻ DMPCFG ≻ MAG 2 DundeeMIT Base + DMPCFG70.9**38.5** Base + MAG10.9*0.1 Base + FG118.3**62.5** (**: 99% significant, *: 95% significant)

Subtractive tests Effect of each grammar predictor explains above and beyond others. Ranking: FG ≻ MAG ≻ DMPCFG – DMPCFG doesn’t explain above and beyond FG. 2 DundeeMIT Full - DMPCFG4.0*3.5* Full - MAG14.3**23.6** Full - FG62.5**42.9** (**: 99% significant, *: 95% significant)

Mixed-effect coefficients Full setting: with predictors from all models. MAG is negatively correlated with reading time. – Syntax is still mostly compositional. – Only a small fraction of structures are stored. DundeeMIT DMPCFG MAG FG

Conclusion Study the effect of computation & storage in predicting reading difficulty: Provide a framework for future research in human sentence processing. Thank you! Maximal computation Maximal storage Dirichlet multinomial PCFGs Fragment Grammars MAP Adaptor Grammars

Earley parsing algorithm Top-down approach developed by Earley (1970): – States – pending derivations: [l, r] X ↦ Y. Z – Operations – state transitions: predict, scan, complete Predict Scan Complete 0123 dogschasecats Grammar: S ↦ NP VP, VP ↦ V NP, NP ↦ dogs, NP ↦ cats, V ↦ chase Root ↦. S S ↦. NP VP NP ↦. dogs NP ↦ dogs. S ↦ NP. VP VP ↦. V NP V ↦. chase V ↦ chase. VP ↦ V. NP NP ↦. cats NP ↦ cats. VP ↦ V NP. S ↦ NP VP. Root ↦ S.

Earley algorithm for PCFGs (Stolcke, 95) Earley path: a sequence of states linked by Earley operations (predict, scan, complete). – Partial derivations Earley paths. – P(d) = product of rule probs used in predicted states. Prefix probability: sum of derivation probabilities across all paths yielding a prefix x. wiwi wiwi w0w0 w0w0 w1w1 w1w1 … Root Prefix probability P(w 0 w 1 … w i ) Earley paths d 1 d 2... d n thedogspiggie P(d 1 ) P(d n ) P(d 2 )