Anaphora resolution in connectionist networks Florian Niefind University of Saarbrücken, Institute for Computational Linguistics Helmut Weldle University.

Slides:

Advertisements

Similar presentations

CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.

Advertisements

Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.

Intervention by gaps in online sentence processing Michael Frazier, Peter Baumann, Lauren Ackerman, David Potter, Masaya Yoshida Northwestern University.

Projecting Grammatical Features in Nominals: 23 March 2010 Jerry T. Ball Senior Research Psychologist 711 th HPW / RHAC Air Force Research Laboratory DISTRIBUTION.

Theeraporn Ratitamkul, University of Illinois and Adele E. Goldberg, Princeton University Introduction How do young children learn verb meanings? Scene.

Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:

Lecture 11: Binding and Reflexivity.  Pronouns differ from nouns in that their reference is determined in context  The reference of the word dog is.

Tuomas Sandholm Carnegie Mellon University Computer Science Department

Learning linguistic structure with simple recurrent networks February 20, 2013.

Statistical Methods and Linguistics - Steven Abney Thur. POSTECH Computer Science NLP Lab Shim Jun-Hyuk.

Processing of large document collections Part 6 (Text summarization: discourse- based approaches) Helena Ahonen-Myka Spring 2006.

Chapter 18: Discourse Tianjun Fu Ling538 Presentation Nov 30th, 2006.

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.

Chapter 20: Natural Language Generation Presented by: Anastasia Gorbunova LING538: Computational Linguistics, Fall 2006 Speech and Language Processing.

Connectionist Simulation of the Empirical Acquisition of Grammatical Relations – William C. Morris, Jeffrey Elman Connectionist Simulation of the Empirical.

Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

9.012 Brain and Cognitive Sciences II Part VIII: Intro to Language & Psycholinguistics - Dr. Ted Gibson.

Chapter 5 NEURAL NETWORKS

CS 4705 Algorithms for Reference Resolution. Anaphora resolution Finding in a text all the referring expressions that have one and the same denotation.

Tom Griffiths CogSci C131/Psych C123 Computational Models of Cognition.

CS 4705 Lecture 21 Algorithms for Reference Resolution.

Natural Language Generation Martin Hassel KTH CSC Royal Institute of Technology Stockholm

Bernard Ans, Stéphane Rousset, Robert M. French & Serban Musca (European Commission grant HPRN-CT ) Preventing Catastrophic Interference in.

June 7th, 2008TAG+91 Binding Theory in LTAG Lucas Champollion University of Pennsylvania

Input-Output Relations in Syntactic Development Reflected in Large Corpora Anat Ninio The Hebrew University, Jerusalem The 2009 Biennial Meeting of SRCD,

Chapter Seven The Network Approach: Mind as a Web.

Sentence Processing using a Simple Recurrent Network EE 645 Final Project Spring 2003 Dong-Wan Kang 5/14/2003.

1 Binding Sharon Armon-Lotem. 2 John i shaved himself i 1.John likes himself 2.John likes him 3.He likes John 4.*Himself likes John 5.John thinks that.

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

Modeling Language Acquisition with Neural Networks A preliminary research plan Steve R. Howell.

The mental representation of sentences Tree structures or state vectors? Stefan Frank

Jelena Mirković and Maryellen C. MacDonald Language and Cognitive Neuroscience Lab, University of Wisconsin-Madison Introduction How to Study Subject-Verb.

Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Guide to Simulation Run Graphic: The simulation runs show ME (memory element) activation, production matching and production firing during activation of.

Binding Theory Describing Relationships between Nouns.

February 22, 2010 Connectionist Models of Language.

NEURAL NETWORKS FOR DATA MINING

Connectionist Models of Language Development: Grammar and the Lexicon Steve R. Howell McMaster University, 1999.

1 Special Electives of Comp.Linguistics: Processing Anaphoric Expressions Eleni Miltsakaki AUTH Fall 2005-Lecture 2.

Methodology of Simulations n CS/PY 399 Lecture Presentation # 19 n February 21, 2001 n Mount Union College.

Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.

Modelling Language Acquisition with Neural Networks Steve R. Howell A preliminary research plan.

Neural Organization and Connectionist Models of Spatial Cognition: A Review Jigar Patel and S. Bapi Raju* University of Hyderabad Spatial Cognition The.

1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 4.

Coherence and Coreference Introduction to Discourse and Dialogue CS 359 October 2, 2001.

COSC 460 – Neural Networks Gregory Caza 17 August 2007.

Deep Learning for Efficient Discriminative Parsing Niranjan Balasubramanian September 2 nd, 2015 Slides based on Ronan Collobert’s Paper and video from.

EMPATH: A Neural Network that Categorizes Facial Expressions Matthew N. Dailey and Garrison W. Cottrell University of California, San Diego Curtis Padgett.

Grammatical Illusions and Selective Fallibility in Real- Time Language Comprehension Collin Phillips, Matthew W. Wagers an Ellen F. Lau April 15, 2015.

Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.

Chapter 18 Connectionist Models

Chapter 6 Neural Network.

Biological and cognitive plausibility in connectionist networks for language modelling Maja Anđel Department for German Studies University of Zagreb.

Method. Input to Learning Two groups of learners each learn one of two new Semi-Artificial Languages. Both Languages: Example sentences: glim lion bee.

IEEE AI - BASED POWER SYSTEM TRANSIENT SECURITY ASSESSMENT Dr. Hossam Talaat Dept. of Electrical Power & Machines Faculty of Engineering - Ain Shams.

Coreferential Interpretations of Reflexives in Picture Noun Phrases: an Experimental Approach Micah Goldwater University of Texas at Austin Jeffrey T.

Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.

Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.

Jonatas Wehrmann, Willian Becker, Henry E. L. Cagnini, and Rodrigo C

Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

Simone Paolo Ponzetto University of Heidelberg Massimo Poesio

Improving a Pipeline Architecture for Shallow Discourse Parsing

Backpropagation in fully recurrent and continuous networks

Algorithms for Reference Resolution

Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 8, 2018.

Learning linguistic structure with simple recurrent neural networks

The Network Approach: Mind as a Web

Presentation transcript:

Anaphora resolution in connectionist networks Florian Niefind University of Saarbrücken, Institute for Computational Linguistics Helmut Weldle University of Freiburg, Centre for Cognitive Science Workshop: „Representation and Processing of Language“ University of Freiburg,

Connectionism and Language Connectionist approaches to language processing applied to various levels and processes (Christiansen & Chater, 1999, 2001) Sequence processing: Simple Recurrent Networks (SRNs: Elman, 1990, 1991, 1993) Linguistic representation in SRNs Associationist, probabilistic, distribution sensitive Constraint satisfaction …Grammar? …Syntactic structures? –Categorization by collocation –Transitions in phase states

Sentence processing in SRNs Feed-forward network performing word prediction Context layer provides memory for syntactic context Probability derivation: context dependent word (transition) probabilities Internal Representations: syntactic word classes, context specific features boythesaw boysaw the

SRNs as models for language processing? SRNs are merely semantics free POS-taggers (Steedman, 1999, 2002) Limited systematicity, but (Frank, 2006; Brakel & Frank 2009; Frank, Haselager & vanRooij, 2009) Sensitive to irrelevant structural relations (Frank, Mathis & Badecker, 2005) No extrapolation and variable binding (Marcus, 1998; concerning eliminative view: Holyoak & Hummel, 2000) Only structural relations –Language grounding: Acquisition in a situated fashion (Harnad, 1990; Glenberg, 1997; Barsalou, 1999) –Connectionist approaches to grounded acquisition (e.g., Cangelosi, 2005; Plunkett, et al., 1992; Coventry, et al., 2004)

Anaphora Resolution Anaphora resolution factors (constraints vs. preferences) Gender/number agreement Semantic consistency Salience Semantic/syntactic parallelism Global structural constraints: c-command Structurally determined complementary binding domains for pronouns and reflexives (G&B theory) a)Reflexives need a c-commanding NP as antecedent b)Pronouns must not have a c-commanding NP as antecedent (inside the boundaries of one sentence) (a) Ken i who likes John j saw himself i/*j. (b) Ken i who likes John j saw him j/*i.

Anaphora Resolution Is online anaphora resolution globally structure driven? –Pro sensitivity for structural binding constraints (Asudeh & Keller, 2001; Badecker & Straub, 2002 …but with influences of gender marking by inaccessible antecedents) –Contra I: dominance of exclusively structural principles Logophors (Kaiser et al., 2009; Runner, Sussman & Tanenhaus, 2003, 2006) Referential commitment (MacWhinney, 2008) –Contra II: sensitivity for structural constraints, but not within a global but rather a local frame

Anaphora Resolution in SRNs Origins of our studies: Investigation of the performance capacity of SRNs (Frank, Mathis & Badecker, 2005) –How abstract are the grammatical generalizations derived by SRNs? –Anaphora resolution (subsequently: variable binding) Acquisition of binding constraints for pronouns reflexives Lexically complex (variable reference) Structurally complex (bridging irrelevant structures) Architecture: Stepwise cascading SRNs

Word prediction Reference assignment Anaphora resolution in SRNs

Anaphora Resolution in SRNs Results (Frank, Mathis & Badecker, 2005) –Word prediction good performance –Reference assignment good performance for simple sentences bad performance for complex sentences that impose long- distance constraints –Internal representations reveal the problem Assignment is based on irrelevant structural generalizations E.g., pronoun/reflexive position after SRCs vs. ORCs

New Approach SRNs are capable of integrating multiple cues (e.g., Christiansen, Allen & Seidenberg, 1998) SRNs are capable of processing anaphors (Weldle, Konieczny, Müller, Wolfer & Baumann, 2009) Despite restrictions concerning variable binding –Interesting behaviour and predictions of SRNs –Behaviour and predictions for anaphora resolution!? Error-correspondence of performance Locality-effects, false alarms, local syntactic coherences (Konieczny, Müller & Ruh, 2009) Improved replication of Frank, Mathis & Badecker (2005) : mature grammatical representations by means of complex stimuli, task-driven representations forced by integrated cascading SRNs

Architecture: cascading SRNs SPC:70 hidden/context units 27 input/output units localistic lexical encoding RAC:35 hidden/context units 9 output units (referents) Learning rate:0.2 – 0.02 (grad. decr.) Momentum:0.6 Init. weight range:0.5 Training10 epochs backpropagation through time Integrative training allows the SRN to keep sensitive for structural information required to solve the reference assignment task word prediction(t +1 ) reference assignment (t 0 ) R EFERENCE A SSIGNMENT C OMPONENT S ENTENCE P ROCESSING C OMPONENT Input word-by-word (t 0 )

Training corpus Artificial training corpus, generated with a PCFG – sentences, presented word-by-word

Test corpora SRC Während der Germanist, der den Biologen sieht, sich/ihn kratzte, … „While the philologist, who saw the biologist, scratched him/himself…“ ORC Während der Germanist, den der Biologe sieht, sich/ihn kratzt, … „While the philologist, who the biologist saw, scratched him/himself…“ Test sets –Common test set –Complex test set: anaphora resolution and N/V- agreement in complex syntactic embeddings

Results Examination of –Output performance for word prediction –Output performance for reference assignment –Internal representations at anaphoric expression Grammatical Prediction Error (Christiansen & Chater, 1999)

Word prediction While the philologist, who saw the biologist, scratched him/himself …

Word prediction While the philologist, who saw the biologist, scratched him/himself …

Reference: pronouns While the philologist, who saw the biologist, scratched him …

Reference: reflexives While the philologist, who saw the biologist, scratched himself …

Reference: reflexives While the philologist, who saw the biologist, scratched himself …

Local syntactic coherences Analysis of probability vectors at anaphor position Activations are influenced by the antecedent directly preceeding the anaphoric expression Locally coherent sub-sequence crossing the RC-boundary (cf. converging previous simulation findings: Konieczny, Ruh & Müller, 2009) a.Enables access to normally inaccessible antecedents b.Inhibits access to normally accessible antecedents Internal representations (multivariate statistics) Do not reflect dependence on preceding phrase structure Categorization highlights gender- and agreement-marking of MC subject Network develops trans-structural generalizations the biologist, scratched himself i/*j „While the philologist i, who saw the biologist j, scratched himself i/*j …“ the biologist, scratched him „While the philologist i, who saw the biologist j, scratched him j/*i …“

Conclusions SRNs with proper prerequisites are in principle capable of anaphora resolution – within limits of interpolation Previous results (Frank, Mathis & Badecker, 2005) are most likely simulation artefacts of the architecture, training procedure and limited grammar Interferences by local coherent subsequences should be seen in terms of error correspondence: prediction of local coherence effects in anaphora resolution Local syntactic coherence effects (Konieczny, 2005; Konieczny et al., 2007, 2009) Effects also affect reference assignment (Weldle et al., 2009; Wolfer, previous talk)