E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 Coordinate Structures: On the Relationship between Parsing Preferences and Corpus Frequencies Ilona Steiner.

Slides:



Advertisements
Similar presentations
The Robert Gordon University School of Engineering Dr. Mohamed Amish
Advertisements

Another word on parsing relative clauses Eyetracking evidence from Spanish and English Manuel Carreiras & Charles Clifton, Jr.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Tracking L2 Lexical and Syntactic Development Xiaofei Lu CALPER 2010 Summer Workshop July 14, 2010.
The prosody of ambiguous relative clauses in Spanish: a study of monolinguals and Basque- Spanish bilinguals Irene de la Cruz-Pavía & Gorka Elordieta UPV-EHU.
Eye Movements and Spoken Language Comprehension: effects of visual context on syntactic ambiguity resolution Spivey et al. (2002) Psych 526 Eun-Kyung Lee.
The Interaction of Lexical and Syntactic Ambiguity by Maryellen C. MacDonald presented by Joshua Johanson.
Coherence-Driven Effects in Relative Clause Processing Hannah Rohde, Roger Levy, & Andrew Kehler University of California, San Diego LSA 2008, Chicago,
Ambiguity ambiguity is the property of being ambiguous, where a word, notation, phrase, clause, sentence is called ambiguous if it can be interpreted in.
Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.
Hagoort, P., Brown, C.M., Groothusen, J. (1993) The Syntactic Positive Shift (SPS) as an ERP measure of syntactic processing.
BioContrasts: Extracting and Exploiting Protein-protein Contrastive Relations from Biomedical Literature Jung-jae Kim 1, Zhuo Zhang 2, Jong C. Park 1 and.
Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 18, March 13, 2007.
1 The role of structural prediction in rapid syntactic analysis Ellen Lau, Clare Sroud, Silke Plesch, Colin Phillips, 2006 PSYC Soondo Baek.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 4.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
Working Memory and Relative Clause Attachment under Increased Sentence Complexity Akira Omaki Department of Second Language Studies, University of Hawai‘i.
1 Human simulations of vocabulary learning Présentation Interface Syntaxe-Psycholinguistique Y-Lan BOUREAU Gillette, Gleitman, Gleitman, Lederer.
SYNTACTIC ANALYSIS OF NOUN PHRASE IN SHIRLEY JACKSON THE LOTTERY “SHORT STORY” BY: SITI MUNAWAROH.
Phonetics, Phonology, Morphology and Syntax
Albert Gatt LIN 3098 Corpus Linguistics. In this lecture Some more on corpora and grammar Construction Grammar as a theoretical framework Collostructional.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
Ferreira and Henderson (1990)
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Incremental sentence production and ellipsis Incremental sentence production reduces the working memory capacity needed for advance planning: The planning.
Relative Clauses in Mandarin Chinese Conversation Na Wang.
Older Adults’ More Effective Use of Context: Evidence from Modification Ambiguities Robert Thornton Pomona College Method Participants: 32 young and 32.
Age of acquisition and frequency of occurrence: Implications for experience based models of word processing and sentence parsing Marc Brysbaert.
Adele E. Goldberg. How argument structure constructions are learned.
Avoiding the Garden Path: Eye Movements in Context
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
Capturing patterns of linguistic interaction in a parsed corpus A methodological case study Sean Wallis Survey of English Usage University College London.
Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.
Using Surface Syntactic Parser & Deviation from Randomness Jean-Pierre Chevallet IPAL I2R Gilles Sérasset CLIPS IMAG.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
Background: Speakers use prosody to distinguish between the meanings of ambiguous syntactic structures (Snedeker & Trueswell, 2004). Discourse also has.
Workshop: Corpus (1) What might a corpus of spoken data tell us about language? OLINCO 2014 Olomouc, Czech Republic, June 7 Sean Wallis Survey of English.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Rules, Movement, Ambiguity
Supertagging CMSC Natural Language Processing January 31, 2006.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Evaluation issues in anaphora resolution and beyond Ruslan Mitkov University of Wolverhampton Faro, 27 June 2002.
Automatic acquisition for low frequency lexical items Nuria Bel, Sergio Espeja, Montserrat Marimon.
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
Towards Semi-Automated Annotation for Prepositional Phrase Attachment Sara Rosenthal William J. Lipovsky Kathleen McKeown Kapil Thadani Jacob Andreas Columbia.
NLP. Parsing ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (,,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (,,) ) (VP (MD will) (VP (VB join) (NP (DT.
Syntactic Priming in Sentence Comprehension (Tooley, Traxler & Swaab, 2009) Zhenghan Qi.
Using PaQu for language acquisition research Jan Odijk CLARIN 2015 Conference Wroclaw,
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 3 rd.
Chapter 3 Language Acquisition: A Linguistic Treatment Jang, HaYoung Biointelligence Laborotary Seoul National University.
Natural Language Processing Vasile Rus
Chapter 8 Introducing Inferential Statistics.
Statistical NLP: Lecture 7
An Introduction to the Government and Binding Theory
Clustering Algorithms for Noun Phrase Coreference Resolution
LING/C SC 581: Advanced Computational Linguistics
Presentation transcript:

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 Coordinate Structures: On the Relationship between Parsing Preferences and Corpus Frequencies Ilona Steiner SFB 441, University of Tübingen Linguistic Evidence, 2-4 February 2006

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB Overview Previous work on the relationship between parsing preferences and corpus frequencies Study by Gibson & Schütze (1999) Study by Mitchell & Brysbaert (1998) Reanalysis by Desmet et al. (2002) Corpus analysis I: Parallel-structure effect Corpus analysis II: Disambiguation preference NP vs. S coordination Discussion

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB Tuning hypothesis (Cuetos et al., 1996) People prefer the most frequently occurring resolution of an ambiguity: Initial parsing preferences in syntactically ambigous sentences are determined by people's previous exposure to similar structures. → Parsing preferences and corpus frequencies should be correlated, i.e., the preferred construction should occur more frequently in corpora than the unpreferred construction

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB Study by Gibson & Schütze (1999) Disambiguation preferences in English noun phrase conjunction: „NP 1 Prep NP 2 Prep NP 3 and NP 4 “ (1) The talkshow host told [ NP 1 : a joke] about [ NP 2 : a man] with [ NP 3 : an umbrella] and … Preference in experiments: NP 3 > NP 1 > NP 2 Preference in corpus data: NP 3 > NP 2 > NP 1 → No correlation Conclusion: „…the sentence comprehension mechanism is not using corpus frequencies in arriving at its preference in this ambiguity and hence the decision principles of sentence comprehension and sentence production must be partially distinct.“

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB Study by Mitchell & Brysbaert (1998) Attachment preference in Dutch: „NP 1 Prep NP 2 Relative Clause“ (2) Someone shot [NP1: the servant] of [NP2: the actress] [RC: who was on the balcony] Preference in experiments: NP 1 > NP 2 Preference in corpus data:NP 2 > NP 1 → No correlation Were the experimental stimuli representative for the sentences in the corpus?

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB Desmet et al. (2002): Reanalysis of Mitchell & Brysbaert's (1998) corpus „NP 1 of NP 2 RC“ Experimental stimuli: both NPs referring to humans Corpus data: very few sentences with both NPs referring to humans Preference depends on nature of NP 1 : NP 1 / human: NP 1 preference NP 1 / non-human: NP 2 preference Sentence completion studies with corpus sentences confirmed this pattern → correlation! → If corpus data are controlled carefully to match the experimental sentences, the discrepancy between sentence comprehension and production disappears

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB Aims Relationship between two types of linguistic evidence with respect to coordinate structures in English: Parsing preferences Corpus frequencies Focus on two processing effects that have been found in reading-time studies: Parallel-structure effect Disambiguation preference in noun phrase vs. sentence coordination Comparison to corpus data in English Verbmobil treebank (TüBa-E, Hinrichs et al. (2000))

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB English Verbmobil Treebank (TüBa-E) Spoken dialogs in the domain of business appointments Annotated (manually) at the levels of Morpho-syntax (parts-of-speech categories) Syntactic phrase structure Function-argument structure Unedited naturally occurring dialogs: data reflect mechanisms of sentence production, and not factors that are due to editing processes

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB Parallel-structure effect In a coordinate structure, the second conjunct is read faster when it is structurally similar to the first one. (Frazier, Munn & Clifton, 2000) (3) Terry wrote [ NP: a long novel] and [ NP: a short poem]. (4) Terry wrote [ NP: a novel] and [ NP: a short poem]. → [ NP: a short poem] faster in (3) than in (4) Preference for structural similarity in coordination also present in corpora?

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB Coordination dataset Analysis of ca sentences (CD13) of TüBa-E Coordination within complete sentences, conjunction: and → 274 occurrences of coordinate structures For each coordinate structure extraction of Syntactic category of mother and daughters Grammatical function of mother Number of conjuncts Length of conjuncts (number of words) Degree of redundancy in conjuncts ( %) Syntactic annotation in the treebank used as basis for the extraction process

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB Corpus analysis I: Degree of redundancy Manual inspection of degree of redundancy for each coordinate structure (with 2 conjuncts) 100%: both conjuncts have exactly the same structure including PoS-tags 0%: not a single node is redundant

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB Example sentence with 36% redundancy

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB Example sentence with 36% redundancy

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB Example sentence with 100% redundancy

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB Example sentence with 100% redundancy

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB Redundancy in coordination dataset Distribution of redundancy from 0% to 100% 36% of all coordinate structures contained conjuncts with identical structure (100%) How often does structural similarity occur randomly? Degree of redundancy No. of occurrences

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB Redundancy in coordination dataset Distribution of redundancy from 0% to 100% 36% of all coordinate structures contained conjuncts with identical structure (100%) How often does structural similarity occur randomly? Degree of redundancy No. of occurrences

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB Random dataset For each first conjunct in coordination dataset: extraction of random second 'conjunct' (randomly chosen from the corpus, independent of coordination) (5)[ NP: Twenty second] and [ NP: twenty fourth] are pretty bad.“ Randomly chosen phrase: [ NP: the first] Random phrase matches original second conjunct in syntactic category (here: NP) grammatical function (here: SBJ) length (here: 2 words) How many occurrences in random dataset are structurally identical?

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB Results: Parallel-structure effect Structural similarity: 13% in random dataset (pairs of original first and random second conjunct) 36% in coordination dataset Difference is highly significant ( χ ² (1) = 72.1; p < ) → Structural similarity within coordination is significantly more frequent in corpus data than structural similarity of two phrases independent of coordination. → Corpus data match preference for structural similarity during parsing

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB Distribution of parallel occurrences Length effect: the shorter the conjuncts, the more frequently structural similarity occurs (F(1,3) = 6.63; p < 0.05 for length 1 vs. 2) Length of conjuncts % Parallel conjuncts

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB Disambiguation preference: NP vs. S coordination Reading-time experiments (Frazier, 1979) : (6) a. Peter kissed [ NP: Mary] and [ NP: her sister] too b. [ S: Peter kissed Mary] and [ S: her sister laughed] Local ambiguity that cannot be resolved prior to the last word Garden-path effect at “laughed“ in (6b) compared to “too“ in (6a) Preference to interpret [her sister] as part of a conjoined NP, and not as the beginning of a new sentence Is the preference for NP coordination (compared to S coordination) also present in corpora?

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB Corpus analysis II: NP vs. S coordination Analysis of CD6 (2554 sentences) and CD13 (2906 sentences) of TüBa-E Extraction of all occurrences with the form: “NP Subj …Verb…NP Compl and…“ Continuation with NP and S coordination should both be semantically possible (7) a. and I will bring [ NP: the doughnuts] and [ NP: coffee] I guess b. and [ S: Friday I have a nine to ten meeting] and [ S: I also have a meeting in the early afternoon] *c. I hate to mix [ NP: business] and [ NP: weekends]

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB Results: NP vs. S coordination “NP Subj …Verb…NP Compl and…“ Coordination with NP: 69% (n = 27) Coordination with S:31% (n = 12) [ t(38) = 2.57; p < 0.05 ] → Corpus frequencies match preference for NP coordination (compared to S coordination) during parsing

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB Discussion 1 Correlation between parsing preferences and corpus frequencies for Parallel-structure effect Disambiguation preference NP vs. S coordination Possible explanations: “Tuning hypothesis“: statistical pattern in natural language causes development of parsing preferences Common source for language production and comprehension

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB Discussion 2 If there is a common source, structure-based accounts would explain parsing preferences Parallel-structure effect: Recycling mechanism (Steiner, 2005) Preference for NP coordination: “Minimal Attachment Principle“ (Frazier, 1979), “Recency Preference“ (Gibson et al., 1996) Preferred construction is more economical, and is easier and faster to process Constructions that are easier to understand would also be easier to produce, and are produced more often → Common mechanism for language production and comprehension leads to faster reading times during parsing and to higher frequencies during production

E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB Discussion 3 With the present study we are not able to differentiate between “tuning hypothesis“ and a possible common source. We showed that production and comprehension are closer to each other than expected. If correlation between parsing preferences and corpus frequencies holds in general: → corpus data can be used to evaluate (at least qualitatively) models of sentence processing