Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates Matthew Gerber and Joyce Y. Chai Department of Computer Science Michigan State University.

Slides:



Advertisements
Similar presentations
A Joint Model of Implicit Arguments for Nominal Predicates Matthew Gerber and Joyce Y. Chai Department of Computer Science Michigan State University East.
Advertisements

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara.
Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.
Farag Saad i-KNOW 2014 Graz- Austria,
Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois.
Automatically Evaluating Text Coherence Using Discourse Relations Ziheng Lin, Hwee Tou Ng and Min-Yen Kan Department of Computer Science National University.
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Class-based nominal semantic role labeling: a preliminary investigation Matt Gerber Michigan State University, Department of Computer Science.
E XTRACTING SEMANTIC ROLE INFORMATION FROM UNSTRUCTURED TEXTS Diana Trandab ă 1 and Alexandru Trandab ă 2 1 Faculty of Computer Science, University “Al.
计算机科学与技术学院 Chinese Semantic Role Labeling with Dependency-driven Constituent Parse Tree Structure Hongling Wang, Bukang Wang Guodong Zhou NLP Lab, School.
® Towards Using Structural Events To Assess Non-Native Speech Lei Chen, Joel Tetreault, Xiaoming Xi Educational Testing Service (ETS) The 5th Workshop.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Event Extraction Using Distant Supervision Kevin Reschke, Martin Jankowiak, Mihai Surdeanu, Christopher D. Manning, Daniel Jurafsky 30 May 2014 Language.
Semantic Role Labeling Abdul-Lateef Yussiff
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
A Novel Approach to Event Duration Prediction
Enterprise Systems.
Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.
Distributed Representations of Sentences and Documents
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Mining and Summarizing Customer Reviews
A Joint Model of Feature Mining and Sentiment Analysis for Product Review Rating Jorge Carrillo de Albornoz Laura Plaza Pablo Gervás Alberto Díaz Universidad.
Lecture 2: Introduction to Machine Learning
1 CS546: Machine Learning and Natural Language Preparation to the Term Project: - Dependency Parsing - Dependency Representation for Semantic Role Labeling.
Learning Narrative Schemas Nate Chambers, Dan Jurafsky Stanford University IBM Watson Research Center Visit.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Opinion Mining Using Econometrics: A Case Study on Reputation Systems Anindya Ghose, Panagiotis G. Ipeirotis, and Arun Sundararajan Leonard N. Stern School.
Aspect Guided Text Categorization with Unobserved Labels Dan Roth, Yuancheng Tu University of Illinois at Urbana-Champaign.
Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
1 Rated Aspect Summarization of Short Comments Yue Lu, ChengXiang Zhai, and Neel Sundaresan Presented by: Sapan Shah.
A Cross-Lingual ILP Solution to Zero Anaphora Resolution Ryu Iida & Massimo Poesio (ACL-HLT 2011)
CS 6998 NLP for the Web Columbia University 04/22/2010 Analyzing Wikipedia and Gold-Standard Corpora for NER Training William Y. Wang Computer Science.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.
Supervised Clustering of Label Ranking Data Mihajlo Grbovic, Nemanja Djuric, Slobodan Vucetic {mihajlo.grbovic, nemanja.djuric,
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Prior Knowledge Driven Domain Adaptation Gourab Kundu, Ming-wei Chang, and Dan Roth Hyphenated compounds are tagged as NN. Example: H-ras Digit letter.
Summarizing Encyclopedic Term Descriptions on the Web from Coling 2004 Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Evaluation issues in anaphora resolution and beyond Ruslan Mitkov University of Wolverhampton Faro, 27 June 2002.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
NLP. Introduction to NLP Last week, Min broke the window with a hammer. The window was broken with a hammer by Min last week With a hammer, Min broke.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Teacher: Dr. Celeste Ng Student : Betty Ady Summarized from: Gefen, D. and A. Ragowsky (2005). A Multi-Level Approach to Measuring the Benefits.
Solving Hard Coreference Problems Haoruo Peng, Daniel Khashabi and Dan Roth Problem Description  Problems with Existing Coref Systems Rely heavily on.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
1.6 Understanding markets. Candidates should be able to: give examples of types of markets explain the concept of demand define market segmentation list.
A Database of Narrative Schemas A 2010 paper by Nathaniel Chambers and Dan Jurafsky Presentation by Julia Kelly.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Semantic Role Labelling Using Chunk Sequences Ulrike Baldewein Katrin Erk Sebastian Padó Saarland University Saarbrücken Detlef Prescher Amsterdam University.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
Language Identification and Part-of-Speech Tagging
Discourse Mode Identification in Essays
Experience Report: System Log Analysis for Anomaly Detection
Automatically Labeled Data Generation for Large Scale Event Extraction
CRF &SVM in Medication Extraction
Erasmus University Rotterdam
NYU Coreference CSCI-GA.2591 Ralph Grishman.
Two Discourse Driven Language Models for Semantics
Improving a Pipeline Architecture for Shallow Discourse Parsing
Aspect-based sentiment analysis
Background & Overview Proposed Model Experimental Results Future Work
Lei Sha, Jing Liu, Chin-Yew Lin, Sujian Li, Baobao Chang, Zhifang Sui
View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions 1,2 1.
Presentation transcript:

Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates Matthew Gerber and Joyce Y. Chai Department of Computer Science Michigan State University East Lansing, Michigan, USA Language & Interaction Research

A Motivating Example What can traditional SRL systems tell us? Georgia-Pacific and Nekoosa produce market pulp, containerboard and white paper. The goods could be manufactured closer to customers, saving shipping costs.

A Motivating Example What can traditional SRL systems tell us? –Who is the producer? –What is produced? Georgia-Pacific and Nekoosa produce market pulp, containerboard and white paper. The goods could be manufactured closer to customers, saving shipping costs.

A Motivating Example What can traditional SRL systems tell us? –Who is the producer? –What is produced? –What is manufactured? But that’s not the whole story… –Who is the manufacturer? Georgia-Pacific and Nekoosa produce market pulp, containerboard and white paper. The goods could be manufactured closer to customers, saving shipping costs.

A Motivating Example What can traditional SRL systems tell us? –Who is the producer? –What is produced? –What is manufactured? But that’s not the whole story… –Who is the manufacturer? –Who ships? Georgia-Pacific and Nekoosa produce market pulp, containerboard and white paper. The goods could be manufactured closer to customers, saving shipping costs.

A Motivating Example What can traditional SRL systems tell us? –Who is the producer? –What is produced? –What is manufactured? But that’s not the whole story… –Who is the manufacturer? –Who ships what? Georgia-Pacific and Nekoosa produce market pulp, containerboard and white paper. The goods could be manufactured closer to customers, saving shipping costs.

A Motivating Example What can traditional SRL systems tell us? –Who is the producer? –What is produced? –What is manufactured? But that’s not the whole story… –Who is the manufacturer? –Who ships what to whom? Implicit arguments Georgia-Pacific and Nekoosa produce market pulp, containerboard and white paper. The goods could be manufactured closer to customers, saving shipping costs.

...saving shipping costs. Nominal predicates are not new NomBank (Meyers, 2007) –115k manual SRL analyses for 4,700 predicates Identifying NomBank arguments –Jiang and Ng (2006); Liu and Ng (2007) Many predicates lack arguments in NomBank –2008 CoNLL Shared Task (Surdeanu et al., 2008) –Gerber et al. (NAACL, 2009) Predicate filter improves performance –Do not address the recovery of implicit arguments Nominal SRL

Implicit Argument Identification Research questions –Where are implicit arguments? –Can we recover them? Related work –Japanese anaphora Indirect anaphora (Sasano et al., 2004) Zero-anaphora (Imamura et al., 2009) –Implicit Arguments Fine-grained domain model (Palmer et al., 1986) SemEval Task 10 (Ruppenhofer et al., 2010)

Outline Implicit Argument Annotation and Analysis Model Formulation and Features Evaluation Conclusions and Future Work

Ten most prominent NomBank predicates –Derived from verbal role set (ship shipment) –Frequency of nominal predicate –Difference between verbal/nominal argument counts [John] shipped [the package]. (argument count: 2) Shipping costs will decline. (argument count: 0) Predicates instances annotated: 1,254 Independently annotated by two annotators –Cohen’s Kappa: 67% –Agreement: both unfilled or both filled identically Data Annotation

Post-annotation Verb form Annotation Analysis Pre-annotation Verb form Average number of expressed arguments

Annotation Analysis Pre-annotation Verb form Post-annotation Verb form Average number of expressed arguments

Annotation Analysis Average arguments across all predicates –Pre-annotation: 1.1 –Post-annotation: 1.8 –Verb form: 2.0 Overall percentage of possible roles filled –Pre-annotation: 28.0% –Post-annotation: 46.2% ( 65%)

Annotation Analysis 90% within current or previous three sentences Only 55% within current sentence

Outline Implicit Argument Annotation and Analysis Model Formulation and Features Evaluation Conclusions and Future Work

Model Formulation Candidate selection –Core PropBank/NomBank arguments –Two-sentence candidate window Coreference chaining – Binary classification function – c2 c3 c1 Georgia-Pacific and Nekoosa produce market pulp, containerboard and white paper. The goods could be manufactured closer to customers, saving shipping costs.

Model Features SRL structure Discourse structure Other

Model Features: SRL Structure VerbNet role transition c3 Georgia-Pacific and Nekoosa produce market pulp, containerboard and white paper. The goods could be manufactured closer to customers, saving shipping costs.

send.theme.theme createproduct Feature value: create.product send.theme Captures script-like properties of events Multiple values are possible VerbNet role transition Model Features: SRL Structure c3 VN class: create VN role: product VN class: send arg1? Georgia-Pacific and Nekoosa produce market pulp, containerboard and white paper. The goods could be manufactured closer to customers, saving shipping costs. arg1

Narrative event chains (Chambers and Jurafsky, 2008) –PMI(manufacture.arg1, ship.arg1) –Computed from SRL output over Gigaword (Graff, 2003) –Advantages: better coverage + relationship strength Model Features: SRL Structure c3 Georgia-Pacific and Nekoosa produce market pulp, containerboard and white paper. The goods could be manufactured closer to customers, saving shipping costs. arg1

Penn Discourse TreeBank (Prasad et al., 2008) –Feature value: Contingency.Cause.Result –Might help identify salient discourse segments Model Features: Discourse Structure c2 Georgia-Pacific and Nekoosa produce market pulp, containerboard and white paper. The goods could be manufactured closer to customers, saving shipping costs.

Outline Implicit Argument Annotation and Analysis Model Formulation and Features Evaluation Conclusions and Future Work

Data processing –Gold SRL labels, OpenNLP coreference, GigaWord Training (sections 2-21, 24) –816 annotated predicates –650 implicitly filled argument positions –LibLinear logistic regression Testing (section 23) –437 annotated predicates –246 implicitly filled argument positions Baseline heuristic: matching argument positions Armstrong agreed to sell its carpet operations to Shaw Industries. The sale could help Armstrong. Evaluation Setting

Data processing –Gold SRL labels, OpenNLP coreference, GigaWord Training (sections 2-21, 24) –816 annotated predicates –650 implicitly filled argument positions –LibLinear logistic regression Testing (section 23) –437 annotated predicates –246 implicitly filled argument positions Baseline heuristic: matching argument positions Armstrong agreed to sell its carpet operations to Shaw Industries. The sale could help Armstrong. Evaluation Setting

Methodology (Ruppenhofer et al., 2010) –Ground-truth implicit arguments: –Predicted implicit argument: –Prediction score: –P: total prediction score / prediction count –R: total prediction score / true implicit positions Evaluation Setting Georgia-Pacific and Nekoosa produce market pulp, containerboard and white paper. The goods could be manufactured closer to customers, saving shipping costs.

Evaluation Results Overall F 1 –Baseline: 26.5% –Discriminative: 42.3% –Human annotator Two-sentence window: 58.4% Unlimited window: 67.0%

Evaluation Results # filled Baseline F 1 (%) Discriminative F 1 (%) p Oracle recall (%) sale60/ price53/ investor35/ < bid26/ plan20/ cost17/ loss12/ loan9/ investment8/ fund6/

Feature Ablation Ablation sets –SRL structure (e.g., VerbNet role transition) –Non-SRL information –Discourse structure Percent change (p-value) PRF1F1 Remove SRL str (<0.01)-36.1 (<0.01)-35.7 (<0.01) Remove non-SRL-26.3 (<0.01)-11.9 (0.05)-19.2 (<0.01) Remove discourse0.2 (0.95)1.0 (0.66)0.7 (0.73)

Improvements Versus Baseline Who is the seller? Two key pieces of information –Coreference chain for Olivetti (exporting/supplying) –Relationships between exporting, supplying, and sales Olivetti exports... Olivetti supplies... Olivetti has denied that it violated the rules, asserting that the shipments were properly licensed. However, the legality of these sales is still an open question.

Outline Implicit Argument Annotation and Analysis Model Formulation and Features Evaluation Conclusions and Future Work

Implicit arguments are prevalent –Add 65% to the coverage of NomBank Most implicit arguments are near the predicate –55% in current sentence –90% within three sentences Implicit arguments can be automatically extracted –SRL structure is currently the most informative –This is a difficult task and much work remains Ongoing investigations –Global inference instead of local classification –Unsupervised knowledge acquisition Data: