GermaNet-WS II 03-2005 A WordNet “Detour” to FrameNet Aljoscha Burchardt Katrin Erk Anette Frank* Saarland University, DFKI* Saarbrücken

Slides:



Advertisements
Similar presentations
Building Wordnets Piek Vossen, Irion Technologies.
Advertisements

COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
An Evaluation Procedure for Word Net Based Lexical Chaining: Methods and Issues Irene Cramer & Marc Finthammer Faculty of Cultural.
CL Research ACL Pattern Dictionary of English Prepositions (PDEP) Ken Litkowski CL Research 9208 Gue Road Damascus,
FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,
The SALSA experience: semantic role annotation Katrin Erk University of Texas at Austin.
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
Building Text Meaning Representations from Contextually Related Frames – A Case Study – Aljoscha Burchardt Anette Frank Manfred Pinkal Saarland University.
FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,
Automatic Classification of Semantic Relations between Facts and Opinions Koji Murakami, Eric Nichols, Junta Mizuno, Yotaro Watanabe, Hayato Goto, Megumi.
Shallow semantic parsing: Making most of limited training data Katrin Erk Sebastian Pado Saarland University.
Natural Language Query Interface Mostafa Karkache & Bryce Wenninger.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Corpus-based Induction of an LFG Syntax-Semantics Interface for Frame Semantic Processing Anette Frank, Jiří Semecký
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
PropBank, VerbNet & SemLink Edward Loper. PropBank 1M words of WSJ annotated with predicate- argument structures for verbs. –The location & type of each.
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
A Survey of NLP Toolkits Jing Jiang Mar 8, /08/20072 Outline WordNet Statistics-based phrases POS taggers Parsers Chunkers (syntax-based phrases)
Assessing the Impact of Frame Semantics on Textual Entailment Authors: Aljoscha Burchardt, Marco Pennacchiotti, Stefan Thater, Manfred Pinkal Saarland.
The Impact of Grammar Enhancement on Semantic Resources Induction Luca Dini Giampaolo Mazzini
SALSA The Saarbrücken Lexical Semantics Annotation & Acquisition Project Aljoscha Burchardt, Katrin Erk, Anette Frank, Andrea Kowalski, Sebastian Pado,
Based on “Semi-Supervised Semantic Role Labeling via Structural Alignment” by Furstenau and Lapata, 2011 Advisors: Prof. Michael Elhadad and Mr. Avi Hayoun.
A Survey for Interspeech Xavier Anguera Information Retrieval-based Dynamic TimeWarping.
LREC 2008 AWN 1 Arabic WordNet: Semi-automatic Extensions using Bayesian Inference H. Rodríguez 1, D. Farwell 1, J. Farreres 1, M. Bertran 1, M. Alkhalifa.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
Finding High-frequent Synonyms of a Domain- specific Verb in English Sub-language of MEDLINE Abstracts Using WordNet Chun Xiao and Dietmar Rösner Institut.
The Current State of FrameNet CLFNG June 26, 2006 Fillmore.
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
MASC The Manually Annotated Sub- Corpus of American English Nancy Ide, Collin Baker, Christiane Fellbaum, Charles Fillmore, Rebecca Passonneau.
Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.
11 Chapter 19 Lexical Semantics. 2 Lexical Ambiguity Most words in natural languages have multiple possible meanings. –“pen” (noun) The dog is in the.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
WordNet Enhancements: Toward Version 2.0 WordNet Connectivity Derivational Connections Disambiguated Definitions Topical Connections.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
1 Chen Yirong, Lu Qin, Li Wenjie, Cui Gaoying Department of Computing The Hong Kong Polytechnic University Chinese Core Ontology Construction from a Bilingual.
Presented By- Shahina Ferdous, Student ID – , Spring 2010.
Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.
Supertagging CMSC Natural Language Processing January 31, 2006.
The Unreasonable Effectiveness of Data
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
1 Fine-grained and Coarse-grained Word Sense Disambiguation Jinying Chen, Hoa Trang Dang, Martha Palmer August 22, 2003.
SALSA-WS 09/05 Approximating Textual Entailment with LFG and FrameNet Frames Aljoscha Burchardt, Anette Frank Computational Linguistics Department Saarland.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Semantic Roles and Ontologies Ontologies Growing interest in the data structures known as ontologies Language expressions covering the.
Automatically Labeled Data Generation for Large Scale Event Extraction
Coarse-grained Word Sense Disambiguation
Kiril Simov1, Alexander Popov1, Iliana Simova2, Petya Osenova1
WordNet: A Lexical Database for English
WordNet WordNet, WSD.
Automatic Detection of Causal Relations for Question Answering
Lecture 19 Word Meanings II
Presentation transcript:

GermaNet-WS II A WordNet “Detour” to FrameNet Aljoscha Burchardt Katrin Erk Anette Frank* Saarland University, DFKI* Saarbrücken

GermaNet-WS II Motivation Demand for semantic information access (IE, QA,…) Available resources –Large-scale (statistical) parsing systems –WordNet(s) Modeling approximate lexical semantics High coverage –FrameNet, PropBank Modeling predicate-argument structure Limited coverage Aim: Combining methods to arrive at a high coverage, various-depth (lexical) semantic analysis

GermaNet-WS II Outline FrameNet Using Frames for NLP applications –Current architecture –Coverage problems A WordNet detour to FrameNet –First Evaluation Conclusion and Outlook

GermaNet-WS II FrameNet Frame Semantics (Fillmore 1976,...) –Frame: a conceptual structure or prototypical situation –Frame elements (roles): participants of the situation –Frame evoking elements (FEEs; verbs, nouns,…) Example instances of Statement : 1.“[He Speaker ] speaks [highly Manner ] [of you Topic ],” she said. 2.“Did [Dominic Speaker ] ever make any comments [regarding Toby Topic ] [to you Addressee ]?” Berkeley FrameNet Project –Database of frames for core lexicon of English –Current release: 615 frames, ~ 8000 lexical units (LUs)

GermaNet-WS II Saarbrücken SALSA (II) Project Manual frame-annotation of part of TIGER corpus Develop automatic methods for Frame/Role assignment Study metaphors, multi-word expressions Study frames in context Work out logical representation for heuristic inferences Funded by DFG

GermaNet-WS II Using Frames for NLP applications LFG-based parsing and syntax-semantics interface –ParGram grammars for German and English (Butt et al. 2002) –Interfaces to statistical frame and role assignment (Baldewein et al. 2004, Erk 2004) –Frame projection from f-structure (XLE transfer, Crouch 2005) Enriching Semantic Representation –Rule-based refinement of semantic representation –Autom. assignment of SUMO/MILO classes (using WordNet WSD) Logical Representation and Reasoning –FEF (frame exchange format) –Translation to logic programs (joint work with P. Baumgartner and F. Suchanek, MPI Saarbrücken) –First scenario: RTE Challenge (PASCAL Network)

GermaNet-WS II FEFViewer (by Alexander Koller)

GermaNet-WS II FEF Example F-Structure string('Jessica Litman is a law professor.'). xcomp(f(0),f(13)). tense(f(0),pres). stmt_type(f(0),declarative). pred(f(0),be). mood(f(0),indicative). dsubj(f(0),f(1)). proper(f(1),name). pred(f(1),'Litman'). num(f(1),sg). mod(f(1),f(4)). proper(f(4),name). pred(f(4),'Jessica'). num(f(4),sg). subj(f(13),f(1)). pred(f(13),professor). num(f(13),sg). mod(f(13),f(16)). det_type(f(13),indef). pred(f(16),law). num(f(16),sg). Semantics Projection frame(s(93),'Education_teaching'). rel(s(93),professor). ont(s(93),s(154)). wn_syn(s(154),'professor#n#1'). sumo_sub(s(154),'Position'). milo_syn(s(154),'Professor'). rel(s(157),law). ont(s(157),s(156)). wn_syn(s(156),'law#n#3'). sumo_sub(s(156),'Proposition'). milo_sub(s(156),'Proposition'). rel(s(166),'Jessica'). ont(s(166),s(165)). sumo_syn(s(165),'Human'). frame(s(168),'People'). person(s(168),s(168)). person(s(168),s(166)). rel(s(168),'Litman'). ont(s(168),s(167)). sumo_syn(s(167),'Human'). sslink(f(1),s(168)). sslink(f(4),s(166)). sslink(f(13),s(93)). sslink(f(16),s(157)).

GermaNet-WS II Statistical Frame Assignment -Example- “The Royal Navy servicemen being held captive by Iran are expected to be freed today.” statistical (79,83) Calendric_unit statistical (58,65) Expectation

GermaNet-WS II Statistical Frame Assignment -Issues- Learning statistical frame assignment from annotated FrameNet data –Coverage (often too few examples to learn) –Too little ambiguity Reason: frame-wise annotation E.g. have only LU of Birth 0.7% of the current 8000 LUs ambiguous at all Baseline for assigning each word its most frequent frame at 93% f-score.

GermaNet-WS II Frame Assignment via WordNet ”Detour“ Assign frame(s) on the basis of WordNet related words Addresses coverage problem Requires WSD to WordNet –SenseRelate system by Ted Pedersen et al available, alternatively –always take first (most frequent) synset

GermaNet-WS II Frame Assignment via Detour –Example- “The Royal Navy servicemen being held captive by Iran are expected to be freed today.” statistical (79,83) Calendric_unit statistical (58,65) Expectation serviceman#n#1 (16,26) People hold#v#20 (33,37) Containing captive#a#1 (38,45) Prison expect#v#1 (58,66) Expectation free#v#6 (73,78) Emitting,Use_firearm

GermaNet-WS II FN-Detour Algorithm Input: a target word (synset) 1.Use WordNet Search words = target word, synonyms, antonyms, hypernyms 2.Look up FrameNet Candidate frames = all frames that list any search word as LUs 3.Select and return best frame(s) from candidate frames

GermaNet-WS II Detour Example Step 1: WordNet Target: serviceman#n#1 serviceman, military man, man, military personnel => skilled worker, trained worker => worker => person, individual, someone, somebody, mortal, human, soul => organism, being => living thing, animate thing => object, physical object => entity => causal agent, cause, causal agency => entity

GermaNet-WS II Detour Example Step 2: Candidate Frames Word(s)Frame(s) cause Causation object Goal man, individual, person People

GermaNet-WS II Detour Example Step 3: Weights Word(s)Frame(s)Weight man, individual, person People 1.68 cause Causation 0.06 object Goal 0.03

GermaNet-WS II Weighting Factors 1.WordNet distance of FEE from target word (similarity) 2.“Spreading factor“, i.e. the number of frames a word evokes 3.Matching vs. LU lookup (boost)

GermaNet-WS II Special: Matching Frame Names E.g. Research does not (yet) list the noun researcher as LU If there is no LU for a given word, Detour system looks for matching frame names Lower weighting for match

GermaNet-WS II Matching Example Target: researcher#n#1 research worker, researcher, investigator => scientist, man of science => person, individual, someone, somebody,… => organism, being => living thing, animate thing => object, physical object => entity => causal agent, cause, causal agency => entity

GermaNet-WS II Matching Example (ctd) Target: researcher#n#1 Word(s)TypeFrame(s)Weight researcher, research worker Match Research 2 scientistLU People_by_vocation 0.38 individual, person LU People

GermaNet-WS II Evaluation Problem: no off-hand gold standard FrameNet data ( annotated instances) –All annotated words are LUs of some frame –Detour not really necessary Solution: detour-only version of our system must not look up target word

GermaNet-WS II First Evaluation Results (detour-only) Frames assigned per synset none1>1 Total instances 13%71%16% Gold standard frame contained -38%7% Table 1: Frame assignment of detour-only system (FrameNet corpus) frame instances ( verb, noun, adj./adv.)

GermaNet-WS II Inspection of Misses Gold standard frame Frames assigned by system Instances ManufacturingInvention Intentionally_ create Building Cause_to_start Getting Transformation

GermaNet-WS II Recent Evaluation Results (detour-only) Return best frame(s) condition may be too strict (ambiguity is there) Take first and second best result frame(s) –Gold standard contained +10% –Number of returned frames rises from 1,3 to 3 Does the WSD system help? –“Always take first synset” slightly better +4%

GermaNet-WS II Evaluation (full system) Coverage: 96% Gold standard in (best) result: 83% –WSD not always optimal –Ambiguity leads to a higher weighting of another frame

GermaNet-WS II Issues Just to mention: frames only (no roles) Weighting hand-crafted, improvement possible? Threshold needed (“Is there a frame that fits?”) What about German? –Access to GermaNet Available Perl packages for WordNet 2.0 WSD system as well Encoding problems (“Period of transition”) –German FrameNet data not (yet) in Berkeley format –Coverage?

GermaNet-WS II Conclusion and Outlook Detour via WordNet allows assignment of FrameNet frames in many „unknown“ cases Still: this is the beginning of a journey Web interface (link on my HP): Student project to –Prepare release –More evaluation –Learning of weighting? –Transfer to German?