Kiril Simov1, Alexander Popov1, Iliana Simova2, Petya Osenova1

Slides:

Advertisements

Similar presentations

Semi-automatic compound nouns annotation for data integration systems Tuesday, 23 June 2009 SEBD 2009 Sonia Bergamaschi Serena Sorrentino

Advertisements

Semantics (Representing Meaning)

A Robust Approach to Aligning Heterogeneous Lexical Resources Mohammad Taher Pilehvar Roberto Navigli MultiJEDI ERC

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.

An Evaluation Procedure for Word Net Based Lexical Chaining: Methods and Issues Irene Cramer & Marc Finthammer Faculty of Cultural.

Creating a Similarity Graph from WordNet

Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.

GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.

Collective Word Sense Disambiguation David Vickrey Ben Taskar Daphne Koller.

Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam

A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,

1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.

WordNetPlus C. Fellbaum, D. Osherson, R. Schapire, M. Charikar, C. Basu and 24 Princeton Undergraduates Funded by NSF/IIS.

Taking the Kitchen Sink Seriously: An Ensemble Approach to Word Sense Disambiguation from Christopher Manning et al.

Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.

From Semantic Similarity to Semantic Relations Georgeta Bordea, November 25 Based on a talk by Alessandro Lenci titled “Will DS ever become Semantic?”,

Word sense induction using continuous vector space models

Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.

Semantic Matching Pavel Shvaiko Stanford University, October 31, 2003 Paper with Fausto Giunchiglia Research group (alphabetically ordered): Fausto Giunchiglia,

Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.

“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.

Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.

Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.

WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.

SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.

2014 EMNLP Xinxiong Chen, Zhiyuan Liu, Maosong Sun State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information.

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.

Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.

Detecting compositionality using semantic vector space models based on syntactic context Guillermo Garrido and Anselmo Peñas NLP & IR Group at UNED Madrid,

1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )

1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,

Element Level Semantic Matching Pavel Shvaiko Meaning Coordination and Negotiation Workshop, ISWC 8 th November 2004, Hiroshima, Japan Paper by Fausto.

Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.

SALSA-WS 09/05 Approximating Textual Entailment with LFG and FrameNet Frames Aljoscha Burchardt, Anette Frank Computational Linguistics Department Saarland.

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Second Language Learning From News Websites Word Sense Disambiguation using Word Embeddings.

Experiences of (Lexicographers and) Computer Scientists in Validating Estonian Wordnet with Test Patterns Ahti Lohk | Kadri Vare | Heili Orav | Leo Võhandu.

NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.

Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.

The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.

Learning Relational Dependency Networks for Relation Extraction

Kim Schouten, Flavius Frasincar, and Rommert Dekker

Concept Grounding to Multiple Knowledge Bases via Indirect Supervision

Presenter: Jia-Kuan Lin Advisor: Chung-Hsien Wu

Coarse-grained Word Sense Disambiguation

Semantics (Representing Meaning)

Exploring and Navigating: Tools for GermaNet

Element Level Semantic Matching

Distributed Representations of Words and Phrases and their Compositionality Presenter: Haotian Xu.

Vector-Space (Distributional) Lexical Semantics

CSC 594 Topics in AI – Applied Natural Language Processing

Applying Key Phrase Extraction to aid Invalidity Search

Category-Based Pseudowords

Donna M. Gates Carnegie Mellon University

Topics in Linguistics ENG 331

Learning Emoji Embeddings Using Emoji Co-Occurrence Network Graph

WordNet WordNet, WSD.

A method for WSD on Unrestricted Text

Review-Level Aspect-Based Sentiment Analysis Using an Ontology

Sadov M. A. , NRU HSE, Moscow, Russia Kutuzov A. B

Resource Recommendation for AAN

臺灣大學資訊工程學系高紹航臺灣大學外國語文學系高照明

Enriching Taxonomies With Functional Domain Knowledge

Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou

Unsupervised Word Sense Disambiguation Using Lesk algorithm

Unsupervised Learning of Narrative Schemas and their Participants

Vector Representation of Text

Presentation transcript:

Grammatical Role Embeddings for Enhancements of Relation Density in the Princeton WordNet Kiril Simov1, Alexander Popov1, Iliana Simova2, Petya Osenova1 1DemoSem Project, IICT-BAS, Bulgaria 2Saarland University, Saarbrücken, Germany Workshop on Wordnets and Word Embeddings The 9th Global WordNet Conference 11 January 2018 WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018 1

WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018 Outline Knowledge-based WSD Motivation Word Embeddings for Grammatical Roles Experiments and Results Conclusion Future Work WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018 2

UKB: Graph Based Word Sense Disambiguation and Similarity Knowledge-based approach to word sense disambiguation; no supervision in the form of a manually annotated corpus needed Personalized PageRank algorithm http://ixa2.si.ehu.es/ukb Knowledge Graph over WordNet u:03038685-n (classroom) v:04146050-n (school) s:30 d:0 WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018 3

WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018 4

Knowledge Graph Extension We performed several extensions of the Knowledge Graph with additional arcs: Domain relations from WordNet Inferred hypernymy relations Syntactic relations from gold corpora Extended syntactic relations WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018 5

Knowledge Graph Extensions: WordNet Hierarchies professional investigate professor doctor research surgeon consult WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018 6

Knowledge Graph Extensions – Inheritance professional investigate professor doctor research surgeon consult WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018 7

Knowledge Graph Extensions – Syntax professional investigate professor doctor research surgeon consult WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018 8

Knowledge Graph Extensions – Syntax  professional investigate professor doctor research surgeon consult WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018 9

Knowledge Graph Extensions – Syntax  professional investigate professor doctor research surgeon consult WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018 10

Knowledge Graph Extensions – Syntax V professional investigate professor doctor research surgeon consult WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018 11

Results – Syntax and Inference Graph Accuracy WN 0.517 WNG 0.538 WNI 0.535 WNGI 0.537 WNGIS 0.565 WNGISE 0.616 WNGISEU 0.656 WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018

Knowledge Graph Extensions – Motivation 1 The inheritance in WordNet is not monotonic From “A doctor operates a patient” the inheritance is not acceptable for all hyponyms From “A surgeon cures a patient” the inheritance is acceptable for all hyponyms, but it is also acceptable for concepts that are not hyponyms of “surgeon” Thus, a mechanism for evaluation of a suggested relation is necessary Our suggestion is: To use a vector representation of features for nouns and for the grammatical roles of verbs WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018 13

Logical Form – Motivation 2 Example from MRS “Every dog chases some white cat.” <h0, {h1:every(x,h2,h3), h2:dog(x), h4:chase(e, x, y), h5:some(y,h6,h7), h6:white(y), h6:cat(y)}, {}> The embeddings for the different word have to ‘agree’ on the corresponding variables for the different arguments WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018

Using Word Embeddings for KWSD How to use Word Embedding for adding new knowledge to the Knowledge Graph? Through adding new semantic relations When generating candidate syntagmatic semantic relations: Noun  Verb relations where the Noun denotes a participant in the event represented by the Verb Generation is done by constructing combinations of a selected verb with all the nouns in the WordNet WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018

Word Embedding for Grammatical Roles: Steps Select a syntactically annotated corpus, where Subject, Direct Object and Indirect Object represent the core participants in the event Substitute the specific words with pseudo words The man saw the boy with the telescope SUBJ_see see DOBJ_see with IOBJ_see A pair N-subj_of-V is good if N and SUBJ_V are close to each other Thus we need compatible embeddings for both: the nouns and the pseudo words for the grammatical roles WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018 16

Corpus Preparation (RTC) WaCkypedia_EN corpus (Baroni et al., 2009) It was reparsed by Stanford CoreNLP dependency parser (collapsed-cc) the dog runs and barks nsubj(dog, runs) and nsubj(dog, barks) Substitution with pseudo words SUBJ_run run and SUBJ_bark bark Dependency relations: ‘nsubj’, ‘nsubjpass’, ‘dobj’, and ‘iobj’ WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018 17

Pseudo Corpus over WordNet (PCWN) However, a lot of nouns are not in RTC In order to have their embeddings, we add PCWN to RTC PCWN is generated via UKB over WN with added syntagmatic relations from existing resources (Goikoetxea et al., 2015) goldbrick dupery take_in gull dupe person laugh_at WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018

WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018 Synset Embeddings In order to construct semantic relations we need to evaluate N-V where N and V are Synsets, not words Direct embeddings of Synsets depend on the semantically annotated corpus Or in other words, an embedding for a Synset is the average of the embeddings of the lemmas in the Synset We performed three sets of experiments: Embeddings of lemmas Embeddings of lemma-POS pairs Embeddings of Synsets WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018 19

WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018 Embeddings Training RTC was used several times with different combinations of pseudo words The man saw the boy with the telescope SUBJ_see see DOBJ_see with IOBJ_see SUBJ_see see DOBJ_see with the telescope SUBJ_see see the boy with IOBJ_see We used the Word2Vec tool (skip-gram model) Settings: context window of 5 words; 7 iterations; negative examples set to 5; and frequency cut sampling set to 7 WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018

WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018 Results Text-Only WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018 21

Results Text + Pseudo Corpus WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018 22

Results Text + Pseudo Corpus Reduced WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018 23

WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018 Manual Evaluation The first 100 top-ranked subject and direct object relations were manually evaluated The scale: good, acceptable, and bad WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018

WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018 Conclusions We proposed a mechanisim for learning of embeddings for grammatical roles of verbs in WordNet These embeddings were used to rank potential relations between verb synsets and noun synsets Two evaluations were done – KWSD and manual The evaluations showed that highly ranked candidates are valuable The approach might be applied to other relations WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018 25

WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018 Future Work We envisage to: Learn representations of more types of arguments Experiment with different algorithms for learning the embeddings over different contexts Improve corpus annotation Evaluate the grammatical role embeddings in other tasks: “If the baby does not thrive on raw milk, boil it.” (Jespersen, 1949), NNWSD Use techniques similar to retrofitting to improve the results Application to WordNet relations WWE 2018 at GWC 2018, NTU, Singapore, 11.01.2018