The WordNet Lexical Database Bernardo Magnini ITC-irst, Istituto per la Ricerca Scientifica e Tecnologica Trento - Italy.

Slides:



Advertisements
Similar presentations
Building Wordnets Piek Vossen, Irion Technologies.
Advertisements

A Bilingual Corpus of Inter-linked Events Tommaso Caselli♠, Nancy Ide ♣, Roberto Bartolini ♠ ♠ Istituto di Linguistica Computazionale – ILC-CNR Pisa ♣
Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl.
C SC 620 Advanced Topics in Natural Language Processing Lecture 22 4/15.
Building an Ontology-based Multilingual Lexicon for Word Sense Disambiguation in Machine Translation Lian-Tze Lim & Tang Enya Kong Unit Terjemahan Melalui.
Klaus M. Frei1 WordNet „An On-line Lexical Database“ (Miller, G. A.; Beckwith, R.; Fellbaum, Chr.; Gross, D.; Miller, K. 1993, title). Based on psycho-linguistic.
The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.
Experiments on Using Semantic Distances Between Words in Image Caption Retrieval Presenter: Cosmin Adrian Bejan Alan F. Smeaton and Ian Quigley School.
June 19-21, 2006WMS'06, Chania, Crete1 Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies.
Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):
Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK
LREC 2008 AWN 1 Building WordNets: The Arabic case H. Rodríguez.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Partners Using NLP Techniques for Meaning Negotiation Bernardo Magnini, Luciano Serafini and Manuela Speranza ITC-irst, via Sommarive 18, I Trento-Povo,
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
Adam Pease and Christiane Fellbaum Presenter: 吳怡安
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
1 Natural Language Processing (2a) Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( Bridging Languages for Question Answering: DIOGENE at CLEF-2003.
A Fully Unsupervised Word Sense Disambiguation Method Using Dependency Knowledge Ping Chen University of Houston-Downtown Wei Ding University of Massachusetts-Boston.
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Jiuling Zhang  Why perform query expansion?  WordNet based Word Sense Disambiguation WordNet Word Sense Disambiguation  Conceptual Query.
ArchiWordNet Integrating WordNet with Domain-Specific Knowledge Luisa Bentivogli 1, Andrea Bocco 2, Emanuele Pianta 1 1 ITC-irst Trento, Italy 2 Politecnico.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
1 Query Operations Relevance Feedback & Query Expansion.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
WORDNET. THE WORDNET SYSTEM  Lexicographer files  Code: Lexico files  database  Search Routines and Interfaces.
A Language Independent Method for Question Classification COLING 2004.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
Application of INTEX in refinement and validation of Serbian WordNet Ivan Obradović, Ranka Stanković Cvetana Krstev, Gordana Pavlović-Lažetić University.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
WordNet: Connecting words and concepts Christiane Fellbaum Cognitive Science Laboratory Princeton University.
WordNet: Connecting words and concepts Peng.Huang.
11 Chapter 19 Lexical Semantics. 2 Lexical Ambiguity Most words in natural languages have multiple possible meanings. –“pen” (noun) The dog is in the.
Clustering Word Senses Eneko Agirre, Oier Lopez de Lacalle IxA NLP group
Wordnet - A lexical database for the English Language.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Word sense disambiguation of WordNet glosses Presenter: Chun-Ping Wu Author: Dan Moldovan, Adrian Novischi.
Using Semantic Relatedness for Word Sense Disambiguation
CLEF Kerkyra Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl.
Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Annotation Framework & ImageCLEF 2014 JAN BOTOREK, PETRA BUDÍKOVÁ
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
TUNING HIERARCHIES IN PRINCETON WORDNET AHTI LOHK | CHRISTIANE D. FELLBAUM | LEO VÕHANDU THE 8TH MEETING OF THE GLOBAL WORDNET CONFERENCE IN BUCHAREST.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
DALOS Progress Meeting – April 20th Florence The Lois data base A Knowledge Organization System for Dalos Daniela Tiscornia.
Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
SENSEVAL: Evaluating WSD Systems
Talp Research Center, UPC, Barcelona, Spain
WordNet: A Lexical Database for English
Bulgarian WordNet Svetla Koeva Institute for Bulgarian Language
WordNet WordNet, WSD.
CS 620 Class Presentation Using WordNet to Improve User Modelling in a Web Document Recommender System Using WordNet to Improve User Modelling in a Web.
A method for WSD on Unrestricted Text
C SC 620 Advanced Topics in Natural Language Processing
Presentation transcript:

The WordNet Lexical Database Bernardo Magnini ITC-irst, Istituto per la Ricerca Scientifica e Tecnologica Trento - Italy

Outline 1.WordNet: introduction 2.Extending WordNet  Languages other than English  New information  WordNet as a (linguistic) ontology 3.Using WordNet  Word sense disambiguation  Information Retrieval/ Question Answering  Semantic Web

WordNet  Electronic Lexical Database for the English language realized at Princeton University by George Miller’s team  Based on psycholinguistic theories  Several releases: from version 1.0 in 1991 to version in 2001 WordNet 2 (??)  WordNet is a public domain resource Fellbaum C. (Ed.): WordNet, an Electronic Lexical Database, MIT Press, 1998  Global WordNet Association (GWA)  Conference, workshops

Lexical Matrix Word Forms Word Meanings F 1 F 2 F 3 …F n E 1,1 E 1,2 E 2,2 E 3,3 E m,n M1M2M3…MmM1M2M3…Mm.  Mappings between word forms and meanings are many:many  F1 and F2 are synonyms  F2 is polysemous

Basic Primitives  Word forms: lexical items in a language (i.e. no artificial concepts), including collocations  Senses: a meaning of a word form  Synsets: a set of synonym senses  Relations:  Lexical: among senses  Semantic: among synsets

Lexical Relations  Synonymy  Two expression are synonymous if the substitution of one for the other does not alter the truth value of the sentence (Leibniz)  => need to partition WordNet into nouns, verbs, adjectives, and adverbs  Antonymy ex. [rich/poor] [rise/fall]  The antonym of a word x is sometimes not-x, but not always: not rich ≠> poor  Main organization principle for the adjectives

Semantic relations (1)  Hyponymy/Hyperonymy (the ISA relation) A synset {x 1, x 2, … } is an hyponym of the synset {y 1, y 2, …} if native speakers accept sentences such as An x is a (kind of) y  Transitive and asimmetrical  WordNet is a graph, even if normally synsets have a single hyperonym  Main organization principle of nouns

Semantic relations (2)  Meronymy/Holonymy (the Part-Of relation) A synset {x 1, x 2, … } is a meronym of the synset {y 1, y 2, …} if native speakers accept sentences such as An x is a part of y or A y has an x (as a part)  Meronymy is transitive and asimmetrical and can be used to construct a part hierarchy

Semantic relations (3)  Peculiar semantic relations in the verb hierarchy  Troponym: a verb expressing a specific manner elaboration of another verb (e.g. walk  move) X is a troponym of Y if to X is to Y in some manner or Y is a particular way to X  Entailment: a verb X entails Y if X cannot be done unless Y is or has been done (e.g. snore  sleep)

An Example

WordNet NOUNVERBADJADVTOTAL Words Synsets Senses Polysemy Senses/Words 1.23/ / / /2.41

SemCor  English, part of the Brown Corpus  700,000 running words, annotated with Part of Speech  200,000 words annotated with WordNet senses (and lemmas)

WordNet Extensions  Computational needs:  WordNets for languages other than English  New semantic relations  WordNet as an Ontology  Domain specific wordnets  Automatic acquisition of information  Interchange formats

Languages other than English  EuroWordNet project: monolingual wordnets are connected through an Interlingual Index (ILI) – Distributed by ELDA/ELRA  Italian, Spanish, Catalan, Basque, French, Estonian, Portuguese, Swedish, Dutch, German,  Balkanet Project: Bulgarian, Greek, Romanian, Slovenian  Danish, Hebrew  Chinese, some Indian languages  Lexical gaps

New Relations (1)  Derivation relations (Princeton – WordNet-2)  Invent  inventor (need of disambiguation)  Gloss disambiguation (Extended WordNet – Moldovan 2000)  Glosses are parsed, disambiguated and converted in a logical form  WordNet Domains (Magnini, Cavaglia, 2000) (ITC-irst)  Synsets are labeled with domains, such as Medicine, Architecture, Sport, …

WordNet Domains  Integrate taxonomic and domain oriented information  Cross hierarchy relations  doctor#2 [Medicine] --> person#1  hospital#1 [Medicine] --> location#1  Cross category relations: operate#3 [Medicine]  Cross language information

New Relations (2)  Classes versus Instances:  Bush person  Role relations for verbs:  singer song  Implicit knowledge (Peters, 2002)  Discover regular polysemy relations in WordNet: Bank#1 (an istitution) bank#2 (a building)

Automatic Acquisition  MEANING project (IST )  Topic Signatures (Aguirre, 2001)  Synset related words automatically extracted from the Web  Automatic collection of sense examples (Leacock et al. 98, Mihalcea and Moldovan 99)  Synsets Selectional Preferences (Carrol, 2001)  From the BNC corpus  WordNet Annotated corpora  Open Mind Word Expert (Mihalcea, 2002)

WordNet as an Ontology  Some relations contradict ontological principles  OntoClean approach (Guarino, 2002):  Confusion between concepts and individuals (e.g. Palestine and Trust_Territories at the same level)  Role/Type: a role cannot subsume a type (e.g. Person Causal_agent

Domain Specific WordNets  Extension of WordNet hierarchies using domain-specific document collections (Vossen, 2001) (Buitelaar, 2001) (Velardi, 2001)  Tuning of WordNet synsets (Turcato, 2000)  Merging generic and specialized wordnets (Magnini et al. 2002):  Overlaps and inconsistencies among sysnsets  Precedence rules for inheritance

Interchange Formats  XML:  Implementation independent  Easily extensible to new relations  there are at least three different versions; none of them is yet much used  Mappings among different wordnet versions:  1.5  1.6  1.6  1.7  May contain errors

Using WordNet  Large diffusion within the Natural Language Processing community  Suitable for open-domain, content-based tasks where interpretation based on lexical semantics is required  Algorithms: take advantage of the wordnet semantic relations  Issues: fine grained sense distinctions  Applicative areas: Query expansion in IR, Word Sense Disambiguation, Question Answering

Distance/Similarity Algorithms  Conceptual distance (Agirre-Rigau, 1995)  Consider the density of the taxonomy  Semantic similarity (Resnik, 1995)  The node with the higher information content connecting two nodes Sim(c1, c2) = max [-log p(c)] Where c is a node on a isa-path connecting c1 and c2 And p(c) is a probability computed considering the occurrence of c in a corpus.

Sense Distinctions  In WordNet there are sense distinctions difficult to understand  Many applications would benefit from polysemy reduction  Sense clustering methodologies:  Based on domain information  Based on aligned corpora in different languages

WordNet and Word Sense Disambiguation  As a sense repository  For the SENSEVAL competition  Manual annotated data are required for training systems based on machine learning algorithms  As an information source for knowledge-based algorithms

IR: Query Expansion  Open debate:  Semantic information is not useful (Voorhees, 1994)  WSD with performance < 90% decrease IR results (Sanderson, 1994); current WSD systems perform less then 80%  Semantic information significantly increases the IR performances (up to 30%) (Gonzalo, 1998)  Recent experiments (de Luopy, 2002) show that using synonyms and WSD (72% accuracy) in query expansion slightly (2-3%) improve performances

WordNet in Question/Answering  Answer type identification (Harabagiu, 2001: top score at TREC-QA-2000);  Answer types defined on the WordNet taxonomy  Answer extraction  Named entities recognition based on WordNet Question/answer relation discovery in passage retrieval (Pasca, 2001)

Semantic Web  Interpreting semi- structured knowledge sources  Directories, file systems, catalogues  Implicit knowledge  Linguistic analysis of labels based on WordNet

Conclusions  WordNet as a linguistic ontology  Using WordNet, as it is, in applicative tasks is not easy: “The art of using WordNet”  Extensions, such as domains, multilingual wordnets, etc., are required  Still preliminary results in IR, QA, WSD  Good news: a more and more large community is using WordNet