WordNet WordNet, WSD.

Slides:



Advertisements
Similar presentations
The WordNet Lexical Database Bernardo Magnini ITC-irst, Istituto per la Ricerca Scientifica e Tecnologica Trento - Italy.
Advertisements

How dominant is the commonest sense of a word? Adam Kilgarriff Lexicography MasterClass Univ of Brighton.
1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth.
Lexical Semantics and Word Senses Hongning Wang
Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl.
LING 581: Advanced Computational Linguistics Lecture Notes April 6th.
NATURAL LANGUAGE TOOLKIT(NLTK) April Corbet. Overview 1. What is NLTK? 2. NLTK Basic Functionalities 3. Part of Speech Tagging 4. Chunking and Trees 5.
ImageNet: A Large-Scale Hierarchical Image Database
Course G Web Search Engines 3/9/2011 Wei Xu
LING 581: Advanced Computational Linguistics Lecture Notes April 12th.
Antonym Creation Tool Presented By Thapar University WordNet Development Team.
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Carmen Banea, Rada Mihalcea University of North Texas A Bootstrapping Method for Building Subjectivity Lexicons for Languages.
A Fully Unsupervised Word Sense Disambiguation Method Using Dependency Knowledge Ping Chen University of Houston-Downtown Wei Ding University of Massachusetts-Boston.
Wordnet, Raw Text Pinker, continuing Chapter 2
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
A Survey of NLP Toolkits Jing Jiang Mar 8, /08/20072 Outline WordNet Statistics-based phrases POS taggers Parsers Chunkers (syntax-based phrases)
Jiuling Zhang  Why perform query expansion?  WordNet based Word Sense Disambiguation WordNet Word Sense Disambiguation  Conceptual Query.
WordNet ® and its Java API ♦ Introduction to WordNet ♦ WordNet API for Java Name: Hao Li Uni: hl2489.
Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Word Sense Disambiguation UIUC - 06/10/2004 Word Sense Disambiguation Another NLP working problem for learning with constraints… Lluís Màrquez TALP, LSI,
1 Query Operations Relevance Feedback & Query Expansion.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
WORDNET. THE WORDNET SYSTEM  Lexicographer files  Code: Lexico files  database  Search Routines and Interfaces.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
WordNet: Connecting words and concepts Christiane Fellbaum Cognitive Science Laboratory Princeton University.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Page 1 SenDiS Sectoral Operational Programme "Increase of Economic Competitiveness" "Investments for your future" Project co-financed by the European Regional.
WordNet: Connecting words and concepts Peng.Huang.
Wordnet - A lexical database for the English Language.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Word sense disambiguation of WordNet glosses Presenter: Chun-Ping Wu Author: Dan Moldovan, Adrian Novischi.
CLEF Kerkyra Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl.
An Applied Ontological Approach to Computational Semantics Sam Zhang.
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
Finding Predominant Word Senses in Untagged Text Diana McCarthy & Rob Koeling & Julie Weeds & Carroll Department of Indormatics, University of Sussex {dianam,
Lexical Semantics and Word Senses Hongning Wang
Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
KYOTO (ICT ) Knowledge Yielding Ontologies for Transition-Based Organization Intelligent Content and Semantics The First KYOTO Workshop February.
Sampath Jayarathna Cal Poly Pomona
Automatic Writing Evaluation
Lexicons, Concept Networks, and Ontologies
SENSEVAL: Evaluating WSD Systems
Talp Research Center, UPC, Barcelona, Spain
Kiril Simov1, Alexander Popov1, Iliana Simova2, Petya Osenova1
ArtsSemNet: From Bilingual Dictionary To Bilingual Semantic Network
CS 388: Natural Language Processing: Word Sense Disambiguation
Comparing Two Thesaurus Representations for Russian
LING 581: Advanced Computational Linguistics
Text Analytics Giuseppe Attardi Università di Pisa
WordNet: A Lexical Database for English
Bulgarian WordNet Svetla Koeva Institute for Bulgarian Language
Center for Natural Language Processing School of Information Studies
CS 620 Class Presentation Using WordNet to Improve User Modelling in a Web Document Recommender System Using WordNet to Improve User Modelling in a Web.
A method for WSD on Unrestricted Text
Relevance Feedback & Query Expansion
Text Mining & Natural Language Processing
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Lecture 19 Word Meanings II
CS224N Section 3: Corpora, etc.
Text Mining Application Programming Chapter 3 Explore Text
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Unsupervised Word Sense Disambiguation Using Lesk algorithm
CS224N Section 3: Project,Corpora
Dynamic Word Sense Disambiguation with Semantic Similarity
Presentation transcript:

WordNet WordNet, WSD

WordNet What is WordNet? Miller 95: “WordNet is an online lexical database designed for use under program control. English nouns, verbs, adjectives, and adverbs are organized into sets of synonyms, each representing a lexicalized concept. Semantic relations link the synonym sets.”

WordNet Go to the main WordNet site: http://wordnet.princeton.edu/ Open the wordnet folder on pongo: ~/dropbox/570/wordnet/dict

WordNet Vocabulary See glossary at: http://wordnet.princeton.edu/gloss synset: A synonym set; a set of words that are interchangeable in some context lemma: lower case ASCII text of word as found in the WordNet database index files lexical pointer: A lexical pointer indicates a relation between words in synsets

Navigating WordNet files data.* files – the actual network files (synsets) index.* files – contains lower case instances of all words in WordNet, with pointers to the synset entries in the network

WordNet data file See: wndb Synset file offset Synset type File number # words in synset word 00045430 04 n 01 performance 3 003 @ 00033580 n 0000 ~ 00045680 n 0000 ~ 00045874 n 0000 | any recognized accomplishment; "they admired his performance under stress“ 00045680 04 n 01 overachievement 0 003 @ 00045430 n 0000 + 02537922 v 0101 ! 00045874 n 0101 | better than expected performance (better than might have been predicted from intelligence tests) # pointers to other synsets Type of pointer POS Pointer See: wndb

Pointer symbols See: wninput For nouns: !    Antonym @    Hypernym  ~    Hyponym #m    Member holonym #s    Substance holonym #p    Part holonym %m    Member meronym %s    Substance meronym %p    Part meronym =    Attribute +    Derivationally related form         See: wninput

WordNet index file lemma (word) POS # pointers pointers abomination n 3 2 @ + 3 0 09613960 07401317 00734041 synset file offset # synsets

WordNet tools Many, many tools General documentation: http://wordnet.princeton.edu/doc Online query and lookup: http://wordnet.princeton.edu/perl/webwn APIs and tools: http://wordnet.princeton.edu/links WordNet::similarity: http://wn-similarity.sourceforge.net/ WordNet::similarity web interface: http://marimba.d.umn.edu/cgi-bin/similarity/similarity.cgi

WordNet and WSD Milhalcea 2002 describes system to sense encode text using WordNet (and related tools and resources)

Milhalcea 2002 Some tools and resources described: Senseval http://www.senseval.org/ Evalutation exercises for Word Sense Disambiguation Senseval-1 – 3, held in last several years, workshops at ACL Senseval-4 coming up Data and materials from Senseval-3 can be downloaded Some useful materials for multiple languages Materials and test data for English, Italian, Basque, Catalan, Chinese, Romanian, and Spanish

Milhalcea 2002 Some tools and resources described: Semcor Sense tagged Brown corpus Created at Princeton Used for training WSD systems Can be downloaded from Milhalcea’s web site: http://www.cs.unt.edu/~rada/downloads.html We’re also planning on installing it on Pongo

McCarthy et al 2004 Task: find the predominant word senses in untagged text Unlike Milhalcea 2002, did not rely on supervised method using SemCor Built a thesaurus from raw text and Wordnet Intuition: word sense more likely to be determined from untagged corpus from context, affected by genre, domain or text type Rather than relying on SemCor’s 250,000 words, where the word senses are rather limited

McCarthy et al Thesaurus development relies on dependencies between “neighbors” Look at distributional similarities between a word and its neighbors

McCarthy et al Experimented with several similarity measures available in WordNet::similarity First experiment used SemCor to see how well the unsupervised system worked 2595 polysemous nouns in SemCor

McCarthy et al Experiment #2 against SENSEVAL-2 English All Words Data Comparison between the precision and recall for SemCor vs. their automatic data (and the SENSEVAL ceiling)

McCarthy et al Some experiments with domain specific corpora gave these results: