Presentation is loading. Please wait.

Presentation is loading. Please wait.

WordNet WordNet, WSD.

Similar presentations


Presentation on theme: "WordNet WordNet, WSD."— Presentation transcript:

1 WordNet WordNet, WSD

2 WordNet What is WordNet?
Miller 95: “WordNet is an online lexical database designed for use under program control. English nouns, verbs, adjectives, and adverbs are organized into sets of synonyms, each representing a lexicalized concept. Semantic relations link the synonym sets.”

3 WordNet Go to the main WordNet site: http://wordnet.princeton.edu/
Open the wordnet folder on pongo: ~/dropbox/570/wordnet/dict

4 WordNet Vocabulary See glossary at: http://wordnet.princeton.edu/gloss
synset: A synonym set; a set of words that are interchangeable in some context lemma: lower case ASCII text of word as found in the WordNet database index files lexical pointer: A lexical pointer indicates a relation between words in synsets

5 Navigating WordNet files
data.* files – the actual network files (synsets) index.* files – contains lower case instances of all words in WordNet, with pointers to the synset entries in the network

6 WordNet data file See: wndb Synset file offset Synset type File number
# words in synset word n 01 performance n 0000 ~ n 0000 ~ n 0000 | any recognized accomplishment; "they admired his performance under stress“ n 01 overachievement n v 0101 ! n 0101 | better than expected performance (better than might have been predicted from intelligence tests) # pointers to other synsets Type of pointer POS Pointer See: wndb

7 Pointer symbols See: wninput For nouns:
!    Antonym @    Hypernym  ~    Hyponym #m    Member holonym #s    Substance holonym #p    Part holonym %m    Member meronym %s    Substance meronym %p    Part meronym =    Attribute +    Derivationally related form         See: wninput

8 WordNet index file lemma (word) POS # pointers pointers abomination n synset file offset # synsets

9 WordNet tools Many, many tools General documentation:
Online query and lookup: APIs and tools: WordNet::similarity: WordNet::similarity web interface:

10 WordNet and WSD Milhalcea 2002 describes system to sense encode text using WordNet (and related tools and resources)

11 Milhalcea 2002 Some tools and resources described: Senseval
Evalutation exercises for Word Sense Disambiguation Senseval-1 – 3, held in last several years, workshops at ACL Senseval-4 coming up Data and materials from Senseval-3 can be downloaded Some useful materials for multiple languages Materials and test data for English, Italian, Basque, Catalan, Chinese, Romanian, and Spanish

12 Milhalcea 2002 Some tools and resources described: Semcor
Sense tagged Brown corpus Created at Princeton Used for training WSD systems Can be downloaded from Milhalcea’s web site: We’re also planning on installing it on Pongo

13 McCarthy et al 2004 Task: find the predominant word senses in untagged text Unlike Milhalcea 2002, did not rely on supervised method using SemCor Built a thesaurus from raw text and Wordnet Intuition: word sense more likely to be determined from untagged corpus from context, affected by genre, domain or text type Rather than relying on SemCor’s 250,000 words, where the word senses are rather limited

14 McCarthy et al Thesaurus development relies on dependencies between “neighbors” Look at distributional similarities between a word and its neighbors

15 McCarthy et al Experimented with several similarity measures available in WordNet::similarity First experiment used SemCor to see how well the unsupervised system worked 2595 polysemous nouns in SemCor

16 McCarthy et al Experiment #2 against SENSEVAL-2 English All Words Data
Comparison between the precision and recall for SemCor vs. their automatic data (and the SENSEVAL ceiling)

17 McCarthy et al Some experiments with domain specific corpora gave these results:


Download ppt "WordNet WordNet, WSD."

Similar presentations


Ads by Google