Download presentation
Presentation is loading. Please wait.
Published byShyann Quilter Modified over 9 years ago
1
The WordNet Lexical Database Bernardo Magnini ITC-irst, Istituto per la Ricerca Scientifica e Tecnologica Trento - Italy
2
Outline 1.WordNet: introduction 2.Extending WordNet Languages other than English New information WordNet as a (linguistic) ontology 3.Using WordNet Word sense disambiguation Information Retrieval/ Question Answering Semantic Web
3
WordNet Electronic Lexical Database for the English language realized at Princeton University by George Miller’s team Based on psycholinguistic theories Several releases: from version 1.0 in 1991 to version 1.7.1 in 2001 WordNet 2 (??) WordNet is a public domain resource http://www.cogsci.princeton.edu/~wn/ Fellbaum C. (Ed.): WordNet, an Electronic Lexical Database, MIT Press, 1998 Global WordNet Association (GWA) Conference, workshops
4
Lexical Matrix Word Forms Word Meanings F 1 F 2 F 3 …F n E 1,1 E 1,2 E 2,2 E 3,3 E m,n M1M2M3…MmM1M2M3…Mm. Mappings between word forms and meanings are many:many F1 and F2 are synonyms F2 is polysemous
5
Basic Primitives Word forms: lexical items in a language (i.e. no artificial concepts), including collocations Senses: a meaning of a word form Synsets: a set of synonym senses Relations: Lexical: among senses Semantic: among synsets
6
Lexical Relations Synonymy Two expression are synonymous if the substitution of one for the other does not alter the truth value of the sentence (Leibniz) => need to partition WordNet into nouns, verbs, adjectives, and adverbs Antonymy ex. [rich/poor] [rise/fall] The antonym of a word x is sometimes not-x, but not always: not rich ≠> poor Main organization principle for the adjectives
7
Semantic relations (1) Hyponymy/Hyperonymy (the ISA relation) A synset {x 1, x 2, … } is an hyponym of the synset {y 1, y 2, …} if native speakers accept sentences such as An x is a (kind of) y Transitive and asimmetrical WordNet is a graph, even if normally synsets have a single hyperonym Main organization principle of nouns
8
Semantic relations (2) Meronymy/Holonymy (the Part-Of relation) A synset {x 1, x 2, … } is a meronym of the synset {y 1, y 2, …} if native speakers accept sentences such as An x is a part of y or A y has an x (as a part) Meronymy is transitive and asimmetrical and can be used to construct a part hierarchy
9
Semantic relations (3) Peculiar semantic relations in the verb hierarchy Troponym: a verb expressing a specific manner elaboration of another verb (e.g. walk move) X is a troponym of Y if to X is to Y in some manner or Y is a particular way to X Entailment: a verb X entails Y if X cannot be done unless Y is or has been done (e.g. snore sleep)
10
An Example
11
WordNet 1.7.1 NOUNVERBADJADVTOTAL Words 10919511088214604607146350 Synsets 7580413214185763629111223 Senses 13471624169311845748195817 Polysemy Senses/Words 1.23/2.752.17/3.521.45/2.761.24/2.41
12
SemCor English, part of the Brown Corpus 700,000 running words, annotated with Part of Speech 200,000 words annotated with WordNet senses (and lemmas)
13
WordNet Extensions Computational needs: WordNets for languages other than English New semantic relations WordNet as an Ontology Domain specific wordnets Automatic acquisition of information Interchange formats
14
Languages other than English EuroWordNet project: monolingual wordnets are connected through an Interlingual Index (ILI) – Distributed by ELDA/ELRA Italian, Spanish, Catalan, Basque, French, Estonian, Portuguese, Swedish, Dutch, German, Balkanet Project: Bulgarian, Greek, Romanian, Slovenian Danish, Hebrew Chinese, some Indian languages Lexical gaps
15
New Relations (1) Derivation relations (Princeton – WordNet-2) Invent inventor (need of disambiguation) Gloss disambiguation (Extended WordNet – Moldovan 2000) Glosses are parsed, disambiguated and converted in a logical form WordNet Domains (Magnini, Cavaglia, 2000) (ITC-irst) Synsets are labeled with domains, such as Medicine, Architecture, Sport, …
16
WordNet Domains Integrate taxonomic and domain oriented information Cross hierarchy relations doctor#2 [Medicine] --> person#1 hospital#1 [Medicine] --> location#1 Cross category relations: operate#3 [Medicine] Cross language information
17
New Relations (2) Classes versus Instances: Bush person Role relations for verbs: singer song Implicit knowledge (Peters, 2002) Discover regular polysemy relations in WordNet: Bank#1 (an istitution) bank#2 (a building)
18
Automatic Acquisition MEANING project (IST-2001-34460) Topic Signatures (Aguirre, 2001) Synset related words automatically extracted from the Web Automatic collection of sense examples (Leacock et al. 98, Mihalcea and Moldovan 99) Synsets Selectional Preferences (Carrol, 2001) From the BNC corpus WordNet Annotated corpora Open Mind Word Expert (Mihalcea, 2002)
19
WordNet as an Ontology Some relations contradict ontological principles OntoClean approach (Guarino, 2002): Confusion between concepts and individuals (e.g. Palestine and Trust_Territories at the same level) Role/Type: a role cannot subsume a type (e.g. Person Causal_agent
20
Domain Specific WordNets Extension of WordNet hierarchies using domain-specific document collections (Vossen, 2001) (Buitelaar, 2001) (Velardi, 2001) Tuning of WordNet synsets (Turcato, 2000) Merging generic and specialized wordnets (Magnini et al. 2002): Overlaps and inconsistencies among sysnsets Precedence rules for inheritance
21
Interchange Formats XML: Implementation independent Easily extensible to new relations there are at least three different versions; none of them is yet much used Mappings among different wordnet versions: 1.5 1.6 1.6 1.7 May contain errors
22
Using WordNet Large diffusion within the Natural Language Processing community Suitable for open-domain, content-based tasks where interpretation based on lexical semantics is required Algorithms: take advantage of the wordnet semantic relations Issues: fine grained sense distinctions Applicative areas: Query expansion in IR, Word Sense Disambiguation, Question Answering
23
Distance/Similarity Algorithms Conceptual distance (Agirre-Rigau, 1995) Consider the density of the taxonomy Semantic similarity (Resnik, 1995) The node with the higher information content connecting two nodes Sim(c1, c2) = max [-log p(c)] Where c is a node on a isa-path connecting c1 and c2 And p(c) is a probability computed considering the occurrence of c in a corpus.
24
Sense Distinctions In WordNet there are sense distinctions difficult to understand Many applications would benefit from polysemy reduction Sense clustering methodologies: Based on domain information Based on aligned corpora in different languages
25
WordNet and Word Sense Disambiguation As a sense repository For the SENSEVAL competition Manual annotated data are required for training systems based on machine learning algorithms As an information source for knowledge-based algorithms
26
IR: Query Expansion Open debate: Semantic information is not useful (Voorhees, 1994) WSD with performance < 90% decrease IR results (Sanderson, 1994); current WSD systems perform less then 80% Semantic information significantly increases the IR performances (up to 30%) (Gonzalo, 1998) Recent experiments (de Luopy, 2002) show that using synonyms and WSD (72% accuracy) in query expansion slightly (2-3%) improve performances
27
WordNet in Question/Answering Answer type identification (Harabagiu, 2001: top score at TREC-QA-2000); Answer types defined on the WordNet taxonomy Answer extraction Named entities recognition based on WordNet Question/answer relation discovery in passage retrieval (Pasca, 2001)
28
Semantic Web Interpreting semi- structured knowledge sources Directories, file systems, catalogues Implicit knowledge Linguistic analysis of labels based on WordNet
29
Conclusions WordNet as a linguistic ontology Using WordNet, as it is, in applicative tasks is not easy: “The art of using WordNet” Extensions, such as domains, multilingual wordnets, etc., are required Still preliminary results in IR, QA, WSD Good news: a more and more large community is using WordNet
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.