Presentation is loading. Please wait.

Presentation is loading. Please wait.

The WordNet Lexical Database Bernardo Magnini ITC-irst, Istituto per la Ricerca Scientifica e Tecnologica Trento - Italy.

Similar presentations

Presentation on theme: "The WordNet Lexical Database Bernardo Magnini ITC-irst, Istituto per la Ricerca Scientifica e Tecnologica Trento - Italy."— Presentation transcript:

1 The WordNet Lexical Database Bernardo Magnini ITC-irst, Istituto per la Ricerca Scientifica e Tecnologica Trento - Italy

2 Outline 1.WordNet: introduction 2.Extending WordNet  Languages other than English  New information  WordNet as a (linguistic) ontology 3.Using WordNet  Word sense disambiguation  Information Retrieval/ Question Answering  Semantic Web

3 WordNet  Electronic Lexical Database for the English language realized at Princeton University by George Miller’s team  Based on psycholinguistic theories  Several releases: from version 1.0 in 1991 to version 1.7.1 in 2001 WordNet 2 (??)  WordNet is a public domain resource Fellbaum C. (Ed.): WordNet, an Electronic Lexical Database, MIT Press, 1998  Global WordNet Association (GWA)  Conference, workshops

4 Lexical Matrix Word Forms Word Meanings F 1 F 2 F 3 …F n E 1,1 E 1,2 E 2,2 E 3,3 E m,n M1M2M3…MmM1M2M3…Mm.  Mappings between word forms and meanings are many:many  F1 and F2 are synonyms  F2 is polysemous

5 Basic Primitives  Word forms: lexical items in a language (i.e. no artificial concepts), including collocations  Senses: a meaning of a word form  Synsets: a set of synonym senses  Relations:  Lexical: among senses  Semantic: among synsets

6 Lexical Relations  Synonymy  Two expression are synonymous if the substitution of one for the other does not alter the truth value of the sentence (Leibniz)  => need to partition WordNet into nouns, verbs, adjectives, and adverbs  Antonymy ex. [rich/poor] [rise/fall]  The antonym of a word x is sometimes not-x, but not always: not rich ≠> poor  Main organization principle for the adjectives

7 Semantic relations (1)  Hyponymy/Hyperonymy (the ISA relation) A synset {x 1, x 2, … } is an hyponym of the synset {y 1, y 2, …} if native speakers accept sentences such as An x is a (kind of) y  Transitive and asimmetrical  WordNet is a graph, even if normally synsets have a single hyperonym  Main organization principle of nouns

8 Semantic relations (2)  Meronymy/Holonymy (the Part-Of relation) A synset {x 1, x 2, … } is a meronym of the synset {y 1, y 2, …} if native speakers accept sentences such as An x is a part of y or A y has an x (as a part)  Meronymy is transitive and asimmetrical and can be used to construct a part hierarchy

9 Semantic relations (3)  Peculiar semantic relations in the verb hierarchy  Troponym: a verb expressing a specific manner elaboration of another verb (e.g. walk  move) X is a troponym of Y if to X is to Y in some manner or Y is a particular way to X  Entailment: a verb X entails Y if X cannot be done unless Y is or has been done (e.g. snore  sleep)

10 An Example

11 WordNet 1.7.1 NOUNVERBADJADVTOTAL Words 10919511088214604607146350 Synsets 7580413214185763629111223 Senses 13471624169311845748195817 Polysemy Senses/Words 1.23/2.752.17/3.521.45/2.761.24/2.41

12 SemCor  English, part of the Brown Corpus  700,000 running words, annotated with Part of Speech  200,000 words annotated with WordNet senses (and lemmas)

13 WordNet Extensions  Computational needs:  WordNets for languages other than English  New semantic relations  WordNet as an Ontology  Domain specific wordnets  Automatic acquisition of information  Interchange formats

14 Languages other than English  EuroWordNet project: monolingual wordnets are connected through an Interlingual Index (ILI) – Distributed by ELDA/ELRA  Italian, Spanish, Catalan, Basque, French, Estonian, Portuguese, Swedish, Dutch, German,  Balkanet Project: Bulgarian, Greek, Romanian, Slovenian  Danish, Hebrew  Chinese, some Indian languages  Lexical gaps

15 New Relations (1)  Derivation relations (Princeton – WordNet-2)  Invent  inventor (need of disambiguation)  Gloss disambiguation (Extended WordNet – Moldovan 2000)  Glosses are parsed, disambiguated and converted in a logical form  WordNet Domains (Magnini, Cavaglia, 2000) (ITC-irst)  Synsets are labeled with domains, such as Medicine, Architecture, Sport, …

16 WordNet Domains  Integrate taxonomic and domain oriented information  Cross hierarchy relations  doctor#2 [Medicine] --> person#1  hospital#1 [Medicine] --> location#1  Cross category relations: operate#3 [Medicine]  Cross language information

17 New Relations (2)  Classes versus Instances:  Bush person  Role relations for verbs:  singer song  Implicit knowledge (Peters, 2002)  Discover regular polysemy relations in WordNet: Bank#1 (an istitution) bank#2 (a building)

18 Automatic Acquisition  MEANING project (IST-2001-34460)  Topic Signatures (Aguirre, 2001)  Synset related words automatically extracted from the Web  Automatic collection of sense examples (Leacock et al. 98, Mihalcea and Moldovan 99)  Synsets Selectional Preferences (Carrol, 2001)  From the BNC corpus  WordNet Annotated corpora  Open Mind Word Expert (Mihalcea, 2002)

19 WordNet as an Ontology  Some relations contradict ontological principles  OntoClean approach (Guarino, 2002):  Confusion between concepts and individuals (e.g. Palestine and Trust_Territories at the same level)  Role/Type: a role cannot subsume a type (e.g. Person Causal_agent

20 Domain Specific WordNets  Extension of WordNet hierarchies using domain-specific document collections (Vossen, 2001) (Buitelaar, 2001) (Velardi, 2001)  Tuning of WordNet synsets (Turcato, 2000)  Merging generic and specialized wordnets (Magnini et al. 2002):  Overlaps and inconsistencies among sysnsets  Precedence rules for inheritance

21 Interchange Formats  XML:  Implementation independent  Easily extensible to new relations  there are at least three different versions; none of them is yet much used  Mappings among different wordnet versions:  1.5  1.6  1.6  1.7  May contain errors

22 Using WordNet  Large diffusion within the Natural Language Processing community  Suitable for open-domain, content-based tasks where interpretation based on lexical semantics is required  Algorithms: take advantage of the wordnet semantic relations  Issues: fine grained sense distinctions  Applicative areas: Query expansion in IR, Word Sense Disambiguation, Question Answering

23 Distance/Similarity Algorithms  Conceptual distance (Agirre-Rigau, 1995)  Consider the density of the taxonomy  Semantic similarity (Resnik, 1995)  The node with the higher information content connecting two nodes Sim(c1, c2) = max [-log p(c)] Where c is a node on a isa-path connecting c1 and c2 And p(c) is a probability computed considering the occurrence of c in a corpus.

24 Sense Distinctions  In WordNet there are sense distinctions difficult to understand  Many applications would benefit from polysemy reduction  Sense clustering methodologies:  Based on domain information  Based on aligned corpora in different languages

25 WordNet and Word Sense Disambiguation  As a sense repository  For the SENSEVAL competition  Manual annotated data are required for training systems based on machine learning algorithms  As an information source for knowledge-based algorithms

26 IR: Query Expansion  Open debate:  Semantic information is not useful (Voorhees, 1994)  WSD with performance < 90% decrease IR results (Sanderson, 1994); current WSD systems perform less then 80%  Semantic information significantly increases the IR performances (up to 30%) (Gonzalo, 1998)  Recent experiments (de Luopy, 2002) show that using synonyms and WSD (72% accuracy) in query expansion slightly (2-3%) improve performances

27 WordNet in Question/Answering  Answer type identification (Harabagiu, 2001: top score at TREC-QA-2000);  Answer types defined on the WordNet taxonomy  Answer extraction  Named entities recognition based on WordNet Question/answer relation discovery in passage retrieval (Pasca, 2001)

28 Semantic Web  Interpreting semi- structured knowledge sources  Directories, file systems, catalogues  Implicit knowledge  Linguistic analysis of labels based on WordNet

29 Conclusions  WordNet as a linguistic ontology  Using WordNet, as it is, in applicative tasks is not easy: “The art of using WordNet”  Extensions, such as domains, multilingual wordnets, etc., are required  Still preliminary results in IR, QA, WSD  Good news: a more and more large community is using WordNet

Download ppt "The WordNet Lexical Database Bernardo Magnini ITC-irst, Istituto per la Ricerca Scientifica e Tecnologica Trento - Italy."

Similar presentations

Ads by Google