Presentation is loading. Please wait.

Presentation is loading. Please wait.

Extending Princeton WordNet withcompositional semantics Luchezar Jackov Institute for Bulgarian Language Bulgarian Academy of Sciences.

Similar presentations


Presentation on theme: "Extending Princeton WordNet withcompositional semantics Luchezar Jackov Institute for Bulgarian Language Bulgarian Academy of Sciences."— Presentation transcript:

1 Extending Princeton WordNet withcompositional semantics Luchezar Jackov Institute for Bulgarian Language Bulgarian Academy of Sciences

2 Introduction Princeton WordNet (PWN) is probably the most popular and widely used resource for semantic analysis. When dealing with multi-language WordNets, one of the greatest problems is the lexical and conceptual gap of the original PWN synsetswhen mapped to other languages. The approach being presented here has been motivated by the creation of an universal dictionary based on PWN synsets as a part of a syntactic and semantic analysis system.

3 Motivation Lack of appropriate synsets for a number of derivative phenomena, for instance prefixation in Slavic languages and modality expressed by suffixation in Turkish BulgarianEnglish za-peya start singing o-po-znavamget to know TurkishEnglish yapabilmekcan do yapmalımakmust do

4 Motivation (2) Lack of synonym sets (synsets) for various words and collocations BulgarianEnglish komandirovkabusiness trip ribarnikhatchery pool kola na starosecondhand car lyata dzhantaalloy rim

5 A few words on WordNet data structure WordNet is based on synonym sets (synsets). Each synset has a gloss (definition) and a synset identifier. For instance: 102958343(a motor vehicle with four wheels; usually propelled by an internal combustion engine; "he needs a car to get to work")

6 A few words on WordNet data structure (2) All lexicalisations for a given synset have a synset identifier which “binds” the lexicalisation to the synset. For instance: 102958343car 102958343auto 102958343automobile 102958343machine 102958343motorcar

7 A few words on WordNet data structure (2) Various relations are defined over the synset identifiers, e.g. hyponymy/hypernymy: For instance: is_hyponym (102958343 [car], 103791235 [motor vehicle]) ([motor vehicle], [self-propelled vehicle]) ([self-propelled vehicle], [wheeled vehicle]) ([wheeled vehicle], [vehicle]) The synset identifiers above are omitted for brevity.

8 How WordNet lexical gaps are currently addressed The current approach to addressing the lexical gaps in WordNet is to: - define a new synonym set with an identifier that will not cause conflicts with PWN identifiers, i.e. the identifiers must be unique; - formulate a gloss (definition) for the synonym set; - define lexicalisations for the synonym set; - define at least hyponymy/hyperonymy relation for the synonym set.

9 Issues with the current approach - Explicit formulation of the gloss - Explicit definition of the hyponymy/hyperonymy relation - Imposibility to easily create inter-language index between WordNets of different languages

10 What can be done Concepts lacking synonym sets in the original PWN sense inventory can be constructed using compositional semantics over the original PWN synsets. This can be done using binary semantic relations such as attribution, modality, phase, etc. For instance: - ”alloy rim” is an attribution of ”alloy” to ”rim”; - ”business trip” is an attribution of ”business” to ”trip”; - ”start singing” is a relation between the phase verb “start” and “sing”; - “must do” is a relation between the modal verb “must” and “do”;

11 The approach being proposed Select a number of relations that can compositionally define synonym sets: - attributive relation: attrib([alloy], [rim]) attrib([secondhand], [car]) - phase verb relation: phase([begin], [sing]) - modal verb relation: modal([must], [do]) modal([can], [do])

12 The approach being proposed (2) Assign an identifier to each relation instance and use it as a synset identifier: 5000000001 attrib([alloy], [rim]) 5000000002 attrib([secondhand], [car]) 5000000003 phase([begin], [sing]) 5000000004 modal([must], [do]) 5000000005 modal([can], [do])

13 The approach being proposed (3) Define lexicalisations for the new synsets just like with the current approach: 5000000001 lyata dzhanta 5000000001 alloy rim 5000000002 kola na staro 5000000002 secondhand car 5000000003 zapyavam 5000000003 start singing 5000000004 yapmalimak 5000000004 must do 5000000005 yapabilmek 5000000005 can do

14 The approach being proposed (4) The hyponymy/hyperonymy can easily be inferred from the relation used for defining the synset. For instance, the head component of the attributive relation is the second argument. The same goes for the modal and phase verb relations. For example: Compositional synsetHyponym attrib([alloy], [rim])[rim] phase([start], [sing])[sing] modal([must], [do])[do]

15 Conclusion The proposed approach defines new synsets by using semantic relations to compose specialised senses from the available PWN sense inventory. This leads to the following benefits: - obviating the need for explicit gloss definition; - obviating the need for explicitly defining hyporonymy/hyperonymy relation for each new synset; - automatic creation of inter-language indices between WordNets for different languages. If both WordNets are created using the PWN sense inventory and extended in line with the proposed approach, then the extended synsets of the two WordNets can be easily matched by the relation and the arguments used for the compositional definition.

16 Thank you!


Download ppt "Extending Princeton WordNet withcompositional semantics Luchezar Jackov Institute for Bulgarian Language Bulgarian Academy of Sciences."

Similar presentations


Ads by Google