Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ewa Rudnicka, Marek Maziarz, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl.

Similar presentations


Presentation on theme: "Ewa Rudnicka, Marek Maziarz, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl."— Presentation transcript:

1 Ewa Rudnicka, Marek Maziarz, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl

2 What is a wordnet?

3 Princeton WordNet (Fellbaum 1998) a huge electronic lexical database – a kind of thesaurus, yet of a much more advanced structure Words grouped into synonym sets called synsets Synsets linked via different lexico-semantic relations such as synonymy, near-synonymy, hypernymy/hyponymy, meronymy/holonymy, antonymy, fuzzynymy) the integratation of lexical data gathered from the existing resources such as traditional and electronic dictionaries as well as from corpora psycholinguistic principles – the structure of human lexical memory (cf. Miller 1998) taxonomic hierarchies for nouns, entailment relations for verbs WordNet – a lexico-semantic database

4 multi - lingual databases consisting of inter - linked ' national '/ mono - lingual wordnets: EuroWordNet - transfer method – translation from Princeton WordNet Dutch, Spanish, Italian, French, German, Czech a nd Estonian (cf. Vossen 2002) MultiWordNet - semi - automatic acquisition method from the Princeton WordNet Italian, Spanish, Portuguese, Romanian and Latin (Bentivogli et. al. ) IndoWordNet Sinha et al. 2006, Bhattacharyya 2010) expansion approach from Hindi wordnet;16 out of 22 languages of India Multi-lingual wordnets

5 plWordNet (Słowosieć) plWordNet – developed fairly independently of Princeton WordNet by applying a unique corpus - based method one of the biggest existing wordnets the emphasis on relations between lexical units, not between synsets much more relations, some of them specially designed to cover the pecularities of morphosyntactic structure of Polish (cf. Piasecki et al. 2009, Maziarz et al. 2012 )

6 Basic common concepts: lemma – base form representing different inflectional forms and different meanings Lexical unit – lemma plus sense pair (in wordnets marked with number) Synset – a set of synonymous lexical units Differences: plWN – synsets built of lexical units sharing the same constitutive relations (such as hyponymy, hypernymy, meronymy, holonymy) PWN – a synset represents a 'lexicalised concept' (cf. Miller 1998); synsets built of lexical units linked by synonymy relation, understood as a conceptual relation established on the basis of linguist's intuitions and dictionary definitions plWordNet vs. Princeton WordNet

7 Mapping plWordNet on Princeton WordNet Linking plWordNet synsets with Princeton Wordnet synsets Defining a set of inter-lingual relations Setting a hierarchy of inter-lingual relations Designing mapping procedure Mapping direction: plWordNet > Princeton WordNet Domains selected for mapping: person, artefact, location, family relationships, food, time, vocabulary connected with thinking and communication a novel perspective – linking two independent systems the main challenge – different philosophical, theoretical and methodological assumptions

8 Inter-lingual relations hierarchy A set of inter-lingual relations inspired by: - inter-lingual relations from EuroWordNet (Vossen 2002) - intra-lingual relations from plWordNet (Maziarz et al. 2011) 1. Synonymy 2. Partial synonymy 3. Inter-register synonymy 4. Hyponymy 5. Hypernymy 6. Meronymy 7. Holonymy

9 Inter-lingual relations (1) Synonymy (only one per one synset) - for large correspondence in sense and position in the source wordnet structure combined with many indirect inter- lingual links between the source and target synsets Inter-register synonymy - for I-synonyms as defined above, but differing in stylistic register Partial synonymy - in the case of partial correspondence of meanings and/or structures

10 Partial synonymy

11 Inter-lingual relations (2) Inter-lingual hyponymy - defined in terms of inclusion of set denotation: a hyponym refers to an object which is included in the denotation set of a hypernym Inter-lingual hypernymy - defined in terms of inclusion of set denotation; a hypernym refers to an object that includes hyponyms in its denotation set Inter-lingual meronymy - for parts, elements or materials of bigger wholes Inter-lingual holonymy - for a whole made of smaller parts, elements or materials

12 Mapping procedure (1) Recognizing the sense of a source synset: - checking its position in the network structure (all existing relations with an emphasis on hypernym(s) and hyponyms; definitions, commentaries; comparing other synsets contaning the given lemma) Example: {zagranica 1, obczyzna 1, obce terytorium 1}: - is a hyponym of {obszar 1, terytorium 1, obręb 1, strefa 1, zona 1, rejon 3} commentary: 'ograniczona część przestrzeni, zwykle dużych rozmiarów, określona powierzchnia czegoś (np. obszar państwa) 'a limited part of an area, usually of big size, a set surface of sth (e.g. state territory) - is a meronym of {świat 3, nieznane 1} – 'world, unknown territory' - is a fuzzynym of {granica państwa 1} – 'state border'

13 Mapping procedure (2) Searching for a target synset: – choosing candidates for a target synset with the help of intuitions, automatic prompts and dictionaries: e.g. {foreign country 1} - 'any state of which one is not a citizen' – is a hyponym of {state 1, nation 1, country 1, land 9, commonwealth 2, res publica 1, body politic 1} - 'a politically organized body of people under a single government' - verifing candidates for a target synset (comparing hyper and hyponymic structures (and other if such exist) with the source synset (checking the existing and/or potential inter-lingual relations; definitions, commentaries; dictionaries) {state 1,..} is an inter-lingual hyponym of {państwo 1, kraj 1} - 'zorganizowana politycznie społeczność, zamieszkująca określone terytorium, z niepodległą formą rządów' – 'a politically organised community, inhabiting a certain territory, with an independent form of government'

14 Mapping procedure (3) Choosing a target synset and an inter-lingual relation: {foreign country 1} Synonymy – no (different meaning, structures and relations) Hyponymy – no (meaning, structures and relations do not qualify as a subtype) Meronymy – yes (meaning, structures and relations qualify as a part) Linking the source synset with the target synset:

15 Results of inter-lingual mapping About 46 500 inter-lingual links/relations between synsets which amounts to about 50 000 relations between lexical units Synonymy - 15268 Partial synonymy – 971 Inter-register synonymy - 676 Hyponymy - 23677 Hypernymy - 3526 Meronymy – 1898 Holonymy - 555 Mapped branches: people, artefacts, places,food, time units, communication (partly), states and processes (partly), body parts (partly), group names (partly) Mapping direction: plWordNet – Princeton WordNet Bottom-up approach – starting from the lowest levels in the hierarchy

16 Lexico-grammatical differences (1) Markedness - young being (prosiak 'piglet' -hypo → młodzik 'young creature') - diminutive (prosiaczek 'piggy' ← prosiak + -ek) - augmentative (2) Lexicalised gender - (cousin ~ kuzyn (masc.) & kuzynka (fem.) (3) Lexical gaps Lexico-grammatical differences (1) Markedness - young being (prosiak 'piglet' -hypo → młodzik 'young creature') - diminutive (prosiaczek 'piggy' ← prosiak + -ek) - augmentative (2) Lexicalised gender - (cousin ~ kuzyn (masc.) & kuzynka (fem.) (3) Lexical gaps Inter-lingual lexico-grammatical differences : - marked forms (diminutives, augmentatives) - lexicalised gender - lexical gaps Differences in the definition of synonymy and synset: - 'Mixed' PWN synsets – marked and unmarked forms, feminine and masculine, countable and uncountable, hypernym and hyponym- hypernymy and (plWN) vs. and/or (PWN) Other differences: - synset definitions incompatible with relations (PWN) - different relations used for coding the same conceptual dependencies - more fine-grained meaning differentiation - differences boiling down to the content and size of resources Types of differences between plWN and PWN

17 Marked forms Lexico-grammatical differences (1) Markedness - young being (prosiak 'piglet' -hypo → młodzik 'young creature') - diminutive (prosiaczek 'piggy' ← prosiak + -ek) - augmentative (2) Lexicalised gender - (cousin ~ kuzyn (masc.) & kuzynka (fem.) (3) Lexical gaps Lexico-grammatical differences (1) Markedness - young being (prosiak 'piglet' -hypo → młodzik 'young creature') - diminutive (prosiaczek 'piggy' ← prosiak + -ek) - augmentative (2) Lexicalised gender - (cousin ~ kuzyn (masc.) & kuzynka (fem.) (3) Lexical gaps

18 Differences in lexicalisation

19 Hyponymy

20 Different relations for coding the same conceptual dependencies

21 References Fellbaum, Ch. ( ed ). 1998. WordNet : An Electronic Lexical Database. MIT Press : Cambridge, Massachusets. Maziarz, M., Piasecki, M. and S. Szpakowicz. 2012. Approaching plWordNet 2. 0. Proceedings of the 6th Global Wordnet Conference, Matsue. pp. 189 - 196. accepted for publication. Piasecki, M., Szpakowicz, S. and B. Broda. 2009. A Wordnet from the Ground Up. Oficyna Wydawnicza Politechniki Wroc ł awskiej : Wroc ł aw. Princeton WordNet http :// wordnet. princeton. edu / wordnet / S ł owosie ć http :// plwordnet. pwr. wroc. pl / wordnet / Vossen, P. ( ed ). 2002. EuroWordNet. General Document. Amsterdam.


Download ppt "Ewa Rudnicka, Marek Maziarz, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl."

Similar presentations


Ads by Google