Presentation is loading. Please wait.

Presentation is loading. Please wait.

Laboratorio di Informatica Umanistica Università degli Studi di Verona

Similar presentations


Presentation on theme: "Laboratorio di Informatica Umanistica Università degli Studi di Verona"— Presentation transcript:

1 Laboratorio di Informatica Umanistica Università degli Studi di Verona
Latin WordNet project Stefano Minozzi Laboratorio di Informatica Umanistica Università degli Studi di Verona

2 Latin WordNet project Laboratorio di Informatica Umanistica Università degli Studi di Verona The Cognitive and Communication Technologies (TCC) division – Fondazione Bruno Kessler – Trento

3 Historical credits Latin WordNet project owes to:
Princeton WordNet: lexical database for the English language (was created and is being maintained at the Cognitive Science Laboratory of Princeton University under the direction of psychology professor George A. Miller. Development began in 1985.) MultiWordNet: a multilingual lexical database in which the Italian WordNet is strictly aligned with Princeton WordNet v (Developed since 1994, at Istituto Trentino di Cultura – now Fondazione Bruno Kessler)

4 MultiWordnet: multilingual lexical matrix
language meaning lemma

5 In Latin WordNet are represented:
Semantic part of speech: Nouns Verbs Adjectives Adverbs Lexical relations that connect words Meanings are considered a constant through the various languages, while the lexicalization of a meaning is a language-specific variable

6 Structure of the database

7 the synset (= group of synonims) is the building block of WordNet
v# express an idea, etc. in words; \"He said that he wanted to marry her\"; \"tell me what is bothering you\"; \"state your opinion\" synset word v# state say tell synset lemma v# adnuntio dico effor enuntio for inquam inseco loquor narro synset word v# dire enunciare enunziare raccontare

8 The synsets are linked with relations

9 Ralations for adjectives and adverbs

10 Moreover the synsets are connected with semantic field labels in order to create a domain-related dictionaries

11 Building the semantic network

12 Build a semantic network from scratch is very time consuming
Resources available permits a different approach: Automatic assignment of synsets Manual correction of the results

13 Building blocks: Latin to italian MRD (mostly from G. B. Conte – E. Pianezzola) Latin to english MRD (mostly from OLD, via William Whitaker's Words) Italian and English branches of MultiWordnet

14 We developed a number of assignment strategies
Multilingual intersection method  exploits multilingual nature of MultiWordNet Generic probability  for very specialized words, where polisemy is really limited Gloss correspondence  exploits glosses present in the MRD Intersection of synsets  assigns a lemma to a synset when a number of the translation equivalents addresses to the same synset

15 Intersection method amor, is amor, is love, affection; n#04478900
the beloved; Cupid; affair; desire, passion; sexual passion; illicit passion amore; persona amata, amore; questioni amorose, amorazzi; storie d'amore;amore, desiderio; Amore;gli Amori, gli Amorini; Intersection amor, is n# n# n# n# n# Synsets from italian Synsets from english

16 Generic probability abactor, oris  rustler, cattle_thief; one_who_drives_off SYNSET n#

17 Gloss correspondence punctum, i  point, dot; point, spot; small_hole, pin_prick; sting, small_puncture (of_insect); vote, tick; tiny_amount; full-stop, period (punctuation) PERIOD n# n# n# n# n# n# n# n# n# n# Period point full_stop stop full_point {a punctuation mark (.) placed at the end of a declarative sentence to indicate a full stop or after abbreviations}

18 Intersezione di synset
punctum, i  point, dot; point, spot; small_hole, pin_prick; sting, small_puncture (of_insect); vote, tick; tiny_amount; full-stop, period (punctuation) POINT (24 synset) n# ;n# ;n# ;n# ;n# ;n# ;n# ;n# ;n# ;n# ;n# ;n# ;n# ;n# ;n# ;n# ;n# ;n# ;n# ;n# ;n# ;n# ;n# ;n# DOT (2 synset) n# ;n#

19 Lexical Gaps LEXICAL UNIT  FREE COMBINATION
abactor, is  gap latin-TO-italian: “ladro di bestiame”

20 Consistency of the database
Latin Noun Verb Adj Adv TOTAL SYNSETS 5621 2283 775 294 8973 LEMMAS 4777 2609 1259 479 9124 WORD SENSES 13060 10062 2054 732 25908

21 Latin WordNet can be browsed online
The database of Latin WordNet will soon be available from European Language Resource Association


Download ppt "Laboratorio di Informatica Umanistica Università degli Studi di Verona"

Similar presentations


Ads by Google