Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam.

Similar presentations

Presentation on theme: "1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam."— Presentation transcript:

1 1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam

2 2 What kind of resource is wordnet? Mostly used database in language technology Enormous impact in language technology development Large Free and downloadable English

3 WordNet Developed by George Miller and his team at Princeton University, as the implementation of a mental model of the lexicon Organized around the notion of a synset: a set of synonyms in a language that represent a single concept Semantic relations between concepts Covers over 117,000 concepts and over 150,000 English words

4 4 Relational model of meaning manwoman boygirl cat kitten dog puppy animal man woman boy meisje cat kitten dog puppy animal

5 Wordnet: a network of semantically related words {conveyance;transport} {vehicle} {motor vehicle; automotive vehicle} {car; auto; automobile; machine; motorcar} {bumper} {car door} {car window} {car mirror} {armrest} {doorlock} {hinge; flexible joint} {cruiser; squad car; patrol car; police car; prowl car} {cab; taxi; hack; taxicab}

6 6 Wordnet Semantic Relations WN 1.5 starting point The synset as a weak notion of synonymy: two expressions are synonymous in a linguistic context C if the substitution of one for the other in C does not alter the truth value. (Miller et al. 1993) Relations between synsets: RelationPOS-combinationExample ANTONYMYadjective-to-adjectivegood/bad verb-to-verbopen/ close HYPONYMYnoun-to-nouncar/ vehicle verb-to-verbwalk/ move MERONYMYnoun-to-nounhead/ nose ENTAILMENTverb-to-verbbuy/ pay CAUSEverb-to-verbkill/ die

7 7 Wordnet Data Model bank fiddle violin violist fiddler string rec: 12345 - financial institute rec: 54321 - side of a river rec: 9876 - small string instrument rec: 65438 - musician playing violin rec:42654 - musician rec:25876 - string instrument rec:35576 - string of instrument rec:29551 - underwear type-of part-of Vocabulary of a language ConceptsRelations 1 2 2 1 1 2

8 8 Some observations on Wordnet synsets are more compact representations for concepts than word meanings in traditional lexicons synonyms and hypernyms are substitutional variants: –begin – commence –I once had a canary. The bird got sick. The poor animal died. hyponymy and meronymy chains are important transitive relations for predicting properties and explaining textual properties: object -> artifact -> vehicle -> 4-wheeled vehicle -> car strict separation of part of speech although concepts are closely related (bed – sleep) and are similar (dead – death) lexicalization patterns reveal important mental structures

9 9 Lexicalization patterns 25 unique beginners garbage tree organism animal bird canarychurch building artifact object plant flower rose waste threat entity common canary abbey crocodiledog basic level concepts balance of two principles: predict most features apply to most subclasses where most concepts are created amalgamate most parts most abstract level to draw a pictures

10 10 Wordnet top level

11 11 Meronymy & pictures beak tail leg

12 12 Meronymy & pictures

13 13 Co-reference constraint in wordnet: Cats cannot be a kind of cats S: (n) cat, true cat (feline mammal usually having thick soft fur and no ability to roar: domestic cats; wildcats)S:true cat S: (n) guy, cat, hombre, bozo (an informal term for a youth or man) "a nice guy"; "the guy's only doing it for some doll"S:guyhombrebozo S: (n) cat (a spiteful woman gossip) "what a cat she is!"S: S: (n) kat, khat, qat, quat, cat, Arabian tea, African tea (the leaves of the shrub Catha edulis which are chewed like tobacco or used to make tea; has the effect of a euphoric stimulant) "in Yemen kat is used daily by 85% of adults"S:katkhatqatquatArabian teaAfrican tea S: (n) cat-o'-nine-tails, cat (a whip with nine knotted cords) "British sailors feared the cat"S:cat-o'-nine-tails S: (n) Caterpillar, cat (a large tracked vehicle that is propelled by two endless metal belts; frequently used for moving earth in construction and farm work)S:Caterpillar S: (n) big cat, cat (any of several large cats typically able to roar and living in the wild)S:big cat S: (n) computerized tomography, computed tomography, CT, computerized axial tomography, computed axial tomography, CAT (a method of examining body organs by scanning them with X rays and using a computer to construct a series of cross- sectional scans along a single axis)S:computerized tomographycomputed tomographyCTcomputerized axial tomographycomputed axial tomography S: (n) domestic cat, house cat, Felis domesticus, Felis catus (any domesticated member of the genus Felis)S:domestic catFelis domesticusFelis catus

14 14

15 15 Wordnet 3.0 statistics POSUniqueSynsetsTotal Strings Word-Sense Pairs Noun117,79882,115146,312 Verb11,52913,76725,047 Adjective21,47918,15630,002 Adverb4,4813,6215,580 Totals155,287117,659206,941

16 16 Wordnet 3.0 statistics POSMonosemousPolysemous Words and SensesWordsSenses Noun101,86315,93544,449 Verb6,2775,25218,770 Adjective16,5034,97614,399 Adverb3,7487331,832 Totals128,39126,89679,450

17 17 Wordnet 3.0 statistics POSAverage Polysemy Including Monosemous Words Excluding Monosemous Words Noun1.242.79 Verb2.173.57 Adjective1.42.71 Adverb1.252.5

18 18

19 19

20 20 Usage of Wordnet Improve recall of textual based analysis: – Query -> Index Synonyms: commence – begin Hypernyms: taxi -> car Hyponyms: car -> taxi Meronyms: trunk -> elephant Lexical entailments: gun -> shoot Inferencing: –what things can burn? Expression in language generation and translation: –alternative words and paraphrases

21 21 Improve recall Information retrieval: –small databases without redundancy, e.g. image captions, video text Text classification: –small training sets Question & Answer systems –query analysis: who, whom, where, what, when

22 22 Improve recall Anaphora resolution: –The girl fell off the table. She.... –The glass fell of the table. It... Coreference resolution: –When he moved the furniture, the antique table got damaged. Information extraction (unstructed text to structured databases): –generic forms or patterns "vehicle" - > text with specific cases "car"

23 23 Improve recall Summarizers: –Sentence selection based on word counts -> concept counts –Avoid repetition in summary -> language generation Limited inferencing: detect locations, organisations, etc.

24 24 Many others Data sparseness for machine learning: hapaxes can be replaced by semantic classes Use redundancy for more robustness: spelling correction and speech recognition can built semantic expectations using Wordnet and make better choices Sentiment and opinion mining Natural language learning

25 Recall & Precision query: cell phone mobile phones nerve cell police cell recall = doorsnede / relevant precision = doorsnede / gevonden foundintersectionrelevant Recall < 20% for basic search engines! (Blair & Maron 1985) jail neuron

26 26 EuroWordNet The development of a multilingual database with wordnets for several European languages Funded by the European Commission, DG XIII, Luxembourg as projects LE2-4003 and LE4-8328 March 1996 - September 1999 2.5 Million EURO. ewn.html ewn.html

27 27 EuroWordNet Languages covered: –EuroWordNet-1 (LE2-4003): English, Dutch, Spanish, Italian –EuroWordNet-2 (LE4-8328): German, French, Czech, Estonian. Size of vocabulary: –EuroWordNet-1: 30,000 concepts - 50,000 word meanings. –EuroWordNet-2: 15,000 concepts- 25,000 word meaning. Type of vocabulary: –the most frequent words of the languages –all concepts needed to relate more specific concepts

28 28 EuroWordNet Model I = Language Independent link II = Link from Language Specific to Inter lingual Index III = Language Dependent Link III Lexical Items Table cavalcare andare muoversi III guidare ILI-record {drive} Inter-Lingual-Index Ontology 2OrderEntity LocationDynamic Domains Traffic AirRoad` III Lexical Items Table bewegen gaan rijden berijden III Lexical Items Table driveride move go III Lexical Items Table cabalgar jinetear III conducir mover transitar III II I I

29 29 ENGLISH Car … Train … Vehicle Inter-Lingual-Index Transport Road Air Water Domains DOLCE SUMO Device Object TransportDevice English Words vehicle cartrain 1 2 4 33 Czech Words dopravní prostředník autovlak 2 1 French Words véhicule voiture train 2 1 Estonian Words liiklusvahend autokillavoor 2 1 German Words Fahrzeug AutoZug 2 1 Spanish Words vehículo autotren 2 1 Italian Words veicolo autotreno 2 1 Dutch Words voertuig autotrein 2 1 EuroWordNet Design

30 30 Differences in relations between EuroWordNet and WordNet Added Features to relations Cross-Part-Of-Speech relations New relations to differentiate shallow hierarchies New interpretations of relations

31 31 EWN Relationship Labels Disjunction/Conjunction of multiple relations of the same type WordNet1.5 door1 -- (a swinging or sliding barrier that will close the entrance to a room or building; "he knocked on the door"; "he slammed the door as he left") PART OF: doorway, door, entree, entry, portal, room access door 6 -- (a swinging or sliding barrier that will close off access into a car; "she forgot to lock the doors of her car") PART OF: car, auto, automobile, machine, motorcar.

32 32 EWN Relationship Labels {airplane}HAS_MERO_PART: conj1 {door} HAS_MERO_PART: conj2 disj1{jet engine} HAS_MERO_PART: conj2 disj2{propeller} {door}HAS_HOLO_PART: disj1 {car} HAS_HOLO_PART: disj2 {room} HAS_HOLO_PART: disj3 {entrance} {dog} HAS_HYPERONYM: conj1{mammal} HAS_HYPERONYM: conj2{pet} {albino}HAS_HYPERONYM: disj1{plant} HAS_HYPERONYM: disj2{animal} Default Interpretation: non-exclusive disjunction

33 33 Factive/Non-factive CAUSES (Lyons 1977) factive (default interpretation): to kill causes to die: {kill}CAUSES{die} non-factive: E 1 probably or likely causes event E 2 or E 1 is intended to cause some event E 2 : to search may cause to find. {search}CAUSES {find} non-factive EWN Relationship Labels

34 34 Cross-Part-Of-Speech relations WordNet1.5: nouns and verbs are not interrelated by basic semantic relations such as hyponymy and synonymy: adornment 2 change of state-- (the act of changing something) adorn 1 change, alter-- (cause to change; make different) EuroWordNet: words of different parts of speech can be inter-linked with explicit xpos-synonymy, xpos-antonymy and xpos-hyponymy relations: {adorn V}XPOS_NEAR_SYNONYM{adornment N} {size N}XPOS_NEAR_HYPONYM{tall A} {short A}

35 35 Role relations In the case of many verbs and nouns the most salient relation is not the hyperonym but the relation between the event and the involved participants. These relations are expressed as follows: {knife}ROLE_INSTRUMENT{to cut} {to cut}INVOLVED_INSTRUMENT{knife}reversed {school}ROLE_LOCATION {to teach} {to teach}INVOLVED_LOCATION {school}reversed These relations are typically used when other relations, mainly hyponymy, do not clarify the position of the concept network, but the word is still closely related to another word.

36 36 Co_Role relations guitar playerHAS_HYPERONYMplayer CO_AGENT_INSTRUMENTguitar playerHAS_HYPERONYMperson ROLE_AGENTto play music CO_AGENT_INSTRUMENTmusical instrument to play musicHAS_HYPERONYM to make ROLE_INSTRUMENTmusical instrument guitarHAS_HYPERONYMmusical instrument CO_INSTRUMENT_AGENTguitar player ice sawHAS_HYPERONYMsaw CO_INSTRUMENT_PATIENTice sawHAS_HYPERONYMsaw ROLE_INSTRUMENTto saw iceCO_PATIENT_INSTRUMENTice saw REVERSED

37 37 Co_Role relations Examples of the other relations are: criminalCO_AGENT_PATIENTvictim novel writer/ poetCO_AGENT_RESULTnovel/ poem doughCO_PATIENT_RESULTpastry/ bread photograpic cameraCO_INSTRUMENT_RESULTphoto

38 38 Overview of the Language Internal relations in EuroWordnet Same Part of Speech relations: NEAR_SYNONYMYapparatus - machine HYPERONYMY/HYPONYMYcar - vehicle ANTONYMYopen - close HOLONYMY/MERONYMYhead - nose Cross-Part-of-Speech relations: XPOS_NEAR_SYNONYMYdead - death; to adorn - adornment XPOS_HYPERONYMY/HYPONYMYto love - emotion XPOS_ANTONYMYto live - dead CAUSEdie - death SUBEVENTbuy - pay; sleep - snore ROLE/INVOLVEDwrite - pencil; hammer - hammer STATEthe poor - poor MANNERto slurp - noisily BELONG_TO_CLASSRome - city

39 chronical patient ; mental patient patient HYPONYM ρ-PROCEDURE ρ-LOCATION STATE ρ-CAUSE cure ρ-PATIENT treat docter disease; disorder physiotherapy medicine etc. hospital, etc. stomach disease, kidney disorder, ρ-PATIENT ρ-AGENT child docter child co-ρ- AGENT-PATIENT Horizontal & vertical semantic relations HYPONYM

40 40 Inter-Lingual-Index: unstructured fund of concepts to provide an efficient mapping across the languages; Index-records are mainly based on WordNet synsets and consist of synonyms, glosses and source references; Various types of complex equivalence relations are distinguished; Equivalence relations from synsets to index records: not on a word-to-word basis; Indirect matching of synsets linked to the same index items; The Multilingual Design

41 41 Equivalent Near Synonym 1. Multiple Targets (1:many) Dutch wordnet: schoonmaken (to clean) matches with 4 senses of clean in WordNet1.5: make clean by removing dirt, filth, or unwanted substances from remove unwanted substances from, such as feathers or pits, as of chickens or fruit remove in making clean; "Clean the spots off the rug" remove unwanted substances from - (as in chemistry) 2. Multiple Sources (many:1) Dutch wordnet: versiersel near_synonym versiering ILI-Record:decoration. 3. Multiple Targets and Sources (many:many) Dutch wordnet: toestel near_synonym apparaat ILI-records:machine; device; apparatus; tool

42 42 Equivalent Hyperonymy Typically used for gaps in English WordNet: genuine, cultural gaps for things not known in English culture: –Dutch: klunen, to walk on skates over land from one frozen water to the other pragmatic, in the sense that the concept is known but is not expressed by a single lexicalized form in English: –Dutch: kunststof = artifact substance artifact object

43 43 Equivalent Hyponymy has_eq_hyponym Used when wordnet1.5 only provides more narrow terms. In this case there can only be a pragmatic difference, not a genuine cultural gap, e.g.: Spanish dedo = either finger or toe.

44 44 { toe : part of foot } { finger : part of hand } { dedo, dito : finger or toe } { head : part of body } { hoofd : human head } { kop : animal head } toe finger head dito dedo hoofd kop EN-Net NL-Net IT-Net ES-Net = normal equivalence =eq_has_hyponym =eq_has_hyperonym Complex mappings across languages

45 45 Typical gaps in the (English) ILI Dutch: doodschoppen (to kick to death): eq_hyperonym {kill}V and to {kick}V aardig (Adjective, to like): eq_near_synonym {like}V cassière (female cashier) eq_hyperonym {cashier}, {woman} kunstproduct (artifact substance) eq_hyperonym {artifact} and to {product} Spanish: alevín (young fish): eq_hyperonym {fish} and eq_be_in_state {young} cajera (female cashier) eq_hyperonym {cashier}, {woman}

46 46 Wordnets as semantic structures Wordnets are unique language-specific structures: –different lexicalizations –differences in synonymy and homonymy –different relations between synsets –same organizational principles: synset structure and same set of semantic relations. Language independent knowledge is assigned to the ILI and can thus be shared for all language linked to the ILI: both an ontology and domain hierarchy

47 47 Autonomous & Language-Specific voorwerp {object} lepel {spoon} werktuig{tool} tas {bag} bak {box} blok {block} lichaam {body} Wordnet1.5Dutch Wordnet bag spoon box object natural object (an object occurring naturally) artifact, artefact (a man-made object) instrumentality blockbody container device implement tool instrument

48 48 Artificial ontology: better control or performance, or a more compact and coherent structure. introduce artificial levels for concepts which are not lexicalized in a language (e.g. instrumentality, hand tool), neglect levels which are lexicalized but not relevant for the purpose of the ontology (e.g. tableware, silverware, merchandise ). What properties can we infer for spoons? spoon -> container; artifact; hand tool; object; made of metal or plastic; for eating, pouring or cooking Linguistic versus Artificial Ontologies

49 49 Linguistic ontology: Exactly reflects the relations between all the lexicalized words and expressions in a language. Captures valuable information about the lexical capacity of languages: what is the available fund of words and expressions in a language. What words can be used to name spoons? spoon -> object, tableware, silverware, merchandise, cutlery, Linguistic versus Artificial Ontologies

50 50 Wordnets versus ontologies Wordnets: autonomous language-specific lexicalization patterns in a relational network. Usage: to predict substitution in text for information retrieval, text generation, machine translation, word- sense-disambiguation. Ontologies: data structure with formally defined concepts. Usage: making semantic inferences.

51 51 Sharing world knowledge All wordnets in the world can be linked to the same ontology All wordnets in the world can be linked to the same thesaurus

52 52 Wordnet: Domain information type-of part-of Relations rec: 12345 - financial institute rec: 54321 - river side rec: 9876 - small string instrument rec: 65438 - musician playing a violin rec:42654 - musician rec:25876 - string instrument rec:35576 - string of an instrument rec:29551 - underwear Concepts Vocabularies of languages bank violin violist string 1 2 1 2 1 2 Domains Music Culture FinanceClothingSport Ball sports Winter sports

53 53 How to harmonize wordnets? Wordnets are unique language-specific lexicalizations patterns Define universal sets of concepts that play a major role in many different wordnets: so-called Base Concepts Define base concepts in each language wordnet –High level in the hierarchy –Many hyponyms Provide the closest equivalent in English wordnet Determine the intersection of English equivalences

54 54 Lexicalization patterns 25 unique beginners garbage tree organism animal bird canarychurch building artifact object plant flower rose threat entity common canary abbey crocodiledog basic level concepts 1024 base concepts

55 55 Base Concept Intersection NounsVerbs Intersection EN, NL, IT, ES246 Intersection FR, DE, EE, CZ7030 Intersection All132 {cause 6; get#9; have#7; induce#2; make#12; stimulate#3} {create 2; make#13} {go 14; locomote#1; move#15; travel#4} {be 4; have the quality of being#1} {human 1; individual#1; mortal#1; person#1; someone#1; soul#1} {animal 1; animate being#1; beast#1; brute#1; creature#1; fauna#1} {flora 1; plant#1; plant life#1} {matter 1; substance#1} {food 1; nutrient#1} {feeling 1} {act 1; human action#1; human activity#1}

56 56 Explanations for low intersection of Base Concepts The individual selections are not representative enough. There are major differences in the way meanings are classified, which have an effect on the frequency of the relations. The translations of the selection to WordNet1.5 synsets are not reliable The resources cover very different vocabularies

57 57 Concepts selected by at least two languages: intersections of pairs NOUNS VERBS NLESITENNLESITEN NL1027103182333323364286 ES10352345284361281843 IT18245334167421810439 EN3332841671296864339236

58 58 NounsVerbsTotal Physical objects & substances491 Processes and states272228500 Mental objects33 Total7962281024 Common Base Concepts

59 59 Table 4: Number of Common BCs represented in the local wordnets Related to CBCsEq_synonymEq_nearCBCs Without Direct Equivalent NL99272526997 ES10121009015 IT8787591919 Table 5: BC4 Gaps in at least two wordnets (10 synsets) body covering#1mental object#1; cognitive content#1; content#2 body substance#1natural object#1 social control#1place of business#1; business establishment#1 change of magnitude#1plant organ#1 contractile organ#1plant part#1 psychological feature#1spatial property#1; spatiality#1

60 60 Table 6: Local senses with complex equivalence relations to CBCs NLESIT Eq_has_hyperonym61404 eq_has_hyponym341420 Eq_has_holonym20 Eq_has_meronym32 Eq_involved3 Eq_is_caused_by3 Eq_is_state_of1 Example of complex relation CBC: cause to feel unwell#1, Verb Closest Dutch concept: {onwel#1}, Adjective (sick) Equivalence relation: eq_is_caused_by

61 61 EuroWordNet data

62 62 From EuroWordNet to Global WordNet Currently, wordnets exist for more than 50 languages, including: Arabic, Bantu, Basque, Chinese, Bulgarian, Estonian, Hebrew, Icelandic, Japanese, Kannada, Korean, Latvian, Nepali, Persian, Romanian, Sanskrit, Tamil, Thai, Turkish, Zulu... Many languages are genetically and typologically unrelated

63 63 Global Wordnet Association Danish Norway Swedish Portuguese Korean Russian Basque Catalan Thai Arabic Polish Welsh Chinese 20 Indian Languages Brazilian Portuguese Hebrew Latvian Persian Kurdish Avestan Baluchi Hungarian English German Spanish French Italian Dutch Czech Estonian Romanian Bulgarian Turkish Slovenian Greek Serbian EuroWordNet BalkaNet

64 64 Some downsides of the EuroWordnet model Construction is not done uniformly Coverage differs Not all wordnets can communicate with one another Proprietary rights restrict free access and usage A lot of semantics is duplicated Complex and obscure equivalence relations due to linguistic differences between English and other languages

65 65 Inter-Lingual Ontology Device Object TransportDevice English Words vehicle cartrain 1 2 33 Czech Words dopravní prostředník autovlak 2 1 French Words véhicule voituretrain 2 1 Estonian Words liiklusvahend autokillavoor 2 1 German Words Fahrzeug AutoZug 2 1 Spanish Words vehículo autotren 2 1 Italian Words veicolo autotreno 2 1 Dutch Words voertuig autotrein 2 1 Next step: Global WordNet Grid

66 66 GWNG: Main Features Construct separate wordnets for each Grid language Contributors from each language encode the same core set of concepts plus culture/language-specific ones Synsets (concepts) can be mapped crosslinguistically via an ontology

67 67 The Ontology: Main Features Formal ontology serves as universal index of concepts List of concepts is not just based on the lexicon of a particular language (unlike in EuroWordNet) but uses ontological observations Ontology contains only upper and mid-level concepts Concepts are related in a type hierarchy Concepts are defined with axioms

68 68 The Ontology: Main Features In addition to high-level (primitive) concept ontology needs to express low-level concepts lexicalized in the Grid languages Additional concepts can be defined with expressions in Knowledge Interchange Format (KIF) based on first order predicate calculus and atomic element

69 69 The Ontology: Main Features Minimal set of concepts (Reductionist view): –to express equivalence across languages –to support inferencing Ontology must be powerful enough to encode all concepts that are lexically expressed in any of the Grid languages Ontology need not and cannot provide a linguistic encoding for all concepts found in the Grid languages –Lexicalization in a language is not sufficient to warrant inclusion in the ontology –Lexicalization in all or many languages may be sufficient Ontological observations will be used to define the concepts in the ontology

70 70 Ontological observations Identity criteria as used in OntoClean (Guarino & Welty 2002), : –rigidity: to what extent are properties true for entities in all worlds? You are always a human, but you can be a student for a short while. –essence: what properties are essential for an entity? Shape is essential for a statue but not for the clay it is made of. –unicity: what represents a whole and what entities are parts of these wholes? An ocean is a whole but the water it contains is not.

71 71 Type-role distinction Current WordNet treatment: (1) a husky is a kind of dog(type) (2) a husky is a kind of working dog (role) Whats wrong? (2) is defeasible, (1) is not: *This husky is not a dog This husky is not a working dog Other roles: watchdog, sheepdog, herding dog, lapdog, etc….

72 72 Ontology and lexicon Hierarchy of disjunct types: Canine PoodleDog; NewfoundlandDog; GermanShepherdDog; Husky Lexicon: –NAMES for TYPES: {poodle}EN, {poedel}NL, {pudoru}JP ((instance x Poodle) –LABELS for ROLES: {watchdog}EN, {waakhond}NL, {banken}JP ((instance x Canine) and (role x GuardingProcess))

73 73 Ontology and lexicon Hierarchy of disjunct types: River; Clay; etc… Lexicon: –NAMES for TYPES: {river}EN, {rivier, stroom}NL ((instance x River) –LABELS for dependent concepts: {rivierwater}NL (water from a river => water is not a unit) {kleibrok}NL (irregularly shared piece of clay=>non-essential) ((instance x water) and (instance y River) and (portion x y) ((instance x Object) and (instance y Clay) and (portion x y) and (shape X Irregular))

74 74 Rigidity The primitive concepts represented in the ontology are rigid types Entities with non-rigid properties will be represented with KIF statements But: ontology may include some universal, core concepts referring to roles like father, mother

75 75 Properties of the Ontology Minimal: terms are distinguished by essential properties only Comprehensive: includes all distinct concepts types of all Grid languages Allows definitions via KIF of all lexemes that express non-rigid, non-essential properties of types Logically valid, allows inferencing

76 76 Mapping Grid Languages onto the Ontology Explicit and precise equivalence relations among synsets in different languages: –type hierarchy is minimal –subtle differences can be encoded in KIF expressions Grid database contains wordnets with synsets that label --either primitive types in the hierarchies, --or words relating to these types in ways made explicit in KIF expressions If 2 lgs. create the same KIF expression, this is a statement of equivalence!

77 77 How to construct the GWNG Take an existing ontology as starting point; Use English WordNet to maximize the number of disjunct types in the ontology; Link English WordNet synsets as names to the disjunct types; Provide KIF expressions for all other English words and synsets Copy the relation to the ontology to other languages, including KIF statements built for English Revise KIF statements to make the mapping more precise Map all words and synsets that are and cannot be mapped to English WordNet to the ontology: –propose extensions to the type hierarchy –create KIF expressions for all non-rigid concepts

78 78 Initial Ontology: SUMO (Niles and Pease) SUMO = Suggested Upper Merged Ontology --consistent with good ontological practice --fully mapped to WordNet(s): 1000 equivalence mappings, the rest through subsumption --freely and publicly available --allows data interoperability --allows NLP --allows reasoning/inferencing

79 79 SUMO 1,000 generic, abstract, high-level terms 4,000 definitional statements MILO (Mid-Level Ontology) closer to lexicon, WordNet

80 80 Mapping Grid languages onto the Ontology Check existing SUMO mappings to Princeton WordNet -> extend the ontology with rigid types for specific concepts Extend it to many other WordNet synsets Observe OntoClean principles! (Synsets referring to non-rigid, non-essential, non- unicitous concepts must be expressed in KIF)

81 81 Lexicalizations not mapped to WordNet Not added to the type hierarchy: {straathond}NL (a dog that lives in the streets) ((instance x Canine) and (habitat x Street)) Added to the type hierarchy: {klunen}NL (to walk on skates from one frozen body to the next over land) WalkProcess KluunProcess Axioms: (and (instance x Human) (instance y Walk) (instance z Skates) (wear x z) (instance s1 Skate) (instance s2 Skate) (before s1 y) (before y s2) etc… National dishes, customs, games,....

82 82 Most mismatching concepts are not new types Refer to sets of types in specific circumstances or to concept that are dependent on these types, next to {rivierwater}NL there are many other: {theewater}NL (water used for making tea) {koffiewater}NL (water used for making coffee) {bluswater}NL (water used for making extinguishing file) Relate to linguistic phenomena: –gender, perspective, aspect, diminutives, politeness, pejoratives, part-of-speech constraints

83 83 {teacher}EN ((instance x Human) and (agent x TeachingProcess)) {Lehrer}DE ((instance x Man) and (agent x TeachingProcess)) {Lehrerin}DE ((instance x Woman) and (agent x TeachingProcess)) KIF expression for gender marking

84 84 KIF expression for perspective sell: subj(x), direct obj(z),indirect obj(y) versus buy: subj(y), direct obj(z),indirect obj(x) (and (instance x Human)(instance y Human) (instance z Entity) (instance e FinancialTransaction) (source x e) (destination y e) (patient e) The same process but a different perspective by subject and object realization: marry in Russian two verbs, apprendre in French can mean teach and learn

85 85 Aspectual variants Slavic languages: two members of a verb pair for an ongoing event and a completed event. English: can mark perfectivity with particles, as in the phrasal verbs eat up and read through. Romance languages: mark aspect by verb conjugations on the same verb. Dutch, verbs with marked aspect can be created by prefixing a verb with door: doorademen, dooreten, doorfietsen, doorlezen, doorpraten (continue to breathe/eat/bike/read/talk). These verbs are restrictions on phases of the same process Does NOT warrant the extension of the ontology with separate processes for each aspectual variant

86 86 Kinship relations in Arabic عَم (Eam~)father's brother, paternal uncle. خَال (xaAl)mother's brother, maternal uncle. عَمَّة (Eam~ap)father's sister, paternal aunt. خَالَة (xaAlap)mother's sister, maternal aunt

87 87 Kinship relations in Arabic......... شَقِيقَة ($aqiyqapfull) sister, sister on the paternal and maternal side (as distinct from أُخْت (>uxot): 'sister' which may refer to a 'sister' from paternal or maternal side, or both sides). ثَكْلان (vakolAna)father bereaved of a child (as opposed to يَتِيم (yatiym) or يَتِيمَة (yatiymap) for feminine: 'orphan' a person whose father or mother died or both father and mother died). ثَكْلَى (vakolaYa)other bereaved of a child (as opposed to يَتِيم or يَتِيمَة for feminine: 'orphan' a person whose father or mother died or both father and mother died).

88 88 father's brother, paternal uncle WORDNET paternal uncle => uncle => brother of....???? ONTOLOGY (=> (paternalUncle ?P ?UNC) (exists (?F) (and (father ?P ?F) (brother ?F ?UNC)))) Complex Kinship concepts

89 89 Universality as evidence English verb cut abstracts from the precise process but there are troponyms that implicate the manner : – snip, clip imply scissors, chop and hack a large knife or an axe Dutch there is no general verb but only specific verbs: knippen clip, snip, cut with scissors or a scissor-like tool', snijden cut with a knife or knife-like tool, hakken chop, hack, to cut with an axe, or similar tool). If lexicalization of the specific process is more universal it can be seen as evidence that the specific processes should be listed in the ontology and not the generic verb

90 90 Open Questions/Challenges What is a word, i.e., a lexical unit? What is the status of complex lexemes like English lightning rod, word of mouth, find out, kick the bucket? What is a semantic unit, i.e. a concept?

91 91 Open Questions/Challenges Is there a core inventory of concepts that are universally encoded? If so, what are these concepts? How can crosslinguistic equivalence be verified? Is there systematicity to the language-specific extensions? What are the lexicalization patterns of individual languages? Are lexical gaps accidental or systematic?

92 92 Coverage: what belongs in a universal lexical database? Formal, linguistic criteria for inclusion Informal, cultural criteria Both are difficult to define and apply!

93 93 Advantages of the Global Wordnet Grid Shared and uniform world knowledge: –universal inferencing –uniform text analysis and interpretation More compact and less redundant databases More clear notion how languages map to the knowledge –better criteria for expressing knowledge –better criteria for understanding variation

94 94 dog watchdog poodle street dog dachshund lapdog short hair dachshund long hair dachshund Expansion from a type to roles hunting dog Expansion with pure hyponymy relations puppy bitch

95 95 dog watchdog poodle street dog dachshund lapdog short hair dachshund long hair dachshund Expansion from a role to types and other roles hunting dog Expansion with pure hyponymy relations puppy bitch

96 96 Automotive ontology: (

97 97 Who uses ontologies?

98 98

Download ppt "1 From WordNet, to EuroWordNet, to the Global Wordnet Grid: anchoring languages to universal meaning Piek Vossen VU University Amsterdam."

Similar presentations

Ads by Google