Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Global Wordnet Grid: anchoring languages to universal meaning

Similar presentations


Presentation on theme: "The Global Wordnet Grid: anchoring languages to universal meaning"— Presentation transcript:

1 The Global Wordnet Grid: anchoring languages to universal meaning
Piek Vossen Irion Technologies/Vrije Universiteit Amsterdam 6th International Plain Language Conference, October 11-14th, 2007, Amsterdam

2 6th International PLAIN language Conference
Overview: Problem: effective language and communication From human to human From human to machine From machine to machine From human to machine and back to human, maybe via other machines... Solution: anchoring language to universal meaning Wordnets: network of words related through meaning The Global Wordnet Grid: wordnets for languages connected to each other through an ontology Future: Equal access to the knowledge and information on the Internet to all people, regardless of language and background Systems that start to understand language 6th International PLAIN language Conference 11-14th October, Amsterdam

3 6th International PLAIN language Conference
Problem 6th International PLAIN language Conference 11-14th October, Amsterdam

4 Language is inherently vague and ambiguous
Communication through language: mediates between the expectation of the Speaker and the Hearer => half a word is enough Language is not fully descriptive but minimally sufficient: Do not bother the Hearer with information that is already known => rely on background knowledge Use a minimal set of words and expressions to avoid memory overloading => words and expressions have multiple meaning 6th International PLAIN language Conference 11-14th October, Amsterdam

5 Understanding is fundamentally impossible
Concept in our head rabbit with carrots and rosemary sweet pet wanna hug devine appearance announcing spring "gavagai" Plato with beard W.V.O.Quine (1964): inscrutability of reference 6th International PLAIN language Conference 11-14th October, Amsterdam

6 Full understanding is fundamentally impossible BUT?
People do communicate... People even communicate with computers... As long as language is effective: meaning= to have the desired effect! Link language to useful content! 6th International PLAIN language Conference 11-14th October, Amsterdam

7 What is effective computer-mediated language?
Computers store information and knowledge in textual form: People search information and knowledge by 'querying' computers Effective Computer Mediated Communication (CMC) = find what you need and nothing else Computers analyze information and knowledge: Collect data and send alerts, reports and facts Computers connect people: Support communication across people by analyzing communication or translating languages 6th International PLAIN language Conference 11-14th October, Amsterdam

8 6th International PLAIN language Conference
Concept Concept Information Seeker Expression in language Words…. Expression in language ….Words Strings Information ape …. energy mass zebra Index of Strings Strings Information Provider Strings Query 6th International PLAIN language Conference 11-14th October, Amsterdam

9 6th International PLAIN language Conference
Conceptual match Concept Concept Expression in language Expression in language my cell phone…. ….mobile Index of Strings Strings Strings Information Provider ape …. mobile zebra Strings Information Seeker Query Information Linguistic mismatch 6th International PLAIN language Conference 11-14th October, Amsterdam

10 6th International PLAIN language Conference
Conceptual mismatch Concept Concept Expression in language Expression in language my cell phone…. ….nerve cells Index of Strings Strings Strings Information Provider ape …. cell zebra Strings Information Seeker Query Information Linguistic match 6th International PLAIN language Conference 11-14th October, Amsterdam

11 6th International PLAIN language Conference
Conceptual mismatch Concept Concept Expression in language Expression in language police cell …. …. nerve cells Index of Strings Strings Strings Information Provider ape …. cell zebra Strings Information Seeker Query Information Linguistic match 6th International PLAIN language Conference 11-14th October, Amsterdam

12 6th International PLAIN language Conference
Conceptual match Concept Concept Expression in language Expression in language neuron …. ….nerve cells Index of Strings Strings Strings Information Provider ape …. cell zebra Strings Information Seeker Query Information Linguistic mismatch 6th International PLAIN language Conference 11-14th October, Amsterdam

13 6th International PLAIN language Conference
Recall & Precision Search engine for database with all documents “nerve cell” “police cell” “cell phone” “mobile phones” found intersection relevant query: “cell” Recall < 20% for basic search engines! (Blair & Maron 1985)‏ recall = doorsnede / relevant precision = doorsnede / gevonden 6th International PLAIN language Conference 11-14th October, Amsterdam

14 Useless dialogues with Alice-bot
6th International PLAIN language Conference 11-14th October, Amsterdam

15 It is useful to anchor meaning!
Anchoring already takes place all over the world through standardization: measures and units: meter, liter, kilo terminological databases, legal definitions, contracts international cooperation ontologies: definition of the meaning of concepts in a formal knowledge presentation system, (1st order logic) so that a computer can reason with it 6th International PLAIN language Conference 11-14th October, Amsterdam

16 6th International PLAIN language Conference
Solution 6th International PLAIN language Conference 11-14th October, Amsterdam

17 How can we anchor the meaning of words?
We can anchor words to each other: semantic network or wordnet We can anchor words to logical implications: a formal ontology 6th International PLAIN language Conference 11-14th October, Amsterdam

18 Relational model of meaning
animal kitten animal man boy man woman cat kitten dog puppy cat meisje boy girl dog puppy woman 6th International PLAIN language Conference 11-14th October, Amsterdam

19 6th International PLAIN language Conference
Princeton WordNet Developed by George Miller and his team at Princeton University, as the implementation of a mental model of the lexicon Organized around the notion of a synset: a set of synonyms in a language that represent a single concept Semantic relations between concepts Covers over 100,000 concepts and over 120,000 English words 6th International PLAIN language Conference 11-14th October, Amsterdam

20 Wordnet: a network of semantically related words
{conveyance;transport} {vehicle} {armrest} {car mirror} {motor vehicle; automotive vehicle} {car door} {doorlock} {car; auto; automobile; machine; motorcar} {bumper} {hinge; flexible joint} {car window} {cruiser; squad car; patrol car; police car; prowl car} {cab; taxi; hack; taxicab} 6th International PLAIN language Conference 11-14th October, Amsterdam

21 Wordnet: a network of semantically related words
chronical patient ; mental patient ρ-PATIENT ISA cure patient ρ-CAUSE docter treat ρ-PATIENT ρ-AGENT ISA STATE child docter ρ-PROCEDURE ρ-LOCATION disease; disorder co-ρ- AGENT-PATIENT ISA physiotherapy medicine etc. hospital, etc. stomach disease, kidney disorder, child 6th International PLAIN language Conference 11-14th October, Amsterdam

22 Wordnet family Princeton WordNet, (Fellbaum 1998): 115,000 conceps
BalkaNet, (Tufis 2004): 6 languages EuroWordNet, (Vossen 1998): 8 languages Global Wordnet Association: all languages Transport Road Air Water Domains DOLCE SUMO Device Object TransportDevice Czech Words dopravní prostředník auto vlak 2 1 French Words véhicule voiture train Estonian Words liiklusvahend killavoor German Words Fahrzeug Auto Zug Spanish Words vehículo auto tren 2 1 Italian Words veicolo treno Dutch Words voertuig trein English Words vehicle car train 1 2 4 3 ENGLISH Car Train Vehicle Inter-Lingual-Index

23 Wordnets as autonomous language-specific structures
Dutch Wordnet bag spoon box object natural object (an object occurring naturally) artifact, artefact (a man-made object) instrumentality block body container device implement tool instrument voorwerp {object} blok {block} werktuig{tool} lichaam {body} bak {box} lepel {spoon} tas {bag} 6th International PLAIN language Conference 11-14th October, Amsterdam

24 Complex equivalence relations
1. Multiple Targets (1:many) Dutch wordnet: schoonmaken (to clean) matches with 4 senses of clean in WordNet1.5: make clean by removing dirt, filth, or unwanted substances from remove unwanted substances from, such as feathers or pits, as of chickens or fruit remove in making clean; "Clean the spots off the rug" remove unwanted substances from - (as in chemistry) 2. Multiple Sources (many:1) Dutch wordnet: versiersel near_synonym versiering Target record: decoration. 3. Multiple Targets and Sources (many:many) Dutch wordnet: toestel near_synonym apparaat Target records: machine; device; apparatus; tool 6th International PLAIN language Conference 11-14th October, Amsterdam

25 Complex equivalece relations
Gaps in the English WordNet: genuine, cultural gaps: unknown in English culture: Dutch: klunen, to walk on skates over land from one frozen water to the other pragmatic gaps: the concept is known but is not expressed by a single lexicalized form in English: Dutch: kunstproduct = artifact substance <=> artifact object 6th International PLAIN language Conference 11-14th October, Amsterdam

26 From EuroWordNet to Global WordNet
Global Wordnet Association: Bi-annual conference: India (2002), Czech (2004), Korea (2006), Hungary (2008)‏, .... Currently, wordnets exist for more than 40 languages, including: Arabic, Bantu, Basque, ...., Chinese, Bulgarian, Estonian, Hebrew, ...., Icelandic, Japanese, Kannada, Korean, Latvian, Latin, ....Nepali, Persian, Romanian, Sanskrit, Tamil, Thai, Turkish, .... Zulu Many languages are genetically and typologically unrelated 6th International PLAIN language Conference 11-14th October, Amsterdam

27 6th International PLAIN language Conference
Some downsides Construction is not done uniformly Coverage differs Not all Wordnets can communicate with one another: not linked linked to different versions: 1.5, 1.6, 1.7, 2.0 and now 3.0, 3.1 linked with different relations Proprietary rights restrict free access and usage A lot of the semantics is duplicated Complex and obscure equivalence relations due to linguistic differences between English and other languages 6th International PLAIN language Conference 11-14th October, Amsterdam

28 Next step: Global WordNet Grid
German Words Fahrzeug Auto Zug 2 1 3 Inter-Lingual Ontology voertuig English Words vehicle car train 1 2 1 auto trein Object 2 Dutch Words liiklusvahend 1 Device auto killavoor TransportDevice Spanish Words vehículo auto tren 2 1 2 véhicule Estonian Words Italian Words veicolo auto treno 2 1 1 voiture train 2 dopravní prostředník French Words auto 1 vlak 2 Czech Words 6th International PLAIN language Conference 11-14th October, Amsterdam

29 The Ontology: main features
Formal, artificial ontology serves as universal index of concepts List of concepts is not just based on the lexicon of a particular language (unlike in EuroWordNet) but uses ontological observations: Lexicalization in a language is not sufficient to warrant inclusion in the ontology Lexicalization in all or many languages may be sufficient Ontological observations will be used to define the concepts in the ontology Concepts are related in a type hierarchy Concepts are defined with axioms: Knowledge Interchange Format (KIF) based on first order predicate calculus and atomic elements 6th International PLAIN language Conference 11-14th October, Amsterdam

30 Concepts by ontological observations
Types and Roles among the hyponyms of dog in Wordnet: husky, lapdog; toy dog; hunting dog; working dog; dalmatian, coach dog, carriage dog; basenji; pug, pug-dog; Leonberg; Newfoundland; Great Pyrenees; spitz; griffon, Brussels griffon, Belgian griffon; corgi, Welsh corgi; poodle, poodle dog; Mexican hairless; pooch, doggie, doggy, barker, bow-wow; cur, mongrel, mutt Current WordNet treatment: (1) a husky is a kind of dog (2) a husky is a kind of working dog ‏ What’s wrong? (2) is defeasible, (1) is not: *This husky is not a dog => RIGID TYPE‏ This husky is not a working dog => ROLE, NON-RIGID 6th International PLAIN language Conference 11-14th October, Amsterdam

31 Ontology versus wordnet
Hierarchy of disjunct types: Canine  PoodleDog; NewfoundlandDog; GermanShepherdDog; Husky Wordnet: NAMES for TYPES: {poodle}EN, {poedel}NL, {pudoru}JP ((instance x Poodle)‏ LABELS for ROLES: {watchdog}EN, {waakhond}NL, {banken}JP ((instance x Canine) and (role x GuardingProcess))‏ 6th International PLAIN language Conference 11-14th October, Amsterdam

32 Properties of the Ontology
Minimal: terms are distinguished by essential properties only Comprehensive: includes all distinct concepts types of all Grid languages Allows definitions via KIF of all words that express non-rigid, non-essential properties of types Logically valid, allows inferencing 6th International PLAIN language Conference 11-14th October, Amsterdam

33 Ontology versus Wordnet
Not added to the type hierarchy: {straathond}NL (a dog that lives in the streets)‏ ((instance x Canine) and (habitat x Street))‏ Added to the type hierarchy: {klunen}NL (to walk on skates from one frozen body to the next over land)‏ KluunProcess => WalkProcess Axioms: (and (instance x Human) (instance y Walk) (instance z Skates) (wear x z) (instance s1 Skate) (instance s2 Skate) (before s1 y) (before y s2) etc… National dishes, customs, games,.... 6th International PLAIN language Conference 11-14th October, Amsterdam

34 Ontology versus Wordnet
Refer to sets of types in specific circumstances or to concept that are dependent on these types, next to {rivierwater}NL there are many others: {theewater}NL (water used for making tea)‏ {koffiewater}NL (water used for making coffee)‏ {bluswater}NL (water used for making extinguishing file)‏ Relate to linguistic phenomena: gender, perspective, aspect, diminutives, politeness, pejoratives, part-of-speech constraints 6th International PLAIN language Conference 11-14th October, Amsterdam

35 KIF expression for gender marking
{teacher}EN ((instance x Human) and (agent x TeachingProcess))‏ {Lehrer}DE ((instance x Man) and (agent x TeachingProcess))‏ {Lehrerin}DE ((instance x Woman) and (agent x TeachingProcess))‏ 6th International PLAIN language Conference 11-14th October, Amsterdam

36 KIF expression for perspective
sell: subj(x), direct obj(z),indirect obj(y) buy: subj(y), direct obj(z),indirect obj(x) FinancialTransaction (and (instance x Human)(instance y Human) (instance z Entity) (instance e FinancialTransaction) (source x e) (destination y e) (patient e)‏ The same process but a different perspective by subject and object realization: marry in Russian two verbs, apprendre in French can mean teach and learn 6th International PLAIN language Conference 11-14th October, Amsterdam

37 Advantages of the Global Wordnet Grid
Shared and uniform world knowledge: universal inferencing uniform text analysis and interpretation More compact and less redundant databases More clear notion how languages map to the knowledge better criteria for expressing knowledge better criteria for understanding variation 6th International PLAIN language Conference 11-14th October, Amsterdam

38 6th International PLAIN language Conference
Future 6th International PLAIN language Conference 11-14th October, Amsterdam

39 Language technology: a hole in one!
Tiger Woods golf club(s) clubs for golf Golf at the club thesaurus golf sticks Linguistic analysis Synonyms, Semantic network golf clubs 6th International PLAIN language Conference 11-14th October, Amsterdam

40 Index concepts rather than words
Meaning of a word in context: Domain of the document: Juventus => football Topic of the paragraph: transfer scandal => business, crime Phrase: linguistically-motivated combination of words: [wing player]football player in [police cell]jail Topic of the query: Can I order chicken wings? => food Phrase: [chicken wings]dish 6th International PLAIN language Conference 11-14th October, Amsterdam

41 Expansion with clear hyponymy
dog hunting dog puppy dachshund lapdog poodle bitch street dog watchdog short hair dachshund long hair dachshund Expansion from a type to roles 6th International PLAIN language Conference 11-14th October, Amsterdam

42 Expansion with clear hyponymy
dog hunting dog puppy dachshund lapdog poodle bitch street dog watchdog short hair dachshund long hair dachshund Expansion from a role to types and other roles 6th International PLAIN language Conference 11-14th October, Amsterdam

43 6th International PLAIN language Conference
Thought Objects in reality Ontology Expression 携帯電話 (keitaidenwa )‏ Texts Knowledge & information Useful and effective behavior: reason over knowledge collect information and data deliver services and be helpful 6th International PLAIN language Conference 11-14th October, Amsterdam

44 Automotive ontology: (http://www.ontoprise.de)‏
6th International PLAIN language Conference 11-14th October, Amsterdam

45 6th International PLAIN language Conference
Who uses ontologies? 6th International PLAIN language Conference 11-14th October, Amsterdam

46 Make word meanings effective!
Irion Technologies makes smart language technology solutions: Knowledge mining: automatic extraction of knowledge from text Cooperative dialogue systems: Access to information and services: regardless of choice of words regardless of the structuring of the information possibly using a given structuring Cooperates with the user: Ask the user for help, instructions, confirmation and explanations 6th International PLAIN language Conference 11-14th October, Amsterdam

47 6th International PLAIN language Conference
Strings Phrases Concepts Facts support Model Price In stock Docs Tele Commu- nication Text cell phone accessories repair Concept Detection Fact Extraction Domain Parsing Semantic Network Words Concepts Ontology Concepts Relations Domain Classifier Grammar 6th International PLAIN language Conference 11-14th October, Amsterdam

48 6th International PLAIN language Conference
Dialogue system Question Analysis Word mobile head phone Concept Topic detection Search Engine reparair information accessories products Dialogue Manager User Model Intention Satisfaction Emotion Information State: Positive Negative Relations Can I help you? My head phone is broke. Would you like repair or products? I want to buy a new one. Can yousay more about products? Text Analysis It is for my cell phone. Can you give more details? It is a Nokia 6110 Website I got the following accessoires for you. Please have a look. That is not what I want! 6th International PLAIN language Conference 11-14th October, Amsterdam

49 Communicative dialog system
Dialogue system that cooperates with user: Detect intention: complaint, buy, support, information Measure satisfaction: happy, emotions Create more context than simple key words and deliver more precision: answers instead of hits. 6th International PLAIN language Conference 11-14th October, Amsterdam

50 Communicative dialog system
Prevent deadlocks: Detects vagueness and ambiguity (what meaning of cell?)‏ Detect topic changes Uses negative feedback: “No jails, I want cell phones!” Can handle out-of-domain questions (users do not know what the system knows) : "We do not have hotel rooms but we do have electronic equipment". "No, we do not have portophones but we do have other electronic equipement such as cell phones" space object room equipment hotel room cell phone portophone 6th International PLAIN language Conference 11-14th October, Amsterdam

51 6th International PLAIN language Conference
THANK YOU FOR YOUR ATTENTION 6th International PLAIN language Conference 11-14th October, Amsterdam


Download ppt "The Global Wordnet Grid: anchoring languages to universal meaning"

Similar presentations


Ads by Google