Presentation is loading. Please wait.

Presentation is loading. Please wait.

Division of semantic labor in the Global WordNet Grid Piek Vossen, VU University Amsterdam German Rigau, University of the Basque Country 5 th Global Wordnet.

Similar presentations


Presentation on theme: "Division of semantic labor in the Global WordNet Grid Piek Vossen, VU University Amsterdam German Rigau, University of the Basque Country 5 th Global Wordnet."— Presentation transcript:

1 Division of semantic labor in the Global WordNet Grid Piek Vossen, VU University Amsterdam German Rigau, University of the Basque Country 5 th Global Wordnet Conference Mumbai, India, Jan 30 – Feb 5, 2010

2 Overview KYOTO as a domain implementation of the Global Wordnet Grid Scope of knowledge integration Division of linguistic labor How to integrate resources? How to make inferences?

3 KYOTO – some statistics European-Asian project March 2008 – March countries (The Netherlands, Italy, Germany, Spain, Taiwan, Japan, Czech Republic) 12 sites –Universities & research institutes: VUA, CNR-ILC, CNR-IIT, BBAW, EHU, AS, NICT, Masaryk –Companies: Synthema, Irion –User organizations: ECNC, WWF 7 languages (English, Italian, Japanese, Dutch, Spanish, Basque, Chinese)

4 KYOTO – Overall architecture Overview of the KYOTO process

5 GWC2010, Mumbai 5 Applying ontology mappings

6 GWC2010, Mumbai 6 Gobal Wordnet Grid Domain Ontology Base concepts Wn DOLCE/SUMO OntoWordnet Domain V

7 GWC2010, Mumbai 7 Available repositories in KYOTO Environment domain Term database: 500,000 terms per 1,000 documents per language Open data project: –DBPedia: 2.6 million things, including at least 213,000 persons, 328,000 places, 57,000 music albums, 36,000 films, 20,000 companies. The knowledge base consists of 274 million pieces of information (RDF triples) –GeoNames: 8 million geographical names and consists of 6.5 million unique features whereof 2.2 million populated places and 1.8 million alternate names Domain thesauri and taxonomies: Species 2000: 2,1 million species Wordnets for 7 languages: about 50,000 to 120,000 synsets per language Ontologies: SUMO, DOLCE, SIMPLE

8 GWC2010, Mumbai 8 Domain T TV TV V T TT Species Domain Kyoto Knowledge Base Ontology Base concepts Wn DBPedia Terms 500K 2,100K DOLCE/SUMO OntoWordnet Terms 500K Species 2,100K Domain V

9 GWC2010, Mumbai 9 Species in the ontology - Implies to store 2.1 million species twice in the ontology.

10 GWC2010, Mumbai 10 Should all knowledge be stored in the central ontology? Vocabularies are too large for full inferencing with current reasoners Vocabularies are linguistically too diverse to be represented in an ontology Inferencing capabilities of formal ontologies is not needed for all levels of knowledge

11 GWC2010, Mumbai 11 Modeling knowledge in a domain Knowledge needs to be divided over different lexical and ontological layers: –Precisely define the relations between lexical and ontological layers –Precisely define the inferencing based on the distributed knowledge layers

12 GWC2010, Mumbai 12 Division of linguistic labor principle Putnam 1975: –No need to know all the necessary and sufficient properties to determine if something is "gold" –Assume that there is a way to determine these properties and that domain experts know how to recognize instances of these concepts. –Speakers can still use the word "gold" and communicate useful information

13 GWC2010, Mumbai 13 Division of semantic labor principle Digital version of Putnam (1975): –Computer does not need to have all the necessary and sufficient properties to determine if something is a "European tree frog" –Computer assumes that there is a way to determine this and that domain experts (people) know how to recognize instances of these concepts. –Computers can still reason with semantics and do useful stuff with textual data

14 GWC2010, Mumbai 14 What does the computer need to know? Distinction between rigid and non-rigid (Welty & Guarino 2002): –being a "cat" is essential to individual's existence and therefore rigid –being a "pet" is a temporarily role and therefore non- rigid; a cat can become a pet and stop being a pet without ceasing to exist –Felix is born as a cat and will always be a cat, but during some period Felix can become a pet and stop being a pet while he continuous to exist as a cat All 2.1 million species are rigid concepts

15 GWC2010, Mumbai 15 What does the computer need to know? Roles and processes in documents have more information value than the defining properties of species: –Species defined in terms of physical properties already known to expert; –Roles such as "invasive species", "migration species", "threatened species" express THE important properties of instances of species Roles are typically the terms we learn from the text not the species!

16 GWC2010, Mumbai 16 Wordnet-ontology-relations Rigid synset relations to ontology: –Synset:Endurant(Object); Synset:Perdurant(Event); Synset:Quality: –sc_equivalenceOf (= relation in WN-SUMO) or sc_subclassOf (+ relation in WN-SUMO) Non-rigid synset relations to ontology: –Synset:Role; Synset:Endurant(Object); Synset:Perdurant(Event) –sc_domainOf: range of ontology types that restricts a role –sc_playRole: role that is being played –sc_participantOf: the process in wich the role is played Rigidity can be detected automatically (Rudify, 80% precision, IAG 80%) and is stored in wordnets as attributes to synsets

17 Global Wordnet Grid Model perdurant change-of-location migration endurant object organism bird role done-by has-source has-destination has-path some has bird_1_Nsc_equivalentOf bird rigid English Wordnet in WN-LMFKYOTO Ontology in OWL-DL (Extension of DOLCE LT) migration_bird_1_Nsc_domainOf bird non-rigidsc_playRole done-by sc_participantOf migration migration_4_Nsc_equivalentOf migration migrate_1_Vsc_equivalentOf migration duck_1_N, rigid hyponym subclass

18 Global Wordnet Grid Model perdurant change-of-location migration endurant object organism bird role done-by has-source has-destination has-path some has bird_1_Nsc_equivalentOf bird rigid English Wordnet in WN-LMFKYOTO Ontology in OWL-DL (Extension of DOLCE LT) migration_bird_1_Nsc_domainOf bird non-rigidsc_playRole done-by sc_participantOf migration migration_4_Nsc_equivalentOf migration migrate_1_Vsc_equivalentOf migration duck_1_N, rigid subclass Dutch Wordnet migrerende dieren_1_Nsc_domainOf organism (migrating species)sc_playRole done-by non-rigidsc_participantOf migration equivalent_hypernym eng n (bird) eend_1_N (duck) equivalent eng n (duck) Spanish Wn, Basque Wn Italian Wn, Japanese Wn Chinese Wn.... Cross-lingual equivalence mappings are expressed through wordnet mappings

19 Wordnet to ontology mappings {create, produce, make}Verb, English -> sc_ equivalenceOf construction {artifact, artefact}Noun, English -> sc_domainOf physical_object -> sc_playRole result-existence -> sc_participantOf construction {kunststof}Noun, Dutch // lit. artifact substance -> sc_domainOf amount_of_matter -> sc_playRole result-existence -> sc_participantOf construction {meat}Noun, English -> sc_domainOf cow, sheep, pig -> sc_playRole patient -> sc_participantOf eat {,, }Noun, Chinese -> sc_domainOf animal -> sc_playRole patient -> sc_participantOf eat { غذاء, لحم, طعام}Noun, Arabic -> sc_domainOf cow, sheep -> sc_playRole patient -> sc_participantOf eat

20 Wordnet to ontology mappings {teacher}Noun, English -> sc_domainOf human -> sc_playRole done-by -> sc_participantOf teach {leraar}Noun, Dutch // lit. male teacher -> sc_domainOf man -> sc_playRole done-by -> sc_participantOf teach {lerares}Noun, Dutch // lit. female teacher -> sc_domainOf woman -> sc_playRole done-by -> sc_participantOf teach

21 Wordnet-LMF

22 WN-LMF Synset relations

23 WN-LMF Synset relations

24 GWC2010, Mumbai 24 Division of labor in knowledge sources Eleutherodactylus augusti Eleutherodactylus Leptodactylidae Anura Amphibia Chordata Animalia Eleutherodactylus atrabracus barking frog frog:1, toad:1, toad frog:1, anuran:1, batrachian:1, salientian:1 amphibian:3 vertebrate:1,craniate:1 chordate:1 animal:1 Base Concept 2.1 million species100,000 synsets2,000 types endurant physical-object organism endemic frog endangered frog poisonous frog alien frog 500,000 terms Skos database Wordnet-LMFOntology-OWL-DL Term database perdurant endanger

25 GWC2010, Mumbai 25 How to make inferences? Sparql queries to large Virtuoso databases: Aligned Species 2000, DBPedia Sql queries to term database Graph matching on wordnets stored in DebVisDic Reasoning on a small ontology

26 KYOTO Project meeting, Jan 13-14th 2010, PolyU Hong Kong 26 Ontotagger applied to KAF Apply WSD to every term in the KAF representation of a text For each term in KAF representation of a text: (a)If wordnet synset (WSD) then check for ontology mappings, if none traverse wordnet hierarchy to find first mapping (b)Else check the SKOS database for wordnet mapping, if necessary traverse broader relations up to the first wordnet mapping and go to a.) (c)Else check the term database for wordnet mappings, if necessary traverse parent relations up to the first wordnet mapping and go to a.) Collect all mappings from the ontology and all (relevant) ontological implications and insert them into the KAF representation of the text.

27 KYOTO Project meeting, Jan 13-14th 2010, PolyU Hong Kong 27 Examples 1.Migration birds in the Humber Estuary. 2.The migration of birds to the Humber Estuary 3.Bird migration in the Humber Estuary 4.Birds that migrate to the Humber Estuary

28 Annotation of ontological implications in KAF

29 Annotation of ontological implications in KAF

30 Annotation of ontological implications in KAF

31 KYOTO Project meeting, Jan 13-14th 2010, PolyU Hong Kong 31 Kybot profiles IF T1 + to + T2 & T1.impliedType="change_of_location" & T1.impliedRole="has-target" & T2.Type="location" THEN IF T1 + from + T2 & T1.impliedType="change_of_location" & T1.impliedRole="has-source" & T2.Type="location" THEN

32 Kybot Knowledge Patterns

33 GWC2010, Mumbai 33 Conclusion: Should all knowledge be stored in the central ontology? Vocabularies are too large for full inferencing Vocabularies are linguistically too diverse to be represented in an ontology Inferencing capabilities of formal ontologies is not needed for all levels of knowledge A model of division of labor (along the lines of Putnam 1975) in which knowledge is stored in 3 layers: –SKOS vocabularies and term databases –wordnet (WN-LMF) –ontology (OWL-DL), Each layer supports different types of inferencing ranging from Sparql queries, graph algorithms to reasoning. Mapping relations that support the division of labour and different types of inferencing and that allow for the encoding of language- specific lexicalizations and restrictions.

34 Conclusions Ontologies are abstract and minimal and lexicons are large and rich Semantic relations in lexicons are complementary to ontological relations Semantic relations expressed in a language system should be compatible with ontologies Large vocabularies of types (rigid things in the world) can be mapped to the ontology through combinations of lexical relations and basic ontological mappings Lexicalizations of contextual and subjective concepts need to be expressed through more complex relations Equivalences across languages partially through ontological expressions and partially across lexicons

35 Applying WSD to terms

36 GWC2010, Mumbai 36 How to integrate the data? Species 2000 vocabulary: 2,171,281 concepts in MySql database with parent relations: –Kingdom -> Class -> Order -> Family -> Genus -> Species -> Infra species –Animalia -> Chordata -> Amphibia -> Anura -> Leptodactylidae - > Eleutherodactylus -> Eleutherodactylus augusti Converted to SKOS format Aligned with DBPedia for language labels Aligned with Wordnet using vocabulary and relation mappings Published in Virtuoso, accessed with SPARQL queries

37 GWC2010, Mumbai 37 How to integrate data? Extending language labels using DBPedia Language Species 2000DBPedia extension English 69,045834,821 Spanish 1,731358,499 Italian 17,552215,511 Dutch 5,397185,437 Chinese 58,77483,756 Japanese 4,625139,754

38 GWC2010, Mumbai 38 Vocabulary match with Wordnet synsets If polysemous then SSI-Dijkstra weighting of senses based on the hyperonym chain Results still to be evaluated: –Animalia (animal:1)-> Chordata (chordate:1) - > Amphibia (amphibian:3) -> Anura -> Leptodactylidae -> Eleutherodactylus -> Eleutherodactylus augusti (barking frog:1) How to integrate data? Alignment Species 2000 with wordnet

39 GWC2010, Mumbai 39 Word-sense-disambiguation is applied to terms in KAF (Kyoto Annotation Format) Term hierarchy is extracted from KAF: –land:5 grassland:1 -> biome:1 woodland:1 -> biome:1 cropland urban land Results still to be evaluated: SemEval2010 How to integrate data? Alignment of terms with wordnet


Download ppt "Division of semantic labor in the Global WordNet Grid Piek Vossen, VU University Amsterdam German Rigau, University of the Basque Country 5 th Global Wordnet."

Similar presentations


Ads by Google