Presentation is loading. Please wait.

Presentation is loading. Please wait.

KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content and Semantics Piek Vossen.

Similar presentations


Presentation on theme: "KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content and Semantics Piek Vossen."— Presentation transcript:

1 KYOTO (ICT ) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content and Semantics Piek Vossen Tienjarig jubileum NL-TERM, October 2008, Amsterdam

2 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam KYOTO (ICT ) Overview Title : Yielding Ontologies for Transition-Based Organization Funded: –7 th Framework Program-ICT of the European Union: Intelligent Content and Semantics –Taiwan and Japan funded by national grants Goal: –Platform for knowledge sharing across languages and cultures –Enables knowledge transition and information search across different target groups, transgressing linguistic, cultural and geographic boundaries. –Open text mining and deep semantic search –Wiki environment that allows people in the field to maintain their knowledge and agree on meaning without knowledge engineering skills URL: Duration: –March 2008 – March 2011 Effort : –364 person months of work.

3 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam Consortium 1.Vrije Universiteit Amsterdam (Amsterdam, The Netherlands), 2.Consiglio Nazionale delle Ricerche (Pisa, Italy), 3.Berlin-Brandenburg Academy of Sciences and Humantities (Berlin, Germany), 4.Euskal Herriko Unibertsitatea (San Sebastian, Spain), 5.Academia Sinica (Tapei, Taiwan), 6.National Institute of Information and Communications Technology (Kyoto, Japan), 7.Irion Technologies (Delft, The Netherlands), 8.Synthema (Rome, Italy), 9.European Centre for Nature Conservation (Tilburg, The Netherlands), Subcontractors: –World Wide Fund for Nature (Zeist, The Netherlands), –Masaryk University (Brno, Czech)

4 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam KYOTO (ICT ) Overview Languages: –English, Dutch, Italian, Spanish, Basque, Chinese, Japanese Domain: –Environmental domain, BUT usable in any domain Global: –Both European and non-European languages Available: –Free: as open source system and data (GPL) Future perspective: –Content standardization that supports world wide communication –Global Wordnet Grid -> database that interlinks all wordnets in the world to a shared ontology of meaning

5 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam zieke, patiënt chronisch zieke ; langdurig zieke psychisch/geestelijk zieke HYPONYM arts, dokter ziekte, stoornis genezen ρ-PATIENT behandelen ρ-PATIENT STATE maagaandoening, nieraandoening, keelpijn HYPONYM ρ-CAUSE ρ-AGENT ρ-PROCEDURE ρ-LOCATION fysiotherapie medicijnen etc. ziekenhuis, etc. kind co-ρ- AGENT-PATIENT kinderarts HYPONYM Wordnet = network of semantic relations between words in a language

6 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam Images Index Docs URLs Experts Search Dialogue CO2 emission water pollution Capture Citizens Governors Companies Domain Wikyoto Wordnets AbstractPhysical Top Middle waterCO2 Substance Universal Ontology Process Environmental organizations Environmental organizations Global Wordnet Grid Kybots Fact Mining Tybots Concept Mining Sudden increase of CO2 emissions in 2008 in Europe

7 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam qualifies Lexicon versus Ontology AbstractPhysical H20CO2 Element Ontology Process Physical Change Organism Ecosystem services -Nature as a resource -Nature for waste absorption -State of nature -Threats to nature rural products sustainable products green roof alien invasive species species migration ecosystem-based drinking water production Artifacts green house gas Spider Roof type

8 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam Concepts & Facts Conceptual knowledge: general & generic knowledge about –ClimateChange physical change affecting the climate => definition of climate in a region during a period of time caused by another change causing yet other changes

9 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam Concepts & Facts Fact: –A case of ClimateChange has been observed: factual and significant change in the climate (temperature, humidity, wind direction, rain fall, etc.) in a particular region, e.g. the Alps. Time period Caused by CO2 emissions, North Atlantic gulf stream Causes decrease of biodiversity measured in specific populations: fish, birds, insects => counts of populations

10 ICT System architecture

11 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam System components Wikyoto = wiki environment for a social group: –to model the terms and concepts of a domain and agree on their meaning, within a group, across languages and cultures –to define the types of knowledge and facts of interest Tybots = Term extracting robots, extract term data from text corpus Kybots = Knowledge yielding robots, extract facts from a text corpus Linguistic processors: –tokenizers, segmentizers, taggers, grammars –named entity recognition –word sense disambiguation –generate a layered text annotation in Kyoto Annotation Format (KAF)

12 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam Capture Server Document Base Linear KAF Document Base Linear KAF Tybot server (Term Extraction) Tybot server (Term Extraction) Extracted Terms Generic K-TMF Extracted Terms Generic K-TMF Term Editor (Wikyoto) Term Editor (Wikyoto) Domain Ontology OWL_DL Domain Ontology OWL_DL Domain Wordnet K-LMF Domain Wordnet K-LMF Kybot Server (Fact Extraction) Kybot Server (Fact Extraction) Semantic Annotation Semantic Annotation Document Base Linear Generic KAF Document Base Linear Generic KAF Document Base Linear KAF Document Base Linear KAF Kybot Editor Kybot Profiles Kybot Profiles Concept User Fact User

13 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam What Tybots do... Input are text documents –Green house gases, such as CO2 –CO2 and other green house gases Linguistic processors generate KAF annotation (sequential): –morpho-syntactic analysis –semantic roles –named entities –wordnet and ontology mappings Output are term hierarchies in TMF (generic): –structural parent relations: CO2 is a green house gas is a gas –quantified structural and semantic relations –statistical data –generalized semantic mappings

14 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam Generic algorithm Extraction of a structural term hierarchy Advantage: conceptual coherence Steps: –extraction of potential terms using the morpho- syntactic structure –statistical selection of salient terms –conceptual selection of dominant terms –contextual selection of terms

15 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam Terms from morpho-syntactic structure Words that are the syntactic head of an NP, e.g.: card, wing-player Word combinations (excluding determiners and adverbs) that include the syntactic head, e.g.: yellow card, yellow card for wing-player. The head of a compound: player as the head of wing-player, name as the head of username.

16 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam Statistical extraction of terms Frequency of terms by distribution over reference corpus: –Salience = normFreq * normRef Where normFreq = normalized frequency of terms on the website and normRef = normalized count of website occurrence in the reference corpus: –normFreq = nTermFrequencynWords / nPages –normRef = 1-((nWebsitesnWords) / (referenceCorpusSize))

17 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam Statistical extraction of terms Table 2: Salience filtering of terms Excluded for SalienceSelected for Salience Preferred term Site freq. Nr. of pagesSaliencePreferred term Site freq. Nr. of pages Salienc e ressource humaine Oxy/Conductimètre portable221.0 sécurité Multiparamétre portable221.0 sites convection naturelle421.0 mobilité Conductimètre portable421.0 qualité produits Universelle convection naturelle produits Pipette graduée satisfaction client Pipette contact Photomètre place Perce-bouchon formation professionnelle Nettoyants autolaveurs ligne Mini-UniPrep gestion Microscope conception Micro-pipettes capillaire groupes Micropipettes démarche qualité Loupes binoculaire environnement l'enseignement primaire moyens Incubateurs réfrigérés

18 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam Conceptual analysis of terms Number of children and further descendants of a branch; Cumulated frequency of the descendants; Conceptual profile of the descendants compared to the profile of all the terms; –Domain classification of the complete tree –Domain classification of the branches –Proportion of overlap

19 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam Structure to relation table for terms Term phraseStructureRoleType populations of terrestrial speciesofPartSpecies populations of vertebrate species:ofPartSpecies populations of 1313 vertebrate species ­ fish, amphibians, reptiles, birds, mammals ­ from all around the world ofPartSpecies the restoration of wild species populations and their habitats ofPatientRestore The increase in the footprint is driven by modest rates of growth in both population and demand for biocapacity inPatientIncrease at half the rate of population increaseofSpeedIncrease the relative proportion of current biocapacity or world population in each region inLocationRegion the growth of the world population and consumptionofPatientIncrease trends in their populationsinPatientTrend? The rapid rate of population decline in tropical speciesofSpeedDecline all countries with populations greater than 1 millionwithPossessCountry Increase in populationinPatientIncrease species populationsModifierPartSpecies MARINE SPECIES POPULATIONSModifierPartSpecies

20 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam Source Documents Linguistic Processors [[the emission] NP [of greenhouse gases] PP [in agricultural areas] PP ] NP Morpho-syntactic analysis TYBOT Concept Miners AbstractPhysical H20CO2 Substance CO2Emission WaterPollution Ontology Process Chemical Reaction GlobalWarming GreenhouseGas Ontologize Axiomatize (instance s1 Substance) (instance e1 Warming) (katalyist s1 e1) Synthesize in of Term hierarchy emissiongas greenhouse gas area agricultural area CO2 naturalprocess:1 English Wordnet emission:2gas:1 area:1 greenhouse gas:1 rural area:1 geographical area:1 region:3 location:3substance:1 emission:3 farmland:2 CO2 Conceptual modeling

21 ICT Wikyoto

22 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam Do populations always consist of marine species? A..... decline... population.....Z Are terrestrial species never marine species? Simplified Term Fragment population marine species terrestrial species Simplified Ontology Fragment ?Population Group Kyoto Server Hidden Shown.... populations declined.....terrestrial and marine species.. in forests.....declined Do populations consist of marine species? Interview Are terrestrial species a type of populations? Interview.... populations such as terrestrial and marine species..... Smart Kytext KAFD E -TN Tybots pdf FactAF KAF Kybots plugin D E -KOND E -WN Facts in RDF G-WN Wordnets in LMFOntologies in OWL-DL G-KON WIKIPEDIA SUMO DOLCE GEO FRAMENET

23 ICT Editing the domain wordnet

24 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam A..... decline... population.....Z group terrestrial species population species population population of vertebrate species marine species population people population 1. Validate Term Hierarchy: -Defining phrases: - document - domain corpus - Google -Other phrases -Wiki classes -Generic-WN classes.... populations such as terrestrial and marine species..... Are terrestrial species a type of populations? Are terrestrial species never marine species? WN & DOC D E -WN G-WN: Synset: ENG n {population:2} a group of organisms of the same species populating a given area SUMO: +inhabits -> +Group Wiki: In sociology and biology a population is the collection of inter-breeding organisms of a particular species.sociologybiologyspecies Smart KyText

25 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam land grasslandcroplandwoodland country:1, state:6, land:5 domain:2, demesne:2, land:4 land:1 land:2, ground:7, soil:3 object:1, physical object:1 real property:1, real estate:1, realty:1 land:3, dry land:1, earth:3,ground:1, solid ground:1, terra firma:1 administrative district:1, administrative division:1, territorial division:1 region:3 biome:1 urban land mediterranean woodland Wordnet & Doc agricultural urban land Wordnet, Doc Difficult wordnet mapping

26 ICT Editing the domain ontology

27 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam Ontologization of terms A domain term is a disjoint hyponym in the domain wordnet and is propagated to the domain ontology as a new Type. A domain term is not a disjoint hyponym and therefore we do not propose a new ontology extension but we still need to map the term to the ontology, i.e. make the ontological constraint explicit.

28 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam A..... decline... population.....Z grouppopulation terrestrial species population species population population of vertebrate species marine species population people + ?Population D E -WN D E -ON Group = 1. Validate Implied Ontological Constraints: - Generalize semantic relations - Interpret relation given ontology parent - Formulate interview using highlighted text Can populations decline? Do populations consist of marine species? Do populations always consist of marine species? Do populations always decline? Are populations located in forests? Are populations always located in forests?.... populations of marine species populations declined.....terrestrial and marine species.. in forests.....declined Smart KyText 2. Validate additional constraints - Select dominant relations - Formulate interviews using highligted text Sumo axiom for Group (Hidden Data) (=> (and (instance ?GROUP Group) (member ?MEMB ?GROUP)) (instance ?MEMB Agent))andinstanceGroupmemberinstanceAgent

29 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam Derived hidden structures New constraint Population in D E -ON: (subclass Population Group) (=> (and (instance ?POP Population) (member ?MEMB ? POP)andinstancePopulationmember (instance ?MEMB Species))) Extended constraint Population in D E -ON: (subclass Population Group) (=> (and (instance ?POP Population) (member ?MEMB ? POP)andinstancePopulationmember (instance ?MEMB Species) (*instance ?REGION Region) * indicates possible relations (*inhabits ?MEMB ?REGION) * indicates possible relations (*location ?MEMB ?REGION))) * indicates possible relations

30 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam Cross-lingual validation Population is added by Group-1, with constraints derived from language L1 Group-2 uses languages L2 and observes a domain Type in the domain ontology with an English gloss, description -> possibly proposed through WSD Select/confirm existing domain type as a candidate for validation Smart Ky-Text in Language L2 and the Term hierarchy are used to generate questions in L2 Group-2 can confirm or deny constraints for L2 and add new constraints Cross-lingual and cross-group validation is added to the constraints in the ontology

31 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam Cross validated structures Population in D E -ON: (subclass Population Group) (=> (and (instance ?POP Population) (member ?MEMB ? POP)andinstancePopulationmember (instance ?MEMB Species (xval G1-ENG G2-NLD G3-NLD G4-ITA)) (instance ?REGION Region(xval G1-ENG G2-NLD)) (*inhabits ?MEMB ?REGION (xval G3-NLD)) (*location ?MEMB ?REGION (xval G1-ENG G4-ITA)))))

32 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam Capture Server Document Base Linear KAF Document Base Linear KAF Tybot server (Term Extraction) Tybot server (Term Extraction) Extracted Terms Generic K-TMF Extracted Terms Generic K-TMF Term Editor (Wikyoto) Term Editor (Wikyoto) Domain Ontology OWL_DL Domain Ontology OWL_DL Domain Wordnet K-LMF Domain Wordnet K-LMF Kybot Server (Fact Extraction) Kybot Server (Fact Extraction) Semantic Annotation Semantic Annotation Document Base Linear Generic KAF Document Base Linear Generic KAF Document Base Linear KAF Document Base Linear KAF Kybot Editor Kybot Profiles Kybot Profiles Concept User Fact User

33 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam What Kybots do Input: –KAF annotations of text: sequential & encoded by language –Conceptual frame from the ontology –Expression rules for frame to language mapping: Wordnet in a language Morpho-syntactic mappings rules Output are a database of facts in KAF/FactAF (generic): –aggregated facts –inferred facts –language neutral

34 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam Fact mining KYBOT = Knowledge Yielding Robot Logical expression –(instance, e1, Burn) (instance, e2, Warming) (cause, e1, e2) –(instance, s1, CO2) (instance, e1, GlobalWarming) (katalyist, s1,e1) Expression rules per language : –[N[s1]V[e1]] S –[N[e1]N[s1] N –[[N[e1]][prep][N[s2]] NP Ontology * Wordnets –Capabilities –Conditions: WNT -> adjectives, WNT -> nouns –Causes: WNT -> verbs, WNT -> nouns –Process: DamageProcess, ProduceProcess Kybot compiler –kybots = logical pattern+ ontology + WN[Lx] + ER[Lx]

35 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam Fact mining by Kybots Source Documents Linguistic Processors [[the emission] NP [of greenhouse gases] PP [in agricultural areas] PP ] NP Morpho-syntactic analysis (KAF) AbstractPhysical H2OCO2 Substance CO2 emission water pollution OntologyWordnets & Linguistic Expressions Process Chemical Reaction Generic Logical Expressions [[the emission] NP ] Process: e1 [of greenhouse gases] PP Patient: s2 [in agricultural areas] PP ] Location: a3 Fact analysis Patient Domain

36 emission:2 gas:1 greenhouse gas:1 substance:1 emission:3 natural process:1 C02 Lexical database: wordnet AbstractPhysical H20CO2 Substance CO2 Emission Process ChemicalReaction Global Warming Greenhouse Gas Ontology Maximal abstraction& integrity Language neutral integrity gas green house gas -> gas -increase(AG) -in 2003 (TIME) CO2 -> green house gas -emission (PA) -in European countries (LO) Term database Generic text based Sudden increase of green house gases in C02 emission in European countries....Green house gases such as C02,.... Text corpus Linear text Concept Mining by Tybots Synthesize Text mining by Kybots Ontologize Axiomatize (instance s1 Substance) (instance e1 Warming) (katalyist s1 e1)

37 ICT Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam Thank you for your attention


Download ppt "KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content and Semantics Piek Vossen."

Similar presentations


Ads by Google