Presentation is loading. Please wait.

Presentation is loading. Please wait.

Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Similar presentations


Presentation on theme: "Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content."— Presentation transcript:

1 Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT ) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content and Semantics Piek Vossen, VU University Amsterdam

2 Asian Language Resources Summit, Phuket, March, Overview Background information Baseline for retrieval in environment domain System architecture Knowledge mining Conclusions

3 Asian Language Resources Summit, Phuket, March, KYOTO (ICT ) Overview Title : Knowledge Yielding Ontologies for Transition-Based Organization Funded: –7 th Framework Program-ICT of the European Union: Intelligent Content and Semantics –Taiwan and Japan funded by national grants Goal: –Open and free platform for knowledge sharing across languages and cultures –Wiki environment that allows people in the field to maintain their knowledge and agree on meaning without knowledge engineering skills –Bootstrap through open text mining & concept learning –Enables knowledge transition and information search across different target groups, transgressing linguistic, cultural and geographic boundaries. –Enables deep semantic search for facts and knowledge URL: (http://www.kyoto-project.eu/)http://www.kyoto-project.eu/ Duration: –March 2008 – March 2011 Effort : –364 person months of work.

4 Asian Language Resources Summit, Phuket, March, Consortium 1.Vrije Universiteit Amsterdam (Amsterdam, The Netherlands), 2.Consiglio Nazionale delle Ricerche (Pisa, Italy), 3.Berlin-Brandenburg Academy of Sciences and Humantities (Berlin, Germany), 4.Euskal Herriko Unibertsitatea (San Sebastian, Spain), 5.Academia Sinica (Tapei, Taiwan), 6.National Institute of Information and Communications Technology (Kyoto, Japan), 7.Irion Technologies (Delft, The Netherlands), 8.Synthema (Rome, Italy), 9.European Centre for Nature Conservation (Tilburg, The Netherlands), Subcontractors: –World Wide Fund for Nature (Zeist, The Netherlands), –Masaryk University (Brno, Czech)

5 Asian Language Resources Summit, Phuket, March, KYOTO (ICT ) Overview Languages: –English, Dutch, Italian, Spanish, Basque, Chinese, Japanese Domain: –Environmental domain, BUT usable in any domain Global: –Both European and non-European languages Available: –Free: as open source system and data (GPL) Future perspective: –Content standardization that supports world wide communication

6 State of the art in the environment domain

7 Asian Language Resources Summit, Phuket, March, Baseline for environment domain Mainly use Google, first 10 hits, no advanced options Textual search with linguistic enhancements but no real semantic search: –polluted water…. –polluting water…. Growing time & information pressure: –deliver actual information from diverse & dynamic sources –regional, local situations no general source –various subdomains government, legal, biology, health, industry –difficult access scientific publications –no time to read too much information and work pressure –dependent on trust: scientists environmentalist governmentgeneral public

8 Asian Language Resources Summit, Phuket, March, High-level targets & Low-level questions High level target (about 300 questions collected) –Are there huge negative effects with regard to ecological networks and alien invasive species? Low level facts that support answering the high level targets: –cases of alien invasion –amount of species –causal relations associated with these (increments of) invasions –causes related to ecological networks –limit in the same time and location boundary

9 Asian Language Resources Summit, Phuket, March,

10 10 Baseline retrieval results 6 persons, 30 high-level questions, Result Rank CONFIRMED DISAPPROVED UNDECIDED Total % % % % %96.77%914.29%249.23% %139.77%711.11% % %64.51%34.76%145.38% %64.51%23.17%166.15% %75.26%34.76%124.62% %64.51%46.35%124.62% %21.50%11.59%51.92% %32.26%11.59%83.08% %53.76%00.00%62.31% % % % % Total % % %260

11 Asian Language Resources Summit, Phuket, March, KYOTO's Solution Text mining: –Massive and accurate indexing of facts from vast amounts of text; –In any language/culture from scattered sources; –Again and again to detect trends and changes; –Direct relation between knowledge modeling effort and text mining Knowledge modeling: –automatic learning of terms and concepts from text in any language; –formalization of knowledge in computer usable format -> wordnets & ontologies Community software: –For experts in the field and not knowledge engineers –Continuous and collaborative effort: adapt to the changing domain; consensus in the field; consensus across languages and cultures –Produce interoperable, formal, standardized knowledge structures; –Relate knowledge structure to expressions in languages

12 Top Middle H20CO2 Substance Abstract Process Physical Ontology Environmental organizations Tybot: term yielding robot Kybot: knowledge yielding robot Wordnets Distributed, diverse & dynamic data 1 Capture text: "Sudden increase of CO2 emissions in 2008 in Europe" 2 CO2 emission 3 Wikyoto maintain terms & concepts 4 Index facts: Process:Emission Involves: CO2 Property:increase, sudden When: 2008 Where: Europe 5 Text & Fact Index Semantic Search 6 Citizens Governments Companies Domain CO2 Emission H20 Pollution Greenhouse Gas

13 System architecture

14 Original Document Base Original Document Base Keyword Search Semantic & Syntactic Base Kyoto Annotation Format (KAF) Semantic & Syntactic Base Kyoto Annotation Format (KAF) Linguistic Processor End User Semantic Search End User Data Flow Diagram of Kyoto System Fact Base Fact Extractor Fact User Kybot Term Base Term Extractor Tybot Multilingual Knowledge Base Wiki Term Editor Concept User Wikyoto Wordnets Ontologies interlinked

15 Asian Language Resources Summit, Phuket, March, Kyoto Annotation Format KAF Kyoto Annotation Format (Level 1) a multi-layered annotation format for: –Tokenizaton and word form segmentation –POS tagging –Lemmatization and Term extraction –Constituency Tagging –Dependency Tagging ENG N

16 Asian Language Resources Summit, Phuket, March, Semantic Annotation Semantic Annotation Format for: –Named Entity Recognition ( time, events, quant. …) –Word Sense Disambiguation (D-WSD) –Semantic Role Labeling (SRL) no synsets KAF level2 (SemKAF) ENG N

17 Asian Language Resources Summit, Phuket, March, KAF annotation : WSD

18 Asian Language Resources Summit, Phuket, March, Data formats Level of annotation: 1.Morpho-syntax annotation 2.Semantic annotation 3.Terms representation 4.Facts annotation 5.Wordnets 6.Ontologies Standard format } KAF <=(MAF, SYNAF, SEMAF) TMF KAF Wordnet-LMF OWL

19 Knowledge mining

20 Asian Language Resources Summit, Phuket, March, Knowledge mining Concept mining (Tybots): –Extract terms and relations in a language –Map the terms to an existing wordnet –Ontologize terms to concepts and axioms Fact mining (Kybots) –Define logical patterns –Define expression rules in a language

21 Asian Language Resources Summit, Phuket, March, What Tybots do... Input are text documents Linguistic processors generate KAF annotation (sequential): –morpho-syntactic analysis –semantic roles –named entities –wordnet and ontology mappings Output are term hierarchies in TMF (generic): –structural parent relations –quantified structural and semantic relations –statistical data

22 Asian Language Resources Summit, Phuket, March, Source Documents Linguistic Processors [[the emission] NP [of greenhouse gases] PP [in agricultural areas] PP ] NP Morpho-syntactic analysis TYBOT Concept Miners AbstractPhysical H20CO2 Substance CO2Emission WaterPollution Ontology Process Chemical Reaction GlobalWarming GreenhouseGas Ontologize Axiomatize (instance s1 Substance) (instance e1 Warming) (katalyist s1 e1) Synthesize in of Term hierarchy emissiongas greenhouse gas area agricultural area CO2 naturalprocess:1 English Wordnet emission:2gas:1 area:1 greenhouse gas:1 rural area:1 geographical area:1 region:3 location:3substance:1 emission:3 farmland:2 CO2 Conceptual modeling

23 Asian Language Resources Summit, Phuket, March, What Kybots do Input: –KAF annotations of text: sequential & encoded by language –Conceptual frame from the ontology –Expression rules for frame to language mapping: Wordnet in a language Morpho-syntactic mappings rules Output are a database of facts in FactAF (generic): –aggregated facts –inferred facts –language neutral

24 Asian Language Resources Summit, Phuket, March, Fact mining KYBOT = Knowledge Yielding Robot Logical expression –(instance, e1, Burn) (instance, e2, Warming) (cause, e1, e2) –(instance, s1, CO2) (instance, e1, GlobalWarming) (katalyist, s1,e1) Expression rules per language : –[N[s1]V[e1]] S e.g. "CO2 is emitted", "fine dust blocks sun-light" –[N[s1]N[e1] N e.g. "CO2 emission", "sun-light blocking" –[[N[e1]][prep][N[s2]] NP e.g. "emission of CO2", "sun light blocking by fine dust" Ontology * Wordnets –Capabilities: WNT -> adjectives ("explosive", "toxic"), WNT -> nouns ("explosive", "poison") –Causes: WNT -> verbs ("eat"), WNT -> nouns ("consumption") –Process: DamageProcess, ProduceProcess Kybot compiler –kybots = logical pattern+ ontology + WN[Lx] + ER[Lx]

25 Asian Language Resources Summit, Phuket, March, Fact mining by Kybots Source Documents Linguistic Processors [[the emission] NP [of greenhouse gases] PP [in agricultural areas] PP ] NP Morpho-syntactic analysis (KAF) AbstractPhysical H2OCO2 Substance CO2 emission water pollution OntologyWordnets & Linguistic Expressions Process Chemical Reaction Generic Logical Expressions [[the emission] NP ] Process: e1 [of greenhouse gases] PP Patient: s2 [in agricultural areas] PP ] Location: a3 Fact analysis Patient Domain semantic role labelling time & place aggregation from all relevant phrases and documents inferencing adding trust and reliability

26 Wikyoto

27 Asian Language Resources Summit, Phuket, March, Do populations always consist of marine species? A..... decline... population.....Z Are terrestrial species never marine species? Simplified Term Fragment population marine species terrestrial species Simplified Ontology Fragment ?Population Group Kyoto Server Hidden Shown.... populations declined.....terrestrial and marine species.. in forests.....declined Do populations consist of marine species? Interview Are terrestrial species a type of populations? Interview.... populations such as terrestrial and marine species..... Smart Kytext KAFD E -TN Tybots pdf FactAF KAF Kybots plugin D E -KOND E -WN Facts in RDF G-WN Wordnets in LMFOntologies in OWL-DL G-KON WIKIPEDIA SUMO DOLCE GEO FRAMENET

28 Kyoto Knowledge Base WnIT Domain WnEN Domain WnEU Domain WnNL Domain WnJP Domain WnCH Domain WnES Domain Ontology Domain Ontology

29 Potential impact

30 Asian Language Resources Summit, Phuket, March, Ultimate goal Global standardization and anchoring of meaning such that: –Machines can start to approach text understanding -> semantic web connects to the current web –Communities can dynamically maintain knowledge, concepts and their terms in an easy to use system –Cross-linguistic and cross-cultural sharing and communication of knowledge is enabled Establish a Global-Wordnet-Grid: formalization of Wikipedia for humans AND machines across languages

31 Asian Language Resources Summit, Phuket, March, Inter-Lingual Ontology Device Object TransportDevice English Words vehicle cartrain Czech Words dopravní prostředník autovlak 2 1 French Words véhicule voituretrain 2 1 Estonian Words liiklusvahend autokillavoor 2 1 German Words Fahrzeug AutoZug 2 1 Spanish Words vehículo autotren 2 1 Italian Words veicolo autotreno 2 1 Dutch Words voertuig autotrein 2 1 Global WordNet Grid

32 Asian Language Resources Summit, Phuket, March, Linking Open Data dataset cloud Wordnet sailing terms Ontology environment concepts environment facts Ontology medical concepts Wordnet legal terms Wordnet medical terms medical facts legal facts Ontology legal concepts Ontology sailing concepts Wordnet environment terms Wordnet environment terms Wordnet environment terms Wordnet environment terms Wordnet environment terms

33 Conclusions

34 Asian Language Resources Summit, Phuket, March, Kyoto main assets Wiki platform (WIKYOTO) for connecting, transferring and controlling knowledge and information across people and computers Term yielding robots (TYBOT): software that extracts terms and concepts from documents Knowledge yielding robots (KYBOT): fact extraction software that generates a comprehensive list of facts from collection of sources Fact repositories & fact alert: reports changes in facts on a collection of sources Domain WORDNETS and domain ONTOLOGIES Create the backbone for the Global Wordnet Grid

35 Asian Language Resources Summit, Phuket, March, What makes KYOTO unique? Integrates & combines all knowledge engineering, language engineering, wikis, term & concept learning, fact mining from text in and across languages, & standardization Direct relation between concept modeling and text mining make it worth the effort Wikyoto community tool hides technology and complex knowledge and language representation Operated by community people and not by knowledge engineers and language technology people exploits massive labor force of communities all over the world

36 Asian Language Resources Summit, Phuket, March, Text mining and ontology learning developed for separate languages –KYOTO multi and cross-lingual & cultural – cross-lingual and cross-cultural semantic interoperability Text mining and ontology learning is often limited to a specific domain and/or application KYOTO for any domain and application Text mining and ontology learning does not relate the terms and concepts to generic language and knowledge resourcesKYOTO anchors knowledge from a community to general vocabulary and likewise to other communities What makes KYOTO unique?

37 Free, open source license (GPL) Thank you for your attention

38 Asian Language Resources Summit, Phuket, March, Contribution of KYOTO html hundreds of thousands sources in the environment domain in many different languages spread all over the world changing every day xls pdf KYOTO learns terms and concepts from text documents, Stored as structures that people and computers understand Wordnet environment terms Ontology environment concepts Wordnet environment terms Wordnet environment terms Wordnet environment terms KYOTO delivers a Web 2.0 environment for community based control Connects people across language and cultures Establish consensus and knowledge transition KYOTO enables semantic search and fact extraction Software can partially understand language and exploit web 1 data Understanding is helped by the terms and concepts defined for each language environment facts TYBOT KYBOT WIKYOTO


Download ppt "Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content."

Similar presentations


Ads by Google