Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Slides:



Advertisements
Similar presentations
Symantec 2010 Windows 7 Migration Global Results.
Advertisements

Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
PDAs Accept Context-Free Languages
1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
Chapter 7 System Models.
Improving Human-Semantic Web Interaction: The Rhizomer Experience Roberto García and Rosa Gil GRIHO - Human Computer Interaction Research Group Universitat.
OMV Ontology Metadata Vocabulary April 10, 2008 Peter Haase.
…to Ontology Repositories Mathieu dAquin Knowledge Media Institute, The Open University From…
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009 KYOTO (ICT ) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
UK 2010 Biodiversity Indicators EIONET Copenhagen 30 October 2007 James Williams Joint Nature Conservation Committee Monkstone House, City Road, Peterborough,
1 Preliminary results of the Environmental Data Exchange Network for Inland Waters (EDEN-IW) project Practical lessons. P. Haastrup.
WIPO Patent Information Services
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
Copyright 2006 Digital Enterprise Research Institute. All rights reserved. MarcOnt Initiative Tools for collaborative ontology development.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
The 5S numbers game..
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Computer Literacy BASICS
BONy: a knowledge centric collaborative learning platform social.bonynetwork.eu Alfio Massimiliano Gliozzo, Concetto Elvio Bonafede and Aldo Gangemi STLAB.
The basics for simulations
1 Quality Indicators for Device Demonstrations April 21, 2009 Lisa Kosh Diana Carl.
PP Test Review Sections 6-1 to 6-6
Campaign Overview Mailers Mailing Lists
26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva.
Use Case Diagrams.
1 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. An Introduction to Data.
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Project Overview Slide 2 of 15 Overview Project in a Nutshell ◦Motivation ◦Aims and Objectives ◦Expected Outcomes PlanetData Programs Join PlanetData.
Distributed search for complex heterogeneous media Werner Bailer, José-Manuel López-Cobo, Guillermo Álvaro, Georg Thallinger Search Computing Workshop.
The 20th International Conference on Software Engineering and Knowledge Engineering (SEKE2008) Department of Electrical and Computer Engineering
TCCI Barometer September “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Artificial Intelligence
Getting Familiar with Web Pages 1 2 The Internet Worldwide collection of interconnected computer networks that enables businesses, organizations, governments,
 Copyright 2006 Digital Enterprise Research Institute. All rights reserved. The Future is Now JeromeDL A Digital Library on Social Semantic.
Who are the Experts?Simon KampaSlide 1 Who are the Experts? Simon Kampa IAM Group University of Southampton
Macromedia Dreamweaver MX 2004 – Design Professional Dreamweaver GETTING STARTED WITH.
Co-funded by the European Union Semantic CMS Community Content Management From free text input to automatic entity enrichment Copyright IKS Consortium.
The library as organizer of digital information
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Chapter 10: The Traditional Approach to Design
Systems Analysis and Design in a Changing World, Fifth Edition
Static Equilibrium; Elasticity and Fracture
PSSA Preparation.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 13 Slide 1 Application architectures.
OpenLandscapes is a proposal of 1 ‘openLandscapes’ The Knowledge Collection of Landscapes Science C. H. Henneberg, M. Puhlmann,
WEB OF KNOWLEDGE 5.2
A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
From Model-based to Model-driven Design of User Interfaces.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
ICT Monica Monachini – 1° KYOTO Workshop – Amsterdam 2/ KYOTO (ICT ) Yielding Ontologies for Transition-Based Organization Intelligent.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Crosslingual Retrieval in an eLearning Environment Cristina Vertan, Kiril Simov, Petya Osenova, Lothar Lemnitzer, Alex Killing, Diane Evans, Paola Monachesi.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
February 2007MCST - FP7 Launch1 Michael Rosner Department of Computer Science and Artificial Intelligence University of Malta.
Ontology Summit2007 Survey Response Analysis -- Issues Ken Baclawski Northeastern University.
Aquenergy Portal Elisabetta Zuanelli, University of Rome “Tor Vergata”, Italy E-Age 2014 Muscat december.
Integrating lexical units, synsets and ontology in the Cornetto Database Piek Vossen 1, 2, Isa Maks 1, Roxane Segers 1, Hennie van der Vliet 1 1: Faculty.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
ICT-enabled Agricultural Science for Development Scenarios, Opportunities, Issues by ICTs transforming agricultural science, research & technology generation.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Why to care about research?
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
KYOTO (ICT ) Knowledge Yielding Ontologies for Transition-Based Organization Intelligent Content and Semantics The First KYOTO Workshop February.
TDM in the Life Sciences Application to Drug Repositioning *
BUILDING A DIGITAL REPOSITORY FOR LEARNING RESOURCES
Presentation transcript:

Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT ) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content and Semantics Piek Vossen, VU University Amsterdam

Asian Language Resources Summit, Phuket, March, Overview Background information Baseline for retrieval in environment domain System architecture Knowledge mining Conclusions

Asian Language Resources Summit, Phuket, March, KYOTO (ICT ) Overview Title : Knowledge Yielding Ontologies for Transition-Based Organization Funded: –7 th Framework Program-ICT of the European Union: Intelligent Content and Semantics –Taiwan and Japan funded by national grants Goal: –Open and free platform for knowledge sharing across languages and cultures –Wiki environment that allows people in the field to maintain their knowledge and agree on meaning without knowledge engineering skills –Bootstrap through open text mining & concept learning –Enables knowledge transition and information search across different target groups, transgressing linguistic, cultural and geographic boundaries. –Enables deep semantic search for facts and knowledge URL: ( Duration: –March 2008 – March 2011 Effort : –364 person months of work.

Asian Language Resources Summit, Phuket, March, Consortium 1.Vrije Universiteit Amsterdam (Amsterdam, The Netherlands), 2.Consiglio Nazionale delle Ricerche (Pisa, Italy), 3.Berlin-Brandenburg Academy of Sciences and Humantities (Berlin, Germany), 4.Euskal Herriko Unibertsitatea (San Sebastian, Spain), 5.Academia Sinica (Tapei, Taiwan), 6.National Institute of Information and Communications Technology (Kyoto, Japan), 7.Irion Technologies (Delft, The Netherlands), 8.Synthema (Rome, Italy), 9.European Centre for Nature Conservation (Tilburg, The Netherlands), Subcontractors: –World Wide Fund for Nature (Zeist, The Netherlands), –Masaryk University (Brno, Czech)

Asian Language Resources Summit, Phuket, March, KYOTO (ICT ) Overview Languages: –English, Dutch, Italian, Spanish, Basque, Chinese, Japanese Domain: –Environmental domain, BUT usable in any domain Global: –Both European and non-European languages Available: –Free: as open source system and data (GPL) Future perspective: –Content standardization that supports world wide communication

State of the art in the environment domain

Asian Language Resources Summit, Phuket, March, Baseline for environment domain Mainly use Google, first 10 hits, no advanced options Textual search with linguistic enhancements but no real semantic search: –polluted water…. –polluting water…. Growing time & information pressure: –deliver actual information from diverse & dynamic sources –regional, local situations no general source –various subdomains government, legal, biology, health, industry –difficult access scientific publications –no time to read too much information and work pressure –dependent on trust: scientists environmentalist governmentgeneral public

Asian Language Resources Summit, Phuket, March, High-level targets & Low-level questions High level target (about 300 questions collected) –Are there huge negative effects with regard to ecological networks and alien invasive species? Low level facts that support answering the high level targets: –cases of alien invasion –amount of species –causal relations associated with these (increments of) invasions –causes related to ecological networks –limit in the same time and location boundary

Asian Language Resources Summit, Phuket, March,

10 Baseline retrieval results 6 persons, 30 high-level questions, Result Rank CONFIRMED DISAPPROVED UNDECIDED Total % % % % %96.77%914.29%249.23% %139.77%711.11% % %64.51%34.76%145.38% %64.51%23.17%166.15% %75.26%34.76%124.62% %64.51%46.35%124.62% %21.50%11.59%51.92% %32.26%11.59%83.08% %53.76%00.00%62.31% % % % % Total % % %260

Asian Language Resources Summit, Phuket, March, KYOTO's Solution Text mining: –Massive and accurate indexing of facts from vast amounts of text; –In any language/culture from scattered sources; –Again and again to detect trends and changes; –Direct relation between knowledge modeling effort and text mining Knowledge modeling: –automatic learning of terms and concepts from text in any language; –formalization of knowledge in computer usable format -> wordnets & ontologies Community software: –For experts in the field and not knowledge engineers –Continuous and collaborative effort: adapt to the changing domain; consensus in the field; consensus across languages and cultures –Produce interoperable, formal, standardized knowledge structures; –Relate knowledge structure to expressions in languages

Top Middle H20CO2 Substance Abstract Process Physical Ontology Environmental organizations Tybot: term yielding robot Kybot: knowledge yielding robot Wordnets Distributed, diverse & dynamic data 1 Capture text: "Sudden increase of CO2 emissions in 2008 in Europe" 2 CO2 emission 3 Wikyoto maintain terms & concepts 4 Index facts: Process:Emission Involves: CO2 Property:increase, sudden When: 2008 Where: Europe 5 Text & Fact Index Semantic Search 6 Citizens Governments Companies Domain CO2 Emission H20 Pollution Greenhouse Gas

System architecture

Original Document Base Original Document Base Keyword Search Semantic & Syntactic Base Kyoto Annotation Format (KAF) Semantic & Syntactic Base Kyoto Annotation Format (KAF) Linguistic Processor End User Semantic Search End User Data Flow Diagram of Kyoto System Fact Base Fact Extractor Fact User Kybot Term Base Term Extractor Tybot Multilingual Knowledge Base Wiki Term Editor Concept User Wikyoto Wordnets Ontologies interlinked

Asian Language Resources Summit, Phuket, March, Kyoto Annotation Format KAF Kyoto Annotation Format (Level 1) a multi-layered annotation format for: –Tokenizaton and word form segmentation –POS tagging –Lemmatization and Term extraction –Constituency Tagging –Dependency Tagging ENG N

Asian Language Resources Summit, Phuket, March, Semantic Annotation Semantic Annotation Format for: –Named Entity Recognition ( time, events, quant. …) –Word Sense Disambiguation (D-WSD) –Semantic Role Labeling (SRL) no synsets KAF level2 (SemKAF) ENG N

Asian Language Resources Summit, Phuket, March, KAF annotation : WSD

Asian Language Resources Summit, Phuket, March, Data formats Level of annotation: 1.Morpho-syntax annotation 2.Semantic annotation 3.Terms representation 4.Facts annotation 5.Wordnets 6.Ontologies Standard format } KAF <=(MAF, SYNAF, SEMAF) TMF KAF Wordnet-LMF OWL

Knowledge mining

Asian Language Resources Summit, Phuket, March, Knowledge mining Concept mining (Tybots): –Extract terms and relations in a language –Map the terms to an existing wordnet –Ontologize terms to concepts and axioms Fact mining (Kybots) –Define logical patterns –Define expression rules in a language

Asian Language Resources Summit, Phuket, March, What Tybots do... Input are text documents Linguistic processors generate KAF annotation (sequential): –morpho-syntactic analysis –semantic roles –named entities –wordnet and ontology mappings Output are term hierarchies in TMF (generic): –structural parent relations –quantified structural and semantic relations –statistical data

Asian Language Resources Summit, Phuket, March, Source Documents Linguistic Processors [[the emission] NP [of greenhouse gases] PP [in agricultural areas] PP ] NP Morpho-syntactic analysis TYBOT Concept Miners AbstractPhysical H20CO2 Substance CO2Emission WaterPollution Ontology Process Chemical Reaction GlobalWarming GreenhouseGas Ontologize Axiomatize (instance s1 Substance) (instance e1 Warming) (katalyist s1 e1) Synthesize in of Term hierarchy emissiongas greenhouse gas area agricultural area CO2 naturalprocess:1 English Wordnet emission:2gas:1 area:1 greenhouse gas:1 rural area:1 geographical area:1 region:3 location:3substance:1 emission:3 farmland:2 CO2 Conceptual modeling

Asian Language Resources Summit, Phuket, March, What Kybots do Input: –KAF annotations of text: sequential & encoded by language –Conceptual frame from the ontology –Expression rules for frame to language mapping: Wordnet in a language Morpho-syntactic mappings rules Output are a database of facts in FactAF (generic): –aggregated facts –inferred facts –language neutral

Asian Language Resources Summit, Phuket, March, Fact mining KYBOT = Knowledge Yielding Robot Logical expression –(instance, e1, Burn) (instance, e2, Warming) (cause, e1, e2) –(instance, s1, CO2) (instance, e1, GlobalWarming) (katalyist, s1,e1) Expression rules per language : –[N[s1]V[e1]] S e.g. "CO2 is emitted", "fine dust blocks sun-light" –[N[s1]N[e1] N e.g. "CO2 emission", "sun-light blocking" –[[N[e1]][prep][N[s2]] NP e.g. "emission of CO2", "sun light blocking by fine dust" Ontology * Wordnets –Capabilities: WNT -> adjectives ("explosive", "toxic"), WNT -> nouns ("explosive", "poison") –Causes: WNT -> verbs ("eat"), WNT -> nouns ("consumption") –Process: DamageProcess, ProduceProcess Kybot compiler –kybots = logical pattern+ ontology + WN[Lx] + ER[Lx]

Asian Language Resources Summit, Phuket, March, Fact mining by Kybots Source Documents Linguistic Processors [[the emission] NP [of greenhouse gases] PP [in agricultural areas] PP ] NP Morpho-syntactic analysis (KAF) AbstractPhysical H2OCO2 Substance CO2 emission water pollution OntologyWordnets & Linguistic Expressions Process Chemical Reaction Generic Logical Expressions [[the emission] NP ] Process: e1 [of greenhouse gases] PP Patient: s2 [in agricultural areas] PP ] Location: a3 Fact analysis Patient Domain semantic role labelling time & place aggregation from all relevant phrases and documents inferencing adding trust and reliability

Wikyoto

Asian Language Resources Summit, Phuket, March, Do populations always consist of marine species? A..... decline... population.....Z Are terrestrial species never marine species? Simplified Term Fragment population marine species terrestrial species Simplified Ontology Fragment ?Population Group Kyoto Server Hidden Shown.... populations declined.....terrestrial and marine species.. in forests.....declined Do populations consist of marine species? Interview Are terrestrial species a type of populations? Interview.... populations such as terrestrial and marine species..... Smart Kytext KAFD E -TN Tybots pdf FactAF KAF Kybots plugin D E -KOND E -WN Facts in RDF G-WN Wordnets in LMFOntologies in OWL-DL G-KON WIKIPEDIA SUMO DOLCE GEO FRAMENET

Kyoto Knowledge Base WnIT Domain WnEN Domain WnEU Domain WnNL Domain WnJP Domain WnCH Domain WnES Domain Ontology Domain Ontology

Potential impact

Asian Language Resources Summit, Phuket, March, Ultimate goal Global standardization and anchoring of meaning such that: –Machines can start to approach text understanding -> semantic web connects to the current web –Communities can dynamically maintain knowledge, concepts and their terms in an easy to use system –Cross-linguistic and cross-cultural sharing and communication of knowledge is enabled Establish a Global-Wordnet-Grid: formalization of Wikipedia for humans AND machines across languages

Asian Language Resources Summit, Phuket, March, Inter-Lingual Ontology Device Object TransportDevice English Words vehicle cartrain Czech Words dopravní prostředník autovlak 2 1 French Words véhicule voituretrain 2 1 Estonian Words liiklusvahend autokillavoor 2 1 German Words Fahrzeug AutoZug 2 1 Spanish Words vehículo autotren 2 1 Italian Words veicolo autotreno 2 1 Dutch Words voertuig autotrein 2 1 Global WordNet Grid

Asian Language Resources Summit, Phuket, March, Linking Open Data dataset cloud Wordnet sailing terms Ontology environment concepts environment facts Ontology medical concepts Wordnet legal terms Wordnet medical terms medical facts legal facts Ontology legal concepts Ontology sailing concepts Wordnet environment terms Wordnet environment terms Wordnet environment terms Wordnet environment terms Wordnet environment terms

Conclusions

Asian Language Resources Summit, Phuket, March, Kyoto main assets Wiki platform (WIKYOTO) for connecting, transferring and controlling knowledge and information across people and computers Term yielding robots (TYBOT): software that extracts terms and concepts from documents Knowledge yielding robots (KYBOT): fact extraction software that generates a comprehensive list of facts from collection of sources Fact repositories & fact alert: reports changes in facts on a collection of sources Domain WORDNETS and domain ONTOLOGIES Create the backbone for the Global Wordnet Grid

Asian Language Resources Summit, Phuket, March, What makes KYOTO unique? Integrates & combines all knowledge engineering, language engineering, wikis, term & concept learning, fact mining from text in and across languages, & standardization Direct relation between concept modeling and text mining make it worth the effort Wikyoto community tool hides technology and complex knowledge and language representation Operated by community people and not by knowledge engineers and language technology people exploits massive labor force of communities all over the world

Asian Language Resources Summit, Phuket, March, Text mining and ontology learning developed for separate languages –KYOTO multi and cross-lingual & cultural – cross-lingual and cross-cultural semantic interoperability Text mining and ontology learning is often limited to a specific domain and/or application KYOTO for any domain and application Text mining and ontology learning does not relate the terms and concepts to generic language and knowledge resourcesKYOTO anchors knowledge from a community to general vocabulary and likewise to other communities What makes KYOTO unique?

Free, open source license (GPL) Thank you for your attention

Asian Language Resources Summit, Phuket, March, Contribution of KYOTO html hundreds of thousands sources in the environment domain in many different languages spread all over the world changing every day xls pdf KYOTO learns terms and concepts from text documents, Stored as structures that people and computers understand Wordnet environment terms Ontology environment concepts Wordnet environment terms Wordnet environment terms Wordnet environment terms KYOTO delivers a Web 2.0 environment for community based control Connects people across language and cultures Establish consensus and knowledge transition KYOTO enables semantic search and fact extraction Software can partially understand language and exploit web 1 data Understanding is helped by the terms and concepts defined for each language environment facts TYBOT KYBOT WIKYOTO