Presentation is loading. Please wait.

Presentation is loading. Please wait.

OKKAM – Enabling the Web of Entities A SCALABLE AND SUSTAINABLE SOLUTION FOR SYSTEMATIC AND GLOBAL IDENTIFIER REUSE IN DECENTRALIZED INFORMATION ENVIRONMENTS.

Similar presentations


Presentation on theme: "OKKAM – Enabling the Web of Entities A SCALABLE AND SUSTAINABLE SOLUTION FOR SYSTEMATIC AND GLOBAL IDENTIFIER REUSE IN DECENTRALIZED INFORMATION ENVIRONMENTS."— Presentation transcript:

1

2 OKKAM – Enabling the Web of Entities A SCALABLE AND SUSTAINABLE SOLUTION FOR SYSTEMATIC AND GLOBAL IDENTIFIER REUSE IN DECENTRALIZED INFORMATION ENVIRONMENTS KnowDive Seminar April 11, 2007 Trento, Italy

3 Background: KR goes Global Knowledge representation is a field which currently seems to have the reputation of being initially interesting, but which did not seem to shake the world to the extent that some of its proponents hoped. It made sense but was of limited use on a small scale, but never made it to the large scale. This is exactly the state which the hypertext field was in before the Web. Each field had made certain centralist assumptions -- if not in the philosophy, then in the implementations, which prevented them from spreading globally. But each field was based on fundamentally sound ideas about the representation of knowledge. The Semantic Web is what we will get if we perform the same globalization process to Knowledge Representation that the Web initially did to Hypertext. We remove the centralized concepts of absolute truth, total knowledge, and total provability, and see what we can do with limited knowledge. [Tim Berners-Lee, What the Semantic Web can represent, 1998]

4 In practice … www.google.com www.unitn.it www.l3s.org www.ryanair.com www.paolobouquet.net ockham.org www.trento.it href Bouquet UniTN Niederee L3SVIKEF Works-for Knows Coordinates Is_involved_in Works-for Web of Meanings Web of Links Web_page

5 What went wrong (personal view) The Web of Meanings (the Semantic Web) is not happening, at least not as the WWW happened along the 90’s Enabling factors for the Web of Links (the WWW): ◦ Any available resource has a global URL, which allows Web clients to address it ◦ The same identifier can be resolved to retrieve the resource through the HTTP protocol (running on top of TCP/IP) ◦ Creating href links is easy on top of this infrastructure What about the Web of Meanings? ◦ Non addressable resources do not have an infrastructure for supporting the use of global identifiers (more about this) ◦ Non addressable resources cannot be retrieved ◦ Creating global links between non addressable resources is difficult Outcome: we lack the preconditions for the Web of Meanings to happen!

6 Further (strategic) errors On top of these infrastructural issues, a big strategic error was made (personal opinion!): ◦ The AI people came in, and tried to “recycle” their logical know- how on the Semantic Web ◦ The plan was to build the Semantic Web starting from representations (theories, currently known as ontologies) and not from resources (entities) ◦ This led to a scalability issue: reasoning is hard for local theories, forget about going global! [Heard about semantic heterogeneity, ontology mapping, alignment, distributed reasoning, …?]

7 My vision Back to the building blocks: entities! ◦ First, create the infrastructure for enabling in practice a global space of identifiers (e.g. URIs) ◦ Second, show how we can create value simply from linking globally identified entities ◦ Third, specify vocabularies and ontologies for (subsets of) globally identified entities ◦ Fourth, link ontologies to each others on top of the already integrated domain of globally identified entities Hopefully, this will lead to the Web of Entities, namely a global digital space in which any knowledge expressed in any local web of entities can be seamlessly integrated and reasoned about

8 OKKAM overall goal The goal of the OKKAM project is to implement the first part of this plan. Establishing a scalable and sustainable infrastructure for the storage and reuse of global identifiers for non addressable entities in decentralized information environments Enabling different forms of OKKAMization of old and new content Creating a primitive index which links global identifiers to OKKAMized content Building applications which can showcase the potential value of this approach

9 But why “OKKAM”? Ockham's Razor (14° century): “entities should not be multiplied beyond necessity“ OKKAM’s Razor (21° century): “entity identifiers should not be multiplied beyond necessity”

10 1. Infrastructure Cornerstone: large-scale EntityRepository (ER) Architecture: distributed, supports federation of local ERs, replicated (no single point of failure) ER vs. Entity Base (or Knowledge Base): supporting reuse vs. collecting and providing knowledge about entities Basic schema: set of attribute/value pairs (called “labels”) with no predefined semantics Features: Size: unbelievable (billions of identifiers+profiles stored Network traffic: massive (up to millions of requests per minute) Quality: hard to ensure Update: grows monotonically (no deletion). Aging mechanism?

11 2. OKKAMization Enabling the runtime or “ex post” OKKAMization of data in various formats (from unstructured to structured) Examples: ◦ Office tools (named entity recognition and annotation) ◦ Databases (annotating records with OKKAM ids) ◦ Ontologies (replacing local URIs with global URIs) ◦ HTML pages ◦ … Objective: creating the critical mass of OKKAMized content

12 3. Indexing The model of knowledge devolution: ◦ The ER stores only IDs + simple labels ◦ Knowledge about entities must be developed outside Idea: use OKKAM to store and index pointers to external resources which mention an OKKAM id Different types of pointers: ◦ Informal: pointing to a document which contains an OKKAM id as a simple annotation for a piece of text ◦ Formal: pointing to formal resources (e.g. ontologies) in which an OKKAM id is used as a URI of an instance Using this index also for entity resolution / matching

13 Okkam Architecture

14 Okkam Applications Three examplary applications on top of OKKAM infrastrucure: Entity-centric search engine Entitity-centric organizational knowledge management Multimedia authoring based Purpose: ◦ Show benefits of entity-centric approach ◦ Trigger the development of further applications ◦ contribute to building a community around the OKKAM approach

15 Entity-centric search engine Starting point: Different types of OKKAMized content collections, e.g. knowledge bases, document collections, metadata repositories, image collectons, etc. Goal: ◦ enabling completely new methods for browsing and searching large collections of data and documents (including the Web itself) ◦ enable new forms of intelligent entity-centric search that exploit the OKKAMization of content RTD Challenges ◦ Retrieval indexing that takes into account the OKKAM IDs ◦ Combination of entity-centric and semantic search ◦ Combined ranking ◦ Adequate combination and visualization of the results from different kinds of resources (e.g. knowledg base + document collection) ◦...

16 Entity-centric organizational knowledge management Idea: Exploit OKKAM benefits in organizational context, Managing and structuring corporate knowledge using entity identifiers as pivots for aggregating information not only from structured sources, but also from poorly or non-structured sources, like electronic documents, email messages, slide presentations, video and audio files, etc. Using and interlinking a local organizational entity repository

17 Multimedia Content Authoring Idea: Creation of an authoring environment, which makes use of the OKKAMization of content Variants authoring environment, which helps the scientific author by providing targeted additional information during writing process Support for the creation of value added artefact on the basis on OKKAMized content (text, video) ◦ creation template for task-specific /selective) enrichment with information about the entity found in the content object („semantic infusion“) ◦ tool for publishers, broadcasters

18 17 Example: Semantic Infusion

19 RTD Challenges Building a scalable entity repository in which a massive and growing number of entity IDs and profiles can be collected, stored and indexed Guaranteeing security and privacy for the data stored in the repository Making the repository efficiently searchable and usable by Web users as well as through APIs Supporting effective and reliable methods for entity matching and for ranking results Enabling several channels through which the repository can be populated, either manually or automatically (import filters, crawling, harvesting, …) Supporting the integration of OKKAM with a variety of content creation applications (e.g. text editors, office applications, HTML and XML editors, ontology editors, DBMS, etc.) Ensuring the quality of data in the repository Enabling a virtuous circle of trust and collaboration with users

20 Conclusions There are many critical issues: Size and Performance Quality of entity search and matching Critical mass of data and applications Trust and community building Sustainability and exploitation … … but it’s fun and I want to give it a try!!


Download ppt "OKKAM – Enabling the Web of Entities A SCALABLE AND SUSTAINABLE SOLUTION FOR SYSTEMATIC AND GLOBAL IDENTIFIER REUSE IN DECENTRALIZED INFORMATION ENVIRONMENTS."

Similar presentations


Ads by Google