OKKAM – Enabling the Web of Entities A SCALABLE AND SUSTAINABLE SOLUTION FOR SYSTEMATIC AND GLOBAL IDENTIFIER REUSE IN DECENTRALIZED INFORMATION ENVIRONMENTS.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

1 Ontolog OOR Use Case Review Todd Schneider 1 April 2010 (v 1.2)
doi> Digital Object Identifier: overview
Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
CLEARSPACE Digital Document Archiving system INTRODUCTION Digital Document Archiving is the process of capturing paper documents through scanning and.
Distributed search for complex heterogeneous media Werner Bailer, José-Manuel López-Cobo, Guillermo Álvaro, Georg Thallinger Search Computing Workshop.
The Top 10 Reasons Why Federated Can’t Succeed And Why it Will Anyway.
Co-ordinated by aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT The Entity Name System (ENS): A technical infrastructure for implementing.
The OKKAM project the quest for a web of uniquely identified entities Stefano Bocconi.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Semantic Search Jiawei Rong Authors Semantic Search, in Proc. Of WWW Author R. Guhua (IBM) Rob McCool (Stanford University) Eric Miller.
ISE554 The WWW for eLearning 3.1 WWW Concepts. “The WWW principle of universal readership is that once information is available, it should be accessible.
An Architecture for Creating Collaborative Semantically Capable Scientific Data Sharing Infrastructures Anuj R. Jaiswal, C. Lee Giles, Prasenjit Mitra,
AceMedia Personal content management in a mobile environment Jonathan Teh Motorola Labs.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
SESSION 9 THE INTERNET AND THE NEW INFORMATION NEW INFORMATIONTECHNOLOGYINFRASTRUCTURE.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
Search Engines and Information Retrieval Chapter 1.
Using the SAS® Information Delivery Portal
Postacademic Interuniversity Course in Information Technology – Module C1p1 Contents Data Communications Applications –File & print serving –Mail –Domain.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,
Web Programming : Building Internet Applications Chris Bates CSE :
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Integrated Collaborative Information Systems Ahmet E. Topcu Advisor: Prof Dr. Geoffrey Fox 1.
Web Engineering we define Web Engineering as follows: 1) Web Engineering is the application of systematic and proven approaches (concepts, methods, techniques,
Linked-data and the Internet of Things Payam Barnaghi Centre for Communication Systems Research University of Surrey March 2012.
FI-CORE Data Context Media Management Chapter Release 4.1 & Sprint Review.
A bad case of content reuse Validator Website to Validate License Violations Validator – Only requires the URI of the site to check for a license violation.
STASIS Technical Innovations - Simplifying e-Business Collaboration by providing a Semantic Mapping Platform - Dr. Sven Abels - TIE -
10/24/09CK The Open Ontology Repository Initiative: Requirements and Research Challenges Ken Baclawski Todd Schneider.
World Wide Web “WWW”, "Web" or "W3". World Wide Web “WWW”, "Web" or "W3"
Semantic Clipboard User Interface is integrated in the Browser Architecture of the Semantic Clipboard Illustration of a license incompliant content reuse.
The Semantic Logger: Supporting Service Building from Personal Context Mischa M Tuffield et al. Intelligence, Agents, Multimedia Group University of Southampton.
Introduction to the Semantic Web and Linked Data
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
Object storage and object interoperability
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Jens Hartmann York Sure Raphael Volz Rudi Studer The OntoWeb Portal.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Information Networks. Internet It is a global system of interconnected computer networks that link several billion devices worldwide. It is an international.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
E-Business Infrastructure PRESENTED BY IKA NOVITA DEWI, MCS.
E-commerce | WWW World Wide Web - Concepts
E-commerce | WWW World Wide Web - Concepts
knowledge organization for a food secure world
The 2007 Winter Conference on Business Intelligence
Web Engineering.
CHAPTER 3 Architectures for Distributed Systems
Doron Goldfarb & Yann LE FRANC
Thanks to Bill Arms, Marti Hearst
PREMIS Tools and Services
World Wide Web “WWW”, "Web" or "W3". World Wide Web “WWW”, "Web" or "W3"
World Wide Web “WWW”, "Web" or "W3". World Wide Web “WWW”, "Web" or "W3"
LOD reference architecture
The Internet and Electronic mail
Web Programming : Building Internet Applications Chris Bates CSE :
9/8/ :03 PM © 2006 Microsoft Corporation. All rights reserved.
Presentation transcript:

OKKAM – Enabling the Web of Entities A SCALABLE AND SUSTAINABLE SOLUTION FOR SYSTEMATIC AND GLOBAL IDENTIFIER REUSE IN DECENTRALIZED INFORMATION ENVIRONMENTS KnowDive Seminar April 11, 2007 Trento, Italy

Background: KR goes Global Knowledge representation is a field which currently seems to have the reputation of being initially interesting, but which did not seem to shake the world to the extent that some of its proponents hoped. It made sense but was of limited use on a small scale, but never made it to the large scale. This is exactly the state which the hypertext field was in before the Web. Each field had made certain centralist assumptions -- if not in the philosophy, then in the implementations, which prevented them from spreading globally. But each field was based on fundamentally sound ideas about the representation of knowledge. The Semantic Web is what we will get if we perform the same globalization process to Knowledge Representation that the Web initially did to Hypertext. We remove the centralized concepts of absolute truth, total knowledge, and total provability, and see what we can do with limited knowledge. [Tim Berners-Lee, What the Semantic Web can represent, 1998]

In practice … ockham.org href Bouquet UniTN Niederee L3SVIKEF Works-for Knows Coordinates Is_involved_in Works-for Web of Meanings Web of Links Web_page

What went wrong (personal view) The Web of Meanings (the Semantic Web) is not happening, at least not as the WWW happened along the 90’s Enabling factors for the Web of Links (the WWW): ◦ Any available resource has a global URL, which allows Web clients to address it ◦ The same identifier can be resolved to retrieve the resource through the HTTP protocol (running on top of TCP/IP) ◦ Creating href links is easy on top of this infrastructure What about the Web of Meanings? ◦ Non addressable resources do not have an infrastructure for supporting the use of global identifiers (more about this) ◦ Non addressable resources cannot be retrieved ◦ Creating global links between non addressable resources is difficult Outcome: we lack the preconditions for the Web of Meanings to happen!

Further (strategic) errors On top of these infrastructural issues, a big strategic error was made (personal opinion!): ◦ The AI people came in, and tried to “recycle” their logical know- how on the Semantic Web ◦ The plan was to build the Semantic Web starting from representations (theories, currently known as ontologies) and not from resources (entities) ◦ This led to a scalability issue: reasoning is hard for local theories, forget about going global! [Heard about semantic heterogeneity, ontology mapping, alignment, distributed reasoning, …?]

My vision Back to the building blocks: entities! ◦ First, create the infrastructure for enabling in practice a global space of identifiers (e.g. URIs) ◦ Second, show how we can create value simply from linking globally identified entities ◦ Third, specify vocabularies and ontologies for (subsets of) globally identified entities ◦ Fourth, link ontologies to each others on top of the already integrated domain of globally identified entities Hopefully, this will lead to the Web of Entities, namely a global digital space in which any knowledge expressed in any local web of entities can be seamlessly integrated and reasoned about

OKKAM overall goal The goal of the OKKAM project is to implement the first part of this plan. Establishing a scalable and sustainable infrastructure for the storage and reuse of global identifiers for non addressable entities in decentralized information environments Enabling different forms of OKKAMization of old and new content Creating a primitive index which links global identifiers to OKKAMized content Building applications which can showcase the potential value of this approach

But why “OKKAM”? Ockham's Razor (14° century): “entities should not be multiplied beyond necessity“ OKKAM’s Razor (21° century): “entity identifiers should not be multiplied beyond necessity”

1. Infrastructure Cornerstone: large-scale EntityRepository (ER) Architecture: distributed, supports federation of local ERs, replicated (no single point of failure) ER vs. Entity Base (or Knowledge Base): supporting reuse vs. collecting and providing knowledge about entities Basic schema: set of attribute/value pairs (called “labels”) with no predefined semantics Features: Size: unbelievable (billions of identifiers+profiles stored Network traffic: massive (up to millions of requests per minute) Quality: hard to ensure Update: grows monotonically (no deletion). Aging mechanism?

2. OKKAMization Enabling the runtime or “ex post” OKKAMization of data in various formats (from unstructured to structured) Examples: ◦ Office tools (named entity recognition and annotation) ◦ Databases (annotating records with OKKAM ids) ◦ Ontologies (replacing local URIs with global URIs) ◦ HTML pages ◦ … Objective: creating the critical mass of OKKAMized content

3. Indexing The model of knowledge devolution: ◦ The ER stores only IDs + simple labels ◦ Knowledge about entities must be developed outside Idea: use OKKAM to store and index pointers to external resources which mention an OKKAM id Different types of pointers: ◦ Informal: pointing to a document which contains an OKKAM id as a simple annotation for a piece of text ◦ Formal: pointing to formal resources (e.g. ontologies) in which an OKKAM id is used as a URI of an instance Using this index also for entity resolution / matching

Okkam Architecture

Okkam Applications Three examplary applications on top of OKKAM infrastrucure: Entity-centric search engine Entitity-centric organizational knowledge management Multimedia authoring based Purpose: ◦ Show benefits of entity-centric approach ◦ Trigger the development of further applications ◦ contribute to building a community around the OKKAM approach

Entity-centric search engine Starting point: Different types of OKKAMized content collections, e.g. knowledge bases, document collections, metadata repositories, image collectons, etc. Goal: ◦ enabling completely new methods for browsing and searching large collections of data and documents (including the Web itself) ◦ enable new forms of intelligent entity-centric search that exploit the OKKAMization of content RTD Challenges ◦ Retrieval indexing that takes into account the OKKAM IDs ◦ Combination of entity-centric and semantic search ◦ Combined ranking ◦ Adequate combination and visualization of the results from different kinds of resources (e.g. knowledg base + document collection) ◦...

Entity-centric organizational knowledge management Idea: Exploit OKKAM benefits in organizational context, Managing and structuring corporate knowledge using entity identifiers as pivots for aggregating information not only from structured sources, but also from poorly or non-structured sources, like electronic documents, messages, slide presentations, video and audio files, etc. Using and interlinking a local organizational entity repository

Multimedia Content Authoring Idea: Creation of an authoring environment, which makes use of the OKKAMization of content Variants authoring environment, which helps the scientific author by providing targeted additional information during writing process Support for the creation of value added artefact on the basis on OKKAMized content (text, video) ◦ creation template for task-specific /selective) enrichment with information about the entity found in the content object („semantic infusion“) ◦ tool for publishers, broadcasters

17 Example: Semantic Infusion

RTD Challenges Building a scalable entity repository in which a massive and growing number of entity IDs and profiles can be collected, stored and indexed Guaranteeing security and privacy for the data stored in the repository Making the repository efficiently searchable and usable by Web users as well as through APIs Supporting effective and reliable methods for entity matching and for ranking results Enabling several channels through which the repository can be populated, either manually or automatically (import filters, crawling, harvesting, …) Supporting the integration of OKKAM with a variety of content creation applications (e.g. text editors, office applications, HTML and XML editors, ontology editors, DBMS, etc.) Ensuring the quality of data in the repository Enabling a virtuous circle of trust and collaboration with users

Conclusions There are many critical issues: Size and Performance Quality of entity search and matching Critical mass of data and applications Trust and community building Sustainability and exploitation … … but it’s fun and I want to give it a try!!