1 ICLP-09 Enabling serendipitous search on the Web of Data using Prolog Jan Wielemaker VU University Amsterdam.

Slides:



Advertisements
Similar presentations
Chapter 6 Server-side Programming: Java Servlets
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Semantic Web examples from E-Culture Guus Schreiber VU –
SPARQL Dimitar Kazakov, with references to material by Noureddin Sadawi ARIN, 2014.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
Peer-to-Peer Networking for Distributed Learning Repositories: The Edutella Network Diplomarbeit von Boris Wolf.
Semantic Web Introduction
Tumbling Walls & Building Bridges Steps towards a Culture Web.
CSCI 572 Project Presentation Mohsen Taheriyan Semantic Search on FOAF profiles.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
Principles and pragmatics of a Semantic Culture Web Tearing down walls and Building bridges.
1 CS6320 – Why Servlets? L. Grewe 2 What is a Servlet? Servlets are Java programs that can be run dynamically from a Web Server Servlets are Java programs.
Module 2b: Modeling Information Objects and Relationships IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.
Overview of Search Engines
© 2006 by IBM 1 How to use Eclipse to Build Rich Internet Applications With PHP and AJAX Phil Berkland IBM Software Group Emerging.
Ontologies: Making Computers Smarter to Deal with Data Kei Cheung, PhD Yale Center for Medical Informatics CBB752, February 9, 2015, Yale University.
INTRODUCTION TO WEB DATABASE PROGRAMMING
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 4 Web technologies: HTTP, CGI, PHP,Java applets)
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
Architecture Of ASP.NET. What is ASP?  Server-side scripting technology.  Files containing HTML and scripting code.  Access via HTTP requests.  Scripting.
-By Mohamed Ershad Junaid UTD ID :
XP New Perspectives on XML, 2 nd Edition Tutorial 10 1 WORKING WITH THE DOCUMENT OBJECT MODEL TUTORIAL 10.
COMP 410 & Sky.NET May 2 nd, What is COMP 410? Forming an independent company The customer The planning Learning teamwork.
Logics for Data and Knowledge Representation
The Semantic Web Web Science Systems Development Spring 2015.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Master Informatique 1 Semantic Technologies Part 11Direct Mapping Werner Nutt.
Cross Site Integration “mashups” cross site scripting.
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
Chapter 6 Server-side Programming: Java Servlets
TAKE – A Derivation Rule Compiler for Java Jens Dietrich, Massey University Jochen Hiller, TopLogic Bastian Schenke, BTU Cottbus.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Overview Web Session 3 Matakuliah: Web Database Tahun: 2008.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
E-Heritage and the VU Semantic Web group Guus Schreiber Computer Science VU University Amsterdam.
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
Server-side Programming The combination of –HTML –JavaScript –DOM is sometimes referred to as Dynamic HTML (DHTML) Web pages that include scripting are.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Introduction to the Semantic Web and Linked Data
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker.
TAKE – A Derivation Rule Compiler for Java Jens Dietrich, Massey University Jochen Hiller, TopLogic Bastian Schenke, BTU Cottbus/REWERSE.
Triple Stores. What is a triple store? A specialized database for RDF triples Can ingest RDF in a variety of formats Supports a query language – SPARQL.
REST By: Vishwanath Vineet.
Web Technologies Lecture 10 Web services. From W3C – A software system designed to support interoperable machine-to-machine interaction over a network.
ASP-2-1 SERVER AND CLIENT SIDE SCRITPING Colorado Technical University IT420 Tim Peterson.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Linked Open Data for European Earth Observation Products Carlo Matteo Scalzo CTO, Epistematica epistematica.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Shared innovation Linking Distributed Data across the Web Dr Tom Heath Researcher, Platform Division Talis Information Ltd t
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
REST API Design. Application API API = Application Programming Interface APIs expose functionality of an application or service that exists independently.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.
11 ICLP-09 Enabling serendipitous search on the Web of Data using Prolog Jan Wielemaker VU University Amsterdam.
1 Chapter 1 INTRODUCTION TO WEB. 2 Objectives In this chapter, you will: Become familiar with the architecture of the World Wide Web Learn about communication.
Triple Stores.
Middleware independent Information Service
Web Engineering.
Logics for Data and Knowledge Representation
Triple Stores.
PREMIS Tools and Services
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Planning and Storyboarding a Web Site
Triple Stores.
Semantic-Web, Triple-Strores, and SPARQL
Presentation transcript:

1 ICLP-09 Enabling serendipitous search on the Web of Data using Prolog Jan Wielemaker VU University Amsterdam

2 ICLP-09 Issues addressed Recent developments reshaped the Web   The web moved from “Web of documents” to “Web of data” and “Web of applications”   “Open” and “Linked” data makes massive amounts of data available to be processed by machines How can we deploy Prolog in this environment?

3 ICLP-09 Overview Introducing the semantic search engine “ClioPatria”; description of the problem it addresses Why (not) use Prolog for semantic web applications? Processing RDF-data Applying Prolog in web-servers Creating interactive web-applications Wrap-up

PART I The ClioPatria use-case: Integrate digital collections of multiple museums and connect it to background knowledge

Collection and Meta-data Schema Vocabularies

6 Background knowledge

7 The Web: documents and links URL Web-link (untyped hyperlink)

8 The Semantic, or Data Web: data and links URL Web link Painter “Henri Matisse” Getty ULAN creator Dublin Core Painting “Green Stripe (M me Matisse)” Royal Museum of Fine Arts, Copenhagen

… nice graph, but... What about semantics? What about structure?

Semantic Web data model: RDF 1 fact = R(O 1, O 2 ) = = 1 “triple” many facts = labelled graph = RDF URIs as identifiers, typed relations between typed objects Has many different syntaxes (XML (W3C), N3, Turtle, graphical, etc). Doesn’t matter: it’s a data model Slide by Frank van Harmelen

Semantic Web data model: RDF Schema hierarchy of types, hierarchy of relations, domain/range-constraints simple: no negation, disjunction, universal Slide by Frank van Harmelen

Semantic Web data model: OWL and SWRL everything you wanted to say but cannot say in RDF(S) negation, disjunction, cardinality, limited universal, relational algebra (trans, symm) still no composition of relations (DL-based) SWRL: rules with DL concepts as atoms Full DL Lite Slide by Frank van Harmelen

15 Structure for thesauri

Structure for works of Art

From meta-data to semantic meta-data Thesaurus Schema mapping (SKOS) Meta-data Schema mapping (VRA) Thesaurus alignment Meta-data mapping 5 collections → 11,000,000 triples

Part of a large cloud of linked data!

The challenge How to make use of this network for search? Can we search better? Can we present better?

ClioPatria A Prolog web-server with RDF-store Developed to explore this challenge Explore graph using best-first search based on semantic distance Cluster results based on relation to query

ClioPatria: “Matisse” “Matisse” in the title “Matisse” in the title Located in “Musee Matisse” Located in “Musee Matisse” Created by “Matisse” Created by “Matisse” Paintings in the same style as used by “Matisse” Paintings in the same style as used by “Matisse”

Serendipitous? Serendipity is the effect by which one accidentally discovers something fortunate, especially while looking for something else entirely unrelated (wikipedia). The search is not based on any schema It can find results through unexpected paths It often finds many unintended results (i.e., it answers multiple “graph” queries) This remains manageable due to clustering → “Post-query disambiguation”

Serendipitous … “Picasso” Things made from “Picasso marble” Things made from “Picasso marble”

ClioPatria fact-sheet Prolog246 files, 67,500 lines Developers3 core, about 10 occasional Triples loaded Used with upto 22,000,000. Scales to 300,000,000 in 64-Gb memory UsageKnown to be in use in 6 projects

25 ICLP-09 Part-II Using Prolog for the Semantic Web

26 ICLP-09 The neaties vs. the scruffies (DL-)Logic background In search for expressive logics, correct and efficient resolution techniques LP: F-Logic, ASP, ALP, FO(.), … (Marc Denecker) Webby background In search for doing something useful with huge amounts of shallow and inconsistent facts Simple logics, techniques need not be sound, neither complete.

27 ICLP-09 Why NOT Prolog? The core-concepts in the Web community are:   Networking   Concurrency   Web-page generation   Internationalization  ... These are typically not associated to Prolog

28 ICLP-09 Why Prolog? RDF fits nicely with relational model of Prolog   With a little work it does everything SPARQL can   … but it is much more flexible Most languages in the SW-community can be translated into Horn-clauses:   OWL (large subset)   Rule languages: SWRL, RIF  ...

29 ICLP-09 The Semantic Web seen from Prolog Pure predicate rdf/3: rdf(?Subject, ?Predicate, ?Object) is nondet.   URI → Atom   Literal → literal(Atom) literal(lang(Code, Atom)) literal(type(URI, Atom))

30 ICLP-09 URI: XML Namespaces Namespaces are expanded at compile-time by means of rules for goal_expansion/2, so rdf(S, ' ' rdf(S, rdf:type, ulan:'Person'). Toplevel and debugger results are made readable again using portray/1 Can be written as

31 ICLP-09 A simple example ?- module(rdfs_entailment). rdfs_entailment: ?- rdf(X, rdf:type, ulan:'Person'), rdf(X, rdfs:label, literal('Matisse, Henri')), rdf(Work, dc:creator, X).

32 ICLP-09 Optimising ?- In = rdf(X, rdf:type, ulan:'Person'), rdf(X, rdfs:label, literal('Matisse, Henri')) rdf(Work, dc:creator, X), rdf_optimise(In, Goal). Goal = rdf(X, rdfs:label, literal('Matisse, Henri')), rdfql_carthesian([ bag([], rdf(X, rdf:type, ulan:'Person')), bag([Work], rdf(Work, dc:creator, X)) ])).

33 ICLP-09 Advantages of Prolog over SPARQL Flexibility and reuse:   We can mix with arbitrary Prolog code   We can name and combine queries   We can do recursion This is similar to SQL vs. Prolog, but   Processing RDF involves pattern-matching, rules and recursion, while datatypes are less important.

34 ICLP-09 Prolog ↔ SPARQL One SPARQL query to get result One SPARQL query to get result Multiple SPARQL queries Multiple SPARQL queries Fetch triple-by-triple and process in client Fetch triple-by-triple and process in client

35 ICLP-09 Reasoning Reasoning is connected to a language (RDFS, OWL, SWRL, …) Reasoning derives facts from the triple store that are not explicitly provided in the dataset.

36 ICLP-09 Options for Reasoning (I) Reasoning adds (virtual) triples (entailment):   The only API is rdf(S,P,O)   Forward reasoning Easy to implement Difficult to handle database updates Can explode using richer languages (e.g., OWL)   Backward reasoning Non-termination under SLD resolution Need for optimization of conjunctions Easy to provide alternative reasoners

37 ICLP-09 Alternative entailment reasoners as Prolog modules Core RDF-DB rdf/3 Core RDF-DB rdf/3 RDFS rdf/3 RDFS rdf/3 OWL-Horst rdf/3 OWL-Horst rdf/3....

38 ICLP-09 Options for Reasoning (II) Based on Abstract Syntax   Dedicated high-level API   Forward reasoning Transformation (Thea OWL(-2) library)   Backward reasoning Thea:   By Vangelis Vassiliadis and Chris Mungall

39 ICLP-09 Reasoning with Abstract Syntax API Core RDF-DB rdf/3 Core RDF-DB rdf/3.... Thea (OWL-2) subClassOf/2 Thea (OWL-2) subClassOf/2 Forward: Transformation RDFS rdfs_individual_of/2 rdfs_subclass_of/2... RDFS rdfs_individual_of/2 rdfs_subclass_of/2... Backward: Prolog rules

40 ICLP-09 Options for reasoning (summary) Entailment-based   Uniform query API → app can switch entailment   Query API is low-level   (Using forward reasoning) entailed graph is added to database   → Difficult to deal with multiple languages Abstract-syntax based   Each language has its own query API   Query API is high-level   Easy to deal with multiple languages

41 ICLP-09 A closer look at the RDF store: requirements Efficient in any instantiation-pattern (full indexing) Deal with property-hierarchy Deal with owl:sameAs Literal indexing (prefix, full-text,...) Scalable to M-triples

42 ICLP-09 Options for rdf/3 (I: Using Prolog) Prolog dynamic database   We need multiple indexes (e.g., YAP)   Cannot exploit domain-specific aspects: Property-hierarchy matching Facts are ground, unordered and support limited types   Hard to provide statistics for the optimizer because they are also domain-specific

43 ICLP-09 Options for rdf/3 (II: Using an external store) External store   Slow connection (need to intern/extern URI-as- atom)   We do not want (most of) the reasoning

44 ICLP-09 Options for rdf/3 (III: Dedicated C) Using dedicated C-library   Can optimize for space based on limited datatypes   Use atom-handles in the database (no intern/extern)   Sort literals in an AVL-tree (prefix search)   Keep counts (for query optimizations)   Fast binary load/save format

45 ICLP-09 RDF Processing (summary) Expressing graph-patterns mixed with auxiliary Prolog is easy   This is enough for a large part of RDF processing in semantic web applications Reasoning   Forward closure (easy, big, no changes)   Backward: termination issues (tabling can help)   Extending rdf/3 ↔ Using abstract language

46 ICLP-09 Part III Web-Applications

47 ICLP-09 Database Web-Application Reference Architecture (Three Tier Model) Presentation generation Presentation generation Application Logic Application Logic Web 2.0 JavaScript Web Browser Web 3.0 (Semantic Web) RDF Linked Data

48 ICLP-09 Protocols and Standards RDF Database RDF Database Application Logic Application Logic HTTP SPARQL Prolog HTTP ?

49 ICLP-09 Prolog-to-HTTP Tomcat.NET... Tomcat.NET... JPL InterProlog PrologBeans... JPL InterProlog PrologBeans... Prolog Web-ServerInterfaceApplication Need to program in Tomcat/.NET/... & Prolog Difficult deployment JPL: One process (JNI/C interface) Fast, but hard to debug InterProlog/Prologbeans/... (proprietary network) HTTP

50 ICLP-09 Prolog-to-HTTP Easy debugging Easily extend the HTTP interface Not `industry standard' But … many languages provide an HTTP server library Prolog Web-ServerApplication Prolog library HTTP Prolog library HTTP Interface

51 ICLP-09 Apache Deployment Using Apache reverse-proxy and load-balancer     ServerName   ProxyPass / Prolog VNC Port 80 Port 3040

52 ICLP-09 VNC server console

53 ICLP-09 /api/search?q=picasso&count=100 :- use_module(library(http/http_dispatch)). :- use_module(library(http/http_parameters)). :- use_module(library(http/http_json)). :- http_handler('/api/search', search, []). search(Request) :- http_parameters(Request, [ q(Q, []), start(S, [default(0)]), count(C, [default(25)]) ]), search(Q, S, C, Results), reply_json(Results).

54 ICLP-09 Summary HTTP support Writing the HTTP-server in Prolog gives us:   Good single-language development environment   Incremental compilation: life-updating the server   Deployment can be direct or through a proxy Not so big: 12,000 lines for   Core HTTP client and server   HTML and JSON read/write   Parameters, sessions, authorization, logging

55 ICLP-09 Part IV Creating Interactive Web Applications using Prolog

56 ICLP-09 Web of Documents (Original drawing by Tim Burners Lee)

57 ICLP-09 Interactive Web-Applications Server needs to keep track of client (sessions) Client needs light-weight updates of the interface … but HTTP is state-less …

58 ICLP-09 Introducing State Negotiate a session-key between client and server   Server associates state with this key   Client modifies the interface using JavaScript → AJAX

59 ICLP-09 What is AJAX not?

60 ICLP-09 Case Create a web-interface for the N-queens problem   Interaction Select size of board Select implementation (Prolog ↔ clp(FD)) Get first solution Get next solution or stop   State in backtrackable Prolog program By   Torbjörn Lager, Markus Triska, Jan Wielemaker

61 ICLP-09 Step I: create initial page DOM Browser JavaScript WEB Application Server (HTTP) WEB Application Server (HTTP) Initial HTML Page Builds initial DOM Initial HTML +JS

62 ICLP-09

63 ICLP-09 DOM Browser JavaScript WEB Application Server (HTTP) WEB Application Server (HTTP) Initial HTML Page Builds initial DOM Initial HTML +JS Local Interaction Step II: Add local interaction

64 ICLP-09

65 ICLP-09 Options... <input type="button" id='opts' name="options" value="Options …" onClick="showOptions(true)"> function showOptions(show) { document.getElementById("options").style.display = show ? "block" : "none"; }

66 ICLP-09 OK: applyOptions() function applyOptions() { var size = parseInt(document.getElementById("size").value); if ( document.getElementById("queens").checked == true ) { algorithm = "queens"; } else { algorithm = "clpfd_queens"; } if ( size 40 ) { alert("Size must be in the range 2..40"); } else { boardsize = size; showOptions(false); document.getElementById("N").innerHTML = size; document.getElementById("who").innerHTML = (algorithm == "queens" ? "Prolog" : "clp(FD)"); document.getElementById("board").innerHTML = board(boardsize, boardwidth); } Set client state in global variables Set client state in global variables Update the interface by changing the DOM Update the interface by changing the DOM → NO server interaction

67 ICLP-09

68 ICLP-09 Step-III: Add server interaction DOM Browser JavaScript WEB Application Server (HTTP) WEB Application Server (HTTP) Initial HTML Page Builds initial DOM Initial HTML +JS Local Interaction Server Interaction

69 ICLP-09 First... function first() { working(); YAHOO.util.Connect.asyncRequest( 'GET', "/prolog/first?goal="+algorithm+"("+boardsize+",L)", { success: update }); } <input type="button" id='first' name="first" value="First" onClick="first()"> Server request What to do when the server responds? What to do when the server responds?

70 ICLP-09 Client code-fragment: handle response function update(o) { var solution = YAHOO.lang.JSON.parse(o.responseText); if (solution.solution) { if ( solution.next == true ) { setButtons(true); } else { setButtons(false); } clearBoard(); setQueens(solution.solution.args[1].value); document.getElementById("msg").innerHTML = "CPU: " + solution.time.toPrecision(2) + " sec."; } else if ( solution.error ) { setButtons(false); document.getElementById("msg").innerHTML = " "+solution.error+" "; } else { setButtons(false); document.getElementById("msg").innerHTML = "There are no more solutions."; } Process as JSON Update DOM based on JSON reply Update DOM based on JSON reply

71 ICLP-09 setQueens() Replace DOM fragment Replace DOM fragment function setQueens(squareList) { for (var i = 1; i <= boardsize; i++) { var id = i + "-" + (squareList[i-1].value); document.getElementById(id).innerHTML = " "; }

72 ICLP-09

73 ICLP-09 Backtracking state in the server Thread session-1 Thread session-1 Thread session-N Thread session-N HTTP Worker thread HTTP Worker thread JSON Document JSON Document Backtrack Prolog-term GET /prolog/next session-id=1 State

74 ICLP-09 Backtracking state solve(Goal, Bindings, ThreadID) :- thread_self(Me), thread_statistics(Me, cputime, T0a), State = client(ThreadID, T0a), solve_2(Goal, Bindings, Solution), State = client(Client, T0), thread_statistics(Me, cputime, T1), Time is T1 - T0, solution_time(Solution, Time), nb_setarg(2, State, T1), debug(prolog_server, 'Sending: ~q', [Solution]), thread_send_message(Client, Solution), solution_type(Solution, Type), ( Type == last -> true ; Type == true -> catch( thread_get_message(command(From, Command)), _, Command = stop), debug(prolog_server, 'Command: ~q', [Command]), nb_setarg(1, State, From), Command == stop ; true ). (Guarded) actual goal (Guarded) actual goal Send reply Wait for user Wait for user

75 ICLP-09 AJAX has many architectures From

76 ICLP-09 Where does the JavaScript come from? Widget Library AjaxAnywhere, MochiKit, YUI,... Widget Library AjaxAnywhere, MochiKit, YUI,... User Code - Instantiation - Set attributes - Refine methods User Code - Instantiation - Set attributes - Refine methods

77 ICLP-09 Options for generating application JavaScript Write a JavaScript file and link it from the HTML page   Code is in two places → Good split if API is stable Poor for prototyping and often changing APIs Write JavaScript in Prolog strings and include in page   Messy syntax (Python """long string""")

78 ICLP-09 Generate from Prolog terms? Works well for HTML (e.g., html_write, PiLLoW) But, JavaScript customization often places code- fragments in object-properties   No simply interface such as e.g., XPCE: Create/Set property/Call method   A full mapping of JS code to Prolog syntax is probably not transparent enough for users

79 ICLP-09 Wrap-Up The “Web of data” is out there   Prolog is an excellent tool for processing RDF The interactive “Web 2.0” is out there   Web 2.0 is (relatively) language independent   Prolog is a suitable server component for Web 2.0

80 ICLP-09 Future Directions Enhance RDF support:   Improve scalability   Higher level reasoning Provide tabling Generalise optimizers Enhance web-programming support   Explore cleaner integration with AJAX Merge into Prolog-Commons Initiative

81 ICLP-09 Links culture.multimedian.nl/software/ClioPatria.shtml culture.multimedian.nl/software/ClioPatria.shtml

82 Part of the Dutch knowledge- economy project MultimediaN Partners: VU, CWI, UvA, DEN, ICN People: Alia Amin, Lora Aroyo, Mark van Assem, Victor de Boer, Lynda Hardman, Michiel Hildebrand, Laura Hollink, Marco de Niet, Borys Omelayenko, Marie-France van Orsouw, Jacco van Ossenbruggen, Guus Schreiber Jos Taekema, Annemiek Teesing, Anna Tordai, Jan Wielemaker, Bob Wielinga Artchive.com, RKD, Rijksmuseum Amsterdam, Dutch ethnology musea (Amsterdam, Leiden), National Library (Bibliopolis)