Semantic Web Query Processing with Relational Databases Artem Chebotko Department of Computer Science Wayne State University.

Slides:



Advertisements
Similar presentations
1 ICS-FORTH EU-NSF Semantic Web Workshop 3-5 Oct Christophides Vassilis Database Technology for the Semantic Web Vassilis Christophides Dimitris Plexousakis.
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.
Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
1 Evaluating Conjunctive Triple Pattern Queries over Large Structured Overlay Networks Erietta Liarou, Stratos Idreos, and Manolis Koubarakis Waled.
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
RDF Tutorial.
Store RDF Triples In A Scalable Way Liu Long & Liu Chunqiu.
© Copyright IBM Corporation 2014 Getting started with Rational Engineering Lifecycle Manager queries Andy Lapping – Technical sales and solutions Joanne.
RDF Databases By: Chris Halaschek. Outline Motivation / Requirements Storage Issues Sesame General Introduction Architecture Scalability RQL Introduction.
Storing, Indexing and Querying Large Provenance Data Sets as RDF Graphs in Apache HBase Artem Chebotko Joint work with John Abraham and Pearl Brazier University.
Analyzing Minerva1 AUTORI: Antonello Ercoli Alessandro Pezzullo CORSO: Seminari di Ingegneria del SW DOCENTE: Prof. Giuseppe De Giacomo.
SPARQL for Querying PML Data Jitin Arora. Overview SPARQL: Query Language for RDF Graphs W3C Recommendation since 15 January 2008 Outline: Basic Concepts.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Module 2b: Modeling Information Objects and Relationships IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.
Logics for Data and Knowledge Representation SPARQL Protocol and RDF Query Language (SPARQL) Feroz Farazi.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Managing & Integrating Enterprise Data with Semantic Technologies Susie Stephens Principal Product Manager, Oracle
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Introduction to SPARQL. Acknowledgements This presentation is based on the W3C Candidate Recommendation “SPARQL Query Language for RDF” from
CSE314 Database Systems More SQL: Complex Queries, Triggers, Views, and Schema Modification Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
SPARQL Semantic Web - Spring 2008 Computer Engineering Department Sharif University of Technology.
Chapter 3 Querying RDF stores with SPARQL. Why an RDF Query Language? Why not use an XML query language? XML at a lower level of abstraction than RDF.
Database Support for Semantic Web Masoud Taghinezhad Omran Sharif University of Technology Computer Engineering Department Fall.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
KIT – University of the State of Baden-Württemberg and National Large-scale Research Center of the Helmholtz Association Institute of Applied Informatics.
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, Bhavani Thuraisingham University.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Lesley Charles November 23, 2009.
SPARQL AN RDF Query Language. SPARQL SPARQL is a recursive acronym for SPARQL Protocol And Rdf Query Language SPARQL is the SQL for RDF Example: PREFIX.
Query Processing and Optimization
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
Master Informatique 1 Semantic Technologies Part 7SPARQL 1.1 Werner Nutt.
SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.
Large-scale Linked Data Management Marko Grobelnik, Andreas Harth (Günter Ladwig), Dumitru Roman Big Linked Data Tutorial Semantic Days 2012.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
RDF languages and storages part 1 - expressivness Maciej Janik Conrad Ibanez CSCI 8350, Fall 2004.
1 CSCE Database Systems Anxiao (Andrew) Jiang The Database Language SQL.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
05/01/2016 SPARQL SPARQL Protocol and RDF Query Language S. Garlatti.
Of 38 lecture 6: rdf – axiomatic semantics and query.
CSCI Query Processing1 QUERY PROCESSING & OPTIMIZATION Dr. Awad Khalil Computer Science Department AUC.
Advance Database Systems Query Optimization Ch 15 Department of Computer Science The University of Lahore.
An Effective SPARQL Support over Relational Database Jing Lu, Feng Cao, Li Ma, Yong Yu, Yue Pan SWDB-ODBIS 2007 SNU IDB Lab. Hyewon Lim July 30 th, 2009.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Query Processing – Implementing Set Operations and Joins Chap. 19.
RDF storages and indexes Maciej Janik September 1, 2005 Enterprise Integration – Semantic Web.
Author: Akiyoshi Matonoy, Toshiyuki Amagasay, Masatoshi Yoshikawaz, Shunsuke Uemuray.
Chapter 13: Query Processing
Chapter 04 Semantic Web Application Architecture 23 November 2015 A Team 오혜성, 조형헌, 권윤, 신동준, 이인용.
Database Applications (15-415) DBMS Internals- Part IX Lecture 20, March 31, 2016 Mohammad Hammoud.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
More SQL: Complex Queries,
CC La Web de Datos Primavera 2017 Lecture 7: SPARQL [i]
SPARQL.
Distributed Storage and Querying Techniques for a Semantic Web of Scientific Workflow Provenance The ProvBase System Artem Chebotko (joint work with.
SPARQL SPARQL Protocol and RDF Query Language
SPARQL: A query language for RDF
Logics for Data and Knowledge Representation
File Processing : Query Processing
CC La Web de Datos Primavera 2016 Lecture 7: SPARQL (1.0)
More SQL: Complex Queries, Triggers, Views, and Schema Modification
G-CORE: A Core for Future Graph Query Languages
Logics for Data and Knowledge Representation
On Provenance of Queries on Linked Web Data
Presentation transcript:

Semantic Web Query Processing with Relational Databases Artem Chebotko Department of Computer Science Wayne State University

1/23/20072 Outline  The Semantic Web  RDF  SPARQL  Relational Storage of RDF data  SPARQL-to-SQL Translation  Relational Nested Optional Join

1/23/20073

4 My Web page as seen by a Human

1/23/20075 My Web page as seen by a Computer

1/23/20076 My Web page with Semantics Artem Chebotko

1/23/20077 The Semantic Web  A Web of data (vs. a Web of documents)  … machine-processable/readable data  Framework for integration and combination of data from various sources  Data reuse across application, organization, and community boundaries

1/23/20078 The Semantic Web “Stack”

1/23/20079 RDF  RDF (Resource Description Framework) provides a common framework for representing resources and relations among them. Anything can be a resource (e.g., a person, a file, etc).  RDF provides a data model and a syntax Artem Chebotko

1/23/ RDF Model  RDF statement is a triple that consists of a subject, a predicate, and an object.  foaf=" Artem Chebotko

1/23/ RDF Model  RDF’s graph model: RDF models statements as nodes and edges in a graph Artem Chebotko foaf:name foaf:homepage foaf:img foaf:workplaceHomepage

1/23/ SPARQL  SPARQL is an RDF query language  Graph pattern matching  Basic graph patterns, optional graph patterns, etc. PREFIX foaf: SELECT ?url FROM WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. } Query 1: Find the homepage URL of Artem Chebotko Result 1: ?url is bound to the value “ ?url

1/23/ SPARQL Query 2: Find both the homepage and weblog of Artem Chebotko PREFIX foaf: SELECT ?url ?log FROM WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. ?someone foaf:weblog ?log.} Result 2: ?url and ?log are unbound ?url?log

1/23/ SPARQL Query 3: Find (1) the homepage of Artem Chebotko and (2) his weblog if this information is available PREFIX foaf: SELECT ?url ?log FROM WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. OPTIONAL { ?someone foaf:weblog ?log.} } Result 3: ?url is bound to “ and ?log is unbound ?url?log

1/23/ SPARQL  Basic semantics of OPTIONAL patterns  The evaluation of an OPTIONAL clause is not obligated to succeed, and in the case of failure, no value will be returned for those unbound variables in the SELECT clause.  Semantics of shared variables  In general, shared variables must be bound to the same values. Variables can be shared among subjects, predicates, objects, and across each other.  More complicated semantics follows …

1/23/ SPARQL  Semantics of parallel OPTIONAL patterns  While the failure of the evaluation of an OPTIONAL clause does not block the evaluation of a following parallel OPTIONAL clause, the success of the evaluation of an OPTIONAL clause obligates the same variables in the following parallel OPTIONAL clauses to be bound to the same values.

1/23/ SPARQL Query 4: Find (1) the homepage of Artem Chebotko and (2) his weblog if this information is available (3) his workplace homepage if this information is available PREFIX foaf: SELECT ?url ?log ?work FROM WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. OPTIONAL { ?someone foaf:weblog ?log.} OPTIONAL { ?someone foaf:workplaceHomepage ?work.} } Result 4: ?url?log?work What if … OPTIONAL { ?someone foaf:workplaceHomepage ?log.}

1/23/ SPARQL  Semantics of nested OPTIONAL patterns  Before an OPTIONAL clause is evaluated, all containing basic graph patterns or OPTIONAL clauses must have succeeded.

1/23/ SPARQL Query 5: Find (1) the homepage of Artem Chebotko and (2) his weblog if this information is available (3) his workplace homepage if this information is available and weblog is available PREFIX foaf: SELECT ?url ?log ?work FROM WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. OPTIONAL { ?someone foaf:weblog ?log. OPTIONAL { ?someone foaf:workplaceHomepage ?work.} } Result 5: ?url is bound to “ and ?log is unbound ?url?log?work

1/23/ Relational Storage of RDF data  Increasing amount of RDF data on the Web highlights the need for its efficient and effective management.  Using relational database technology as a basis for storing and querying RDF data is a reasonable choice as this technology is well understood and known to have good performance.

1/23/ Relational Storage of RDF data  The simplest one Table Triples  More complicated (and more efficient) storage schemas are possible subjectpredicateobject Chebotko /welcome/welc ome.jpg Homepage

1/23/ SPARQL-to-SQL Translation  Problem: Relational databases “know” SQL, but not SPARQL  Solution: translate SPARQL queries into equivalent SQL queries in order to access RDF data stored in a relational database  Algorithm BGPtoSQL to translate a SPARQL basic graph pattern to its SQL equivalent  Algorithm SPARQLtoSQL to translate SPARQL queries with arbitrary complex optional graph patterns

1/23/ BGPtoSQL  Basic idea:  Step 1:  Assign a unique table alias to every triple pattern  E.g., t1 and t2  Construct the FROM clause to contain all the table aliases WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. } FROM Triples t1, Triples t2

1/23/ BGPtoSQL  Step 2:  Construct the SELECT clause to contain every relational attribute that corresponds to a distinct variable WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. } SELECT t1.subject AS someone, t2.object AS url FROM Triples t1, Triples t2

1/23/ BGPtoSQL  Step 3:  Construct the WHERE clause to restrict attribute values to the corresponding URIs and Literals WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. } SELECT t1.subject AS someone, t2.object AS url FROM Triples t1, Triples t2 WHERE t1.predicate = ‘foaf:name’ AND t1.object = ‘Artem Chebotko’ AND t2.predicate = ‘foaf:homepage’

1/23/ BGPtoSQL  Step 4:  Create an inverted list for variables  Finish the WHERE clause: attributes that correspond to shared variables must have same values) WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. } SELECT t1.subject AS someone, t2.object AS url FROM Triples t1, Triples t2 WHERE t1.predicate = ‘foaf:name’ AND t1.object = ‘Artem Chebotko’ AND t2.predicate = ‘foaf:homepage’ AND t1.subject = t2.subject ?someonet1.subject, t2.subject ?urlt2.object

1/23/ SPARQLtoSQL  Step 1:  Translate all BGPs to SQL with BGPtoSQL.  E.g., q1, q2, q3, q4 SELECT ?url ?log ?topic WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. OPTIONAL { ?someone foaf:weblog ?log. OPTIONAL { ?url foaf:topic ?topic.} } OPTIONAL { ?someone ?log.} }

1/23/ SPARQLtoSQL  Step 2:  Join the ‘relations’ (q1, q2, q3, q4) in the order as their corresponding graph patterns appear in the query  LEFT OUTER JOIN SELECT ?url ?log ?topic WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. OPTIONAL { ?someone foaf:weblog ?log. OPTIONAL { ?url foaf:topic ?topic.} } OPTIONAL { ?someone ?log.} } Q = SELECT r1.someone AS someone, r1.url AS url, r2.log AS log FROM (q1) r1 LEFT OUTER JOIN (q2) r2 ON (r1.someone = r2.someone)

1/23/ SPARQLtoSQL SELECT ?url ?log ?topic WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. OPTIONAL { ?someone foaf:weblog ?log. OPTIONAL { ?url foaf:topic ?topic.} } OPTIONAL { ?someone ?log.} } Q = SELECT r11.someone AS someone, r11.url AS url, r11.log AS log, r22.topic AS topic FROM (Q) r11 LEFT OUTER JOIN (q3) r22 ON ( r11.url = r22.url AND r11.log IS NOT NULL)

1/23/ SPARQLtoSQL SELECT ?url ?log ?topic WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. OPTIONAL { ?someone foaf:weblog ?log. OPTIONAL { ?url foaf:topic ?topic.} } OPTIONAL { ?someone ?log.} } Q = SELECT r111.someone AS someone, r111.url AS url, COALESCE(r111.log,r222.log) AS log, r111.topic AS topic FROM (Q) r111 LEFT OUTER JOIN (q4) r222 ON ( r111.someone = r222.someone AND (r111.log = r222.log OR r111.log IS NULL) )

1/23/ SPARQLtoSQL  Step 3:  Project only required attributes (variables) SELECT ?url ?log ?topic WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. OPTIONAL { ?someone foaf:weblog ?log. OPTIONAL { ?url foaf:topic ?topic.} } OPTIONAL { ?someone ?log.} } } SELECT r.url AS url, r.log AS log, r.topic AS topic FROM (Q) r

1/23/ SPARQLtoSQL  Almost complete query (need to replace q1, q2, q3, q4) SELECT r.url AS url, r.log AS log, r.topic AS topic FROM ( SELECT r111.someone AS someone, r111.url AS url, COALESCE(r111.log,r222.log) AS log, r111.topic AS topic FROM ( SELECT r11.someone AS someone, r11.url AS url, r11.log AS log, r22.topic AS topic FROM ( SELECT r1.someone AS someone, r1.url AS url, r2.log AS log FROM (q1) r1 LEFT OUTER JOIN (q2) r2 ON (r1.someone = r2.someone) ) r11 LEFT OUTER JOIN (q3) r22 ON (r11.url = r22.url AND r11.log IS NOT NULL) ) r111 LEFT OUTER JOIN (q4) r222 ON ( r111.someone = r222.someone AND (r111.log = r222.log OR r111.log IS NULL) ) ) r

1/23/ Experimental Study  Dataset: WordNet, 700,000+ triples  Translation algorithms are very efficient and scalable.  For example, SPARQLtoSQL translated queries with less than 50 OPTIONAL clauses with one triple pattern in each in less than sec. regardless of the clause tree layout  The evaluation of most sample queries in Oracle showed to be unsatisfactory (order of seconds) due to the simple relational schema being the most important reason.  Note that this does not imply that the algorithms are not practical. SPARQLtoSQL does not directly depend on a particular database schema as long as the BGPtoSQL stub for the database is provided, which we believe is a reasonable expectation from existing RDF storage systems.

1/23/ Experimental Study  The evaluation of sample queries in the in- memory relational database showed much better results.  In these experiments, we were able to try different implementations of the left outer join based on nested-loops, sort-merge and simple hash methods.

1/23/ Relational Nested Optional Join

1/23/ New Example

1/23/ New Example  Retrieve:  (1) every graduate student in the RDF graph;  (2) the student's advisor if this information is available;  (3) the student's coadvisor if this information is available and if the student's advisor has been successfully retrieved in the previous step.  In other words, the query returns students and as many advisors as possible; there is no point to return a coadvisor if there is even no advisor for a student.

1/23/ Motivation: Computation Waste with LOJ

1/23/ Nested Optional Join  A novel relational operator to translate nested optional patterns  An alternative to the left outer join  Joins Twin Relations (base relation + optional relation)  A base relation: tuples that have a potential to satisfy a join condition if used in a nested optional join.  An optional relation: tuples that are guaranteed to fail a join condition if used in a nested optional join.

1/23/ SPARQL-to-SQL Translation with NOJ

1/23/ Nested Optional Join  NOJ vs. LOJ  the NOJ allows the processing of the tuples that are guaranteed to be NULL padded very efficiently, in linear time  the NOJ does not require the NOT NULL check to return correct results  NOJ algorithms  nested-loops NOJ algorithm NL-NOJ  sort-merge NOJ algorithm SM-NOJ  simple hash NOJ algorithm SH-NOJ.

1/23/ Nested Optional Join  Queries with joins with low selectivity factors (<0.0002)

1/23/ Nested Optional Join  for in-memory evaluation:  JSF <= 0.005, SH-NOJ  JSF >= 0.8, NL-NOJ  < JSF < 0.8, SM-NOJ

1/23/ Possible Future Work  Extending our work to support other SPARQL constructs, such as UNION, FILTER, etc.  Adding intelligence to our SPARQL-to-SQL translation to support the nested optional join.  Investigating possible optimizations of parallel optional graph patterns.  Defining the relational algebra for SPARQL with the support of nested and parallel optional joins.  … and more

1/23/ References  Artem Chebotko, Mustafa Atay, Shiyong Lu and Farshad Fotouhi "Extending Relational Databases with a Nested Optional Join for Efficient Semantic Web Query Processing". Technical Report TR-DB CLJF, Department of Computer Science, Wayne State University, November, DownloadDownload  Artem Chebotko, Shiyong Lu, Hasan M. Jamil and Farshad Fotouhi "Semantics Preserving SPARQL-to-SQL Query Translation for Optional Graph Patterns". Technical Report TR-DB CLJF, Department of Computer Science, Wayne State University, May, DownloadDownload

1/23/ Acknowledgements Dr. Shiyong Lu, Dr. Farshad Fotouhi, Dr. Hasan Jamil, Dr. Mustafa Atay, Oracle DBA Shwetal Joshi Questions? Thank you!