Semantic Web Query Processing with Relational Databases Artem Chebotko Department of Computer Science Wayne State University
1/23/20072 Outline The Semantic Web RDF SPARQL Relational Storage of RDF data SPARQL-to-SQL Translation Relational Nested Optional Join
1/23/20073
4 My Web page as seen by a Human
1/23/20075 My Web page as seen by a Computer
1/23/20076 My Web page with Semantics Artem Chebotko
1/23/20077 The Semantic Web A Web of data (vs. a Web of documents) … machine-processable/readable data Framework for integration and combination of data from various sources Data reuse across application, organization, and community boundaries
1/23/20078 The Semantic Web “Stack”
1/23/20079 RDF RDF (Resource Description Framework) provides a common framework for representing resources and relations among them. Anything can be a resource (e.g., a person, a file, etc). RDF provides a data model and a syntax Artem Chebotko
1/23/ RDF Model RDF statement is a triple that consists of a subject, a predicate, and an object. foaf=" Artem Chebotko
1/23/ RDF Model RDF’s graph model: RDF models statements as nodes and edges in a graph Artem Chebotko foaf:name foaf:homepage foaf:img foaf:workplaceHomepage
1/23/ SPARQL SPARQL is an RDF query language Graph pattern matching Basic graph patterns, optional graph patterns, etc. PREFIX foaf: SELECT ?url FROM WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. } Query 1: Find the homepage URL of Artem Chebotko Result 1: ?url is bound to the value “ ?url
1/23/ SPARQL Query 2: Find both the homepage and weblog of Artem Chebotko PREFIX foaf: SELECT ?url ?log FROM WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. ?someone foaf:weblog ?log.} Result 2: ?url and ?log are unbound ?url?log
1/23/ SPARQL Query 3: Find (1) the homepage of Artem Chebotko and (2) his weblog if this information is available PREFIX foaf: SELECT ?url ?log FROM WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. OPTIONAL { ?someone foaf:weblog ?log.} } Result 3: ?url is bound to “ and ?log is unbound ?url?log
1/23/ SPARQL Basic semantics of OPTIONAL patterns The evaluation of an OPTIONAL clause is not obligated to succeed, and in the case of failure, no value will be returned for those unbound variables in the SELECT clause. Semantics of shared variables In general, shared variables must be bound to the same values. Variables can be shared among subjects, predicates, objects, and across each other. More complicated semantics follows …
1/23/ SPARQL Semantics of parallel OPTIONAL patterns While the failure of the evaluation of an OPTIONAL clause does not block the evaluation of a following parallel OPTIONAL clause, the success of the evaluation of an OPTIONAL clause obligates the same variables in the following parallel OPTIONAL clauses to be bound to the same values.
1/23/ SPARQL Query 4: Find (1) the homepage of Artem Chebotko and (2) his weblog if this information is available (3) his workplace homepage if this information is available PREFIX foaf: SELECT ?url ?log ?work FROM WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. OPTIONAL { ?someone foaf:weblog ?log.} OPTIONAL { ?someone foaf:workplaceHomepage ?work.} } Result 4: ?url?log?work What if … OPTIONAL { ?someone foaf:workplaceHomepage ?log.}
1/23/ SPARQL Semantics of nested OPTIONAL patterns Before an OPTIONAL clause is evaluated, all containing basic graph patterns or OPTIONAL clauses must have succeeded.
1/23/ SPARQL Query 5: Find (1) the homepage of Artem Chebotko and (2) his weblog if this information is available (3) his workplace homepage if this information is available and weblog is available PREFIX foaf: SELECT ?url ?log ?work FROM WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. OPTIONAL { ?someone foaf:weblog ?log. OPTIONAL { ?someone foaf:workplaceHomepage ?work.} } Result 5: ?url is bound to “ and ?log is unbound ?url?log?work
1/23/ Relational Storage of RDF data Increasing amount of RDF data on the Web highlights the need for its efficient and effective management. Using relational database technology as a basis for storing and querying RDF data is a reasonable choice as this technology is well understood and known to have good performance.
1/23/ Relational Storage of RDF data The simplest one Table Triples More complicated (and more efficient) storage schemas are possible subjectpredicateobject Chebotko /welcome/welc ome.jpg Homepage
1/23/ SPARQL-to-SQL Translation Problem: Relational databases “know” SQL, but not SPARQL Solution: translate SPARQL queries into equivalent SQL queries in order to access RDF data stored in a relational database Algorithm BGPtoSQL to translate a SPARQL basic graph pattern to its SQL equivalent Algorithm SPARQLtoSQL to translate SPARQL queries with arbitrary complex optional graph patterns
1/23/ BGPtoSQL Basic idea: Step 1: Assign a unique table alias to every triple pattern E.g., t1 and t2 Construct the FROM clause to contain all the table aliases WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. } FROM Triples t1, Triples t2
1/23/ BGPtoSQL Step 2: Construct the SELECT clause to contain every relational attribute that corresponds to a distinct variable WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. } SELECT t1.subject AS someone, t2.object AS url FROM Triples t1, Triples t2
1/23/ BGPtoSQL Step 3: Construct the WHERE clause to restrict attribute values to the corresponding URIs and Literals WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. } SELECT t1.subject AS someone, t2.object AS url FROM Triples t1, Triples t2 WHERE t1.predicate = ‘foaf:name’ AND t1.object = ‘Artem Chebotko’ AND t2.predicate = ‘foaf:homepage’
1/23/ BGPtoSQL Step 4: Create an inverted list for variables Finish the WHERE clause: attributes that correspond to shared variables must have same values) WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. } SELECT t1.subject AS someone, t2.object AS url FROM Triples t1, Triples t2 WHERE t1.predicate = ‘foaf:name’ AND t1.object = ‘Artem Chebotko’ AND t2.predicate = ‘foaf:homepage’ AND t1.subject = t2.subject ?someonet1.subject, t2.subject ?urlt2.object
1/23/ SPARQLtoSQL Step 1: Translate all BGPs to SQL with BGPtoSQL. E.g., q1, q2, q3, q4 SELECT ?url ?log ?topic WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. OPTIONAL { ?someone foaf:weblog ?log. OPTIONAL { ?url foaf:topic ?topic.} } OPTIONAL { ?someone ?log.} }
1/23/ SPARQLtoSQL Step 2: Join the ‘relations’ (q1, q2, q3, q4) in the order as their corresponding graph patterns appear in the query LEFT OUTER JOIN SELECT ?url ?log ?topic WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. OPTIONAL { ?someone foaf:weblog ?log. OPTIONAL { ?url foaf:topic ?topic.} } OPTIONAL { ?someone ?log.} } Q = SELECT r1.someone AS someone, r1.url AS url, r2.log AS log FROM (q1) r1 LEFT OUTER JOIN (q2) r2 ON (r1.someone = r2.someone)
1/23/ SPARQLtoSQL SELECT ?url ?log ?topic WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. OPTIONAL { ?someone foaf:weblog ?log. OPTIONAL { ?url foaf:topic ?topic.} } OPTIONAL { ?someone ?log.} } Q = SELECT r11.someone AS someone, r11.url AS url, r11.log AS log, r22.topic AS topic FROM (Q) r11 LEFT OUTER JOIN (q3) r22 ON ( r11.url = r22.url AND r11.log IS NOT NULL)
1/23/ SPARQLtoSQL SELECT ?url ?log ?topic WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. OPTIONAL { ?someone foaf:weblog ?log. OPTIONAL { ?url foaf:topic ?topic.} } OPTIONAL { ?someone ?log.} } Q = SELECT r111.someone AS someone, r111.url AS url, COALESCE(r111.log,r222.log) AS log, r111.topic AS topic FROM (Q) r111 LEFT OUTER JOIN (q4) r222 ON ( r111.someone = r222.someone AND (r111.log = r222.log OR r111.log IS NULL) )
1/23/ SPARQLtoSQL Step 3: Project only required attributes (variables) SELECT ?url ?log ?topic WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. OPTIONAL { ?someone foaf:weblog ?log. OPTIONAL { ?url foaf:topic ?topic.} } OPTIONAL { ?someone ?log.} } } SELECT r.url AS url, r.log AS log, r.topic AS topic FROM (Q) r
1/23/ SPARQLtoSQL Almost complete query (need to replace q1, q2, q3, q4) SELECT r.url AS url, r.log AS log, r.topic AS topic FROM ( SELECT r111.someone AS someone, r111.url AS url, COALESCE(r111.log,r222.log) AS log, r111.topic AS topic FROM ( SELECT r11.someone AS someone, r11.url AS url, r11.log AS log, r22.topic AS topic FROM ( SELECT r1.someone AS someone, r1.url AS url, r2.log AS log FROM (q1) r1 LEFT OUTER JOIN (q2) r2 ON (r1.someone = r2.someone) ) r11 LEFT OUTER JOIN (q3) r22 ON (r11.url = r22.url AND r11.log IS NOT NULL) ) r111 LEFT OUTER JOIN (q4) r222 ON ( r111.someone = r222.someone AND (r111.log = r222.log OR r111.log IS NULL) ) ) r
1/23/ Experimental Study Dataset: WordNet, 700,000+ triples Translation algorithms are very efficient and scalable. For example, SPARQLtoSQL translated queries with less than 50 OPTIONAL clauses with one triple pattern in each in less than sec. regardless of the clause tree layout The evaluation of most sample queries in Oracle showed to be unsatisfactory (order of seconds) due to the simple relational schema being the most important reason. Note that this does not imply that the algorithms are not practical. SPARQLtoSQL does not directly depend on a particular database schema as long as the BGPtoSQL stub for the database is provided, which we believe is a reasonable expectation from existing RDF storage systems.
1/23/ Experimental Study The evaluation of sample queries in the in- memory relational database showed much better results. In these experiments, we were able to try different implementations of the left outer join based on nested-loops, sort-merge and simple hash methods.
1/23/ Relational Nested Optional Join
1/23/ New Example
1/23/ New Example Retrieve: (1) every graduate student in the RDF graph; (2) the student's advisor if this information is available; (3) the student's coadvisor if this information is available and if the student's advisor has been successfully retrieved in the previous step. In other words, the query returns students and as many advisors as possible; there is no point to return a coadvisor if there is even no advisor for a student.
1/23/ Motivation: Computation Waste with LOJ
1/23/ Nested Optional Join A novel relational operator to translate nested optional patterns An alternative to the left outer join Joins Twin Relations (base relation + optional relation) A base relation: tuples that have a potential to satisfy a join condition if used in a nested optional join. An optional relation: tuples that are guaranteed to fail a join condition if used in a nested optional join.
1/23/ SPARQL-to-SQL Translation with NOJ
1/23/ Nested Optional Join NOJ vs. LOJ the NOJ allows the processing of the tuples that are guaranteed to be NULL padded very efficiently, in linear time the NOJ does not require the NOT NULL check to return correct results NOJ algorithms nested-loops NOJ algorithm NL-NOJ sort-merge NOJ algorithm SM-NOJ simple hash NOJ algorithm SH-NOJ.
1/23/ Nested Optional Join Queries with joins with low selectivity factors (<0.0002)
1/23/ Nested Optional Join for in-memory evaluation: JSF <= 0.005, SH-NOJ JSF >= 0.8, NL-NOJ < JSF < 0.8, SM-NOJ
1/23/ Possible Future Work Extending our work to support other SPARQL constructs, such as UNION, FILTER, etc. Adding intelligence to our SPARQL-to-SQL translation to support the nested optional join. Investigating possible optimizations of parallel optional graph patterns. Defining the relational algebra for SPARQL with the support of nested and parallel optional joins. … and more
1/23/ References Artem Chebotko, Mustafa Atay, Shiyong Lu and Farshad Fotouhi "Extending Relational Databases with a Nested Optional Join for Efficient Semantic Web Query Processing". Technical Report TR-DB CLJF, Department of Computer Science, Wayne State University, November, DownloadDownload Artem Chebotko, Shiyong Lu, Hasan M. Jamil and Farshad Fotouhi "Semantics Preserving SPARQL-to-SQL Query Translation for Optional Graph Patterns". Technical Report TR-DB CLJF, Department of Computer Science, Wayne State University, May, DownloadDownload
1/23/ Acknowledgements Dr. Shiyong Lu, Dr. Farshad Fotouhi, Dr. Hasan Jamil, Dr. Mustafa Atay, Oracle DBA Shwetal Joshi Questions? Thank you!