Download presentation
Presentation is loading. Please wait.
1
Semantic Web Query Processing with Relational Databases Artem Chebotko artem@cs.wayne.edu Department of Computer Science Wayne State University
2
1/23/20072 Outline The Semantic Web RDF SPARQL Relational Storage of RDF data SPARQL-to-SQL Translation Relational Nested Optional Join
3
1/23/20073
4
4 My Web page as seen by a Human
5
1/23/20075 My Web page as seen by a Computer
6
1/23/20076 My Web page with Semantics Artem Chebotko
7
1/23/20077 The Semantic Web A Web of data (vs. a Web of documents) … machine-processable/readable data Framework for integration and combination of data from various sources Data reuse across application, organization, and community boundaries
8
1/23/20078 The Semantic Web “Stack”
9
1/23/20079 RDF RDF (Resource Description Framework) provides a common framework for representing resources and relations among them. Anything can be a resource (e.g., a person, a file, etc). RDF provides a data model and a syntax Artem Chebotko
10
1/23/200710 RDF Model RDF statement is a triple that consists of a subject, a predicate, and an object. foaf="http://xmlns.com/foaf/0.1/" Artem Chebotko
11
1/23/200711 RDF Model RDF’s graph model: RDF models statements as nodes and edges in a graph. http://www.cs.wayne.edu/~artem/ID http://www.cs.wayne.edu/~artem http://www.cs.wayne.edu/~artem/main/welcome/welcome.jpg http://www.cs.wayne.edu Artem Chebotko foaf:name foaf:homepage foaf:img foaf:workplaceHomepage
12
1/23/200712 SPARQL SPARQL is an RDF query language Graph pattern matching Basic graph patterns, optional graph patterns, etc. PREFIX foaf: SELECT ?url FROM WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. } Query 1: Find the homepage URL of Artem Chebotko Result 1: ?url is bound to the value “http://www.cs.wayne.edu/~artem” ?url http://www.cs.wayne.edu/~artem
13
1/23/200713 SPARQL Query 2: Find both the homepage and weblog of Artem Chebotko PREFIX foaf: SELECT ?url ?log FROM WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. ?someone foaf:weblog ?log.} Result 2: ?url and ?log are unbound ?url?log
14
1/23/200714 SPARQL Query 3: Find (1) the homepage of Artem Chebotko and (2) his weblog if this information is available PREFIX foaf: SELECT ?url ?log FROM WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. OPTIONAL { ?someone foaf:weblog ?log.} } Result 3: ?url is bound to “http://www.cs.wayne.edu/~artem” and ?log is unbound ?url?log http://www.cs.wayne.edu/~artem
15
1/23/200715 SPARQL Basic semantics of OPTIONAL patterns The evaluation of an OPTIONAL clause is not obligated to succeed, and in the case of failure, no value will be returned for those unbound variables in the SELECT clause. Semantics of shared variables In general, shared variables must be bound to the same values. Variables can be shared among subjects, predicates, objects, and across each other. More complicated semantics follows …
16
1/23/200716 SPARQL Semantics of parallel OPTIONAL patterns While the failure of the evaluation of an OPTIONAL clause does not block the evaluation of a following parallel OPTIONAL clause, the success of the evaluation of an OPTIONAL clause obligates the same variables in the following parallel OPTIONAL clauses to be bound to the same values.
17
1/23/200717 SPARQL Query 4: Find (1) the homepage of Artem Chebotko and (2) his weblog if this information is available (3) his workplace homepage if this information is available PREFIX foaf: SELECT ?url ?log ?work FROM WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. OPTIONAL { ?someone foaf:weblog ?log.} OPTIONAL { ?someone foaf:workplaceHomepage ?work.} } Result 4: ?url?log?work http://www.cs.wayne.edu/~artemhttp://www.cs.wayne.edu What if … OPTIONAL { ?someone foaf:workplaceHomepage ?log.}
18
1/23/200718 SPARQL Semantics of nested OPTIONAL patterns Before an OPTIONAL clause is evaluated, all containing basic graph patterns or OPTIONAL clauses must have succeeded.
19
1/23/200719 SPARQL Query 5: Find (1) the homepage of Artem Chebotko and (2) his weblog if this information is available (3) his workplace homepage if this information is available and weblog is available PREFIX foaf: SELECT ?url ?log ?work FROM WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. OPTIONAL { ?someone foaf:weblog ?log. OPTIONAL { ?someone foaf:workplaceHomepage ?work.} } Result 5: ?url is bound to “http://www.cs.wayne.edu/~artem” and ?log is unbound ?url?log?work http://www.cs.wayne.edu/~artem
20
1/23/200720 Relational Storage of RDF data Increasing amount of RDF data on the Web highlights the need for its efficient and effective management. Using relational database technology as a basis for storing and querying RDF data is a reasonable choice as this technology is well understood and known to have good performance.
21
1/23/200721 Relational Storage of RDF data The simplest one Table Triples More complicated (and more efficient) storage schemas are possible subjectpredicateobject http://www.cs.wayne.edu/~artem/IDfoaf:nameArtem Chebotko http://www.cs.wayne.edu/~artem/IDfoaf:homepagehttp://www.cs.wayne.edu/~artem http://www.cs.wayne.edu/~artem/IDfoaf:imghttp://www.cs.wayne.edu/~artem/main /welcome/welc ome.jpg http://www.cs.wayne.edu/~artem/IDfoaf:workplace Homepage http://www.cs.wayne.edu
22
1/23/200722 SPARQL-to-SQL Translation Problem: Relational databases “know” SQL, but not SPARQL Solution: translate SPARQL queries into equivalent SQL queries in order to access RDF data stored in a relational database Algorithm BGPtoSQL to translate a SPARQL basic graph pattern to its SQL equivalent Algorithm SPARQLtoSQL to translate SPARQL queries with arbitrary complex optional graph patterns
23
1/23/200723 BGPtoSQL Basic idea: Step 1: Assign a unique table alias to every triple pattern E.g., t1 and t2 Construct the FROM clause to contain all the table aliases WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. } FROM Triples t1, Triples t2
24
1/23/200724 BGPtoSQL Step 2: Construct the SELECT clause to contain every relational attribute that corresponds to a distinct variable WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. } SELECT t1.subject AS someone, t2.object AS url FROM Triples t1, Triples t2
25
1/23/200725 BGPtoSQL Step 3: Construct the WHERE clause to restrict attribute values to the corresponding URIs and Literals WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. } SELECT t1.subject AS someone, t2.object AS url FROM Triples t1, Triples t2 WHERE t1.predicate = ‘foaf:name’ AND t1.object = ‘Artem Chebotko’ AND t2.predicate = ‘foaf:homepage’
26
1/23/200726 BGPtoSQL Step 4: Create an inverted list for variables Finish the WHERE clause: attributes that correspond to shared variables must have same values) WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. } SELECT t1.subject AS someone, t2.object AS url FROM Triples t1, Triples t2 WHERE t1.predicate = ‘foaf:name’ AND t1.object = ‘Artem Chebotko’ AND t2.predicate = ‘foaf:homepage’ AND t1.subject = t2.subject ?someonet1.subject, t2.subject ?urlt2.object
27
1/23/200727 SPARQLtoSQL Step 1: Translate all BGPs to SQL with BGPtoSQL. E.g., q1, q2, q3, q4 SELECT ?url ?log ?topic WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. OPTIONAL { ?someone foaf:weblog ?log. OPTIONAL { ?url foaf:topic ?topic.} } OPTIONAL { ?someone http://www.example.org/blog ?log.} }
28
1/23/200728 SPARQLtoSQL Step 2: Join the ‘relations’ (q1, q2, q3, q4) in the order as their corresponding graph patterns appear in the query LEFT OUTER JOIN SELECT ?url ?log ?topic WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. OPTIONAL { ?someone foaf:weblog ?log. OPTIONAL { ?url foaf:topic ?topic.} } OPTIONAL { ?someone http://www.example.org/blog ?log.} } Q = SELECT r1.someone AS someone, r1.url AS url, r2.log AS log FROM (q1) r1 LEFT OUTER JOIN (q2) r2 ON (r1.someone = r2.someone)
29
1/23/200729 SPARQLtoSQL SELECT ?url ?log ?topic WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. OPTIONAL { ?someone foaf:weblog ?log. OPTIONAL { ?url foaf:topic ?topic.} } OPTIONAL { ?someone http://www.example.org/blog ?log.} } Q = SELECT r11.someone AS someone, r11.url AS url, r11.log AS log, r22.topic AS topic FROM (Q) r11 LEFT OUTER JOIN (q3) r22 ON ( r11.url = r22.url AND r11.log IS NOT NULL)
30
1/23/200730 SPARQLtoSQL SELECT ?url ?log ?topic WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. OPTIONAL { ?someone foaf:weblog ?log. OPTIONAL { ?url foaf:topic ?topic.} } OPTIONAL { ?someone http://www.example.org/blog ?log.} } Q = SELECT r111.someone AS someone, r111.url AS url, COALESCE(r111.log,r222.log) AS log, r111.topic AS topic FROM (Q) r111 LEFT OUTER JOIN (q4) r222 ON ( r111.someone = r222.someone AND (r111.log = r222.log OR r111.log IS NULL) )
31
1/23/200731 SPARQLtoSQL Step 3: Project only required attributes (variables) SELECT ?url ?log ?topic WHERE { ?someone foaf:name “Artem Chebotko”. ?someone foaf:homepage ?url. OPTIONAL { ?someone foaf:weblog ?log. OPTIONAL { ?url foaf:topic ?topic.} } OPTIONAL { ?someone http://www.example.org/blog ?log.} } } SELECT r.url AS url, r.log AS log, r.topic AS topic FROM (Q) r
32
1/23/200732 SPARQLtoSQL Almost complete query (need to replace q1, q2, q3, q4) SELECT r.url AS url, r.log AS log, r.topic AS topic FROM ( SELECT r111.someone AS someone, r111.url AS url, COALESCE(r111.log,r222.log) AS log, r111.topic AS topic FROM ( SELECT r11.someone AS someone, r11.url AS url, r11.log AS log, r22.topic AS topic FROM ( SELECT r1.someone AS someone, r1.url AS url, r2.log AS log FROM (q1) r1 LEFT OUTER JOIN (q2) r2 ON (r1.someone = r2.someone) ) r11 LEFT OUTER JOIN (q3) r22 ON (r11.url = r22.url AND r11.log IS NOT NULL) ) r111 LEFT OUTER JOIN (q4) r222 ON ( r111.someone = r222.someone AND (r111.log = r222.log OR r111.log IS NULL) ) ) r
33
1/23/200733 Experimental Study Dataset: WordNet, 700,000+ triples Translation algorithms are very efficient and scalable. For example, SPARQLtoSQL translated queries with less than 50 OPTIONAL clauses with one triple pattern in each in less than 0.001 sec. regardless of the clause tree layout The evaluation of most sample queries in Oracle showed to be unsatisfactory (order of seconds) due to the simple relational schema being the most important reason. Note that this does not imply that the algorithms are not practical. SPARQLtoSQL does not directly depend on a particular database schema as long as the BGPtoSQL stub for the database is provided, which we believe is a reasonable expectation from existing RDF storage systems.
34
1/23/200734 Experimental Study The evaluation of sample queries in the in- memory relational database showed much better results. In these experiments, we were able to try different implementations of the left outer join based on nested-loops, sort-merge and simple hash methods.
35
1/23/200735 Relational Nested Optional Join
36
1/23/200736 New Example
37
1/23/200737 New Example Retrieve: (1) every graduate student in the RDF graph; (2) the student's advisor if this information is available; (3) the student's coadvisor if this information is available and if the student's advisor has been successfully retrieved in the previous step. In other words, the query returns students and as many advisors as possible; there is no point to return a coadvisor if there is even no advisor for a student.
38
1/23/200738 Motivation: Computation Waste with LOJ
39
1/23/200739 Nested Optional Join A novel relational operator to translate nested optional patterns An alternative to the left outer join Joins Twin Relations (base relation + optional relation) A base relation: tuples that have a potential to satisfy a join condition if used in a nested optional join. An optional relation: tuples that are guaranteed to fail a join condition if used in a nested optional join.
40
1/23/200740 SPARQL-to-SQL Translation with NOJ
41
1/23/200741 Nested Optional Join NOJ vs. LOJ the NOJ allows the processing of the tuples that are guaranteed to be NULL padded very efficiently, in linear time the NOJ does not require the NOT NULL check to return correct results NOJ algorithms nested-loops NOJ algorithm NL-NOJ sort-merge NOJ algorithm SM-NOJ simple hash NOJ algorithm SH-NOJ.
42
1/23/200742 Nested Optional Join Queries with joins with low selectivity factors (<0.0002)
43
1/23/200743 Nested Optional Join for in-memory evaluation: JSF <= 0.005, SH-NOJ JSF >= 0.8, NL-NOJ 0.005 < JSF < 0.8, SM-NOJ
44
1/23/200744 Possible Future Work Extending our work to support other SPARQL constructs, such as UNION, FILTER, etc. Adding intelligence to our SPARQL-to-SQL translation to support the nested optional join. Investigating possible optimizations of parallel optional graph patterns. Defining the relational algebra for SPARQL with the support of nested and parallel optional joins. … and more
45
1/23/200745 References Artem Chebotko, Mustafa Atay, Shiyong Lu and Farshad Fotouhi "Extending Relational Databases with a Nested Optional Join for Efficient Semantic Web Query Processing". Technical Report TR-DB-052006-CLJF, Department of Computer Science, Wayne State University, November, 2006. DownloadDownload Artem Chebotko, Shiyong Lu, Hasan M. Jamil and Farshad Fotouhi "Semantics Preserving SPARQL-to-SQL Query Translation for Optional Graph Patterns". Technical Report TR-DB-052006-CLJF, Department of Computer Science, Wayne State University, May, 2006. DownloadDownload
46
1/23/200746 Acknowledgements Dr. Shiyong Lu, Dr. Farshad Fotouhi, Dr. Hasan Jamil, Dr. Mustafa Atay, Oracle DBA Shwetal Joshi Questions? Thank you!
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.