SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
1 ICS-FORTH & Univ. of Crete SeLene November 15, 2002 A View Definition Language for the Semantic Web Maganaraki Aimilia.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
RDF Schemata (with apologies to the W3C, the plural is not ‘schemas’) CSCI 7818 – Web Technologies 14 November 2001 Van Lepthien.
The Semantic Web – WEEK 4: RDF
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
RDF Tutorial.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 3 The Basic (Flat) Relational Model.
Dr. Alexandra I. Cristea RDF.
The Semantic Web – WEEK 5: RDF Schema + Ontologies The “Layer Cake” Model – [From Rector & Horrocks Semantic Web cuurse]
How can Computer Science contribute to Research Publishing?
Module 2b: Modeling Information Objects and Relationships IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.
ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 12: Ontologies and Knowledge Representation PRINCIPLES OF DATA INTEGRATION.
A Really Brief Crash Course in Semantic Web Technologies Rocky Dunlap Spencer Rugaber Georgia Tech.
Ontologies: Making Computers Smarter to Deal with Data Kei Cheung, PhD Yale Center for Medical Informatics CBB752, February 9, 2015, Yale University.
Logics for Data and Knowledge Representation SPARQL Protocol and RDF Query Language (SPARQL) Feroz Farazi.
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Publishing data on the Web (with.
RDF: Concepts and Abstract Syntax W3C Recommendation 10 February Michael Felderer Digital Enterprise.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.
SPARQL All slides are adapted from the W3C Recommendation SPARQL Query Language for RDF Web link:
Reasoning with context in the Semantic Web … or contextualizing ontologies Fausto Giunchiglia July 23, 2004.
Introduction to SPARQL. Acknowledgements This presentation is based on the W3C Candidate Recommendation “SPARQL Query Language for RDF” from
Logics for Data and Knowledge Representation
The Semantic Web Web Science Systems Development Spring 2015.
Ultrawrap: SPARQL Execution on Relational Data Juan F. Sequeda, Daniel P. Miranker University of Texas - Austin ISWC 2009 Seoul National University Internet.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin.
Master Informatique 1 Semantic Technologies Part 11Direct Mapping Werner Nutt.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Dimitrios Skoutas Alkis Simitsis
Towards a semantic web Philip Hider. This talk  The Semantic Web vision  Scenarios  Standards  Semantic Web & RDA.
Semantic Web - an introduction By Daniel Wu (danielwujr)
SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.
A Classification of Schema-based Matching Approaches Pavel Shvaiko Meaning Coordination and Negotiation Workshop, ISWC 8 th November 2004, Hiroshima, Japan.
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall Knowledge Representation Semantic Web - Fall 2005 Computer.
RQL: RDF Query language Jianguo Lu University of Windsor The following slides are from Grigoris Antoniou, Frank van Harmelen, “A Semantic Web Primer”
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Chapter 7: Resource Description Framework (RDF) Service-Oriented Computing: Semantics, Processes, Agents – Munindar P. Singh and Michael N. Huhns, Wiley,
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
1 Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications Stuart Aitken Artificial Intelligence Applications.
Scalable Distributed Reasoning Using MapReduce Jacopo Urbani, Spyros Kotoulas, Eyal Oren, and Frank van Harmelen Department of Computer Science, Vrije.
Artificial Intelligence 2004 Ontology
Understanding RDF. 2/30 What is RDF? Resource Description Framework is an XML-based language to describe resources. A common understanding of a resource.
Chapter 7: Resource Description Framework (RDF) Service-Oriented Computing: Semantics, Processes, Agents – Munindar P. Singh and Michael N. Huhns, Wiley,
CSE314 Database Systems Lecture 3 The Relational Data Model and Relational Database Constraints Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
1 © The ATHENA Consortium. Resource Description Framework (RDF) A language for making simple statements about things (resources) Statements:
1 SPARQLing Constraints for RDF Michael Schmidt EDBT, 2008 March 28 joint work with Prof. Georg Lausen, Michael Meier.
ELIS – Multimedia Lab PREMIS OWL Sam Coppens Multimedia Lab Department of Electronics and Information Systems Faculty of Engineering Ghent University.
R Store Angelique Moscicki Oshani Seneviratne Sergio Herrero-Lopez.
Doc.: IEEE /0169r0 Submission Joe Kwak (InterDigital) Slide 1 November 2010 Slide 1 Overview of Resource Description Framework (RFD/XML) Date:
Of 38 lecture 6: rdf – axiomatic semantics and query.
THE SEMANTIC WEB By Conrad Williams. Contents  What is the Semantic Web?  Technologies  XML  RDF  OWL  Implementations  Social Networking  Scholarly.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Knowledge Technologies Manolis Koubarakis 1 Some Other Useful Features of RDF.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Of 24 lecture 11: ontology – mediation, merging & aligning.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 The Relational Model Chapter 3.
Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
The Semantic Web By: Maulik Parikh.
Building Trustworthy Semantic Webs
Adding ICs to OWL Ming Fang 07/10/2009.
Logics for Data and Knowledge Representation
Information Networks: State of the Art
Presentation transcript:

SPARQLing Constraints for RDF Michael Schmidt, joint work with Prof. Georg Lausen, Michael Meier

About… Michael Schmidt : Studies of Applied Computer Science in Saarbrücken 2006: Started my PhD in Saarbrücken with Prof. Christoph Koch  Focus on XML, XQuery, Streams Since 2007: at Freiburg University with Prof. Georg Lausen  Focus on SPARQL, RDF

Table of Contents SPARQLing Constraints for RDF  Constraints for RDF Types of constraints Encoding of constraints in RDF Satisfiability  SPARQL in the context of constraints Extracting constraints with SPARQL Checking constraints with SPARQL Exploiting constraints: Semantic Query Optimization SP 2 Bench: A SPARQL Performance Benchmark

PART I SPARQLing Constraints for RDF

SPARQLing Constraints for RDF RDF Data Format Machine-readable information Established in the Semantic Web SPARQL Query Language W3C Recommendation since January Constraints Primary and Foreign Keys Cardinality Constraints, … bases on

Why Constraints? Restricting the state space of the database Maintenance of data consistency (e.g. when data is updated) Semantic Query Optimization Better understanding of the data In our scenario: Translation of Relational Schemata to RDF without loss of information

Our Contribution Extension of RDF by constraints  Key constraints, cardinality constraints, …  Seamless integration into the RDF Framework Study of the role of SPARQL in this context  Checking constraints with SPARQL  Specification of user-defined constraints  Optimization of SPARQL queries under constraints (Semantic Query Optimization)

The RDF Data Format Three Types of Elements  URIs: represent physical or logical resources  Blank nodes: resources without fixed URI  Literals: represent values RDF Triples: (subject, predicate, object)  subject U U B  predicate U  object U U B U L

Example RDF Triple SubjectPredicateObject „Joe“ name URILiteral Graph Representation: Person1name Person1 „Joe“ RDF Triple

RDF Databases RDF Databases are Collections of Triples Currently no support for specification of primary/foreign key constraints Person1 name „Joe“ knows Person2 name „Pete“ rdf:type Student ssn „1234“ „2345“ ssn Person rdfs:subClassOf rdf:type

Mapping Relational Data to RDF namefaculty JoeCS FredCS matricname 11111John 22222Ed taught_byname JoeDB FredWeb c_ids_id Fred11111 Fred22222 TeachersStudents CoursesParticipants + NOT NULL constraint

A Naive Translation Approach Students name Teachers Courses t1 t2 s1 s2 c1 c2 “Joe“ “Fred““CS“ “11111““22222“ “John“ “Ed“ “DB“ “Web“ name matric faculty taught_by Participants p1 p2 s_id c_id “Joe“ “Fred“ “22222““Fred“ “11111“ “Fred“ rdf:type

Improving the Translation Students name Teachers Courses t1 t2 s1 s2 c1 c2 Joe Fred “CS“ “John“ “Ed“ “DB“ “Web“ name matric faculty taught_by Participants p1 p2 s_id c_id rdf:type

Encoding Primary Key Constraints Encoding of constraints in the schema layer New namespace „rdfc“ RDF Bags name Teachers t1 t2 JoeFred“CS“ name faculty T_Key rdfc:Key rdf:_1 name rdfc:Key rdf:Bag

taught_by Courses c1 c2 “DB“ “Web“ name taught_by rdfc:FKey name T_Key rdfc:Key rdf:_1 name rdfc:Key rdf:Bag name Teachers t1 t2 JoeFred“CS“ faculty C_FKey rdfc:FKey rdf:Bag rdfc:ref rdf:_1

Other Types of Constraints Let C, C 1, C 2 be classes and Q i, R i properties  Primary Keys Key(C,[Q 1,…Q n ])  Foreign Keys FKey(C 1,[Q 1,…Q n ],C 2,[R 1,…R n ])  Cardinality Constraints Min(C,n,R), Max(C,n,R) for n N  Functionality/Totality Constraints Func(C,Q), Total(C,Q)  Singleton Constraints: Single(C)

RDFS Constraints Let C i denote classes, Q i denote properties  Subclass Constraint SubC(C 1,C 2 )  Subproperty Constraint SubP(Q 1,Q 2 )  Property Domain/Range PropD(Q,C), PropR(Q,C) Restrict the state space of the database No „axioms“ that are used for inferencing

Satisfiability Given an RDF vocabulary and a set of constraints. Is there a non-empty RDF graph that satisfies the constraints? in general undecidable  Primary keys + Foreign Keys  Singleton  Max-Cardinality  Subclass + Subproperty  Property Domain + Property Range always satisfiable

Satisfiability Given an RDF vocabulary and a set of constraints. Is there a non-empty RDF graph that satisfies the constraints?  Primary keys + Foreign Keys  Singleton  Max-Cardinality  Subclass + Subproperty  Property Domain + Property Range  Min-Cardinality undecidable in general undecidable

Satisfiability Given an RDF vocabulary and a set of constraints. Is there a non-empty RDF graph that satisfies the constraints?  Unary primary keys  Unary foreign keys  Min-Cardinality + Max-Cardinality  Subclass + Subproperty  Property Domain + Property Range decidable in ExpTime in general undecidable

The SPARQL Query Language SELECT ?name ?faculty WHERE { ?teacher rdf:type Teachers. ?teacher name ?name. ?teacher faculty ?faculty. } name Teachers t1 t2 JoeFred“CS“ name faculty ?name?faculty Joe“CS“ Fred“CS“ Operator AND („.“)

The SPARQL Query Language SELECT ?name ?faculty WHERE { ?teacher rdf:type Teachers. ?teacher name ?name. ?teacher faculty ?faculty. FILTER (?name=„Joe“) } name Teachers t1 t2 JoeFred“CS“ name faculty ?name?faculty Joe“CS“ Operator FILTER

The SPARQL Query Language SELECT ?name ?faculty ?title WHERE { ?teacher rdf:type Teachers. ?teacher name ?name. ?teacher faculty ?faculty. OPTIONAL { ?teacher title ?title. } title „Professor“ ?name?faculty?title Joe“CS“ Fred“CS““Professor“ name Teachers t1 t2 JoeFred“CS“ name faculty Operator OPTIONAL

Extracting Primary Key Constraints SELECT ?keyname ?class ?keyatt WHERE { ?class rdfc:Key ?keyname. ?keyname rdf:type rdfc:Key. ?keyname ?bagrel ?keyatt. FILTER (?bagrel!=rdf:type) } ?keyname?class?keyatt T_KeyTeachersname T_Key rdfc:Key rdf:_1 name rdfc:Key rdf:Bag Teachers ……

Extracting Foreign Key Constraints SELECT ?keyname ?class ?keyatt ?ref WHERE { ?class rdfc:FKey ?keyname. ?keyname rdf:type rdfc:FKey. ?keyname ?bagrel ?keyatt. ?keyname rdfc:ref ?ref. FILTER (?bagrel!=rdf:type && ?bagrel!=rdfc:ref) } ORDER BY ?keyname taught_by Courses rdfc:FKey T_Key rdfc:Key rdf:_1 name rdfc:Key rdf:Bag Teachers C_FKey rdfc:FKey rdf:Bag rdfc:ref rdf:_1 ?keyname?class?keyatt?ref C_FKeyCoursestaught_byT_Key … …

Use SPARQL „ASK“ query form (returns „yes“ exactly if query contains a result, no otherwise) Constraint checks possible for many natural constraints  Primary Keys + Foreign Keys  Cardinality Constraints  … Checking Constraints with SPARQL A SPARQL query checks a constraint C if it returns yes for each graph that violates C, no otherwise.

Checking Constraints with SPARQL Checking primary key constraints ASK { ?x rdf:type C. ?y rdf:type C. ?x p1 ?p1; [...]; pn ?pn. ?y p1 ?p1; [...]; pn ?pn. FILTER (?x!=?y) } Key(C,[p1,...,pn]) Returns „yes“ exactly if constraint is violated.

Checking Constraints with SPARQL Checking primary key constraints (example) ASK { ?x rdf:type Teachers. ?y rdf:type Teachers. ?x name ?name. ?y name ?name FILTER (?x!=?y) } Returns „no“ (i.e., constraint holds) name Teachers t1 t2 JoeFred“CS“ name faculty

Checking Constraints with SPARQL Checking foreign key constraints ASK { ?x rdf:type C; p1 ?p1; [...]; pn ?pn. OPTIONAL { ?y rdf:type D; q1 ?p1; [...]; qn ?pn. } FILTER (!bound(?y)) } FKey(C,[p1,...,pn],D,[q1,... qn]) Returns „yes“ exactly if constraint is violated.

Semantic Query Optimization Idea: use constraint knowledge to find a more efficient query execution plan Has been studied in the context of relational and datalog databases… … and might now be applicable in the context of RDF and SPARQL

Semantic Query Optimization SELECT ?teachername ?coursename ?studentname WHERE { ?course rdf:type Courses; taught_by ?teachername; name ?coursename. ?participant rdf:type Participants; c_id ?teachername; s_id ?studentmatric. ?teacher rdf:type Teachers; name ?teachername. OPTIONAL { ?student rdf:type Students; matric ?studentmatric; name ?studentname. }

Students name Teachers Courses t1 t2 s1 s2 c1 c2 Joe Fred “CS“ “John“ “Ed“ “DB“ “Web“ name matric faculty taught_by Participants p1 p2 s_id c_id A Solution Candidate Subgraph

Semantic Query Optimization SELECT ?teachername ?coursename ?studentname WHERE { ?course rdf:type Courses; taught_by ?teachername; name ?coursename. ?participant rdf:type Participants; c_id ?teachername; s_id ?studentmatric. ?teacher rdf:type Teachers; name ?teachername. OPTIONAL { ?student rdf:type Students; matric ?studentmatric; name ?studentname. } Key(Students,[matric]) FKey(Participants, [s_id], Student, [matric]) Total(Students,[name])

Semantic Query Optimization SELECT ?teachername ?coursename ?studentname WHERE { ?course rdf:type Courses; taught_by ?teachername; name ?coursename. ?participant rdf:type Participants; c_id ?teachername; s_id ?studentmatric. ?teacher rdf:type Teachers; name ?teachername. ?student rdf:type Students; matric ?studentmatric; name ?studentname. } Key(Teacher, [name]) FKey(Courses, taught_by, Teacher, [name])

Semantic Query Optimization SELECT ?teachername ?coursename ?studentname WHERE { ?course rdf:type Courses; taught_by ?teachername; name ?coursename. ?participant rdf:type Participants; c_id ?teachername; s_id ?studentmatric. ?student rdf:type Students; matric ?studentmatric; name ?studentname. } Other optimizations possible:  Rewriting of filter expressions  Elimination from redundant rdf:type specifications  …

Future Work Study of other types of constraints and the interaction between constraints Development of a schematic approach to Semantic Query Optimization  Mapping to SQL/Datalog?  SPARQL-specific semantic optimizations? Efficient constraint checking algorithms

PART II SP 2 B – A SPARQL Performance Benchmark

PART II: SP 2 Bench Up-to-date no benchmark for SPARQL has been proposed  LUBM: focus on OWL and reasoning  Loose collection of benchmark queries for LUBM SP 2 B fills this gap  Settled in the DBLP scenario  Data generator for creating large arbitrarily large datasets + 16 benchmark queries Currently submitted for publication, will be made available online soon

The SP 2 Bench Data Generator Creates bibliography documents similar to DBLP Mirrors vital key characteristics found in original DBLP data  Structure of entities (Articles, Journals, Books, …)  Relations between authors  Quantity of entities (development over time)  Citation system Combines the benefits of both a real-world scenario and the possibility to generate arbitrarily large documents.

The DBLP RDF Schema sc

The SP 2 Bench Queries Operate on top of the characteristics that are mirrored by the data generator Designed to test…  … typical SPARQL operators and combinations  … SPARQL solution modifiers  … existing (but also obvious future) optimizations  … RDF data access patterns  … the impact of indices on data  … and many other characteristics such as result size, different graph patterns, etc.

Benchmark Queries SELECT ?yr WHERE { ?proc rdf:type bench:Journal. ?proc dc:title "Journal 1 (1940)"^^xsd:string. ?proc dcterms:issued ?yr. } Simple Constant result size (exactly 1 result) Might be answered very fast with index Q1

Benchmark Queries SELECT DISTINCT ?person ?name Q5 WHERE { ?article rdf:type bench:Article. ?article dc:creator ?person. ?inproc rdf:type bench:Inproceedings. ?inproc dc:creator ?person2. ?person foaf:name ?name. ?person2 foaf:name ?name2. FILTER(?name=?name2). } Q5a SELECT DISTINCT ?person ?name WHERE { ?article rdf:type bench:Article. ?article dc:creator ?person. ?inproc rdf:type bench:Inproceedings. ?inproc dc:creator ?person. ?person foaf:name ?name. } Q5b Equivalent in our scenario Tests implicit vs. explicit joins We found that Q5a is much more challenging for current engines

Benchmark Queries SELECT DISTINCT ?title Q7 WHERE { ?class rdfs:subClassOf foaf:Document. ?doc rdf:type ?class. ?doc dc:title ?title. ?bag2 ?member2 ?doc. ?doc2 dcterms:references ?bag2. OPTIONAL { ?class3 rdfs:subClassOf foaf:Document. ?doc3 rdf:type ?class3. ?doc3 dcterms:references ?bag3. ?bag3 ?member3 ?doc. OPTIONAL { ?class4 rdfs:subClassOf foaf:Document. ?doc4 rdf:type ?class4. ?doc4 dcterms:references ?bag4. ?bag4 ?member4 ?doc3. } FILTER (!bound(?doc4)). } FILTER (!bound(?doc3)). } Q7 Double Closed- World-Negation Returns all publications that are cited at least once, but only cited by cited publications

Benchmark Results We tested several SPARQL engines  ARQ  Sesame  Virtuoso  … Results demonstrate that …  … there are differences between engines  … there is still room for improvement in current implementation  … there is poor support for several SPARQL specifics

Thank you for your attention! C. Bizer.D2R MAP-A Database to RDF Mapping Language. In WWW (Posters), C.Bizer, R.Cyganiak, J. Garbers, and O. Maresch. D2RQ: Treading Non-RDF Relational Databases as Virtual RDF Graphs. User Manual and Language Specification. J. J. King. QUIST: A System for Semantic Query Optimization in Relational Databases. Distributed systems, Vol. II, pages , G. Lausen. Relational Databases in RDF. In Joint ODBIS & SWDB Workshop on Semantic Web, Ontologies, Databases, To appear. B. Motik, I. Horrocks, and U. Sattler. Bridging the Gap Between OWL and Relational Databases, In WWW, pages , J. Pérez, M. Arenas, and C. Gutierrez. Semantics and Complexity of SPARQL. In CoRR Technical Report cs.DB/ , Recourse Description Framework (RDF): Concepts and Abstract Syntax. W3C Recommendation, February 10, RDF Vocabulary Description Language 1.0: RDF Schema. W3C Recommendation, Febuary 10, RDF Semantics. W3C Recommendation, February 10, S.T. Shenoy and Z.M. Ozsoyoglu. A System for Semantic Query Optimization. In SIGMOD, pages , SPAQL Query Language for RDF. W3C Proposed Recommendation, November 12, G.E. Weddell. A Theory of Functional Dependencies for Object-Oriented Data Models. In DOOD, pages , 1989.

PART III Additional Resources

The SPARQL Query Language Operator UNION SELECT ?name ?faculty WHERE { { ?teacher rdf:type Teachers. ?teacher name ?name. ?teacher faculty ?faculty. FILTER (?name=„Joe“). } UNION { ?teacher rdf:type Teachers. ?teacher name ?name. ?teacher faculty ?faculty. FILTER (?name=„Fred“). } ?name?faculty Joe“CS“ Fred“CS“ name Teachers t1 t2 JoeFred“CS“ name faculty