Presentation is loading. Please wait.

Presentation is loading. Please wait.

TAPP-09 23/02/2009Giorgos Flouris1 On Explicit Provenance Management in RDF/S Graphs Institute of Computer Science Foundation for Research and Technology.

Similar presentations


Presentation on theme: "TAPP-09 23/02/2009Giorgos Flouris1 On Explicit Provenance Management in RDF/S Graphs Institute of Computer Science Foundation for Research and Technology."— Presentation transcript:

1 TAPP-09 23/02/2009Giorgos Flouris1 On Explicit Provenance Management in RDF/S Graphs Institute of Computer Science Foundation for Research and Technology – Hellas Heraklion, Greece Panagiotis Pediaditis Giorgos Flouris Irini Fundulaki Vassilis Christophides {pped, fgeo, fundul, christop}@ics.forth.gr

2 TAPP-09 23/02/2009Giorgos Flouris2 Provenance Management in RDF/S  Provenance management problem  Mostly addressed in the database context  We are dealing with why provenance in RDF/S graphs — Why provenance: identifying the source data that had some influence on the existence of the target data  Three main characteristics (peculiarities of RDF/S)  Triple-based representation — Use quadruples to talk about triples’ provenance  Inference — Assign provenance information to implicit data  Coherence semantics (in updates) — Implicit data is a first-class citizen and should be retained during change, along with its provenance information

3 TAPP-09 23/02/2009Giorgos Flouris3 Characteristic #1 Triple-based Representation

4 TAPP-09 23/02/2009Giorgos Flouris4 RDF Graphs Paper10 PaperTAPP Paper instance rdf:type subclass rdfs:subClassOf Giorgos Author Person writes RDF graph = set of RDF triples Define classes [Paper rdf:type rdfs:Class] [PaperTAPP rdf:type rdfs:Class] [Person rdf:type rdfs:Class] [Author rdf:type rdfs:Class] Define properties [writes rdf:type rdf:Property [writes rdfs:domain Author] [writes rdfs:range Paper] Instantiate (and define) individuals [Paper10 rdf:type PaperTAPP] [Giorgos rdf:type Author] [Giorgos writes Paper10] Define hierarchies [PaperTAPP rdfs:subClassOf Paper] [Author rdfs:subClassOf Person] And other stuff…

5 TAPP-09 23/02/2009Giorgos Flouris5 Provenance in RDF Graphs Paper10 PaperTAPP Paper instance rdf:type subclass rdfs:subClassOf Giorgos Author Person writes Publications Graph (PUB) TAPP Graph (TAPP) PUB:[Paper rdf:type rdfs:Class] TAPP:[PaperTAPP rdf:type rdfs:Class] PUB:[Person rdf:type rdfs:Class] PUB:[Author rdf:type rdfs:Class] PUB:[writes rdf:type rdf:Property] PUB:[writes rdfs:domain Author] PUB:[writes rdfs:range Paper] TAPP:[Paper10 rdf:type PaperTAPP] TAPP:[Giorgos rdf:type Author] TAPP:[Giorgos writes Paper10] TAPP:[PaperTAPP rdfs:subClassOf Paper] PUB:[Author rdfs:subClassOf Person]

6 TAPP-09 23/02/2009Giorgos Flouris6 Named Graphs and Provenance  Create two named graphs and assign an ID (URI) to each  Publications graph (URI: PUB)  TAPP graph (URI: TAPP)  Each named graph corresponds to a different source  Need some method to associate named graphs with triples  Triples become quadruples  Fourth element is the URI of the named graph (origin) Paper10 PaperTAPP Paper instance rdf:type subclass rdfs:subClassOf Giorgos Author Person writes

7 TAPP-09 23/02/2009Giorgos Flouris7 Quadruples for Provenance Paper10 PaperTAPP Paper instance rdf:type subclass rdfs:subClassOf Giorgos Author Person writes [Paper rdf:type rdfs:Class PUB] [PaperTAPP rdf:type rdfs:Class TAPP] [Person rdf:type rdfs:Class PUB] [Author rdf:type rdfs:Class PUB] [writes rdf:type rdf:Property PUB] [writes rdfs:domain Author PUB] [writes rdfs:range Paper PUB] [Paper10 rdf:type PaperTAPP TAPP] [Giorgos rdf:type Author TAPP] [Giorgos writes Paper10 TAPP] [PaperTAPP rdfs:subClassOf Paper TAPP] [Author rdfs:subClassOf Person PUB] All quadruples of the form [s p o PUB] originate from named graph PUB (Publications graph) All quadruples of the form [s p o TAPP] originate from named graph TAPP (TAPP graph)

8 TAPP-09 23/02/2009Giorgos Flouris8 Properties of Named Graphs  The named graph URI can be used to refer to the named graph  Can be used for assignment of metadata [TAPP hasAuthor JamesCheney G]  Granularity of provenance  A triple is the smallest bit of information  The granularity of provenance achieved by named graphs is at the triple level  Flexible — A named graph can contain 0,1, or many triples — A triple can belong to 0,1, or many named graphs Paper10 PaperTAPP Paper instance rdf:type subclass rdfs:subClassOf Giorgos Author Person writes

9 TAPP-09 23/02/2009Giorgos Flouris9 Characteristic #2 Inference

10 TAPP-09 23/02/2009Giorgos Flouris10 RDF/S Graphs  RDF Schema: add-on to RDF  RDFS adds inference semantics  Transitivity of subclass/subproperty  Implicit instantiations  Example  [Giorgos rdf:type Author]  [Author rdfs:subClassOf Person]  Inference: [Giorgos rdf:type Person]  Inferred knowledge is implicit Paper10 PaperTAPP Paper instance rdf:type subclass rdfs:subClassOf Giorgos Author Person writes

11 TAPP-09 23/02/2009Giorgos Flouris11 Provenance and Inference  Quadruples:  [Giorgos rdf:type Author PUB]  [Author rdfs:subClassOf Person TAPP]  [Giorgos rdf:type Person ???]  Needs:  Shared ownership  A more sophisticated, compound structure  Keeping the connection with the components  Composition operator (PT=PUB●TAPP) — [Giorgos rdf:type Person PT] — Ok, but see characteristic #3 Paper10 PaperTAPP Paper instance rdf:type subclass rdfs:subClassOf Giorgos Author Person writes

12 TAPP-09 23/02/2009Giorgos Flouris12 Characteristic #3 Coherence Semantics (in Updates)

13 TAPP-09 23/02/2009Giorgos Flouris13 Foundational Semantics  Foundational viewpoint (pyramid):  Knowledge consists of the explicitly represented knowledge  Only explicit knowledge can be changed  Implicit knowledge is affected indirectly, through the changes in the explicit knowledge (so that the resulting “pyramid” is “stable”)  Explicit knowledge is more important than implicit knowledge Basic Knowledge Supported Knowledge Explicit Knowledge Implicit Knowledge

14 TAPP-09 23/02/2009Giorgos Flouris14 Coherence Semantics  Coherence viewpoint (raft):  No discrimination between explicit and implicit knowledge  Both explicit and implicit knowledge can be changed  Changes should be made coherently in order for the resulting knowledge to make sense (so that the “raft” is “stable”)  Explicit and implicit knowledge are of the same value { Knowledge (includes both implicit and explicit knowledge)

15 TAPP-09 23/02/2009Giorgos Flouris15 Deletes  Under coherence semantics  Inferred knowledge needs to be made explicit (when in danger of being lost)  Explicit assignment of shared origin to triples  Explicit shared origin assignment  Cannot use any composition operator  Must be a first-class construct (autonomous)  Retain the connection with its constituents  A need, but also a useful feature Paper10 PaperTAPP Paper instance rdf:type subclass rdfs:subClassOf Giorgos Author Person writes

16 TAPP-09 23/02/2009Giorgos Flouris16 RDF/S Graphsets  Graphsets are like named graphs  Have IDs (URIs)  Used in quadruples — Association of triples with graphsets [Giorgos rdf:type Person PT] — Can be referred to (metadata) [PT rdf:type Confidential G]  Encode origin or shared origin  [Giorgos rdf:type Person PT]  URI association (via skolem function) — PT is the URI of {PUB, TAPP} — PUB is the URI of {PUB}  A named graph is a graphset — PUB corresponds to {PUB} Paper10 PaperTAPP Paper instance rdf:type subclass rdfs:subClassOf Giorgos Author Person writes PT

17 TAPP-09 23/02/2009Giorgos Flouris17 Querying With RDF/S Graphsets  Standard queries (original RQL)  Give me the Persons [Giorgos]  Provenance queries (extended RQL)  Give me the Persons per {PUB} [ ]  Give me the Persons per {TAPP, PUB} [Giorgos]  Give me the sources per which Author is a subclass of Person [{PUB}]  Give me all the individual sources [{TAPP}, {PUB}] Paper10 PaperTAPP Paper instance rdf:type subclass rdfs:subClassOf Giorgos Author Person writes

18 TAPP-09 23/02/2009Giorgos Flouris18 Validity and Redundancy Elimination  Two invariants for RDF/S graphs  Valid (per some validity rules)  Redundant-free (space considerations)  The invariants allow optimized execution of queries  These invariants are imposed during change  Improve query speed, but make updates more difficult  Trade-off between having query overhead or update overhead

19 TAPP-09 23/02/2009Giorgos Flouris19 Updating With RDF/S Graphsets  Updates supported through an extended version of RUL  INSERT and DELETE  Only for data (class and property instances)  Implicit or explicit knowledge  Take into account and update graphset (provenance) information  Main considerations  Apply the change (INSERT or DELETE)  Respect invariants — Non-redundancy (INSERT) and validity (DELETE)  Make minimal changes (under coherence viewpoint) — No unnecessary loss of information  Take into account and preserve graphset (provenance) information — Applicable upon quadruples

20 TAPP-09 23/02/2009Giorgos Flouris20 Conclusion  Objective: assign provenance information to RDF/S graphs to capture why provenance  Triple-based representation — Turned triples into quadruples and used named graphs to record the origin  Inference (per RDFS) — Composed named graphs  Coherence semantics in updates (deletes) — Used graphsets for composed named graphs (cannot use an operator)  Proposed query and update languages for graphsets  Based on RQL, RUL  Can be used to query/update provenance information  Provided syntax and semantics, as well as an implementation — Demo at: http://139.91.183.30:3026/RULdemo/named_graph_demo/

21 TAPP-09 23/02/2009Giorgos Flouris21

22 TAPP-09 23/02/2009Giorgos Flouris22 EXTRA SLIDES

23 TAPP-09 23/02/2009Giorgos Flouris23 RDF/S Graphset Properties  Three types of triples in a graphset:  Explicitly assigned triples  Implicitly assigned triples (from the constituent named graphs)  Implications of the above (per RDFS) Paper10 PaperTAPP Paper instance rdf:type subclass rdfs:subClassOf Giorgos Author Person writes PT

24 TAPP-09 23/02/2009Giorgos Flouris24 Inserts and Deletes: General Process  INSERT  Validity respected  Must verify non-redundancy  Process  If INSERT is redundant ignore it  Remove all redundant information (after insert)  DELETE  Must verify validity  Non-redundancy respected  Issues with inference and the coherence viewpoint  Process  If DELETE is void ignore it  Make explicit all originally redundant information that will be lost otherwise  Restore validity by removing property instances if necessary


Download ppt "TAPP-09 23/02/2009Giorgos Flouris1 On Explicit Provenance Management in RDF/S Graphs Institute of Computer Science Foundation for Research and Technology."

Similar presentations


Ads by Google