Presentation is loading. Please wait.

Presentation is loading. Please wait.

Light-weight Ontology Versioning with Multi-temporal RDF Schema

Similar presentations


Presentation on theme: "Light-weight Ontology Versioning with Multi-temporal RDF Schema"— Presentation transcript:

1 Light-weight Ontology Versioning with Multi-temporal RDF Schema
Fifth International Conference on Advances in Semantic Processing - SEMAPRO 2011 Light-weight Ontology Versioning with Multi-temporal RDF Schema Fabio Grandi Alma Mater Studiorum - Università degli Studi di Bologna

2 Introduction Some application fields require the maintenance of past versions of an ontology after changes For instance, in the legal domain: Ontologies evolve as a natural consequence of the dynamics involved in normative systems Agents must often deal with a past perspective (e.g. a Court judging today on some fact committed in the past) Moreover, several time dimensions are usually important for applications in such domains SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

3 Multi-temporal versioning
Time dimensions of interest in the legal domain: Validity time is the time a norm is in force in the real world Efficacy time is the time a norm can be applied to a concrete case; while such cases exist, the norm continues its efficacy though no longer in force Transaction time is the time a norm is stored in the computer system Publication time is the time a norm is published on the Official Journal SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

4 Temporal RDF Data Models
Temporal RDF data models have been recently proposed, the proposals remarkably include: [Gutierrez, Hurtado & Vaisman, 2007] [Pugliese, Udrea & Subrahmanian, 2008] [Tappolet & Bernstein, 2009] Interval timestamping of RDF triples is adopted A single time dimension (valid time) is usually considered Index structures (e.g. tGRIN and keyTree) have been proposed for efficient processing of temporal queries SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

5 A Multi-temporal RDF Database Model
N-dimensional time domain: T = T1 x T2 x … x TN Ti = [0,UC)i Multi-temporal RDF triple: ( s,p,o | T ) s is a subject p is a predicate o is an object T T is a timestamp Multi-temporal RDF database: RDF-TDB = { ( s,p,o | T ) | T  T } SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

6 Multi-temporal RDF Triples
A temporal triple ( s,p,o | T ) assigns a temporal pertinence to an RDF triple ( s,p,o ) The non-temporal triple ( s,p,o ) is the value (or the contents) of the temporal triple ( s,p,o | T ) The temporal pertinence T is a subset of the time domain T represented by a temporal element SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

7 Temporal Elements A temporal element [Gadia 98] is a disjoint union of temporal intervals Multi-temporal intervals are obtained as the Cartesian product of one interval for each temporal dimension T = U1≤j≤m Ij = U1≤j≤m [tjs, tje)1 x [tjs, tje)2 x … x [tjs, tje)N Ij ∩ Ik = Ø for all 1≤j<k≤m SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

8 Integrity Constraint No value-equivalent distinct triples exist:  ( s,p,o | T ), ( s,p,o | T  )  RDF-TDB: s=s  p=p  o=o  T=T  The constraint is made possible by the adoption of temporal element timestamping Temporal elements lead to space saving, whenever the temporal pertinence of a triple is not a convex interval SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

9 Memory Saving with Temporal Elements
For example, even with a monodimensional time domain, the two value-equivalent triples with interval time-stamping ( t2 < t3 ): ( s,p,o | [t1, t2) ) and ( s,p,o | [t3, t4) ) can be merged into a single triple with element time-stamping: ( s,p,o | [t1, t2) U [t3, t4) ) where the same space is required for the timestamps in both cases (i.e. the space needed by 4 time points) and the contents of the triple is stored twice in the former case and only once in the latter Different triple versions are stored only once with a complex timestamp instead of storing multiple copies (value-equivalent triples) with a simple timestamp SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

10 An Example The memory saving obtained with temporal elements grows with the dimensionality of the time domain! The memory saving is also emphasized by the triple size with respect to the timestamp size In very large RDF benchmark datasets, the average triple size ranges from 80140 bytes (DBpedia, UScensus, LUBM, BSBM) to more than 600 bytes (UniProtKB) The timestamp (date+time) data size in SQL is 68 bytes In the example which follows we assume a bitemporal domain (valid + transaction time) SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

11 Representation of the Evolution of a Triple
t0 t t2 UC (s, p, o1 ) With temporal intervals (5 needed) ( s, p, o1 | [t0,t1)x[t0,UC) ) ( s, p, o1 | [t1,UC)x[t0,t1) ) ( s, p, o2 | [t1,t2)x[t1,UC) ) ( s, p, o2 | [t2,UC)x[t1,t2) ) ( s, p, o3 | [t2,UC)x[t2,UC) ) (s, p, o2 ) (s, p, o3 ) t t t UC With temporal elements (3 triples needed) ( s, p, o1 | [t0,t1)x[t0,UC) U [t1,UC)x[t0,t1) ) ( s, p, o2 | [t1,t2)x[t1,UC) U [t2,UC)x[t1,t2) ) ( s, p, o3 | [t2,UC)x[t2,UC) ) SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

12 Memory Saving Figures Percentage space saving with temporal element vs interval timestamping. Avg. number of versions per triple in colums, triple size in bytes in rows. We assume 8-byte timestamps. For instance, with 120-byte triples with 5 versions per triple on average, we have a 39,22% space saving. With 1 billion of triples, this means an RDF-TDB size of 721 GB with temporal elements 1.14 TB with temporal intervals 2 5 8 11 80 27,78 37,04 38,89 39,68 120 29,41 39,22 41,18 42,02 160 30,30 40,40 42,42 43,29 200 30,86 41,15 43,21 44,09 SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

13 Query Operators The only retrieval operator we consider in this work is a snaphot extraction operator, which can be used to extract an ontology version from a multi-version ontology represented as a temporal RDF database Given a time point t = (t1, t2,…, tN)  T we define the RDF database snapshot valid at t as RDF-TDB(t) = { ( s,p,o ) | ( s,p,o | T )  RDF-TDB  t  T } The result is a (non-temporal) RDF graph, which can be used to represent the ontology version valid at t SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

14 Modification Operators – Insertion
Assuming an (N-1)-dimensional temporal element tv (for any modification, transaction time [now, UC) is implied), the insertion operation INSERT DATA { s,p,o } VALID tv can be defined via its effects on the database state as follows (using a triple calculus) RDF-TDB  = RDF-TDB U { ( s,p,o | T ) |  ( s,p,o | T )  RDF-TDB  T = coalesce( T U tv x [now, UC) ) } U { ( s,p,o | tv x [now, UC) ) | ¬ ( s,p,o | T )  RDF-TDB } SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

15 Maintenance of temporal elements
In order to ensure the results are still temporal elements, union and difference operations must be carefully defined In particular, if Ti (i=1,2) are temporal elements defined as Ti = U1≤j≤mi Iij where Iij are multidimensional intervals then the difference can be computed as follows T1 \ T2 = U1≤j≤m1 I1j \ T2 and is ensured to be a temporal element if I1j \ T2 is a temporal element for each j Given the difference, the union can be computed as follows T1 U T2 = T1 U (T2 \ T1) SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

16 Modification Operators - Deletion
Assuming an (N-1)-dimensional temporal element tv and a selection predicate pred(s,p,o), the deletion operation DELETE { s,p,o } VALID tv WHERE pred(s,p,o) can be defined via its effects on the database state as follows RDF-TDB  = RDF-TDB \ { ( s,p,o | T ) |  ( s,p,o | T )  RDF-TDB  pred(s,p,o)  T ∩ tv x [now, UC) ≠ Ø } U { ( s,p,o | T ) |  ( s,p,o | T )  RDF-TDB  pred(s,p,o)  T ∩ tv x [now, UC) ≠ Ø  T  = coalesce( T \ tv x [now, UC) ) } SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

17 Modification Operators - Update
Assuming an (N-1)-dimensional temporal element tv, the update operation UPDATE { s,p,o } SET { s’,p’,o’ } VALID tv WHERE pred(s,p,o) is not primitive, as it can be defined as a delete operation followed by an insert operation as follows DELETE { s,p,o } VALID tv WHERE pred(s,p,o); INSERT DATA { s’,p’,o’ } VALID tv SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

18 Derivation of a new Ontology Version (1)
We assume the new version is obtained by applying changes to an existing ontology version. The parameters needed are: OS_Validity : the valid time point used to select the ontology versions used as base for the derivation The sequence of schema changes to be applied to the selected version in order to produce the new ontology version OC_Validity: the valid time interval used to assign the validity to the new version (possibly in the past or future) SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

19 Derivation of a new Ontology Version (2)
t t t valid time OS_Validity SC_Validity = [ t4, UC ] schema changes t t t t valid time SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

20 Transaction On … BEGIN TRANSACTION ;
CREATE GRAPH <workVersion> ; INSERT INTO <workVersion> { ?s, ?p, ?o } WHERE { TGRAPH <tOntology> { ?s, ?p, ?o | ?t } . FILTER ( VALID(?t) CONTAINS OS_Validity && TRANSACTION(?t) CONTAINS current-date() )} ; => a sequence of ontology changes acting on the (non–temporal) workVersion graph goes here DELETE FROM <tOntology> { ?s, ?p, ?o } VALID OC_Validity ; INSERT INTO <tOntology> { ?s, ?p, ?o } VALID OC_Validity WHERE { GRAPH <workVersion> { ?s, ?p, ?o } } ; DROP GRAPH <workVersion> ; COMMIT TRANSACTION SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

21 Operators for Ontology Management
On the basis of the primitives introduced so far, also high-level macro operators for the management of a multi-version RDF ontology can be defined CREATE_CLASS(Name,Validity) RENAME_CLASS(Class,NewName,Validity) DROP_CLASS(Class,Validity) ADD_SUBCLASS(SubClass,Class,Validity) DEL_SUBCLASS(SubClass,Class,Validity) CREATE_PROPERTY(Name,Range,Validity) RENAME_PROPERTY(Property,NewName,Validity) CHANGE_PROPERTY_RANGE(Property,NewRange,Validity) DROP_PROPERTY(Property,Validity) ADD_PROPERTY(Class,Property,Validity) DEL_PROPERTY(Class,Property,Validity) ADD_SUBPROPERTY(SubProperty,Property,Validity) DEL_SUBPROPERTY(SubProperty,Property,Validity) ………… SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

22 Sample Operator Definitions
For example the definitions of some of the property management operators is the following ADD_PROPERTY(Class,Property,Range,Validity) INSERT DATA { Property rdfs:domain Class ; rdfs:range Range . } VALID Validity CHANGE_PROPERTY_RANGE(Property,NewRange,Validity) UPDATE { Property rdfs:range ?range } SET { Property rdfs:range NewRange } VALID Validity DEL_PROPERTY(Class,Property,Validity) DELETE { Property rdfs:domain Class ; rdfs:range ?range . } VALID Validity SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

23 Conclusions We presented a temporal RDF database model whose distinctive features with respect to previously proposed models are It is defined on a multi-dimensional time domain It employs triple timestamping with temporal elements The adoption of temporal elements in the multi-temporal setting best preserves the scalability property enjoyed by triple storage technologies as it minimizes the database growth (the absence of value-equivalent triples is an integrity constraint) The data model has been equipped with manipulation operators for the extraction of a temporal snapshot and for the maintenance of the database; moreover, also high-level operators can be defined to be used to manage a multi-version RDF ontology SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

24 Future Work Some design choices were motivated by application requirements of an ontology-based personalization service in the legal (or medical) domain. We plan to explore the applicability of the approach also in application fields with more generic requirements We also plan to consider extensions of the proposed RDF database model, including the development of a complete multi-temporal SPARQL-like query language and the adoption of suitable multi-temporal index structures SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema


Download ppt "Light-weight Ontology Versioning with Multi-temporal RDF Schema"

Similar presentations


Ads by Google