Presentation is loading. Please wait.

Presentation is loading. Please wait.

WP3: Data Provenance and Access Control Irini Fundulaki, FORTH December 11-12, 2012, Luxembourg.

Similar presentations


Presentation on theme: "WP3: Data Provenance and Access Control Irini Fundulaki, FORTH December 11-12, 2012, Luxembourg."— Presentation transcript:

1 WP3: Data Provenance and Access Control Irini Fundulaki, FORTH December 11-12, 2012, Luxembourg

2 Slide 2 of 25 Extended example scenario relational DBMS stream DB RDF (stream) DB CSVtwitter C-SPARQL/SPARQL-STR/HTTP registry Quality control Provenance and access control ontologies SSN GADM-RDF, Geospecies

3 Slide 3 of 25 First Year Review Comments Collaboration ◦Provenance and Data Quality (with UPM, FUB for WP2) ◦Collaboration with KIT for a privacy aware access control framework Real versus syntheric data ◦PlanetData datasets (GADM-RDF 1, GeoSpecies 2 enhanced with links from DBPedia using LDSpider), GO 3 and CIDOC 4 ontologies Scalability ◦ Real datasets with up to 11M triples 1 GADM-RDF http://gadm.geovocab.org/http://gadm.geovocab.org/ 2 GeoSpecies: http://lod.geospecies.org/http://lod.geospecies.org/ 3 Gene Ontology: http://www.geneontology.org/http://www.geneontology.org/ 4 CIDOC-CRM: http://www.cidoc-crm.org/http://www.cidoc-crm.org/

4 Slide 4 of 25 Access Control Access Control: ◦Refers to the ability to allow or deny the use of a particular resource by a particular entity ◦Crucial for sensitive content since it ensures the selective exposure of information to different classes of users Access Control Enforcement techniques for RDF data in the presence of RDFS inference: ◦Materialization (forward chaining) ◦Query Rewriting (backward chaining) ◦Static Analysis

5 Slide 5 of 25 Access Control Models Standard access control models associate a concrete label to a triple to indicate whether the triple can be accessed or not Built in rule: An implied RDF triple can be accessed if and only if all its implying triples can be accessed s p o &atypeStudent label yes Student sc Personyes &atypePersonyes spolabel

6 Slide 6 of 25 Access Control Models In the case of any kind of update, the implied triples & their labels must be re-computed An implied RDF triple can be accessed if and only if all its implying triples can be accessed s p o &atypeStudent label yes Student sc Personyes no &atypePersonyes spolabel no the overhead can be substantial in large datasets when updates occur frequently

7 Slide 7 of 25 Access Control Annotations Annotation models are easy to handle but are not amenable to changes since there is no knowledge of the affected triples Any change leads to the re-computation of inferred triples and their labels ◦if the access label of one triple changes ◦if a triple is deleted, modified or added ◦if the semantics according to which the labels of inferred triples are computed change ◦if the policy changes (a liberal policy becomes conservative)

8 Slide 8 of 25 Our Approach Fine-grained Abstract Access Control Model comprised of abstract tokens and operators ◦focus at the RDF Triple level ◦focus on permissions for read-only queries ◦labeled triples are encoded as quadruples ◦supports RDFS inference and propagation of labels across the RDFS hierarchies ◦encodes how the access label of an implied quadruple is computed ◦concrete policies are used to assign specific values to the abstract tokens and operators Implementation of a fine-grained, repository independent, portable across platforms access control framework on top of the MonetDB column store

9 Slide 9 of 25 Annotating Triples with Access Tokens s p o &atypeStudent label l1l1 StudentscPerson l2l2 l 1, l 2 : abstract tokens Persontype l3l3 class q1: q2: q3: &atypePerson spolabel l1 ⊙ l2l1 ⊙ l2 ⊙ : entailment operator records the triples used in the implication Studenttypeclass l2 ⊙ l3l2 ⊙ l3 q4: q5: &a typePerson spolabel  l3 l3  : propagation operator q6: &afnameAlice s polabel   : default access token q7:

10 Slide 10 of 25 Annotating Triples with Access Tokens Authorization Rules A 1 : (construct {?x lname ?y} where {?x type Student}, l 1 ) A 2 :(construct {?x sc ?y}, l 2 ) A 3 :(construct {?x type Student}, l 3 ) Authorizations (Query, abstract token) A 4 :(construct {?x type class}, l 4 ) A 5 :(construct {Student sc ?y}, l 5 ) RDF quadruples q1:q1: q2:q2: q3:q3: q4:q4: q5:q5: StudentscPerson scAgent &atypeStudent &afnameAlice Agenttypeclass l2l2 l2l2 l3l3  l4l4 q6:q6:StudentscPerson l5l5 spol

11 Slide 11 of 25 Computing the labels of implied triples RDFS Inference: generate new knowledge RDF quadruples q1:q1: q2:q2: q3:q3: q5:q5: StudentscPerson scAgent &atypeStudent Agenttype class l2l2 l2l2 l3l3 l4l4 q6:q6:Student sc Person l5l5 spol q8:q8: q9:q9: q 10 : q 11 : q 12 : StudentscAgent StudentscAgent &atypePerson &atypeAgent &atypeAgent l2 ⊙ l2l2 ⊙ l2 l5 ⊙ l2l5 ⊙ l2 l3 ⊙ l2l3 ⊙ l2 ( l 3 ⊙ l 2 ) ⊙ l 2 ( l 5 ⊙ l 2 ) ⊙ l 2 spol (A 1, sc, A 2, l 1 )(A 2, sc, A 3, l 2 ) (A 1, sc, A 3, l 1 ⊙ l 2 ) (&r 1, type, A 1, l 1 )(A 1, sc, A 2, l 2 ) (&r 1, type, A 2, l 1 ⊙ l 2 ) - subClassOf - typeOf

12 Slide 12 of 25 Computing the labels of implied triples RDFS Inference: generate new knowledge RDF quadruples q1:q1: q2:q2: q3:q3: q5:q5: StudentscPerson scAgent &atypeStudent Agenttype class l2l2 l2l2 l3l3 l4l4 q6:q6:Student sc Person l5l5 spol q8:q8: q9:q9: q 10 : q 11 : q 12 : StudentscAgent StudentscAgent &atypePerson &atypeAgent &atypeAgent l2 ⊙ l2l2 ⊙ l2 l5 ⊙ l2l5 ⊙ l2 l3 ⊙ l2l3 ⊙ l2 ( l 3 ⊙ l 2 ) ⊙ l 2 ( l 5 ⊙ l 2 ) ⊙ l 2 spol (A 1, sc, A 2, l 1 )(A 2, sc, A 3, l 2 ) (A 1, sc, A 3, l 1 ⊙ l 2 ) (&r 1, type, A 1, l 1 )(A 1, sc, A 2, l 2 ) (&r 1, type, A 2, l 1 ⊙ l 2 ) - subClassOf - typeOf

13 Slide 13 of 25 Computing the propagated labels Propagating labels along the RDFS hierarchies: no new knowledge created (A 1, type, class, l 1 ) (A 2, type, A 1,  ( l 1 )) (A 2, type, A 1, l 2 ) RDF quadruples q1:q1: q2:q2: q3:q3: q5:q5: StudentscPerson scAgent &atypeStudent Agenttype class l2l2 l2l2 l3l3 l4l4 q6:q6:Student sc Person l5l5 sp ol q8:q8: q9:q9: q 10 : q 11 : q 12 : StudentscAgent StudentscAgent &atypePerson &atypeAgent &atypeAgent l2 ⊙ l2l2 ⊙ l2 l5 ⊙ l2l5 ⊙ l2 l3 ⊙ l2l3 ⊙ l2 ( l 3 ⊙ l 2 ) ⊙ l 2 ( l 5 ⊙ l 2 ) ⊙ l 2 q 13 :&atypeAgent  l4  l4 spol

14 Slide 14 of 25 From Abstract Models to Concrete Policies Concrete Policies implement Abstract Model: ◦Set of Concrete Tokens ◦Mapping from abstract to concrete tokens ◦Set of concrete operators that implement the abstract ones ◦Conflict resolution operator to resolve cases where the same triple is assigned different labels ◦Access Function to decide when a triple is accessible

15 Slide 15 of 25 Concrete Policy C1 Concrete Tokens: LP = { true, false} Concrete Operators: ◦Entailment Operator ⊙ ◦Propagation Operator  : Identity ◦Conflict Resolution Operator  Access function: triples with label true are accessible, otherwise, inaccessible l 1  l 2 if l 1 and l 2 are different from  l 1 if l 2 is   if l 1, l 2 are equal to  l 1 ⊙ l 2 = false if value false is in S true if false does not belong in S  otherwise  S =

16 Slide 16 of 25 Concrete Policy C1 q1:q1: q2:q2: q3:q3: q5:q5: StudentscPerson scAgent &atypeStudent Agenttypeclass l2l2 l2l2 l3l3 l4l4 q6:q6:StudentscPerson l5l5 spo l q8:q8: q9:q9: StudentscAgent StudentscAgent l2 ⊙ l2l2 ⊙ l2 l5 ⊙ l2l5 ⊙ l2 q 13 :&atypeAgent l4 l4 Mapping : l 2  true l 4  true l 5  false l 3  true q1:q1: q2:q2: q3:q3: q5:q5: StudentscPerson scAgent &atypeStudent Agenttypeclass true q6:q6: StudentscPerson false sp o l q8:q8: q9:q9: StudentscAgent StudentscAgent true false q 13 : &atypeAgent true

17 Slide 17 of 25 Evaluation Implementation: ◦Relational Schema:  Authorization(auth_id, query, access_token)  ExplicitTriples(qid, s, p, o, auth_id): stores explicit triples  InferopTriples(qid, s, p, o): stores implicit quadruples  PropOp(qid, s, p, o, l): stores the propagated quadruples  LabelStore(qid, qid_uses): stores for each implied and propagated quadruple, the explicit quadruples that are used in its implication; ◦Query Engine: CWI’s MonetDB Open Source Column Store  Stored Procedures are used for the computation of the access labels

18 Slide 18 of 25 Benchmark Datasets: ◦Synthetic Datasets: produced by PowerGen [1] ◦Real Datasets: CIDOC, GO, GeoSpecies and GADM Datasets Authorizations: ◦15 authorizations Concrete Policy: ◦Access Tokens: {low, medium, high}  low < medium < high ◦Mapping:  same triple is assigned different labels and the same label is assigned to different triples ◦ 5 authorizations - assign to 35% of triples low label ◦ 5 authorizations - assign to 55% of triples medium label ◦ 5 authorizations - assign to 45% of triples high label ◦Operators:  Entailment, Conflict Resolution: min(), Propagation: identity

19 Slide 19 of 25 Experiments Experiment 1: Annotation Time ◦the time required to compute the implied quadruples and the propagated labels Experiment 2: Evaluation Time ◦the time needed to compute for a concrete policy, the concrete access label of a percentage of the RDF triples

20 Slide 20 of 25 Real Datasets Characteristics Experiment 1: Annotation Time 35451

21 Slide 21 of 25 Real Datasets: Experiment 2

22 Slide 22 of 25 Synthetic Datasets: Experiment 1 Annotation time does not depend on the number of explicit (i.e, initial triples in the dataset) but on the number of produced implied quadruples

23 Slide 23 of 25 Synthetic Datasets: Experiment 2 Characteristics Observations ◦Dataset D2 has a larger complexity than D1 (due to the larger depth) leading to more complex abstract expressions

24 Slide 24 of 25 Pros & Cons of Abstract Access Control Models Pros: ◦Same application can experiment with different concrete policies over the same dataset  e.g., liberal vs conservative policies for different classes of users ◦Different applications can experiment with different concrete policies for the same data ◦In the case of updates there is no need re-compute the access labels of the inferred triples Cons: ◦overhead in the required storage space  expressions that describe how a label is computed can become quite complex depending on the structure of the dataset

25 Slide 25 of 25 WP3: Work Plan View 18 24 30366 12 0 Task 3.1 Provenance Management Task 3.2 Privacy, DRM and Access Control Task 3.3 Trust management D 3.4 Trust management and inference system FORTH 42 D 3.2 Provenance management and propagation through SPARQL query and update languages D 3.2 Provenance management and propagation through SPARQL query and update languages D 3.1 Access control specification language, reasoning and enforcement mechanisms FORTH EPFL D 3.3 Access control system and privacy-aware language D 3.3 Access control system and privacy-aware language

26 Slide 26 of 25 BackUp Slides

27 Slide 27 of 25 Abstract Access Control Model Access Control Model defined by a set of abstract tokens and abstract operators to model ◦computation of access labels of implicit RDF triples ◦propagation of access labels ◦missing access labels Access Control Authorizations associate triples in the RDF/S graph with abstract tokens: modeled as quadruples RDFS entailment rules for computing the access labels of implied quadruples Propagation rules to specify how access labels are propagated along the subclassOf and subpropertyOf RDFS relations.

28 Slide 28 of 25 Computing the labels of implied triples RDFS Inference: triple generating rules (A 1, sc, A 2, l 1 )(A 2, sc, A 3, l 2 ) (A 1, sc, A 3, l 1 ⊙ l 2 ) (&r 1, type, A 1, l 1 )(A 1, sc, A 2, l 2 ) (&r 1, type, A 2, l 1 ⊙ l 2 ) RDF quadruples q1:q1: q2:q2: q3:q3: q5:q5: StudentscPerson scAgent &atypeStudent Agenttype class l2l2 l2l2 l3l3 l4l4 q6:q6:Student sc Person l5l5 spol q8:q8: q9:q9: q 10 : q 11 : q 12 : StudentscAgent StudentscAgent &atypePerson &atypeAgent &atypeAgent l2 ⊙ l2l2 ⊙ l2 l5 ⊙ l2l5 ⊙ l2 l3 ⊙ l2l3 ⊙ l2 ( l 3 ⊙ l 2 ) ⊙ l 2 ( l 5 ⊙ l 2 ) ⊙ l 2 - subClassOf - typeOf spol

29 Slide 29 of 25 Computing the labels of implied triples RDFS Inference: triple generating rules RDF quadruples q1:q1: q2:q2: q3:q3: q5:q5: StudentscPerson scAgent &atypeStudent Agenttype class l2l2 l2l2 l3l3 l4l4 q6:q6:Student sc Person l5l5 spol q8:q8: q9:q9: q 10 : q 11 : q 12 : StudentscAgent StudentscAgent &atypePerson &atypeAgent &atypeAgent l2 ⊙ l2l2 ⊙ l2 l5 ⊙ l2l5 ⊙ l2 l3 ⊙ l2l3 ⊙ l2 ( l 3 ⊙ l 2 ) ⊙ l 2 spol ( l 5 ⊙ l 2 ) ⊙ l 2 (A 1, sc, A 2, l 1 )(A 2, sc, A 3, l 2 ) (A 1, sc, A 3, l 1 ⊙ l 2 ) (&r 1, type, A 1, l 1 )(A 1, sc, A 2, l 2 ) (&r 1, type, A 2, l 1 ⊙ l 2 ) - subClassOf - typeOf


Download ppt "WP3: Data Provenance and Access Control Irini Fundulaki, FORTH December 11-12, 2012, Luxembourg."

Similar presentations


Ads by Google