Presentation is loading. Please wait.

Presentation is loading. Please wait.

Privacy Framework for RDF Data Mining Master’s Thesis Project Proposal By: Yotam Aron.

Similar presentations


Presentation on theme: "Privacy Framework for RDF Data Mining Master’s Thesis Project Proposal By: Yotam Aron."— Presentation transcript:

1 Privacy Framework for RDF Data Mining Master’s Thesis Project Proposal By: Yotam Aron

2 Overview Motivation and Goal Background Proposed Solution and Design Example Conclusion

3 Motivation Data mining continues to become more widespread. ◦ Useful for research, public policy, etc. Want to maintain privacy of participants in the database. Little work has been done for privacy for semantic web data.

4 Previous Work Anonymization K-Anonimity 1 Differential Privacy systems: PINQ 2, AIRAVAT 3. Drawbacks: ◦ Do not apply to semantic web data. ◦ Do not support SPARQL.

5 Goal Develop a system to protect dataset participants’ personal data in SPARQL. Integrates well with existing SPARQL endpoints. Relatively easy for the user and the administrator to use.

6 Background Rule-based Privacy Policies in AIR Differential Privacy

7 Rule-based Privacy Policies in AIR 4 Rules define patterns in a SPARQL query. If pattern is matched, rule infers compliance or non-compliance of incoming SPARQL query.

8 AIR Example 5 air:if { :W s:TriplePattern :T. :T log:includes { :X type:F :V }. }; air:then [ air:description (“type:F was selected in " q:QUERY) ; air:assert { q:QUERY air:non-compliant-with q:Policy4. } ]. SELECT ?s WHERE {?s type:F ?p} AIR Policy (extract) Query AIR will show that the query is non- compliant with Policy4.

9 Differential Privacy Overview Minimize probability of privacy breach. Maximize statistical accuracy. Definition requires that given two similar datasets, a function query on those two datasets give similar results with high probability. Makes no assumptions on the underlying dataset.

10 Differential Privacy Definition: We say a randomized computation M provides ɛ- differential privacy if for any two data sets A and B, and any set of possible outputs S ⊆ Range(M), Pr[M(A) ∈ S] ≤ Pr[M(B) ∈ S] × exp( ɛ × |A ⊕ B|).

11 Differential Privacy in Practice

12 Limitations of Differential Privacy Only statistical data protected. High variance in data yields poor query results. Theory not always perfect in practice. ◦ Assume no collusion among users. ◦ Covert channel attacks. 6 ◦ What value of ɛ to choose ?

13 Example, No DP NameSalary Alice31,000 Bob47,000 Charlie20,000 David21,000 SELECT COUNT(Name) WHERE (Age < 25) 2

14 Example, No DP NameSalary Alice31,000 Bob47,000 Charlie20,000 SELECT COUNT(Name) WHERE (Age < 25) 1 Big difference in answers!!

15 Example, With DP NameSalary Alice31,000 Bob47,000 Charlie20,000 David21,000 SELECT COUNT(Name) WHERE (Age < 25) 2 + noise = ~2 (with high probability)

16 Example, With DP NameSalary Alice31,000 Bob47,000 Charlie20,000 SELECT COUNT(Name) WHERE (Age < 25) 1+ noise = ~2 (with high probability) With high probability, records are indistinguishable!

17 Practical Consequences of DP An individual’s inclusion in the dataset is not likely a privacy risk. The answers to the queries can still be useful.

18 Achieving Differential Privacy in RDF Current techniques for differential privacy are developed for relational databases. As a first approximation, reduce triple- store to a relational database. Improved mechanism as project progresses.

19 Example of RDF-RDBS Reduction :Person1 foaf:name “Alice”; foaf:member :DIG foaf:age “21” foaf:knows :Person2 :Person3. :Person2 foaf:name “Bob”; foaf:member :DIG; foaf:knows :Person3. :Person3 foaf:name “Charlie”; foaf:age “22”. IDFoaf:nameFoaf:memberFoaf:knowsFoaf:age Person1“Alice”DIG[Person2,Pers on3 “21” Person2“Bob”DIG[Person3]None Person3“Charlie”None “22”

20 Proposed Solution SPARQL Privacy Insurance Module (SPIM) Build layer between user and endpoint. Integrate both AIR and differential privacy. Integrate credential-checking system. Modify existing differential privacy framework for use with triple-stores.

21 Contributions Complete privacy protection for triplestores. Differential Privacy sensitivity for SPARQL 1.1 aggregate functions including count, sum, avg, sum, min, and max.

22 System Overview

23 SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description

24 SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description

25 SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description TAAC Will: Verify user has permission to access Send central module data about user

26 SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description SPIM: Controls order of privacy operations. Interfaces with the SPARQL endpoint.

27 SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description AIR: Reasoner that uses rule-based policies to check queries for privacy hazards. Extracts information for differential privacy.

28 SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description Policy Files: Contain the rules for AIR.

29 SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description Differential Privacy Module: Checks to see for query limits (based off ɛ use. Applies noise to statistical data.

30 SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description User Data: Contains user ɛ data.

31 SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description SPIM: Controls order of privacy operations. Interfaces with the SPARQL endpoint.

32 SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description Service Description: Contains information to be used for the addition of noise.

33 Miscellaneous: Interface to SPARQL Endpoint Transaction File Improved Differential Privacy Output Service Description Generator

34 Potential Extensions: Robustness against attacks Concurrency Optimization for large systems Customizable UI Accountability

35 Sample Scenario Triplestore datamining in biotechnological applications. Biofirm provides data about hospitals in the US. Alice is a PhD student at MIT. Alice would like to query Biofirm’s database for research purposes. She just got permissions yesterday and is logging in for the first time.

36 Preprocessing Biofirm installs SPIM, and runs the service description generation code. ◦ May need to create the correct interface. Makes sure the UI is accessible online.

37 Sample Compliant Query Alice would like to know the total number of visits that Boston hospitals received. SELECT (SUM(?s) as ?people) WHERE{ ?h a biofirm:Hospital. ?h biofirm:visits ?s. ?h biofirm:location geo:Boston. } Epsilon value: 1.0

38 SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description Alice enters query into the provided user interface.

39 SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description TAAC insures that biofirm has given Alice access to its triple-store.

40 SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description Query request arrives at SPIM central module.

41 SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description Policyrunner is called upon to check query for triple patterns that are in violation. No violations found. Since this is Alice’s first time, AIR extracts what type of permissions Alice has.

42 SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description SPIM creates a profile for Alice. Gives her an ɛ value (suppose it 2.0). Stores it in triple store.

43 SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description SPIM extracts which variables will yield statistical results and will have differential privacy applied.

44 SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description Differential Privacy module assures that query’s results will not exceed given epsilon value.

45 SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description This is Alice’s first time, and her epsilon value is 2.0 and the epsilon for this query is 1.0. Everything looks good.

46 SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description Query is sent to the endpoint. Results are received.

47 SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description Differential privacy module adds noise to appropriate fields, and updates epsilon values.

48 SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description SPIM is ready to return the results.

49 SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description Alice receives results.

50 Summary System will combine rule-based privacy with differential privacy. Develop differential privacy techniques for semantic web data. Make privacy module client and administrator friendly.

51 References K-Anonimity: http://spdp.dti.unimi.it/papers/k-Anonymity.pdfhttp://spdp.dti.unimi.it/papers/k-Anonymity.pdf PINQ: http://research.microsoft.com/pubs/80218/sigmod115-mcsherry.pdfhttp://research.microsoft.com/pubs/80218/sigmod115-mcsherry.pdf AIRAVAT: http://www.cs.utexas.edu/~shmat/shmat_nsdi10.pdfhttp://www.cs.utexas.edu/~shmat/shmat_nsdi10.pdf AIR: http://dig.csail.mit.edu/TAMI/2008/12/AIR/http://dig.csail.mit.edu/TAMI/2008/12/AIR/ AIR Policy Example: http://dig.csail.mit.edu/2009/IARPA-PIR/usecase1/generic- policies.n3http://dig.csail.mit.edu/2009/IARPA-PIR/usecase1/generic- policies.n3 Differential Privacy Under Fire: http://www.usenix.org/events/sec11/tech/full_papers/Haeberlen.pdf http://www.usenix.org/events/sec11/tech/full_papers/Haeberlen.pdf


Download ppt "Privacy Framework for RDF Data Mining Master’s Thesis Project Proposal By: Yotam Aron."

Similar presentations


Ads by Google