Presentation is loading. Please wait.

Presentation is loading. Please wait.

DOE Scientific Data Management Center – Scientific Process Automation A Model for Sharing of Confidential Provenance Information in a Query Based System.

Similar presentations


Presentation on theme: "DOE Scientific Data Management Center – Scientific Process Automation A Model for Sharing of Confidential Provenance Information in a Query Based System."— Presentation transcript:

1 DOE Scientific Data Management Center – Scientific Process Automation A Model for Sharing of Confidential Provenance Information in a Query Based System Meiyappan Nagappan Mladen A. Vouk North Carolina State University IPAW 2008 June 17 th, 2008 IPAW

2 DOE Scientific Data Management Center – Scientific Process Automation Agenda  Problem Motivation  A scenario: Sharing Provenance  Research Objective  Implementation Model  Discussions  Conclusion  Future Work IPAW 20082

3 DOE Scientific Data Management Center – Scientific Process Automation Problem Motivation  Provenance is increasingly being used as part of analyses to speed-up the process, extend its scope beyond raw data, and enable handling of very large data sets.  Attendant problem: Sharing of provenance information Keeping this information appropriately but selectively confidential/protected  Confidentiality: “Ensuring that information is accessible only to those authorized to have access” – ISO/IEC IPAW 20083

4 DOE Scientific Data Management Center – Scientific Process Automation  Unauthorized access of provenance could be used to  Reverse engineer a process  Compromise the privacy of the user  Etc.  On the other hand, lack of sharing for the sake of confidentiality could hinder scientific discovery  Frequent current solution: export and mail the data that is to be shared  Duplication of data – large meta-data sets and growing  A typical simulation may generate ~ 1GB of meta data  Cannot revoke access Problem Motivation IPAW 20084

5 DOE Scientific Data Management Center – Scientific Process Automation Scenario: Sharing Provenance A B C R1R1 R2R2 S11S11 S12S12 S21S21 R3R3 S31S31 S32S32 S33S33 IPAW 20085

6 DOE Scientific Data Management Center – Scientific Process Automation Research Goal The goal of current work is to develop a model, in the context of provenance for scientific simulations that  Enables easy sharing of provenance data  Allows for dynamic changes in the confidentiality levels to serve multiple and different users  Does not compromise the confidentiality of the provenance data (including privacy) IPAW 20086

7 DOE Scientific Data Management Center – Scientific Process Automation Implementation Model - Architecture Super Computer running Simulations Laptop running Kepler Provenance Store Web Interface to Query Provenance APIAPI APIAPI QueryRecord Authorization Service MGMT. API IPAW 20087

8 DOE Scientific Data Management Center – Scientific Process Automation Sub Goals  Sub Goal 1: Person who generates simulation data – owner of original provenance data  Sub Goal 2 : Users cannot edit/delete Administrator can but must leave audit trail  Sub Goal 3: Owner can annotate their data  Sub Goal 4: Owner can choose collaborators  Sub Goal 5: Auditors have full read only access  Goal is to build a model that enables sharing provenance in an environment where the confidentiality level changes dynamically  We attempt to achieve the Goal through the following 5 objectives (sub-goals) IPAW 20088

9 DOE Scientific Data Management Center – Scientific Process Automation  What? Person who generates simulation data is owner of original provenance data  Why? Each dataset is clearly traced to one owner  What is the risk? Dispute on who has the authority to share the data in the first place  Implementation? 3 Tiered: Client – Application Logic – Database Approach Sub Goal 1 IPAW 20089

10 DOE Scientific Data Management Center – Scientific Process Automation  What? Editing and Audit Trail No edits/deletes by owner, collaborator, other users Administrator can edit, but must leave audit trail  Why? Consistency of data (particularly shared data) Auditing  Risk? Each time the collaborator may get different results  How? Restrict privileges at DB level Log all super user actions Sub Goal 2 Provenance Store MGMT. API IPAW

11 DOE Scientific Data Management Center – Scientific Process Automation Sub Goal 3  What? Data Annotation  Why? User specified meta data Collaborator may have different interpretation  Risk? Loss of valuable meta data about provenance Cannot flag inaccurate data – therefore need delete privileges  How? Annotation field in all tables of schema. Through WI, annotate Provenance Data and Saved Queries Provenance Store WI APIAPI Query IPAW

12 DOE Scientific Data Management Center – Scientific Process Automation Sub Goal 4  What? Data Sharing with dynamically changing confidentiality levels  Why? To share data on “What You See Is What You Want To Share” basis Each time a different subset of the data  Risk? Share entire data set or nothing Disk space wasted for saving a separate copy of subset  How? Query Sharing IPAW

13 DOE Scientific Data Management Center – Scientific Process Automation UserAuthorization API DB Username Password Authenticate Request Data Execute Query Return Data Save data for Collaborator Save the Query View Queries Saved for me by other Collaborators View Data Saved in Query for me by other Collaborators Query Table Query ID Saved by Saved for Query Timestamp Allow Cascading Revoke Active Sub Goal 4(contd.) Annot Table Query ID User ID Annotation Viewable Annotate the Query IPAW

14 DOE Scientific Data Management Center – Scientific Process Automation Why Query Sharing  Dynamically decide what to share  Size of the set of information to be shared is large  Subset of information rather than individual records Sub Goal 4(contd.) IPAW

15 DOE Scientific Data Management Center – Scientific Process Automation  What? Data Audit and Verification  Why? Prevent tampering by malicious users Maintain Accuracy  Risk? Collaborators may try to break system Administrators may misuse super user privileges  How? Authorized and authenticated auditors Full Read only access to – Original data, Provenance data, Annotations Edit trails and logs of super user actions Sub Goal 5 IPAW

16 DOE Scientific Data Management Center – Scientific Process Automation Issues  The model is Query Centric  Automatic run time collection of provenance data required.  Restricted to provenance data from scientific workflow systems.  Collaborator can annotate shared subset only as a whole.  Does not address issues in long term storage and scalability IPAW

17 DOE Scientific Data Management Center – Scientific Process Automation Conclusion  With increase in emphasis on provenance data collection in scientific workflows, the issue of its confidentiality becomes more important  Not much research done in this area of provenance  This model addresses the confidentiality in a collaborative environment.  Tradeoff – Disk Space:Time :: Query Sharing:Data Sharing IPAW

18 DOE Scientific Data Management Center – Scientific Process Automation  Validating our model against other solutions using different threat scenarios  Responsibility of sharing data is with user Privacy of user is at stake Tools required to foresee inferences from provenance data  Large data sets: Provenance data and shared queries grow steadily in size Accessing them will be difficult Tools required to improve the HCI aspect Future Work IPAW

19 DOE Scientific Data Management Center – Scientific Process Automation Questions? IPAW

20 DOE Scientific Data Management Center – Scientific Process Automation Related Work: References [1] Hasan, R., Sion, R. and Winslett, M.: Introducing secure provenance: problems and challenges Proceedings of the 2007 ACM workshop on Storage security and survivability, ACM, Alexandria, Virginia, USA, (2007). pp [2] Griffiths, P.P. and Wade, B.W.: An authorization mechanism for a relational database system. ACM Transactions on Database Systems,(Sep 1976)., 1 (3) [3] Sandhu, R. and Samarati, P : Authentication, access control, and audit. ACM Computer Survey 28, 1 (Mar. 1996), DOI = [4] Tan, V., Groth, P., Miles, S., Jiang, S., Munroe, S., Tsasakou, S. and Moreau, L.: Security Issues in a SOA-Based Provenance System. LNCS, Volume 4145 (Provenance and Annotation of Data). pp Springer Berlin / Heidelberg (2006) IPAW


Download ppt "DOE Scientific Data Management Center – Scientific Process Automation A Model for Sharing of Confidential Provenance Information in a Query Based System."

Similar presentations


Ads by Google