Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington.

Similar presentations


Presentation on theme: "A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington."— Presentation transcript:

1 A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington 13/19/2013ICDT 2013

2 A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington 23/19/2013ICDT 2013

3 Visual representation of a number of processes that interact to produce one or more outputs given some inputs Modeled as a directed acyclic graph In an execution of the workflow, data values appear on the edges 3/19/2013 Workflows Vertices = Modules/Processes Edges = Dataflow start Split Entries Align Sequences Functional Data Curate Annotations Format Construct Trees end 3 d1d1 d2d2 d3d3 ICDT 2013

4 Which processes were executed? 43/19/2013 Data Provenance in Workflows Track Provenance: Record and show all data values in all executions Helps validate the experiment Ensures repeatability and debugging But, many private/proprietary elements … Our focus: Module Privacy? Run 1 Run 2... d1d1... d2d2... d3d3... start Split Entries Align Sequences Functional Data Curate Annotations Format Construct Trees end How has this tree been generated? Provenanced1d1 d2d2 d3d3 ICDT 2013

5 Motivation: Module Privacy 3/19/20135 Revealing all data as provenance in an execution can reveal module behavior Goal: Partially hide provenance to protect the privacy of modules when they belong to a workflow Goal: Partially hide provenance to protect the privacy of modules when they belong to a workflow d1d1 d2d2 d3d3 ICDT 2013

6 Public/Private Workflows Private Modules (no a priori knowledge to the user) e.g. Modules for gene sequencing, drug synthesis, etc. Public Modules (full knowledge to the user) e.g. Modules for reformatting, sorting, display, etc. 3/19/ Private Reformatting Public ICDT 2013 Public Private

7 Module f takes input x, produces output y = f(x) 3/19/20137 Module f x1x1 x2x2 x3x3 x4x4 y1y1 y2y2 y3y3 f(x 1, x 2, x 3, x 4 ) = Given privacy requirement L, for all inputs x to a private module f, f(x) has ≥ L ‘equivalent’ candidate values w.r.t. visible provenance data (similar to L-diversity [MKGV’07]) Given privacy requirement L, for all inputs x to a private module f, f(x) has ≥ L ‘equivalent’ candidate values w.r.t. visible provenance data (similar to L-diversity [MKGV’07]) ICDT 2013 Definition: Module Privacy

8 ‘Equivalent Candidates’ and Provenance Views 3/19/20138 = x y x y z x y x y x y y z y z Output a provenance view (incomplete provenance): Projection on visible attributes Possible Worlds: Same projection and respect the functional dependency Standalone-private View: Each input maps to L=2 different outputs by possible worlds Workflow-private View: Possible worlds should respect all func. dep Possible worlds Func. dep. x  y Func. dep. y  z y y z Not a possible world x yz x  y y  z Workflow Module executions as relations with func. dep. Run 1 Run 2 Run 1 Run 2 ICDT 2013

9 3/19/20139 “Composability Theorem” if all modules are private (no public modules) Any combination of standalone-private-views gives workflow-private-views for all of them “Composability Theorem” if all modules are private (no public modules) Any combination of standalone-private-views gives workflow-private-views for all of them Previous Work: Module Privacy for Workflow Provenance [Davidson–Khanna-Milo-Panigrahi-R. : PODS’11] Hiding union of hidden attributes in standalone solutions x y z y z x yz y x ICDT 2013 No public modules

10 Why care about composability? Compose local standalone-private solutions arbitrarily to get a global workflow-private solution (which is hard) Local solutions are NP-hard too, but in the #attributes of a single module – smaller than all attributes in a workflow We can do preprocessing, or exploit module designers’ knowledge But composability fails with public modules common in workflows 103/19/2013ICDT 2013

11 = = Problem with Public Modules 3/19/ This work: Propagate hiding through public modules Public Private Composability theorem does not hold any more Our solution in [DKMPR ’11]: “Privatize” some public modules Our solution in [DKMPR ’11]: “Privatize” some public modules Does not work when module’s identity can be guessed from attribute names, connections etc ICDT 2013

12 This paper: A Propagation Model Find standalone-private solution for private modules (only outputs are hidden, hiding inputs may not work in public/private workflows) In a workflow, propagate hiding attributes through public successors Repeatedly propagate hiding Can we stop at a private successor? – Yes: For single-predecessor workflows – No: For general workflows 12 = = = = 3/19/2013ICDT 2013

13 Single-Predecessor Workflows (Intuitively) Every public module has at most one private predecessor Still can have complex structure Special cases: Chains/Trees Propagate hiding in “public closure” (reachable through undirected public path from a hidden output attribute) Next, how much to hide 133/19/2013 ICDT 2013

14 Upstream/Downstream Safety for Public Modules Visible attributes of public modules should not reveal any information Upstream/Downstream-safe (UD-safe): Equivalent inputs  Equivalent outputs Equivalent outputs  (all) Equivalent inputs Hiding everything is trivially UD-safe 143/19/2013 a 11 a2a2 a3a3 a4a a2a2 a1a1 a3a3 a4a4 UD-safeNot UD-safe InputsOutputs ICDT 2013

15 Composability Theorem for Single-Predecessor Workflows 153/19/2013 Theorem: Each private module is workflow-private if the hidden attributes satisfy … 1.The private module is standalone-private 2. Public modules in public-closure are UD-safe 3. No unnecessary hiding Two levels of composability 1.Inside public closure for a given private module 2.Among different private modules Single-pred wf, UD-safety are necessary ICDT 2013

16 Optimal Composition for Single-Predecessor workflows 163/19/2013 Theorem: Each private module is workflow-private if the hidden attributes satisfy … 1.The private module is standalone-private 2. Public modules in public-closure are UD-safe 3. No unnecessary hiding Find list of standalone-private solutions for private modules Find list of UD-safe solutions for public modules Optimally compose to find solution for a single private module Arbitrarily compose to find solution for all private modules Easy for single-pred wfs NP-hard for general DAG PTIME for trees/chains ICDT 2013

17 Proof Sketch of Composability Theorem /19/2013 Step 1: Assume only one composite module in public closure If individual modules are UD-safe, the composite module is also UD-safe (by induction) Analysis for a single-private module is sufficient: Public closures are disjoint ICDT 2013

18 Step 2: Standalone to Workflow Privacy Privacy  Many candidates for f(x) If y is a candidate of f(x) when f is standalone, y is still a candidate when f is in a workflow Show existence of possible worlds by redefining private modules Proof Sketch of Composability Theorem /19/2013 f f g g h h x zy ExpectedObserved Conflict No conflict Need to handle new conflicts at other inputs/outputs Cannot redefine public modules: UD-safety helps More complex structure in general ICDT 2013

19 About General Workflows Find standalone-private solution for private modules (only outputs are hidden, hiding inputs may not work with public modules) In a workflow, propagate hiding attributes through public successors Repeatedly propagate hiding Can we stop at a private successor? – Yes: For single-predecessor workflows – No: For general workflows – Solution: propagate through private successors as well 19 = = = = = = = = 3/19/2013ICDT 2013

20 Related Work Workflow privacy (mainly access control) – Chebotko et. al. ’08, Gil et. al. ’07, ’10 Secure Provenance – Tan et. al. ’06, Hasan et. al. ’07, Braun et. al. ’08, Ni et. al. ’09, Chong ’09, Cadenhead et. al. ’11, Cheney ’11 L-Diversity and its limitations – Machanavajjhala et. al. ’06, Ganta et. al. ’08, Kifer ’09, Fang et. al. ’08, Cormode et. al. ’11, Xiao et. al. ’10, Wong et. al. ’07 Privacy-preserving data mining – Surveys by Aggarwal-Yu ’08, Verykios et. al. ’04 Differential Privacy/Privacy in statistical databases – Survey by Dwork ’08 203/19/2013ICDT 2013

21 Conclusions Workflow-Privacy of modules by data hiding in public/private wfs Propagating hiding through public modules Composability Theorem and Optimization Problems Future Work: Extend to stronger notion of privacy – Differential Privacy? – Randomization may not work for Sc. Expts. – Can our possible world model be useful? Applicability in practice 213/19/2013ICDT 2013

22 Thank You Questions 223/19/2013ICDT 2013


Download ppt "A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington."

Similar presentations


Ads by Google