Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Washington Database Group Reverse Data Management … and the case for Reverse What-If queries 1 Alexandra Meliou, Wolfgang Gatterbauer, Dan.

Similar presentations


Presentation on theme: "University of Washington Database Group Reverse Data Management … and the case for Reverse What-If queries 1 Alexandra Meliou, Wolfgang Gatterbauer, Dan."— Presentation transcript:

1 University of Washington Database Group Reverse Data Management … and the case for Reverse What-If queries 1 Alexandra Meliou, Wolfgang Gatterbauer, Dan Suciu http://db.cs.washington.edu/causality/

2 Forward and Backward Paradigm http://db.cs.washington.edu/causality/2 e.g: query processing, data integration, data mining, clustering, indexing Forward transformations Source data Target data

3 Source data Target data Forward and Backward Paradigm http://db.cs.washington.edu/causality/3 e.g: query processing, data integration, data mining, clustering, indexing Forward transformations e.g: data cleaning, provenance, causality, data generation, view updates Backward transformations Reverse Data Management (RDM)

4 The Problem Space of RDM http://db.cs.washington.edu/causality/4 explicit specification implicit specification Target Data Specific data instance, or diffs between versions e.g. before and after a view update Described indirectly, through constraints and statistics e.g. declarative data generation

5 The Problem Space of RDM http://db.cs.washington.edu/causality/5 explicit specification implicit specification Target Data Source Data Source data needs to be modified in order to achieve the desired effect in the output e.g. view updates No source data is provided as a reference, but needs to be computed from scratch e.g. inverse schema mappings no sourcereference source

6 The Problem Space of RDM http://db.cs.washington.edu/causality/6 explicit specification implicit specification Target Data Source Data View updates modify the source data, to achieve the desired effect, while minimizing side-effects no sourcereference source

7 The Problem Space of RDM http://db.cs.washington.edu/causality/7 explicit specification implicit specification Target Data Source Data View updates Provenance, Causality no sourcereference source trace the source tuples that correspond to the target tuples of interest

8 The Problem Space of RDM http://db.cs.washington.edu/causality/8 explicit specification implicit specification Target Data Source Data View updates Provenance, Causality Constraint-based repair no sourcereference source repair a data instance in order to satisfy a constraint

9 The Problem Space of RDM http://db.cs.washington.edu/causality/9 explicit specification implicit specification Target Data Source Data Inversion mappings View updates Provenance, Causality Data Generation Constraint-based repair no sourcereference source

10 Introducing Reverse What-If Queries http://db.cs.washington.edu/causality/10 explicit specification implicit specification Target Data Source Data Inversion mappings View updates Provenance, Causality Data Generation Constraint-based repair no sourcereference source Reverse What-If or How-To queries

11 Hypothetical (What-If) Queries  Example from [Balmin et al. VLDB 2000]  “An analyst of a brokerage company wants to know what would be the effect on the return of customers’ portfolios if during the last 3 years they had suggested Intel stocks instead of Motorola” http://db.cs.washington.edu/causality/11 How would the target data change, given a change in the source? Change something in the source (hypothesis) Observe the effect in the target forward

12 Reverse What-If, or How-To queries  Modified example:  “An analyst wants to figure out how to achieve a 10% return in customer portfolios, with the least number of trades” http://db.cs.washington.edu/causality/12 What is the best hypothetical scenario that achieves the desired outcome? Find changes to the source that achieve the desired effect Declare a desired effect in the target reverse

13 Example  Company reorganization: A company going through financial strain wants to reduce operational costs by 10%, through:  lay-offs, salary decreases, or department and project merging,  within certain constraints specified by the company’s requirements:  any salary decreases should be uniform across employees of the same department,  every project should have at least a certain number of employee hours devoted to it,  the solution should be achieved with the minimum number of employee reassignments http://db.cs.washington.edu/causality/13 (variables) (constraints) (optimization objective) (constraints)

14 Declarative Problem Specification http://db.cs.washington.edu/causality/14 Problem constraints Optimization criterion Problem statement query CREATE CONSTRAINT Constr1 AS NOT EXISTS (SELECTok, sum(quant’) AS c FROMLineItem_N GROUP BYok HAVINGc > 100) CREATE OBJECTIVE Obj1 AS SELECT sum(*) FROM (SELECT quant – quant’ FROM LineItem as L1, LineItem_N as L2 WHERE L1.ok = L2.ok, AND L1.pk = L2.pk AND L1.sk = L2.sk) CREATE REPLACEMENT LineItem_N AS(SELECTok, pk, sk, VAR(quant) AS quant’ FROMLineItem) HOW TO minimize(Obj1) SUBJECT TOConstr1 Variable Definitions

15 How-To Engine How-To query How-To parser DB How-To evaluation variables constraints objectives System Architecture http://db.cs.washington.edu/causality/15 How-To answer

16 How-To Engine How-To query How-To parser DB How-To evaluation variables constraints objectives System Architecture http://db.cs.washington.edu/causality/16 How-To answer User Input: Support variable, constraint and objective specifications Maintain declarativity

17 How-To Engine How-To query How-To parser DB How-To evaluation variables constraints objectives System Architecture http://db.cs.washington.edu/causality/17 How-To answer Evaluation requirements: Efficiency!

18 Evaluation http://db.cs.washington.edu/causality/18 User Input LP/IP transformation LP/IP Solver Map LP/IP solution to data DB How-To answer How-To Evaluation LP reduction 100

19 Conclusions  Reverse Data Management  Encompasses many important database problems  Harder in general: the inverse of a function is not always a function  How-To queries (reverse what-if)  Implement optimization problems within a DBMS  Plenty of challenges:  Declarative input specification  Efficient evaluation  Optimization (combination of Integer Prog. and DB techniques)  Under-specified and over-specified problem handling  Solution “stability” and “sensitivity” http://db.cs.washington.edu/causality/19


Download ppt "University of Washington Database Group Reverse Data Management … and the case for Reverse What-If queries 1 Alexandra Meliou, Wolfgang Gatterbauer, Dan."

Similar presentations


Ads by Google