University of Washington Database Group Reverse Data Management … and the case for Reverse What-If queries 1 Alexandra Meliou, Wolfgang Gatterbauer, Dan.

Slides:



Advertisements
Similar presentations
Database Planning, Design, and Administration
Advertisements

University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,
CMU SCS : Multimedia Databases and Data Mining Lecture #19: SVD - part II (case studies) C. Faloutsos.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
The Power of How-to Queries joint work with Dan Suciu (University of Washington) Alexandra Meliou.
University of Washington Database Group Tiresias The Database Oracle for How-To Queries Alexandra Meliou § ✜ Dan Suciu ✜ § University of Massachusetts.
SQL Constraints and Triggers
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
An Extensible System for Merging Two Models Rachel Pottinger University of Washington Supervisors: Phil Bernstein and Alon Halevy.
1 Primitives for Workload Summarization and Implications for SQL Prasanna Ganesan* Stanford University Surajit Chaudhuri Vivek Narasayya Microsoft Research.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
Database Systems: A Practical Approach to Design, Implementation and Management International Computer Science S. Carolyn Begg, Thomas Connolly Lecture.
Security in Databases. 2 Srini & Nandita (CSE2500)DB Security Outline review of databases reliability & integrity protection of sensitive data protection.
Physical Database Monitoring and Tuning the Operational System.
Security in Databases. 2 Outline review of databases reliability & integrity protection of sensitive data protection against inference multi-level security.
Database Systems More SQL Database Design -- More SQL1.
Connecting Diverse Web Search Facilities Udi Manber, Peter Bigot Department of Computer Science University of Arizona Aida Gikouria - M471 University of.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 157 Database Systems I SQL Constraints and Triggers.
Towards Constraint-based Explanations for Answers and Non-Answers Boris Glavic Illinois Institute of Technology Sean Riddle Athenahealth Corporation Sven.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
Introduction to Data bases concepts
Objectives of the Lecture :
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 7-1 David M. Kroenke’s Chapter Seven: SQL for Database Construction and.
Chapter 6 System Engineering - Computer-based system - System engineering process - “Business process” engineering - Product engineering (Source: Pressman,
DATABASE MANAGEMENT SYSTEMS BASIC CONCEPTS 1. What is a database? A database is a collection of data which can be used: alone, or alone, or combined /
Chapter 2 CIS Sungchul Hong
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Implicit An Agent-Based Recommendation System for Web Search Presented by Shaun McQuaker Presentation based on paper Implicit:
General Database Statistics Using Maximum Entropy Raghav Kaushik 1, Christopher Ré 2, and Dan Suciu 3 1 Microsoft Research 2 University of Wisconsin--Madison.
Query Optimization (CB Chapter ) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented.
Querying Structured Text in an XML Database By Xuemei Luo.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
A Survey Based Seminar: Data Cleaning & Uncertain Data Management Speaker: Shawn Yang Supervisor: Dr. Reynold Cheng Prof. David Cheung
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
Online aggregation Joseph M. Hellerstein University of California, Berkley Peter J. Haas IBM Research Division Helen J. Wang University of California,
© ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
RecBench: Benchmarks for Evaluating Performance of Recommender System Architectures Justin Levandoski Michael D. Ekstrand Michael J. Ludwig Ahmed Eldawy.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Constraints, Triggers and Views COMSATS INSTITUTE OF INFORMATION TECHNOLOGY, VEHARI.
Goal Programming Linear program has multiple objectives, often conflicting in nature Target values or goals can be set for each objective identified Not.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Database Environment Chapter 2. Data Independence Sometimes the way data are physically organized depends on the requirements of the application. Result:
Methodology – Physical Database Design for Relational Databases.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
1 Notes on: Clusters Index and Cluster Creation in SQL Elisa Bertino CS Department and CERIAS Purdue University.
1 Quality Attributes of Requirements Documents Lecture # 25.
Lineage Tracing for General Data Warehouse Transformations Yingwei Cui and Jennifer Widom Computer Science Department, Stanford University Presentation.
Wrapper-Based Evolution of Legacy Information System Philippe Thiran et al Fcculties University Notre-Dame de la Paix.
Excel Wrap-Up. What tab is used to access macro functionality in Excel? The Developer tab Know what a recorded macro is as well as its advantages and.
AnHai Doan & Alon Halevy Department of Computer Science & Engineering University of Washington Efficiently Ordering Query Plans for Data Integration.
Written by Changhyun, SON Chapter 5. Introduction to Design Optimization - 1 PART II Design Optimization.
Session 1 Module 1: Introduction to Data Integrity
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
A Semantic Caching Method Based on Linear Constraints Yoshiharu Ishikawa and Hiroyuki Kitagawa University of Tsukuba
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
Default-all is dangerous! Wolfgang Gatterbauer Alexandra Meliou Dan Suciu Database group University of Washington.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 13: Query Processing
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 The Relational Model Chapter 3.
Views / Session 3/ 1 of 40 Session 3 Module 5: Implementing Views Module 6: Managing Views.
What Is Cluster Analysis?
Module 11: File Structure
Physical Database Design
Probabilistic Databases
Presented By: Darlene Banta
Presentation transcript:

University of Washington Database Group Reverse Data Management … and the case for Reverse What-If queries 1 Alexandra Meliou, Wolfgang Gatterbauer, Dan Suciu

Forward and Backward Paradigm e.g: query processing, data integration, data mining, clustering, indexing Forward transformations Source data Target data

Source data Target data Forward and Backward Paradigm e.g: query processing, data integration, data mining, clustering, indexing Forward transformations e.g: data cleaning, provenance, causality, data generation, view updates Backward transformations Reverse Data Management (RDM)

The Problem Space of RDM explicit specification implicit specification Target Data Specific data instance, or diffs between versions e.g. before and after a view update Described indirectly, through constraints and statistics e.g. declarative data generation

The Problem Space of RDM explicit specification implicit specification Target Data Source Data Source data needs to be modified in order to achieve the desired effect in the output e.g. view updates No source data is provided as a reference, but needs to be computed from scratch e.g. inverse schema mappings no sourcereference source

The Problem Space of RDM explicit specification implicit specification Target Data Source Data View updates modify the source data, to achieve the desired effect, while minimizing side-effects no sourcereference source

The Problem Space of RDM explicit specification implicit specification Target Data Source Data View updates Provenance, Causality no sourcereference source trace the source tuples that correspond to the target tuples of interest

The Problem Space of RDM explicit specification implicit specification Target Data Source Data View updates Provenance, Causality Constraint-based repair no sourcereference source repair a data instance in order to satisfy a constraint

The Problem Space of RDM explicit specification implicit specification Target Data Source Data Inversion mappings View updates Provenance, Causality Data Generation Constraint-based repair no sourcereference source

Introducing Reverse What-If Queries explicit specification implicit specification Target Data Source Data Inversion mappings View updates Provenance, Causality Data Generation Constraint-based repair no sourcereference source Reverse What-If or How-To queries

Hypothetical (What-If) Queries  Example from [Balmin et al. VLDB 2000]  “An analyst of a brokerage company wants to know what would be the effect on the return of customers’ portfolios if during the last 3 years they had suggested Intel stocks instead of Motorola” How would the target data change, given a change in the source? Change something in the source (hypothesis) Observe the effect in the target forward

Reverse What-If, or How-To queries  Modified example:  “An analyst wants to figure out how to achieve a 10% return in customer portfolios, with the least number of trades” What is the best hypothetical scenario that achieves the desired outcome? Find changes to the source that achieve the desired effect Declare a desired effect in the target reverse

Example  Company reorganization: A company going through financial strain wants to reduce operational costs by 10%, through:  lay-offs, salary decreases, or department and project merging,  within certain constraints specified by the company’s requirements:  any salary decreases should be uniform across employees of the same department,  every project should have at least a certain number of employee hours devoted to it,  the solution should be achieved with the minimum number of employee reassignments (variables) (constraints) (optimization objective) (constraints)

Declarative Problem Specification Problem constraints Optimization criterion Problem statement query CREATE CONSTRAINT Constr1 AS NOT EXISTS (SELECTok, sum(quant’) AS c FROMLineItem_N GROUP BYok HAVINGc > 100) CREATE OBJECTIVE Obj1 AS SELECT sum(*) FROM (SELECT quant – quant’ FROM LineItem as L1, LineItem_N as L2 WHERE L1.ok = L2.ok, AND L1.pk = L2.pk AND L1.sk = L2.sk) CREATE REPLACEMENT LineItem_N AS(SELECTok, pk, sk, VAR(quant) AS quant’ FROMLineItem) HOW TO minimize(Obj1) SUBJECT TOConstr1 Variable Definitions

How-To Engine How-To query How-To parser DB How-To evaluation variables constraints objectives System Architecture How-To answer

How-To Engine How-To query How-To parser DB How-To evaluation variables constraints objectives System Architecture How-To answer User Input: Support variable, constraint and objective specifications Maintain declarativity

How-To Engine How-To query How-To parser DB How-To evaluation variables constraints objectives System Architecture How-To answer Evaluation requirements: Efficiency!

Evaluation User Input LP/IP transformation LP/IP Solver Map LP/IP solution to data DB How-To answer How-To Evaluation LP reduction 100

Conclusions  Reverse Data Management  Encompasses many important database problems  Harder in general: the inverse of a function is not always a function  How-To queries (reverse what-if)  Implement optimization problems within a DBMS  Plenty of challenges:  Declarative input specification  Efficient evaluation  Optimization (combination of Integer Prog. and DB techniques)  Under-specified and over-specified problem handling  Solution “stability” and “sensitivity”