Incremental Maintenance for Non-Distributive Aggregate Functions work done at IBM Almaden Research Center Themis Palpanas (U of Toronto) Richard Sidle.

Slides:



Advertisements
Similar presentations
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Advertisements

A First Look at Materialized Query Table (MQT) in DB2 LUW
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
TURKISH STATISTICAL INSTITUTE 1 /34 SQL FUNDEMANTALS (Muscat, Oman)
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
1 DynaMat A Dynamic View Management System for Data Warehouses Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan.
CS4432: Database Systems II
Maintenance of Materialized Views: Problems, Techniques, and Applications Ashish Gupta IBM almaden Research Center Inderpal Singh Mumick AT&T Bell Laboratories.
Query Optimization May 31st, Today A few last transformations Size estimation Join ordering Summary of optimization.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
1 Query-by-Example (QBE). 2 v A “GUI” for expressing queries. –Based on the Domain Relational Calulus (DRC)! –Actually invented before GUIs. –Very convenient.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
IBM Software Group ® Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
Slides adapted from A. Silberschatz et al. Database System Concepts, 5th Ed. SQL - part 2 - Database Management Systems I Alex Coman, Winter 2006.
Database Systems More SQL Database Design -- More SQL1.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561.
Query Processing Presented by Aung S. Win.
Data Cube Computation Model dependencies among the aggregates: most detailed “view” can be computed from view (product,store,quarter) by summing-up all.
Database Programming Sections 5– GROUP BY, HAVING clauses, Rollup & Cube Operations, Grouping Set, Set Operations 11/2/10.
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25, Part B.
1 CUBE: A Relational Aggregate Operator Generalizing Group By By Ata İsmet Özçelik.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
CSE314 Database Systems More SQL: Complex Queries, Triggers, Views, and Schema Modification Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
Access Path Selection in a Relational Database Management System Selinger et al.
OnLine Analytical Processing (OLAP)
Query Optimization Arash Izadpanah. Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
Part Two: - The use of views. 1. Topics What is a View? Why Views are useful in Data Warehousing? Understand Materialised Views Understand View Maintenance.
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
BI Terminologies.
Data Warehousing. Databases support: Transaction Processing Systems –operational level decision –recording of transactions Decision Support Systems –tactical.
Triggers. Why Triggers ? Suppose a warehouse wishes to maintain a minimum inventory of each item. Number of items kept in items table Items(name, number,...)
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
View Materialization & Maintenance Strategies By Ashkan Bayati & Ali Reza Vazifehdoost.
BACS 287 Structured Query Language 1. BACS 287 Visual Basic Table Access Visual Basic provides 2 mechanisms to access data in tables: – Record-at-a-time.
IS 230Lecture 6Slide 1 Lecture 7 Advanced SQL Introduction to Database Systems IS 230 This is the instructor’s notes and student has to read the textbook.
E.Bertino, L.Matino Object-Oriented Database Systems 1 Chapter 5. Evolution Seoul National University Department of Computer Engineering OOPSLA Lab.
Advanced SQL: Triggers & Assertions
View 1. Lu Chaojun, SJTU 2 View Three-level vision of DB users Virtual DB views DB Designer Logical DB relations DBA DBA Physical DB stored info.
SqlExam1Review.ppt EXAM - 1. SQL stands for -- Structured Query Language Putting a manual database on a computer ensures? Data is more current Data is.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Written By: Presented By: Swarup Acharya,Amr Elkhatib Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy Join Synopses for Approximate Query Answering.
1 Introduction to Database Systems, CS420 SQL Views and Indexes.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
SQL: Interactive Queries (2) Prof. Weining Zhang Cs.utsa.edu.
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
Views / Session 3/ 1 of 40 Session 3 Module 5: Implementing Views Module 6: Managing Views.
9 Copyright © 2006, Oracle. All rights reserved. Summary Management.
Lecture 6- Query Optimization (continued)
More SQL: Complex Queries, Triggers, Views, and Schema Modification
More SQL: Complex Queries,
Query-by-Example (QBE)
Chapter 3 Introduction to SQL(3)
A paper on Join Synopses for Approximate Query Answering
Informix Red Brick Warehouse 5.1
DATA CUBE Advanced Databases 584.
Database Vs. Data Warehouse
Advanced SQL: Views & Triggers
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Data Warehousing and Decision Support
Structured Query Language (3)
Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
Probabilistic Databases
SQL Views Presented by: Dr. Samir Tartir
Chapter 8 Views and Indexes
Presentation transcript:

Incremental Maintenance for Non-Distributive Aggregate Functions work done at IBM Almaden Research Center Themis Palpanas (U of Toronto) Richard Sidle Bobbie Cochrane Hamid Pirahesh

UoA, Sep 2002Themis Palpanas - U of Toronto2 Motivation large amounts of data stored in databases often times data warehouses are used  consolidate data from many sources  offer more general and descriptive view of data queried by business intelligence tools and decision support systems  produce expensive OLAP queries these OLAP queries have nice properties:  based on same set of tables  perform similar aggregations

UoA, Sep 2002Themis Palpanas - U of Toronto3 Motivation (cont’d) can efficiently support such queries with Automatic Summary Tables (ASTs) materialized queries defined over a set of base tables  precomputed once, used many times  answer complex queries fast must maintain ASTs when base tables change inserts, updates, deletes

4 Motivation (cont’d) base tables AST insert/update/delete AST definition

UoA, Sep 2002Themis Palpanas - U of Toronto5 Aggregate Functions characterization of functions wrt insertion and deletion operations updates are series of deletions and insertions distributive aggregate functions new value computed based on old value and value of operation  SUM() non-distributive aggregate functions above property does not hold  STDDEV()  MIN() (because of deletions)

UoA, Sep 2002Themis Palpanas - U of Toronto6 Problem Statement given ASTs with aggregate functions distributive  SUM, COUNT non-distributive  STDDEV, CORRELATION, REGRESSION, MIN/MAX, XMLAGG, … when base tables change incrementally maintain affected ASTs efficient maintenance of ASTs with non-distributive aggregate functions

UoA, Sep 2002Themis Palpanas - U of Toronto7 Outline Current Approach Our Solution Experimental Evaluation Related Work Conclusions

8 AST Current Approach base tables insert/update/delete delta combine old and new values AST definition Propagate phase Apply phase

UoA, Sep 2002Themis Palpanas - U of Toronto9 Current Approach (cont’d) works for distributive SUM, COUNT does not work for non-distributive STDDEV, CORRELATION, REGRESSION MIN/MAX XMLAGG need new way to deal with these functions

UoA, Sep 2002Themis Palpanas - U of Toronto10 Our Solution selective recomputation no longer enough to compute delta must recompute some aggregation groups minimize work to be done choose which groups to recompute optimize query plan

11 Our Solution (cont’d) AST base tables insert/update/delete delta recompute affected groups combine old and new values Propagate phase AST definition Apply phase

UoA, Sep 2002Themis Palpanas - U of Toronto12 Our Solution (cont’d) the 5 steps 1. compute new aggregate values 2. change column derivation 3. recompute only affected groups 4. eliminate unnecessary operations 5. optimize for special cases

UoA, Sep 2002Themis Palpanas - U of Toronto13 Initial Query Plan prop UDI LOJ AST Query Graph Model (QGM)

UoA, Sep 2002Themis Palpanas - U of Toronto14 1. Compute New Aggregate Values compute delta for distributive functions recompute non-distributive functions get those values only for affected groups duplicate computation for distributive functions! prop UDI LOJ AST LOJ

UoA, Sep 2002Themis Palpanas - U of Toronto15 2. Change Column Derivation change column derivation rewrite phase projects out unused columns entire AST gets recomputed! prop UDI LOJ AST LOJ non-distributive only distributive only

UoA, Sep 2002Themis Palpanas - U of Toronto16 2. Change Column Derivation example AST: SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salary) FROM employees GROUP BY dept_id result of COUNT() computed from old propagate phase results of MAX() and STDDEV() from AST definition

UoA, Sep 2002Themis Palpanas - U of Toronto17 3. Recompute Affected Groups push join predicate down in AST only affected groups are recomputed special rules for super- aggregates GROUPING SETS ROLLUP CUBE prop UDI LOJ AST AST* LOJ non-distributive only distributive only T1Tk … JJ

UoA, Sep 2002Themis Palpanas - U of Toronto18 3. Recompute Affected Groups special treatment for ASTs with super-aggregates predicates not pushdownable caution not to compute totals of totals build special join predicate ensure correct aggregations change rewrite rules allow predicate pushdown through super aggregates applicable only for special join predicate

UoA, Sep 2002Themis Palpanas - U of Toronto19 4. Remove Unnecessary Operations outerjoin not always needed when changes are only inserts reroute columns from propagate phase through AST remove outerjoin operator same for updates not referencing AST grouping columns and predicates prop UDI LOJ AST T1Tk … JJ all columns distributive only

UoA, Sep 2002Themis Palpanas - U of Toronto20 4. Remove Unnecessary Operations example AST: SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salary) FROM employees GROUP BY dept_id modification on base tables: UPDATE employees SET salary=10 WHERE age>40 outerjoin operation will not be built update does not refer to grouping column ( dept_id ), and no predicate in AST refers to updated column ( salary) certain that no tuples in AST will be deleted only STDDEV() will be recomputed the rest are not affected by changes

UoA, Sep 2002Themis Palpanas - U of Toronto21 5. Optimize for Special Cases recomputation step not needed when only insertions and only MIN/MAX functions  build predicate in apply phase  check if new min/max should replace old values only deletions referring only to grouping columns of AST  can only cause entire groups to be deleted  handled in apply phase

UoA, Sep 2002Themis Palpanas - U of Toronto22 5. Optimize for Special Cases example AST: SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salary) FROM employees GROUP BY dept_id modification on base tables: DELETE FROM employees WHERE dept_id>40 selective recomputation step not needed deletion refers only to grouping column ( dept_id ) certain that entire groups will be deleted from AST no other groups will be affected

UoA, Sep 2002Themis Palpanas - U of Toronto23 Experimental Evaluation prototype implementation in IBM DB2 UDB star schema database sales of products over 5 year time period fact table: 10 million tuples AST with non-distributive aggregate function 240,000 tuples workload simulates nightly updates 1. add/delete data for first day of month 2. add/delete data for second day of month 3. add/delete data for full month

UoA, Sep 2002Themis Palpanas - U of Toronto24 Experimental Evaluation (cont’d) workload 1workload 2workload 3 incremental full refresh deletions require 40-60% of full refresh time workload 1workload 2workload 3 incremental3n/a31 full refresh optimized deletions require 1-4% of full refresh time

UoA, Sep 2002Themis Palpanas - U of Toronto25 Experimental Evaluation (cont’d) workload 1workload 2workload 3 incremental full refresh insertions/updates require 20-25% of full refresh time

UoA, Sep 2002Themis Palpanas - U of Toronto26 Related Work incremental view maintenance differential refresh algorithms  Lindsay et al. 1986, Blakeley et al. 1986, Qian and Wiederhold 1991, Ceri and Widom 1991 deferred incremental maintenance  Colby et al. 1996, Salem et al views with aggregation  Quass 1996, Mumick et al. 1997

UoA, Sep 2002Themis Palpanas - U of Toronto27 Conclusions incremental maintenance for ASTs with non- distributive aggregate functions support MIN/MAX, STDDEV, CORRELATION, REGRESSION, XMLAGG, … efficient selective recomputation recompute only affected groups optimize query plan customize for special cases significant performance improvements

UoA, Sep 2002Themis Palpanas - U of Toronto28 Future Work examine use of work areas temporary storage space store intermediate values maintenance without recomputation  STDDEV, MIN/MAX(?), … very helpful for ASTs defined with super-aggregates ASTs with HAVING clauses do not know when groups will enter/leave AST