Presentation is loading. Please wait.

Presentation is loading. Please wait.

Incremental Maintenance for Non-Distributive Aggregate Functions work done at IBM Almaden Research Center Themis Palpanas (U of Toronto) Richard Sidle.

Similar presentations


Presentation on theme: "Incremental Maintenance for Non-Distributive Aggregate Functions work done at IBM Almaden Research Center Themis Palpanas (U of Toronto) Richard Sidle."— Presentation transcript:

1 Incremental Maintenance for Non-Distributive Aggregate Functions work done at IBM Almaden Research Center Themis Palpanas (U of Toronto) Richard Sidle Bobbie Cochrane Hamid Pirahesh

2 UoA, Sep 2002Themis Palpanas - U of Toronto2 Motivation large amounts of data stored in databases often times data warehouses are used  consolidate data from many sources  offer more general and descriptive view of data queried by business intelligence tools and decision support systems  produce expensive OLAP queries these OLAP queries have nice properties:  based on same set of tables  perform similar aggregations

3 UoA, Sep 2002Themis Palpanas - U of Toronto3 Motivation (cont’d) can efficiently support such queries with Automatic Summary Tables (ASTs) materialized queries defined over a set of base tables  precomputed once, used many times  answer complex queries fast must maintain ASTs when base tables change inserts, updates, deletes

4 4 Motivation (cont’d) base tables AST insert/update/delete AST definition

5 UoA, Sep 2002Themis Palpanas - U of Toronto5 Aggregate Functions characterization of functions wrt insertion and deletion operations updates are series of deletions and insertions distributive aggregate functions new value computed based on old value and value of operation  SUM() non-distributive aggregate functions above property does not hold  STDDEV()  MIN() (because of deletions)

6 UoA, Sep 2002Themis Palpanas - U of Toronto6 Problem Statement given ASTs with aggregate functions distributive  SUM, COUNT non-distributive  STDDEV, CORRELATION, REGRESSION, MIN/MAX, XMLAGG, … when base tables change incrementally maintain affected ASTs efficient maintenance of ASTs with non-distributive aggregate functions

7 UoA, Sep 2002Themis Palpanas - U of Toronto7 Outline Current Approach Our Solution Experimental Evaluation Related Work Conclusions

8 8 AST Current Approach base tables insert/update/delete delta combine old and new values AST definition Propagate phase Apply phase

9 UoA, Sep 2002Themis Palpanas - U of Toronto9 Current Approach (cont’d) works for distributive SUM, COUNT does not work for non-distributive STDDEV, CORRELATION, REGRESSION MIN/MAX XMLAGG need new way to deal with these functions

10 UoA, Sep 2002Themis Palpanas - U of Toronto10 Our Solution selective recomputation no longer enough to compute delta must recompute some aggregation groups minimize work to be done choose which groups to recompute optimize query plan

11 11 Our Solution (cont’d) AST base tables insert/update/delete delta recompute affected groups combine old and new values Propagate phase AST definition Apply phase

12 UoA, Sep 2002Themis Palpanas - U of Toronto12 Our Solution (cont’d) the 5 steps 1. compute new aggregate values 2. change column derivation 3. recompute only affected groups 4. eliminate unnecessary operations 5. optimize for special cases

13 UoA, Sep 2002Themis Palpanas - U of Toronto13 Initial Query Plan prop UDI LOJ AST Query Graph Model (QGM)

14 UoA, Sep 2002Themis Palpanas - U of Toronto14 1. Compute New Aggregate Values compute delta for distributive functions recompute non-distributive functions get those values only for affected groups duplicate computation for distributive functions! prop UDI LOJ AST LOJ

15 UoA, Sep 2002Themis Palpanas - U of Toronto15 2. Change Column Derivation change column derivation rewrite phase projects out unused columns entire AST gets recomputed! prop UDI LOJ AST LOJ non-distributive only distributive only

16 UoA, Sep 2002Themis Palpanas - U of Toronto16 2. Change Column Derivation example AST: SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salary) FROM employees GROUP BY dept_id result of COUNT() computed from old propagate phase results of MAX() and STDDEV() from AST definition

17 UoA, Sep 2002Themis Palpanas - U of Toronto17 3. Recompute Affected Groups push join predicate down in AST only affected groups are recomputed special rules for super- aggregates GROUPING SETS ROLLUP CUBE prop UDI LOJ AST AST* LOJ non-distributive only distributive only T1Tk … JJ

18 UoA, Sep 2002Themis Palpanas - U of Toronto18 3. Recompute Affected Groups special treatment for ASTs with super-aggregates predicates not pushdownable caution not to compute totals of totals build special join predicate ensure correct aggregations change rewrite rules allow predicate pushdown through super aggregates applicable only for special join predicate

19 UoA, Sep 2002Themis Palpanas - U of Toronto19 4. Remove Unnecessary Operations outerjoin not always needed when changes are only inserts reroute columns from propagate phase through AST remove outerjoin operator same for updates not referencing AST grouping columns and predicates prop UDI LOJ AST T1Tk … JJ all columns distributive only

20 UoA, Sep 2002Themis Palpanas - U of Toronto20 4. Remove Unnecessary Operations example AST: SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salary) FROM employees GROUP BY dept_id modification on base tables: UPDATE employees SET salary=10 WHERE age>40 outerjoin operation will not be built update does not refer to grouping column ( dept_id ), and no predicate in AST refers to updated column ( salary) certain that no tuples in AST will be deleted only STDDEV() will be recomputed the rest are not affected by changes

21 UoA, Sep 2002Themis Palpanas - U of Toronto21 5. Optimize for Special Cases recomputation step not needed when only insertions and only MIN/MAX functions  build predicate in apply phase  check if new min/max should replace old values only deletions referring only to grouping columns of AST  can only cause entire groups to be deleted  handled in apply phase

22 UoA, Sep 2002Themis Palpanas - U of Toronto22 5. Optimize for Special Cases example AST: SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salary) FROM employees GROUP BY dept_id modification on base tables: DELETE FROM employees WHERE dept_id>40 selective recomputation step not needed deletion refers only to grouping column ( dept_id ) certain that entire groups will be deleted from AST no other groups will be affected

23 UoA, Sep 2002Themis Palpanas - U of Toronto23 Experimental Evaluation prototype implementation in IBM DB2 UDB star schema database sales of products over 5 year time period fact table: 10 million tuples AST with non-distributive aggregate function 240,000 tuples workload simulates nightly updates 1. add/delete data for first day of month 2. add/delete data for second day of month 3. add/delete data for full month

24 UoA, Sep 2002Themis Palpanas - U of Toronto24 Experimental Evaluation (cont’d) workload 1workload 2workload 3 incremental286294420 full refresh699702692 deletions require 40-60% of full refresh time workload 1workload 2workload 3 incremental3n/a31 full refresh699702692 optimized deletions require 1-4% of full refresh time

25 UoA, Sep 2002Themis Palpanas - U of Toronto25 Experimental Evaluation (cont’d) workload 1workload 2workload 3 incremental151158180 full refresh702 721 insertions/updates require 20-25% of full refresh time

26 UoA, Sep 2002Themis Palpanas - U of Toronto26 Related Work incremental view maintenance differential refresh algorithms  Lindsay et al. 1986, Blakeley et al. 1986, Qian and Wiederhold 1991, Ceri and Widom 1991 deferred incremental maintenance  Colby et al. 1996, Salem et al. 2000 views with aggregation  Quass 1996, Mumick et al. 1997

27 UoA, Sep 2002Themis Palpanas - U of Toronto27 Conclusions incremental maintenance for ASTs with non- distributive aggregate functions support MIN/MAX, STDDEV, CORRELATION, REGRESSION, XMLAGG, … efficient selective recomputation recompute only affected groups optimize query plan customize for special cases significant performance improvements

28 UoA, Sep 2002Themis Palpanas - U of Toronto28 Future Work examine use of work areas temporary storage space store intermediate values maintenance without recomputation  STDDEV, MIN/MAX(?), … very helpful for ASTs defined with super-aggregates ASTs with HAVING clauses do not know when groups will enter/leave AST


Download ppt "Incremental Maintenance for Non-Distributive Aggregate Functions work done at IBM Almaden Research Center Themis Palpanas (U of Toronto) Richard Sidle."

Similar presentations


Ads by Google