Presentation is loading. Please wait.

Presentation is loading. Please wait.

® IBM Software Group © 2009 IBM Corporation A First Look at Materialized Query Table (MQT) in DB2 LUW Petrus Chan, IBM Toronto Lab,

Similar presentations


Presentation on theme: "® IBM Software Group © 2009 IBM Corporation A First Look at Materialized Query Table (MQT) in DB2 LUW Petrus Chan, IBM Toronto Lab,"— Presentation transcript:

1 ® IBM Software Group © 2009 IBM Corporation A First Look at Materialized Query Table (MQT) in DB2 LUW Petrus Chan, IBM Toronto Lab,

2 IBM Software Group | DB2 information management software 2 Agenda Motivation Sample scenario How MQT can help Different types of MQTs & Maintenance MQT Matching – what queries can benefit from MQTs Design Considerations Other Uses of MQT

3 IBM Software Group | DB2 information management software 3 Database Schema for Our Sample Scenario Product_id Store_id Date_id Amount Quantity … SALES Store_id Region_id … STORE Date_id Month_id Quarter_id Year_id TIME Product_id Class_id Group_id Family_id Line_id Division_id … PRODUCT Star Schema

4 IBM Software Group | DB2 information management software 4 Dimensions Hierarchy Store Month Product Class Group Family Line Division Quarter Year Region Sales Fact Product Dimension Time Dimension Store Dimension Level 5 Level 1 Level 2 Level 3 Level 4 Level 0 Day

5 IBM Software Group | DB2 information management software 5 Sample Query Q9: Total sales of each product group at each store for December SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id AND t.month_id = GROUP BY t.month_id, f.store_id, p.group_id

6 IBM Software Group | DB2 information management software 6 Benefits of MQTs Fact GB JOIN Dim2 Dim1 JOIN Q9 Month= Fact GB JOIN Dim2 Dim1 JOIN Joes Query Group=Laser Printer Fact GB JOIN Dim2 Dim1 JOIN Bobs Query Store=McLean Fact GB JOIN Dim2 Dim1 JOIN MQT Q9 Bobs Q Joes Q Compute Once & Reuse Month= Store=McLean Group=Laser Printer

7 IBM Software Group | DB2 information management software 7 What is an MQT? MQT stands for Materialized Query Table. Basic Idea: Query results caching - pre-compute and store (in database tables) the results of frequently issued queries Query optimizer automatically and transparently replace portions of future queries with direct references to the pre-computed (i.e., materialized) results. This is referred to as MQT routing or matching. Standard technique in data warehousing, also known as aggregate tables and aggregate (re-)routing, or alternatively, as materialized views.

8 IBM Software Group | DB2 information management software 8 Usages of MQTs – At a High Level Aggregate data over one or more dimensions Joins and aggregate data over a group of tables Data from a commonly accessed subset of data, that is, from a "hot" horizontal or vertical database partition Repartitioned data from a table, or part of a table, in a partitioned database environment

9 IBM Software Group | DB2 information management software 9 How query can be routed to MQT? MQT_SalesByMthStrGrp SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.month_id, f.store_id, p.group_id Q9: Original Query SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id AND t.month_id = GROUP BY t.month_id, f.store_id, p.group_id Q9: Rewritten Query SELECT m.month_id, m.store_id, m.group_id, m.total_sales FROM MQT_SalesByMthStrGrp m WHERE m.month_id = Route Query to MQT

10 IBM Software Group | DB2 information management software 10 MQT Creation MQTs are created using the CREATE TABLE statement Immediately after creation, the MQT is empty and not accessible. It is populated using the REFRESH TABLE statement After running the REFRESH, the MQT becomes accessible CREATE TABLE MQT_SalesByMthStrGrp AS (SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.month_id, f.store_id, p.group_id) DATA INITIALLY DEFERRED REFRESH DEFERRED; REFRESH TABLE MQT_SalesByMthStrGrp; RUNSTATS ON TABLE petrus.MQT_SalesByMthStrGrp WITH DISTRIBUTION AND DETAILED INDEXES ALL; -- Confirm that MQT is being used by -- (a) setting CURRENT REFRESH AGE to ANY -- (b) explaining the query used as -- the MQTs definition

11 IBM Software Group | DB2 information management software 11 Agenda Motivation Sample scenario How MQT can help Types of MQTs & Maintenance MQT Matching – what queries can benefit from MQTs Design Considerations Other Uses of MQT

12 IBM Software Group | DB2 information management software 12 Types of MQTs MQT System Maintained User Maintained Refresh IMMEDIATE Refresh DEFERRED WITH staging table WITHOUT staging table

13 IBM Software Group | DB2 information management software 13 MQT Maintenance Process of synchronizing the MQT data with changes to the underlying tables Incremental Maintenance Immediate Propagate Immediate Applied (IPIA) Full Maintenance Hybrid Immediate Propagate Deferred Applied (IPDA)

14 IBM Software Group | DB2 information management software 14 Incremental Maintenance MQT base tables base tables base tables Base Tables Delta select/join Delta Aggregate Immediate Propagate Immediate Apply Pair Delta to MQT Combine old and new values Primary- key: grouping columns the ability to maintain the MQT without the need to access the modified base table, only using the delta information (newly inserted/updated/deleted data) and the MQT itself. REFRESH IMMEDIDATE MQTs are maintained incrementally during: INSERT/UPDATE/DELETE of base table processing REFRESH TABLE processing after LOAD INSERT into the base table. Immediate Propagate, Immediate Apply For performance reasons, only incrementally maintainable MQTs can be defined as REFRESH IMMEDIATE (hence more restrictions in MQT definition)

15 IBM Software Group | DB2 information management software 15 Full Maintenance MQTs are fully rebuilt by re-evaluating the query in the MQT definition REFRESH DEFERRED MQTs (without staging table): not maintained during INSERT/UPDATE/DELETE operations on the base table. synchronized with base table using the REFRESH TABLE statement MQT base tables base tables base tables Base Tables select/join Aggregate

16 IBM Software Group | DB2 information management software 16 Hybrid - Immediate Propagate, Deferred Applied AST base tables base tables base tables Base Tables Delta select/join Delta Aggregate Immediate Propagate Deferred Apply Pair Delta to AST Combine old and new values Primary- key: grouping columns Staging Staging Tables Net-effect REFRESH DEFERRED MQTs with staging table Delta changes to MQT are computed upon INSERT/UPDATE/DELETE of base tables Delta aggregate is recorded in a separate staging table Subsequent REFRESH of MQT can be done incrementally Immediate Propagate, Deferred Apply One staging table per MQT CREATE TABLE SALES_BY_YEAR_STG FOR SALES_BY_YEAR PROPAGATE IMMEDIATE;

17 IBM Software Group | DB2 information management software 17 User Maintained MQT User is responsible for maintaining MQT content REFRESH TABLE disallowed Insert/update/delete on MQT allowed Must be REFRESH DEFERRED Scenarios where user maintained MQTs are For efficiency reasons, when the user is convinced that (s)he can implement MQT maintenance far more efficiently than the mechanisms used by DB2. For leveraging existing user maintained summaries, where the user wants DB2 to automatically consider them for optimization for new ad hoc queries being executed against the base tables.

18 IBM Software Group | DB2 information management software 18 Refresh Considerations Refresh ImmediateRefresh Deferred (WITH staging table) Refresh Deferred (NO staging table) Latency of DataAlways in syncUp to last REFRESH Transaction Overhread upon base table modification Yes No Incremental Refresh No need, except after LOAD INSERT YesNo Lock Contention during base table modifications PotentialLessNo

19 IBM Software Group | DB2 information management software 19 Agenda Motivation Sample scenario How MQT can help Types of MQTs & Maintenance MQT Matching – what queries can benefit from MQTs Design Considerations Other Uses of MQT

20 IBM Software Group | DB2 information management software 20 MQT Matching Process to automatically use the MQT to derive results for a query for improved performance MQT_SalesByMthStrGrp SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.month_id, s.store_id, p.group_id Q9: Original Query SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id AND t.month_id = GROUP BY t.month_id, f.store_id, p.group_id Q9: Rewritten Query SELECT m.month_id, m.store_id, m.group_id, m.total_sales FROM MQT_SalesByMthStrGrp m WHERE m.month_id = Route Query to MQT

21 IBM Software Group | DB2 information management software 21 What queries can benefit from an MQT? Queries that matches an MQT perfectly! with predicates that are a subset of those in MQT with more table joins than MQT (re-join) with fewer table joins than MQT (need RI constraint) with columns not in MQT definition (need functional dependencies) at a higher aggregation level than an MQT

22 IBM Software Group | DB2 information management software 22 Query with predicates that are a subset of those in MQT Fact GB JOIN Dim2 Dim1 JOIN Q9 Month= Fact GB JOIN Dim2 Dim1 JOIN Joes Query Group=Laser Printer Fact GB JOIN Dim2 Dim1 JOIN Bobs Query Store=McLean Fact GB JOIN Dim2 Dim1 JOIN MQT Q9 Bobs Q Joes Q Compute Once & Reuse Month= Store=McLean Group=Laser Printer

23 IBM Software Group | DB2 information management software 23 Query with more table joins than MQT MQT: Original Query: Sales Product Time GB Sales Product Time Store GB If the extra table join is through a column in group-by list, match and re-join with tables not in MQT Rewritten Query: MQT Store GB Re-join

24 IBM Software Group | DB2 information management software 24 Query with more table joins than MQT (Example) MQT_SalesByMthStrGrp SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.month_id, s.store_id, p.group_id Original Query SELECT t.month_id, s.region_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t, store s WHERE f.date_id = t.date_id AND f.product_id = p.product_id AND f.store_id = s.store_id GROUP BY t.month_id, s.region_id, p.group_id Rewritten Query SELECT m.month_id, s.region_id, m.group_id, sum(m.total_sales) FROM MQT_SalesByMthStrGrp m, store s WHERE m.stored_id = s.store_id GROUP BY m.month_id, s.region_id, m.group_id Route Query to MQT

25 IBM Software Group | DB2 information management software 25 Queries with fewer joins than MQT In general, we cant route queries to an MQT that has more table joins because the extra joins affects the MQT content However, if the extra joins in MQT are RI-joins, i.e. joins with RI parent table with join predicate of the form: child.fk = parent.pk They are look-up joins that do not add or delete rows RI-join is common between Fact and Dimension tables. Need to declare RI constraints (even just informational constraints) to tell DB2 about it Query: MQT: Sales Time GB Sales Time Product GB RI-join

26 IBM Software Group | DB2 information management software 26 Queries with fewer joins than MQT (Example) Original Query SELECT t.month_id, f.store_id, sum(amount) as total_sales FROM sales f, time t WHERE f.date_id = t.date_id GROUP BY t.month_id, f.store_id Rewritten Query SELECT m.month_id, m.store_id, sum(m.total_sales) FROM MQT_SalesByMthStrGrp m GROUP BY m.month_id, m.store_id Route Query to MQT MQT_SalesByMthStrGrp SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.month_id, s.store_id, p.group_id alter table sales add constraint fk_product foreign key (product_id) references product; Sales (product_id int NOT NULL)

27 IBM Software Group | DB2 information management software 27 Importance of NOT NULL Constraint on Foreign Key Columns Declaring NOT NULL constraint on the foreign key column of the child table is important! If a row of column SALES.product_id has null value: the row will not qualify the join predicate f.product_id = p.product_id. The MQT would be missing some rows that are needed by the query, hence incorrect to route. MQT_SalesByMthStrGrp SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.month_id, s.store_id, p.group_id

28 IBM Software Group | DB2 information management software 28 Queries at higher aggregation level than an MQT Store Month Product Class Group Family Line Division Quarter Year Region Sales Fact Product Dimension Time Dimension Store Dimension Level 5 Level 1 Level 2 Level 3 Level 4 Level 0 Day MQT for the Store-Month-Group slice Queries that access data at or above that slice can be satisfied by the MQT with some additional aggregating. Querying more detailed data below the slice, such as Day or Product, cannot be routed to this MQT.

29 IBM Software Group | DB2 information management software 29 MQT_SalesByMthStrGrp SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.month_id, s.store_id, p.group_id Queries at a higher level aggregation - Example Original Query SELECT t.quarter_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.quarter_id, f.store_id, p.group_id Rewritten Query SELECT t.quarer_id, m.store_id, m.group_id, sum(m.total_sales) FROM MQT_SalesByMthStrGrp m, time t WHERE m.month_id = t.month_id GROUP BY t.quarter_id, m.store_id, m.group_id Route Query to MQT Functional Dependency TIME (month_idquarter_id)

30 IBM Software Group | DB2 information management software 30 Functional Dependencies X functionally determines Y iff each X value is associated with precisely one Y value. By definition, all columns of a relation are functionally dependent on the unique/primary key. De-normalization introduces functional dependencies between non-key columns. Between ID and descriptive columns Example: group_id group_desc Between different levels of the hierarchy Example: group_id (family_id, line_id, division_id) Product Class Group Family Line Division Product Dimension

31 IBM Software Group | DB2 information management software 31 Functional Dependencies in DB2 User can help DB2 identifies the functional dependencies among non-key columns by declaring NOT ENFORCED FD constraints: ALTER TABLE PRODUCT ADD CHECK((group_desc) DETERMINED BY group_id) NOT ENFORCED ENABLE QUERY OPTIMIZATION During MQT routing, DB2 will re-join with the table to pick up the dependents

32 IBM Software Group | DB2 information management software 32 Functional Dependencies - Example Query at year-region-division level: Select t.year_id, s.region_desc, p.division_desc, sum(amount) as total_sales... GROUP BY t.year_id, s.region_desc, p.division_desc; MQT at the month-store- group aggregate level: SELECT t.month_id, t.year_id, f.store_id, s.region_id, s.region_desc, p.group_id, p.division_id, p.division_desc, sum(amount) as total_sales FROM sales f, product p, time t,stored s WHERE f.date_id = t.date_id AND f.product_id = p.product_id AND f.store_id = s.store_id GROUP BY t.month_id, t.year_id, s.store_id, s.region_id, s.region_desc, p.group_id,p.division_id, p.division_desc FD: TIME: month_id -> year_id UK: STORE store_id -> (region_id, region_desc) FD: PRODUCT: group_id -> (division_id, division_desc) MQT PRODUCT TIME GB (year-region-division) FD re-join PRODUCT FD re-join (month_id, year_id) (group_id, division_desc) (store_id, region_desc) (month-store-group)

33 IBM Software Group | DB2 information management software 33 Functional Dependencies in MQT It is a good practice to declare functional dependencies to keep the minimum amount of dimensional information in the MQT. This will: keep the size of the MQT small Keep the MQT unaffected by dimension table updates - minimize the need for maintenance

34 IBM Software Group | DB2 information management software 34 MQT Matching – Insurance Policy MQT matching performed during Query Rewrite Rewrite transformations mostly rule/heuristic based Insurance Policy: Optimizer Cost based comparison between: Access plan with MQT routing Access plan WITHOUT MQT routing Cheapest plan wins - against choosing a wrong heuristic Parser Query Graph Semantics Query Rewrite Optimizer Code Generator SQL Query With and without Aggregate MQT

35 IBM Software Group | DB2 information management software 35 Agenda Motivation Sample scenario How MQT can help Types of MQTs & Maintenance MQT Matching – what queries can benefit from MQTs Design Considerations Other Uses of MQT

36 IBM Software Group | DB2 information management software 36 Design Considerations On usage of MQT Space requirement – MQTs are materialized tables On MQT definition Level of aggregation vs. query processing time Too high level – few queries can route to it Too low level – costly compensation time Workload-based vs. Model-based Numbers of MQTs Store Month Product Class Group Family Line Division Quarter Year Region Sales Fact Product Dim. Time Dim. Store Dim. Level 5 Level 1 Level 2 Level 3 Level 4 Level 0 Day

37 IBM Software Group | DB2 information management software 37 Refreshing with finer granularity MQTs DB2 9 allows the use of existing MQTs to be used by higher level MQTs during REFRESH Matching done during REFRESH It is very important to aggregate in the right order Example : MQT1 : GROUP BY DATE, MONTH, YEAR MQT2 : GROUP BY MONTH, YEAR REFRESH MQT1 REFRESH MQT2 uses MQT1

38 IBM Software Group | DB2 information management software 38 Another Use of Replicated MQTs In partitioned database, colocation of rows involved in different tables => efficient join (colocated joins) In a typical star schema setup, the fact table can be co-located with at most one dimension table. The remaining (smaller) dimension tables are frequently replicated to all nodes of the fact table Replicated MQTs are used to achieve co-location of data for efficient join processing in partitioned database environment Product ProdID1 Sales ProdID1 Product ProdID2 Sales ProdID2 Product ProdID3 Sales ProdID3 Store

39 IBM Software Group | DB2 information management software 39 Summary An MQT is a table that stores pre-computed query results. Queries can be matched to an MQT to improve performance. Different types of MQTs and their maintenance: immediate, deferred (with/without staging table), user maintained replicated Queries can be matched to an MQT that has: Predicates that are superset of those in queries Finer grained aggregation level More/few table joins Not null constraint, foreign key constraints and functional dependencies can be defined to help MQT matching

40 IBM Software Group | DB2 information management software 40 References Redbooks (http://www.redbooks.ibm.com) DB2 UDB's High Function Business Intelligence in e-Business (SG ) Related Papers Wolfgang Lehner, Richard Sidle, Hamid Pirahesh, Roberta Cochrane: Maintenance of Automatic Summary Tables. SIGMOD Conference 2000: Markos Zaharioudakis, Roberta Cochrane, George Lapis, Hamid Pirahesh, Monica Urata: Answering Complex SQL Queries Using Automatic Summary Tables. SIGMOD Conference 2000:

41 IBM Software Group | DB2 information management software 41 Thank You

42 IBM Software Group | DB2 information management software 42 Backup Slides

43 IBM Software Group | DB2 information management software 43 Explain Messages Sample Explain Messages: EXP0054W The following REFRESH DEFERRED MQT was not considered for rewrite matching because the CURRENT REFRESH AGE register was not set to ANY: PETRUS.MQT1". EXP0078W The following MQT was not used in rewrite matching because an equivalent or better candidate was available: PETRUS.MQT1. EXP0073W (1) The following MQT was not used in rewrite matching because one or more predicates from the query could not be matched with the MQT: PETRUS.MQT3". EXP0079W The following MQT was not used in the final access plan, because the plan cost with this MQTs was more expensive: PETRUS.MQT3".

44 IBM Software Group | DB2 information management software 44 Grouping Sets and Super-groups GROUPING SETS = UNION of GROUP BY Example: GROUPING SETS ((a,b,c), (a,d,e)) = GROUP BY a,b,c UNION ALL GROUP BY a,d,e Simple GROUP BY = GROUP SETS with single element Example: GROUP BY a,b,c = GROUP BY GROUPING SETS ((a,b,c)) ROLLUP (a,b,c) = GROUPING SETS ((a,b,c), (a,b), (a), ()) CUBE – cross tabulation CUBE (a,b,c) = GROUPING SETS ((a,b,c), (a,b), (b,c), (a,c), (a), (b), (c), ())


Download ppt "® IBM Software Group © 2009 IBM Corporation A First Look at Materialized Query Table (MQT) in DB2 LUW Petrus Chan, IBM Toronto Lab,"

Similar presentations


Ads by Google