Presentation is loading. Please wait.

Presentation is loading. Please wait.

A First Look at Materialized Query Table (MQT) in DB2 LUW

Similar presentations


Presentation on theme: "A First Look at Materialized Query Table (MQT) in DB2 LUW"— Presentation transcript:

1 A First Look at Materialized Query Table (MQT) in DB2 LUW
Petrus Chan, IBM Toronto Lab,

2 Agenda Motivation Different types of MQTs & Maintenance
Sample scenario How MQT can help Different types of MQTs & Maintenance MQT Matching – what queries can benefit from MQTs Design Considerations Other Uses of MQT

3 Database Schema for Our Sample Scenario
STORE Star Schema Store_id Region_id PRODUCT SALES Product_id Class_id Group_id Family_id Line_id Division_id Product_id Store_id Date_id Amount Quantity TIME Date_id Month_id Quarter_id Year_id

4 Dimensions Hierarchy Product Dimension Division Level 5 Line Level 4
Time Dimension Family Year Level 3 Store Dimension Quarter Group Level 2 Region Month Class Level 1 Store Day Product Level 0 Sales Fact

5 Sample Query Q9: Total sales of each product group at each store for December 2008. SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id AND t.month_id = GROUP BY t.month_id, f.store_id, p.group_id

6 Benefits of MQTs Q9 Q9 Joe’s Query Joe’s Q Group=Laser Printer Bob’s Q
Fact GB JOIN Dim2 Dim1 MQT Q9 Bob’s Q Joe’s Q Compute Once & Reuse Month=200812 Store=McLean Group=Laser Printer Fact GB JOIN Dim2 Dim1 Q9 Month=200812 Fact GB JOIN Dim2 Dim1 Joe’s Query Group=Laser Printer Bob’s Query Store=McLean In a data warehouse environment, users often issue queries repetitively against large volumes of data with minor variations in query's predicates.

7 What is an MQT? MQT stands for Materialized Query Table. Basic Idea:
Query results caching - pre-compute and store (in database tables) the results of frequently issued queries Query optimizer automatically and transparently replace portions of future queries with direct references to the pre-computed (i.e., materialized) results. This is referred to as MQT routing or matching. Standard technique in data warehousing, also known as aggregate tables and aggregate (re-)routing, or alternatively, as materialized views.

8 Usages of MQTs – At a High Level
Aggregate data over one or more dimensions Joins and aggregate data over a group of tables Data from a commonly accessed subset of data, that is, from a "hot" horizontal or vertical database partition Repartitioned data from a table, or part of a table, in a partitioned database environment

9 How query can be routed to MQT?
MQT_SalesByMthStrGrp SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.month_id, f.store_id, p.group_id Q9: Original Query SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id AND t.month_id = GROUP BY t.month_id, f.store_id, p.group_id Q9: Rewritten Query SELECT m.month_id, m.store_id, m.group_id, m.total_sales FROM MQT_SalesByMthStrGrp m WHERE m.month_id = Route Query to MQT

10 MQT Creation MQTs are created using the CREATE TABLE statement
Immediately after creation, the MQT is empty and not accessible. It is populated using the REFRESH TABLE statement After running the REFRESH, the MQT becomes accessible CREATE TABLE MQT_SalesByMthStrGrp AS (SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.month_id, f.store_id, p.group_id) DATA INITIALLY DEFERRED REFRESH DEFERRED; REFRESH TABLE MQT_SalesByMthStrGrp; RUNSTATS ON TABLE petrus.MQT_SalesByMthStrGrp WITH DISTRIBUTION AND DETAILED INDEXES ALL; -- Confirm that MQT is being used by -- (a) setting CURRENT REFRESH AGE to ANY -- (b) explaining the query used as the MQT’s definition

11 Agenda Motivation Types of MQTs & Maintenance
Sample scenario How MQT can help Types of MQTs & Maintenance MQT Matching – what queries can benefit from MQTs Design Considerations Other Uses of MQT

12 Types of MQTs MQT User Maintained System Maintained Refresh DEFERRED
Refresh IMMEDIATE Refresh DEFERRED Explain each one means WITHOUT staging table WITH staging table

13 MQT Maintenance Process of synchronizing the MQT data with changes to the underlying tables Incremental Maintenance Immediate Propagate Immediate Applied (IPIA) Full Maintenance Hybrid Immediate Propagate Deferred Applied (IPDA)

14 Incremental Maintenance
the ability to maintain the MQT without the need to access the modified base table, only using the delta information (newly inserted/updated/deleted data) and the MQT itself. REFRESH IMMEDIDATE MQTs are maintained incrementally during: INSERT/UPDATE/DELETE of base table processing REFRESH TABLE processing after LOAD INSERT into the base table. Immediate Propagate, Immediate Apply For performance reasons, only incrementally maintainable MQTs can be defined as REFRESH IMMEDIATE (hence more restrictions in MQT definition) Immediate Propagate Immediate Apply Pair Delta to MQT Combine old and new values Delta Aggregate MQT base tables Base Tables Delta select/join Primary-key: grouping columns

15 Full Maintenance MQTs are fully rebuilt by re-evaluating the query in the MQT definition REFRESH DEFERRED MQTs (without staging table): not maintained during INSERT/UPDATE/DELETE operations on the base table. synchronized with base table using the REFRESH TABLE statement MQT Aggregate select/join base tables Base Tables

16 Hybrid - Immediate Propagate, Deferred Applied
REFRESH DEFERRED MQTs with staging table Delta changes to MQT are computed upon INSERT/UPDATE/DELETE of base tables Delta aggregate is recorded in a separate staging table Subsequent REFRESH of MQT can be done incrementally Immediate Propagate, Deferred Apply One staging table per MQT CREATE TABLE SALES_BY_YEAR_STG FOR SALES_BY_YEAR PROPAGATE IMMEDIATE; Immediate Propagate Deferred Apply Pair Delta to AST Combine old and new values Delta Aggregate AST Net-effect base tables base tables Delta select/join Base Tables base tables Primary-key: grouping columns Staging Tables Staging

17 User Maintained MQT User is responsible for maintaining MQT content
REFRESH TABLE disallowed Insert/update/delete on MQT allowed Must be REFRESH DEFERRED Scenarios where user maintained MQTs are For efficiency reasons, when the user is convinced that (s)he can implement MQT maintenance far more efficiently than the mechanisms used by DB2. For leveraging existing “user maintained” summaries, where the user wants DB2 to automatically consider them for optimization for new ad hoc queries being executed against the base tables.

18 Refresh Considerations
Refresh Immediate Refresh Deferred (WITH staging table) Refresh Deferred (NO staging table) Latency of Data Always in sync Up to last REFRESH Transaction Overhread upon base table modification Yes No Incremental Refresh No need, except after LOAD INSERT Lock Contention during base table modifications Potential Less

19 Agenda Motivation Types of MQTs & Maintenance
Sample scenario How MQT can help Types of MQTs & Maintenance MQT Matching – what queries can benefit from MQTs Design Considerations Other Uses of MQT

20 MQT Matching Route Query to MQT
MQT_SalesByMthStrGrp SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.month_id, s.store_id, p.group_id Process to automatically use the MQT to derive results for a query for improved performance Q9: Original Query SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id AND t.month_id = GROUP BY t.month_id, f.store_id, p.group_id Q9: Rewritten Query SELECT m.month_id, m.store_id, m.group_id, m.total_sales FROM MQT_SalesByMthStrGrp m WHERE m.month_id = Route Query to MQT

21 What queries can benefit from an MQT?
that matches an MQT perfectly! with predicates that are a subset of those in MQT with more table joins than MQT (re-join) with fewer table joins than MQT (need RI constraint) with columns not in MQT definition (need functional dependencies) at a higher aggregation level than an MQT

22 Query with predicates that are a subset of those in MQT
Fact GB JOIN Dim2 Dim1 MQT Q9 Bob’s Q Joe’s Q Compute Once & Reuse Month=200812 Store=McLean Group=Laser Printer Fact GB JOIN Dim2 Dim1 Q9 Month=200812 Fact GB JOIN Dim2 Dim1 Joe’s Query Group=Laser Printer Bob’s Query Store=McLean In a data warehouse environment, users often issue queries repetitively against large volumes of data with minor variations in query's predicates.

23 Query with more table joins than MQT
If the extra table join is through a column in group-by list, match and re-join with tables not in MQT Rewritten Query: GB Original Query: GB Re-join Store MQT Store Time GB MQT: In a data warehouse environment, users often issue queries repetitively against large volumes of data with minor variations in query's predicates. Sales Product Time Sales Product

24 Query with more table joins than MQT (Example)
Rewritten Query SELECT m.month_id, s.region_id, m.group_id, sum(m.total_sales) FROM MQT_SalesByMthStrGrp m, store s WHERE m.stored_id = s.store_id GROUP BY m.month_id, s.region_id, m.group_id Original Query SELECT t.month_id, s.region_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t, store s WHERE f.date_id = t.date_id AND f.product_id = p.product_id AND f.store_id = s.store_id GROUP BY t.month_id, s.region_id, p.group_id Route Query to MQT MQT_SalesByMthStrGrp SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.month_id, s.store_id, p.group_id

25 Queries with fewer joins than MQT
In general, we can’t route queries to an MQT that has more table joins because the extra joins affects the MQT content However, if the extra joins in MQT are RI-joins, i.e. joins with RI parent table with join predicate of the form: child.fk = parent.pk They are “look-up” joins that do not add or delete rows RI-join is common between Fact and Dimension tables. Need to declare RI constraints (even just informational constraints) to tell DB2 about it GB Query: MQT: GB RI-join Product Sales Time Sales Time

26 Queries with fewer joins than MQT (Example)
Rewritten Query SELECT m.month_id, m.store_id, sum(m.total_sales) FROM MQT_SalesByMthStrGrp m GROUP BY m.month_id, m.store_id Original Query SELECT t.month_id, f.store_id, sum(amount) as total_sales FROM sales f, time t WHERE f.date_id = t.date_id GROUP BY t.month_id, f.store_id Route Query to MQT alter table sales add constraint fk_product foreign key (product_id) references product; Sales (product_id int NOT NULL) MQT_SalesByMthStrGrp SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.month_id, s.store_id, p.group_id

27 Importance of NOT NULL Constraint on Foreign Key Columns
Declaring NOT NULL constraint on the foreign key column of the child table is important! If a row of column SALES.product_id has null value: the row will not qualify the join predicate f.product_id = p.product_id. The MQT would be missing some rows that are needed by the query, hence incorrect to route. MQT_SalesByMthStrGrp SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.month_id, s.store_id, p.group_id

28 Queries at higher aggregation level than an MQT
MQT for the Store-Month-Group slice Queries that access data at or above that slice can be satisfied by the MQT with some additional aggregating. Querying more detailed data below the slice, such as Day or Product, cannot be routed to this MQT. Product Dimension Division Level 5 Line Level 4 Time Dimension Family Level 3 Year Store Dimension Quarter Group Level 2 Region Month Class Level 1 Store Day Product Level 0 Sales Fact

29 Queries at a higher level aggregation - Example
MQT_SalesByMthStrGrp SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.month_id, s.store_id, p.group_id Functional Dependency TIME (month_id→quarter_id) Original Query SELECT t.quarter_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.quarter_id, f.store_id, p.group_id Rewritten Query SELECT t.quarer_id, m.store_id, m.group_id, sum(m.total_sales) FROM MQT_SalesByMthStrGrp m, time t WHERE m.month_id = t.month_id GROUP BY t.quarter_id, m.store_id, m.group_id Route Query to MQT

30 Functional Dependencies
Product Dimension X functionally determines Y iff each X value is associated with precisely one Y value. By definition, all columns of a relation are functionally dependent on the unique/primary key. De-normalization introduces functional dependencies between non-key columns. Between ID and descriptive columns Example: group_id → group_desc Between different levels of the hierarchy Example: group_id → (family_id, line_id, division_id) Division Line Family Group Class Product

31 Functional Dependencies in DB2
User can help DB2 identifies the functional dependencies among non-key columns by declaring NOT ENFORCED FD constraints: ALTER TABLE PRODUCT ADD CHECK((group_desc) DETERMINED BY group_id) NOT ENFORCED ENABLE QUERY OPTIMIZATION During MQT routing, DB2 will re-join with the table to pick up the “dependents”

32 Functional Dependencies - Example
Query at year-region-division level: Select t.year_id, s.region_desc, p.division_desc, sum(amount) as total_sales ... GROUP BY t.year_id, s.region_desc, p.division_desc; MQT at the month-store- group aggregate level: SELECT t.month_id, t.year_id, f.store_id, s.region_id, s.region_desc, p.group_id, p.division_id, p.division_desc, FROM sales f, product p, time t ,stored s WHERE f.date_id = t.date_id AND f.product_id = p.product_id AND f.store_id = s.store_id GROUP BY t.month_id, s.store_id, p.group_id ,p.division_id, p.division_desc MQT PRODUCT TIME GB (year-region-division) FD re-join (month_id, year_id) (group_id, division_desc) (store_id, region_desc) (month-store-group) FD: TIME: month_id -> year_id UK: STORE store_id -> (region_id, region_desc) FD: PRODUCT: group_id -> (division_id, division_desc)

33 Functional Dependencies in MQT
It is a good practice to declare functional dependencies to keep the minimum amount of dimensional information in the MQT. This will: keep the size of the MQT small Keep the MQT unaffected by dimension table updates - minimize the need for maintenance

34 MQT Matching – Insurance Policy
SQL Query MQT matching performed during Query Rewrite Rewrite transformations mostly rule/heuristic based Insurance Policy: Optimizer Cost based comparison between: Access plan with MQT routing Access plan WITHOUT MQT routing Cheapest plan wins - against choosing a wrong heuristic Parser Query Graph Semantics With and without Aggregate MQT Query Rewrite Optimizer Code Generator

35 Agenda Motivation Types of MQTs & Maintenance
Sample scenario How MQT can help Types of MQTs & Maintenance MQT Matching – what queries can benefit from MQTs Design Considerations Other Uses of MQT

36 Design Considerations
On usage of MQT Space requirement – MQTs are materialized tables On MQT definition Level of aggregation vs. query processing time Too high level – few queries can route to it Too low level – costly compensation time Workload-based vs. Model-based Numbers of MQTs Store Month Product Class Group Family Line Division Quarter Year Region Sales Fact Product Dim. Time Dim. Store Dim. Level 5 Level 1 Level 2 Level 3 Level 4 Level 0 Day Model-based – Infosphere Warehouse – Optimization Advisor DB2 Design Advisor

37 Refreshing with finer granularity MQTs
DB2 9 allows the use of existing MQTs to be used by higher level MQTs during REFRESH Matching done during REFRESH It is very important to aggregate in the right order Example : MQT1 : GROUP BY DATE, MONTH, YEAR MQT2 : GROUP BY MONTH, YEAR REFRESH MQT1 REFRESH MQT2  uses MQT1

38 Another Use of Replicated MQTs
In partitioned database, colocation of rows involved in different tables => efficient join (colocated joins) In a typical star schema setup, the fact table can be co-located with at most one dimension table. The remaining (smaller) dimension tables are frequently replicated to all nodes of the fact table Replicated MQTs are used to achieve co-location of data for efficient join processing in partitioned database environment Store Store Product ProdID1 Product ProdID2 Product ProdID3 Sales ProdID1 Sales ProdID2 Sales ProdID3

39 Summary An MQT is a table that stores pre-computed query results.
Queries can be matched to an MQT to improve performance. Different types of MQTs and their maintenance: immediate, deferred (with/without staging table), user maintained replicated Queries can be matched to an MQT that has: Predicates that are superset of those in queries Finer grained aggregation level More/few table joins Not null constraint, foreign key constraints and functional dependencies can be defined to help MQT matching

40 References Redbooks (http://www.redbooks.ibm.com) Related Papers
DB2 UDB's High Function Business Intelligence in e-Business (SG ) Related Papers Wolfgang Lehner, Richard Sidle, Hamid Pirahesh, Roberta Cochrane: Maintenance of Automatic Summary Tables. SIGMOD Conference 2000: Markos Zaharioudakis, Roberta Cochrane, George Lapis, Hamid Pirahesh, Monica Urata: Answering Complex SQL Queries Using Automatic Summary Tables. SIGMOD Conference 2000:

41 Thank You

42 Backup Slides

43 Explain Messages Sample Explain Messages:
EXP0054W The following REFRESH DEFERRED MQT was not considered for rewrite matching because the CURRENT REFRESH AGE register was not set to ANY: “PETRUS.MQT1". EXP0078W The following MQT was not used in rewrite matching because an equivalent or better candidate was available: “PETRUS.MQT1”. EXP0073W (1) The following MQT was not used in rewrite matching because one or more predicates from the query could not be matched with the MQT: “PETRUS.MQT3". EXP0079W The following MQT was not used in the final access plan, because the plan cost with this MQTs was more expensive: “PETRUS.MQT3".

44 Grouping Sets and Super-groups
GROUPING SETS = UNION of GROUP BY Example: GROUPING SETS ((a,b,c), (a,d,e)) = GROUP BY a,b,c UNION ALL GROUP BY a,d,e Simple GROUP BY = GROUP SETS with single element Example: GROUP BY a,b,c = GROUP BY GROUPING SETS ((a,b,c)) ROLLUP (a,b,c) = GROUPING SETS ((a,b,c), (a,b), (a), ()) CUBE – cross tabulation CUBE (a,b,c) = GROUPING SETS ((a,b,c), (a,b), (b,c), (a,c), (a), (b), (c), ())


Download ppt "A First Look at Materialized Query Table (MQT) in DB2 LUW"

Similar presentations


Ads by Google