Star Transformations Tony Hasler, UKOUG Birmingham 2012 Tony Hasler, Anvil Computer Services Ltd.

Slides:



Advertisements
Similar presentations
Tuning Oracle SQL The Basics of Efficient SQLThe Basics of Efficient SQL Common Sense Indexing The Optimizer –Making SQL Efficient Finding Problem Queries.
Advertisements

Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Understanding SQL Server Query Execution Plans
Introduction to SQL Tuning Brown Bag Three essential concepts.
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Natural Data Clustering: Why Nested Loops Win So Often May, 2008 ©2008 Dan Tow, All rights reserved SingingSQL.
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
6.830 Lecture 9 10/1/2014 Join Algorithms. Database Internals Outline Front End Admission Control Connection Management (sql) Parser (parse tree) Rewriter.
Semantec Ltd. Oracle Performance Tuning Boyan Pavlov Indexes Indexes.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
Optimization Exercises. Question 1 How do you think the following query should be computed? What indexes would you suggest to use? SELECT E.ename, D.mgr.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
AN INTRODUCTION TO EXECUTION PLAN OF QUERIES These slides have been adapted from a presentation originally made by ORACLE. The full set of original slides.
The query processor does what the query plan tells it to do A “good” query plan is essential for a well- performing.
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
Executing Explain Plans and Explaining Execution Plans Craig Martin 01/20/2011.
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
Oracle Database Administration Lecture 6 Indexes, Optimizer, Hints.
Getting SQL Right the First Try (Most of the Time!) May, 2008 ©2007 Dan Tow, All rights reserved SingingSQL Presents.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
The Model Clause explained Tony Hasler, UKOUG Birmingham 2012 Tony Hasler, Anvil Computer Services Ltd.
Data Warehouse and the Star Schema CSCI 242 ©Copyright 2015, David C. Roberts, all rights reserved.
11-1 Improve response time of interactive programs. Improve batch throughput. To ensure scalability of applications load vs. performance. Reduce system.
1 Chapter 14 DML Tuning. 2 DML Performance Fundamentals DML Performance is affected by: – Efficiency of WHERE clause – Amount of index maintenance – Referential.
C-Store: How Different are Column-Stores and Row-Stores? Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 8, 2009.
M1G Introduction to Database Development 2. Creating a Database.
M1G Introduction to Database Development 5. Doing more with queries.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Oracle tuning: a tutorial Saikat Chakraborty. Introduction In this session we will try to learn how to write optimized SQL statements in Oracle 8i We.
Module 4 Database SQL Tuning Section 3 Application Performance.
SQL Tuning 101 excerpt: Explain Plan A Logical Approach Michael Ruckdaschel Affinion Group International.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
1 Chapter 13 Parallel SQL. 2 Understanding Parallel SQL Enables a SQL statement to be: – Split into multiple threads – Each thread processed simultaneously.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Chapter 5 Index and Clustering
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
1 Chapter 9 Tuning Table Access. 2 Overview Improve performance of access to single table Explain access methods – Full Table Scan – Index – Partition-level.
More Optimization Exercises. Block Nested Loops Join Suppose there are B buffer pages Cost: M + ceil (M/(B-2))*N where –M is the number of pages of R.
Sorting and Joining.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
1 Indexes ► Sort data logically to improve the speed of searching and sorting operations. ► Provide rapid retrieval of specified rows from the table without.
Database Systems, 8 th Edition SQL Performance Tuning Evaluated from client perspective –Most current relational DBMSs perform automatic query optimization.
Execution Plans Detail From Zero to Hero İsmail Adar.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
How is data stored? ● Table and index Data are stored in blocks(aka Page). ● All IO is done at least one block at a time. ● Typical block size is 8Kb.
Tuning Oracle SQL The Basics of Efficient SQL Common Sense Indexing
Tuning Transact-SQL Queries
Query Optimization Techniques
Advanced Indexes.
Query Tuning without Production Data
Optimizing SQL Queries
Introduction to Execution Plans
Chapter 15 QUERY EXECUTION.
Examples of Physical Query Plan Alternatives
Implementation of Relational Operations (Part 2)
Introduction to Execution Plans
Introduction to the Optimizer
Introduction to Execution Plans
Query Transformations
All about Indexes Gail Shaw.
Who Am I Accenture Enkitec Group Performance tuning Book Oak Table
Presentation transcript:

Star Transformations Tony Hasler, UKOUG Birmingham 2012 Tony Hasler, Anvil Computer Services Ltd.

Who is Tony Hasler? Google Tony Hasler! My blog contains all the material from this presentation on the front page Tony Hasler Acknowledgments SQL Tuning by Dan Tow Optimizer team blog: (Sunil Chakkappen 2010) Examples Based on Enterprise Edition

Problem statement: Distributed Filters Generally, we want to access very large tables using an index on the most selective filter we can find. −small (ish) tables may be accessed by a full-table scan using multi-block reads −Partitioning by column X is an alternative to creating an index on X (and a local index on Y is an alternative to a multi-column index on X,Y) This is because we want to avoid reading data from a table only to reject it later on. The most selective filter may or may not be a join condition However, sometimes a combination of filters provides much stronger selectivity than any one filter. When combinations of filters involve join conditions they are called Distributed Filters (by Dan Tow). Tony Hasler

Query using SH example schema SELECT prod_name,cust_first_name, time_id, amount_sold FROM customers c, products p, sales s WHERE s.cust_id = c.cust_id AND s.prod_id = p.prod_id AND c.cust_last_name = 'Everett' AND p.prod_category = 'Electronics'; NOTE: For convenience I will refer to the SALES table as a FACT table and the CUSTOMERS and PRODUCTS tables as DIMENSION tables but star transformations are NOT restricted to star schemas! Tony Hasler

What join order do we want? There are 918,843 rows in the SALES table 116,267 of these 918,843 rows match the product filter 740 of the 918,843 rows match the customer filter 115 rows match both customer and product filters If we begin by joining the SALES table with either PRODUCTS or CUSTOMERS we end up reading far too many rows from SALES; over 80% of the rows read from SALES will subsequently be discarded. If we join PRODUCTS and SALES to start with we end up with 1,040 rows making even a multi-column index expensive. Tony Hasler

What join order do we want? Tony Hasler Product (13) Sales ( 116,267 /918,843 ) Customers (115) Product (115) Sales ( 740 /918,843 ) Customers (80) Product (1,040) Sales ( 115 /918,843 ) They are all bad!

Wouldn’t it be nice if…. Tony Hasler Product (13) SALES_PROD_IDX ( 116,267 /918,843 ) Customers (115) SALES_CUST_IDX ( 740 /918,843 ) Merged bitmap (115/918,843) SALES ( 115/918,843 )

Rewrite attempt 1: SELECT time_id, amount_sold FROM sales WHERE cust_id = 4117 AND prod_id = 20; This query returns too few rows because we are only picking one customer and one product This query returns too few columns because we don’t have the details from the dimension tables. Tony Hasler

Enterprise edition execution plan | Id | Operation | Name | | 0 | SELECT STATEMENT | | | 1 | PARTITION RANGE ALL | | | 2 | TABLE ACCESS BY LOCAL INDEX ROWID| SALES | | 3 | BITMAP CONVERSION TO ROWIDS | | | 4 | BITMAP AND | | |* 5 | BITMAP INDEX SINGLE VALUE | SALES_CUST_BIX | |* 6 | BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX | Here we see selection filters being combined in a straightforward way. Tony Hasler

Rewrite attempt 2: SELECT time_id, amount_sold FROM sales s WHERE s.cust_id IN (SELECT cust_id FROM customers WHERE cust_last_name = 'Everett') AND s.prod_id IN (SELECT prod_id FROM products WHERE prod_category = 'Electronics'); This query returns fewer columns than the original query because the columns from the dimension tables are missing This query returns the same number of rows as the original query providing that CUST_ID is a unique key in CUSTOMERS and PROD_ID is unique within PRODUCTS. Tony Hasler

Enterprise edition execution plan | Id | Operation | Name | | 0 | SELECT STATEMENT | | | 1 | VIEW | VW_ST_EABF22F5 | | 2 | NESTED LOOPS | | | 3 | PARTITION RANGE ALL | | | 4 | BITMAP CONVERSION TO ROWIDS| | | 5 | BITMAP AND | | | 6 | BITMAP MERGE | | | 7 | BITMAP KEY ITERATION | | | 8 | BUFFER SORT | | |* 9 | TABLE ACCESS FULL | CUSTOMERS | |* 10 | BITMAP INDEX RANGE SCAN| SALES_CUST_BIX | | 11 | BITMAP MERGE | | | 12 | BITMAP KEY ITERATION | | | 13 | BUFFER SORT | | |* 14 | VIEW | index$_join$_051 | |* 15 | HASH JOIN | | |* 16 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX | | 17 | INDEX FAST FULL SCAN| PRODUCTS_PK | |* 18 | BITMAP INDEX RANGE SCAN| SALES_PROD_BIX | | 19 | TABLE ACCESS BY USER ROWID | SALES | Tony Hasler

Star transformations make an appearance Multi-column indexes, and the INDEX_COMBINE (and the deprecated AND_EQUAL) hints can be used to deal with combined selection filters Star transformations are used to solve the distributed filter problem when at least one of the filter predicates is a join condition with another table Requires enterprise edition and the initialisation parameter set to TRUE or TEMP_DISABLE They are frequently used with “star” or “snowflake” schemas, hence the name, but can be used elsewhere BITMAP KEY ITERATION is the signature of a star transformation BITMAP KEY ITERATION runs a loop performing lookups on indexes (usually but not necessarily bitmap indexes) The BITMAP MERGE operation merges the output of the loop In the example, the BITMAP AND operation combines the results of two BITMAP KEY ITERATION operations Tony Hasler

Rewrite 3: Getting our missing columns back WITH q1 AS (SELECT /*+ no_merge */ time_id, amount_sold, prod_id, cust_id FROM sales s WHERE s.cust_id IN (SELECT cust_id FROM customers WHERE cust_last_name = 'Everett') AND s.prod_id IN (SELECT prod_id FROM products WHERE prod_category = 'Electronics')) SELECT /*+ leading(p s c) use_hash(c) use_hash(p) swap_join_inputs(c) no_swap_join_inputs(s) */ prod_name,cust_first_name,time_id,amount_sold FROM customers c, products p, q1 s WHERE s.cust_id = c.cust_id AND s.prod_id = p.prod_id AND p.prod_category = 'Electronics'; Note: The hints and the last predicate are for demonstration purposes only Tony Hasler

Execution plan for rewrite 3: | Id | Operation | Name | | 0 | SELECT STATEMENT | | |* 1 | HASH JOIN | | | 2 | TABLE ACCESS FULL | CUSTOMERS | |* 3 | HASH JOIN | | | 4 | TABLE ACCESS BY INDEX ROWID | PRODUCTS | |* 5 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX | | 6 | VIEW | | | 7 | VIEW | VW_ST_EABF22F5 | | 8 | NESTED LOOPS | | | 9 | PARTITION RANGE ALL | | | 10 | BITMAP CONVERSION TO ROWIDS| | | 11 | BITMAP AND | | | 12 | BITMAP MERGE | | | 13 | BITMAP KEY ITERATION | | | 14 | BUFFER SORT | | |* 15 | TABLE ACCESS FULL | CUSTOMERS | |* 16 | BITMAP INDEX RANGE SCAN| SALES_CUST_BIX | | 17 | BITMAP MERGE | | | 18 | BITMAP KEY ITERATION | | | 19 | BUFFER SORT | | |* 20 | VIEW | index$_join$_060 | |* 21 | HASH JOIN | | |* 22 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX | | 23 | INDEX FAST FULL SCAN| PRODUCTS_PK | |* 24 | BITMAP INDEX RANGE SCAN| SALES_PROD_BIX | | 25 | TABLE ACCESS BY USER ROWID | SALES | Tony Hasler Same as rewrite 2

Thoughts so far? Do we really need to rewrite our simple query in such a complex way? Answer: If you set STAR_TRANSFORMATION_ENABLED=TEMP_DISABLE then the un-hinted execution plan for the original query is (almost) identical to that of rewrite 3!!! But why do we need to access the CUSTOMERS table twice? Couldn’t we cache the columns from the select list first time round? Answer: You will cache if STAR_TRANSFORMATION_ENABLED=TRUE Tony Hasler

Execution plan for original query when STAR_TRANSFORMATION =TRUE | Id | Operation | Name | | 0 | SELECT STATEMENT | | | 1 | TEMP TABLE TRANSFORMATION | | | 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6609_38DC9C | |* 3 | TABLE ACCESS FULL | CUSTOMERS | |* 4 | HASH JOIN | | | 5 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6609_38DC9C | |* 6 | HASH JOIN | | | 7 | TABLE ACCESS BY INDEX ROWID | PRODUCTS | |* 8 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX | | 9 | VIEW | VW_ST_62EEF96F | | 10 | NESTED LOOPS | | | 11 | PARTITION RANGE ALL | | | 12 | BITMAP CONVERSION TO ROWIDS| | | 13 | BITMAP AND | | | 14 | BITMAP MERGE | | | 15 | BITMAP KEY ITERATION | | | 16 | BUFFER SORT | | | 17 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6609_38DC9C | |* 18 | BITMAP INDEX RANGE SCAN| SALES_CUST_BIX | | 19 | BITMAP MERGE | | | 20 | BITMAP KEY ITERATION | | | 21 | BUFFER SORT | | |* 22 | VIEW | index$_join$_051 | |* 23 | HASH JOIN | | |* 24 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX | | 25 | INDEX FAST FULL SCAN| PRODUCTS_PK | |* 26 | BITMAP INDEX RANGE SCAN| SALES_PROD_BIX | | 27 | TABLE ACCESS BY USER ROWID | SALES | Tony Hasler

So is that it? Star transformations are only a good idea if the selectivity improvement outweighs the cost of the bitmap operations. Star transformations may occasionally be used in mixed OLTP environments by converting b-tree indexes to bitmaps But execution plans using star transformations in a data warehouse can almost always be optimized further: −Denormalize: e.g. adding columns from CUSTOMERS and PRODUCTS to the SALES table. You can then create multi- column indexes or use INDEX_COMBINE operations −Or if you don’t like wasting disk space…. Tony Hasler

Bitmap join indexes ALTER TABLE products MODIFY CONSTRAINT products_pk VALIDATE; ALTER TABLE customers MODIFY CONSTRAINT customers_pk VALIDATE; CREATE BITMAP INDEX sales_prod_category_bjx ON sales (p.prod_category) FROM products p, sales s WHERE s.prod_id = p.prod_id LOCAL; CREATE BITMAP INDEX sales_cust_ln_bjx ON sales (c.cust_last_name) FROM customers c, sales s WHERE c.cust_id = s.cust_id LOCAL; Tony Hasler

Execution plan with bitmap join indexes | Id | Operation | Name | | 0 | SELECT STATEMENT | | | 1 | NESTED LOOPS | | | 2 | NESTED LOOPS | | | 3 | HASH JOIN | | | 4 | TABLE ACCESS BY INDEX ROWID | PRODUCTS | | 5 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX | | 6 | PARTITION RANGE ALL | | | 7 | TABLE ACCESS BY LOCAL INDEX ROWID| SALES | | 8 | BITMAP CONVERSION TO ROWIDS | | | 9 | BITMAP AND | | | 10 | BITMAP INDEX SINGLE VALUE | SALES_CUST_LN_BJX | | 11 | BITMAP INDEX SINGLE VALUE | SALES_PROD_CATEGORY_BJX | | 12 | INDEX UNIQUE SCAN | CUSTOMERS_PK | | 13 | TABLE ACCESS BY INDEX ROWID | CUSTOMERS | Tony Hasler

What about standard edition users???? CREATE TABLE sales2 AS SELECT * FROM sales; CREATE INDEX sales2_cust_id_idx ON sales2 (cust_id); CREATE INDEX sales2_prod_id_idx ON sales2 (prod_id); Tony Hasler

Standard edition alternative WITH q1 AS (SELECT /*+ no_merge */ s.ROWID rid FROM customers c, sales2 s WHERE s.cust_id = c.cust_id AND c.cust_last_name = 'Everett'),q2 AS (SELECT /*+ no_merge */ s.ROWID rid FROM products p, sales2 s WHERE s.prod_id = p.prod_id AND prod_category = 'Electronics') SELECT /*+ no_star_transformation leading(q1 q2 s) use_nl(s) use_nl(p) use_nl(c) */ prod_name,cust_first_name,time_id,amount_sold FROM q1,q2,sales2 s,products p,customers c WHERE q1.rid = q2.rid AND q1.rid = s.ROWID AND s.cust_id = c.cust_id AND s.prod_id = p.prod_id ; Tony Hasler

Standard edition execution plan (simulated) | Id | Operation | Name | | 0 | SELECT STATEMENT | | | 1 | NESTED LOOPS | | | 2 | NESTED LOOPS | | | 3 | NESTED LOOPS | | | 4 | NESTED LOOPS | | |* 5 | HASH JOIN | | | 6 | VIEW | | | 7 | NESTED LOOPS | | |* 8 | TABLE ACCESS FULL | CUSTOMERS | |* 9 | INDEX RANGE SCAN | SALES2_CUST_ID_IDX | | 10 | VIEW | | | 11 | NESTED LOOPS | | |* 12 | VIEW | index$_join$_003 | |* 13 | HASH JOIN | | |* 14 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX | | 15 | INDEX FAST FULL SCAN | PRODUCTS_PK | |* 16 | INDEX RANGE SCAN | SALES2_PROD_ID_IDX | | 17 | TABLE ACCESS BY USER ROWID| SALES2 | |* 18 | TABLE ACCESS BY INDEX ROWID| PRODUCTS | |* 19 | INDEX UNIQUE SCAN | PRODUCTS_PK | |* 20 | INDEX UNIQUE SCAN | CUSTOMERS_PK | | 21 | TABLE ACCESS BY INDEX ROWID | CUSTOMERS | Tony Hasler

Some final pieces of trivia.. The deprecated STAR hint has nothing to do with star transformations….DON’T USE IT. STAR_TRANSFORMATION and NO_STAR_TRANSFORMATION hints can be used to control star transformations. The FACT hint can be used to specify the table to which the star transformation is applied (although it is usually obvious). The fact table can also be supplied as an argument to the STAR_TRANSFORMATION hint (undocumented, unsupported etc.) Enabled integrity constraints are not strictly required on the dimension table(s). However, if they are absent an extra join may be seen to ensure the row count is accurate. Enabled and validated Integrity constraints are required on dimension tables used in bitmap join indexes. Bitmaps from BITMAP KEY ITERATION operations can be combined with any other bitmaps including those from bitmap join indexes Tony Hasler

Summary Star transformations are used to solve the problem of distributed filters when at least one of the filters is a join predicate Star transformations can be recognised by the presence of the BITMAP KEY ITERATION operation in an execution plan. Star transformations are usually found using bitmap indexes in “star” schemas in a data warehouse but can occasionally be useful in mixed OLTP environments with traditional b-tree indexes and in non-star schemas. Star transformations, like all bitmap combination operations, relies on the fact that the cost of the operation is outweighed by increased selectivity. STAR_TRANSFORMATION_ENABLED=TEMP_DISABLE is to avoid bugs but should rarely be needed these days. Performance of queries using star transformations in a data warehouse can almost always be improved by the use of bitmap join indexes. Tony Hasler

Questions Tony Hasler