Teradata Join Processing

Slides:

Advertisements

Similar presentations

Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,

Advertisements

What is a Database By: Cristian Dubon.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.

Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.

 Database is SQL1.mdb ◦ import using MySQL Migration Toolkit 

EXECUTION PLANS By Nimesh Shah, Amit Bhawnani. Outline  What is execution plan  How are execution plans created  How to get an execution plan  Graphical.

Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.

Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.

David Konopnicki Choosing Access Path ä The basic methods. ä The access paths and when they are available. ä How the optimizer chooses among the.

Physical Database Monitoring and Tuning the Operational System.

Virtual techdays INDIA │ 9-11 February 2011 SQL 2008 Query Tuning Praveen Srivatsa │ Principal SME – StudyDesk91 │ Director, AsthraSoft Consulting │ Microsoft.

Database Systems: Design, Implementation, and Management Eighth Edition Chapter 8 Advanced SQL.

Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.

Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.

1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.

Access Path Selection in a Relation Database Management System (summarized in section 2)

Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.

Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.

Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.

Oracle Database Administration Lecture 6 Indexes, Optimizer, Hints.

Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.

Module 7 Reading SQL Server® 2008 R2 Execution Plans.

Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.

Query optimization in relational DBs Leveraging the mathematical formal underpinnings of the relational model.

Ashwani Roy Understanding Graphical Execution Plans Level 200.

Primary Key, Cluster Key & Identity Loop, Hash & Merge Joins Joe Chang

Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.

Copyright © Curt Hill Query Evaluation Translating a query into action.

Online aggregation Joseph M. Hellerstein University of California, Berkley Peter J. Haas IBM Research Division Helen J. Wang University of California,

Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.

Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.

CSCE Database Systems Chapter 15: Query Execution 1.

Last Updated : 27 th April 2004 Center of Excellence Data Warehousing Group Teradata Performance Optimization.

David Konopnicki –1997, Rev. MS Optimizing Join Statements To choose an execution plan for a join statement, the optimizer must choose: ä Access.

Query Processing – Implementing Set Operations and Joins Chap. 19.

Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.

Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.

Query Processing and Query Optimization CS 157B Dennis Le Weishan Wang.

Thinking in Sets and SQL Query Logical Processing.

7 1 Database Systems: Design, Implementation, & Management, 7 th Edition, Rob & Coronel 7.6 Advanced Select Queries SQL provides useful functions that.

Fan Qi Database Lab 1, com1 #01-08 CS3223 Tutorial 5.

CSE 326: Data Structures Lecture #22 Databases and Sorting Alon Halevy Spring Quarter 2001.

CS4432: Database Systems II Query Processing- Part 1 1.

Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.

Last Updated : 27 th April 2004 Center of Excellence Data Warehousing Group Teradata Physical Database Design Considerations.

SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.

Tuning Oracle SQL The Basics of Efficient SQL Common Sense Indexing

Chiu Luk CS257 Database Systems Principles Spring 2009

Indexes By Adrienne Watt.

Chapter 12 Subqueries and MERGE Oracle 10g: SQL

Prepared by : Moshira M. Ali CS490 Coordinator Arab Open University

Database Management System

Database Systems: Design, Implementation, and Management Tenth Edition

Choosing Access Path The basic methods.

Chapter 12: Query Processing

Database Performance Tuning and Query Optimization

COST ESTIMATION FOR THE RELATIONAL ALGEBRA OPERATIONS MIT 813 GROUP 15 PRESENTATION.

Teradata Physical Implementation – Case Study

Chapter 15 QUERY EXECUTION.

Evaluation of Relational Operations: Other Operations

Physical Join Operators

Database Query Execution

(Two-Pass Algorithms)

Implementation of Relational Operations

Chapter 8 Advanced SQL.

Chapter 11 Database Performance Tuning and Query Optimization

Evaluation of Relational Operations: Other Techniques

Diving into Query Execution Plans

Database Systems: Design, Implementation, and Management Tenth Edition

Evaluation of Relational Operations: Other Techniques

Presentation transcript:

Teradata Join Processing Center of Excellence Data Warehousing Wipro Technologies

Join Processing Rows to be joined must be on the same AMP. For join processing, copies of some or all of the rows may have to be moved to a common AMP. Join plans Product join. Merge join Nested join

Join Processing General scenarios: Join column is the PI of both the tables. Join column is PI of one of the tables. Join column is not a PI of either of the table.

Case 1- PI of both the tables Rows taking part in the join are already in the same AMP. No data movement is necessary. Rows are already in sorted order (within the block) This is the best case scenario.

Case 2 - PI of one of the tables One table has its rows on the target AMP. Rows of the other table need to be redistributed to their target AMPs by the hash code of the join column value. If the table is small optimizer may choose to duplicate the table on all AMPs

Case 3 - not a PI of either of the table Rows of both the tables need to redistributed to their target AMPs by the hash code of the join column value. Optimizer might choose to duplicate the smaller table on all AMPs. This join scenario involves maximum number of data movement.

Nested Join Optimizer choose this join strategy when An equality value for a unique index (UPI or USI) on table 1. A join on a column of that single row to any index on table 2. This joining uses minimum system resource PI USI NUSI 2 AMPs 3 AMPs 4 AMPs ALL AMPs 1 OR MORE ROWS RETURNED 1 ROW RETURNED UPI , data column USI , data column data value =

Product Join Most general for of join Optimizer chooses product join in following conditions WHERE clause is missing. Join condition is not based on equality condition. Join conditions are ORed together. Table alias are incorrectly used. Optimizer determines that it is less expensive than other join types. Identify the smaller table duplicate it in spool on all AMPs. Join each spool row of the smaller table to every row of the larger table.

Merge Join Commonly done when the join conditions are based on equality. Generally more efficient than Product Join as number of row comparisons are less. Steps Identify the smaller table. Put the qualifying rows from one or both table into spool. Move the spool rows to the AMPs based on join column hash (if required). Sort the spool rows by join column hash value (if necessary). Compare those rows with matching join column hash values.

Merge Join Row Hash Col1 Col2…. 110A 111B 203C 110E Row Hash Col1 110A 210D

Example Table 1 Table 2 Col1 (PK) Col2 Col3 (FK) 100 P 600 200 Q 300 R 700 400 S 500 T X Y 800 Z 900 A 1000 B 2000 C 3000 D 4000 E Col1 (PK) Col2…… 100 K 200 L 300 M 400 N 500 O 600 P 700 Q 800 R

Example 100 P 600 800 Z 500 1000 B 300 100 K 800 R 400 S 200 700 Y 300 2000 C 300 4000 E 200 400 N 700 Q 300 R 700 600 X 200 900 A 800 300 M 600 P 200 Q 600 500 T 500 3000 D 300 200 L 500 O

Row Distribution Strategy 1 No distribution needed. No sorting needed. Join columns of both the tables are PIs. Rows involved in the join are located in the same AMP.

Case 1 - Example SELECT * FROM Table1 t1 INNER JOIN Table2 t2 ON t1.Col1 = t2.Col1 100 P 600 800 Z 500 1000 B 300 100 K 800 R 400 S 200 700 Y 300 2000 C 300 4000 E 200 400 N 700 Q 300 R 700 600 X 200 900 A 800 300 M 600 P 200 Q 600 500 T 500 3000 D 300 200 L 500 O

Row Distribution Strategy 2 Distributing and sorting one of the table on join column row hash. Join column is PI of one of the tables. One of the tables is already distributed on join Column Row Hash. Optimizer redistributes one of the tables and sort on join column row hash.

Case 2 – Example SELECT * FROM Table1 t1 INNER JOIN Table2 t2 ON t1.Col3 = t2.Col1 100 P 600 800 Z 500 1000 B 300 100 K 800 R 400 S 200 700 Y 300 2000 C 300 4000 E 200 400 N 700 Q 300 R 700 600 X 200 900 A 800 300 M 600 P 200 Q 600 500 T 500 3000 D 300 200 L 500 O 900 A 800 100 K 800 R 300 R 700 400 N 700 Q 1000 B 300 3000 D 300 Y 300 2000 C 300 Q 600 100 P 600 300 M 600 P 600 X 200 S 200 4000 E 200 Z 500 500 T 500 200 L 500 O S P OO L

Row Distribution Strategy 3 Duplicating and sorting the smaller table on all AMPs and locally building the larger table and sorting it. Optimizer considers this strategy if it finds redistributing a larger table is more expensive than duplicating a the smaller table.

Case 2 – Example S P O L 100 P 600 800 Z 500 1000 B 300 100 K 800 R 700 Y 300 2000 C 300 4000 E 200 400 N 700 Q 300 R 700 600 X 200 900 A 800 300 M 600 P 200 Q 600 500 T 500 3000 D 300 200 L 500 O 1000 B 300 100 P 600 800 Z 500 100 K 200 L 300 M 400 N 500 O 600 P 700 Q 800 R S 200 4000 E 200 700 Y 300 2000 C 300 600 X 200 300 R 700 900 A 800 3000 D 300 500 T 500 200 Q 600 S P O L

Row Distribution Strategy 4 Duplicate the smaller table on every AMP. Optimizer chooses this strategy the join condition is not based on equality. Product join scenario.

Explain Facility Provides an English translation of the steps chosen by the optimizer. Very helpful to estimate the performance of complex queries. Helps physical designers in their index selection by providing the execution strategy chosen by the optimizer.

Explaining the EXPLAIN Generally EXPLAIN outputs are clear and easy to understand however it contains few phrases one needs to be familiar with. “….with no residual conditions…” : There is no residual conditions other than the conditions used locate the row. “..eliminating duplicates..” : DISTINCT operation being done. “…we do a SMS…” : Set manipulations like UNION, EXCEPT are being done. “…we do a BMSMS…” : NUSI Bit mapping being used. “…distributed by hash code to all AMPs…” “…duplicated on all AMPs…”

Statistics Optimizer needs demographic information to create best execution plan for a query. Number of rows in the table. Row size. Number of rows per value. Index information and demographics. Based on the statistics optimizer estimates the cost and creates the best plan. Statistics must be collected for the columns and indexes being accessed frequently. If Statistics are not provided, optimizer does Dynamic Sampling (Random AMP).

Questions ?