1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.

Slides:



Advertisements
Similar presentations
Tuning Oracle SQL The Basics of Efficient SQLThe Basics of Efficient SQL Common Sense Indexing The Optimizer –Making SQL Efficient Finding Problem Queries.
Advertisements

Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Chapter 4 Joining Multiple Tables
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
EXECUTION PLANS By Nimesh Shah, Amit Bhawnani. Outline  What is execution plan  How are execution plans created  How to get an execution plan  Graphical.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
1 Chapter 10 Query Processing: The Basics. 2 External Sorting Sorting is used in implementing many relational operations Problem: –Relations are typically.
David Konopnicki Choosing Access Path ä The basic methods. ä The access paths and when they are available. ä How the optimizer chooses among the.
ICS (072)Query Processing and Optimization 1 Chapter 15 Algorithms for Query Processing and Optimization ICS 424 Advanced Database Systems Dr.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
8-1 Outline  Overview of Physical Database Design  File Structures  Query Optimization  Index Selection  Additional Choices in Physical Database Design.
Database Systems More SQL Database Design -- More SQL1.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Inner join, self join and Outer join Sen Zhang. Joining data together is one of the most significant strengths of a relational database. A join is a query.
Access Path Selection in a Relation Database Management System (summarized in section 2)
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Oracle Database Administration Lecture 6 Indexes, Optimizer, Hints.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Lecture 2 of Advanced Databases Advanced SQL Instructor: Mr.Ahmed Al Astal.
Chapter 9 Joining Data from Multiple Tables
SQL advanced select using Oracle 1 7. Multiple Tables: Joins and Set Operations 8. Subqueries: Nested Queries.
A Guide to MySQL 5. 2 Objectives Use joins to retrieve data from more than one table Use the IN and EXISTS operators to query multiple tables Use a subquery.
1 Chapter 7 Optimizing the Optimizer. 2 The Oracle Optimizer is… About query optimization Is a sophisticated set of algorithms Choosing the fastest approach.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
Data Partitioning in VLDB Tal Olier
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
1 Chapter 14 DML Tuning. 2 DML Performance Fundamentals DML Performance is affected by: – Efficiency of WHERE clause – Amount of index maintenance – Referential.
DATABASE TRANSACTION. Transaction It is a logical unit of work that must succeed or fail in its entirety. A transaction is an atomic operation which may.
Star Transformations Tony Hasler, UKOUG Birmingham 2012 Tony Hasler, Anvil Computer Services Ltd.
Chapter 4Introduction to Oracle9i: SQL1 Chapter 4 Joining Multiple Tables.
SQL Performance and Optimization l SQL Overview l Performance Tuning Process l SQL-Tuning –EXPLAIN PLANs –Tuning Tools –Optimizing Table Scans –Optimizing.
Module 4 Database SQL Tuning Section 3 Application Performance.
SQL advanced select using Oracle 1. 2 Select Simple –data from a single table Advanced –data from more tables join sub-queries.
SQL/Lesson 7/Slide 1 of 32 Implementing Indexes Objectives In this lesson, you will learn to: * Create a clustered index * Create a nonclustered index.
1 Chapter 13 Parallel SQL. 2 Understanding Parallel SQL Enables a SQL statement to be: – Split into multiple threads – Each thread processed simultaneously.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Chapter 4 Logical & Physical Database Design
Chapter 5 Index and Clustering
A Guide to SQL, Eighth Edition Chapter Five Multiple-Table Queries.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
Advance Database Systems Query Optimization Ch 15 Department of Computer Science The University of Lahore.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
1 Chapter 9 Tuning Table Access. 2 Overview Improve performance of access to single table Explain access methods – Full Table Scan – Index – Partition-level.
Sorting and Joining.
In this session, you will learn to: Query data by using joins Query data by using subqueries Objectives.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
Database Programming Sections 6 –Subqueries, Single Row Subqueries, Multiple-row Subqueries, Correlated Subqueries.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
Database Systems, 8 th Edition SQL Performance Tuning Evaluated from client perspective –Most current relational DBMSs perform automatic query optimization.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
Tuning Oracle SQL The Basics of Efficient SQL Common Sense Indexing
CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.
Prepared by : Moshira M. Ali CS490 Coordinator Arab Open University
Database Management System
Choosing Access Path The basic methods.
Database Performance Tuning and Query Optimization
Evaluation of Relational Operations
Physical Database Design
Practical Database Design and Tuning
Advance Database Systems
Chapter 11 Database Performance Tuning and Query Optimization
Chapter 17 Designing Databases
A – Pre Join Indexes.
Presentation transcript:

1 Chapter 10 Joins and Subqueries

2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based on Algorithms used Knowledge of data Subqueries – Complex by nature – Difficult for Optimizer to determine best plan

3 Types of Joins Equi-join (equality condition – i.e. “-”) Non-equi or Theta (non-equality – e.g. “<>”, between) Cross (Cartesian – i.e. no join condition) Outer (joining data not matching in other table) – Left – Right – Full Self (joining table to itself) Hierarchical (type of self-join) Anti (rows from one table without match from other) Semi (only one row from matching table returned)

4 Join Methods Nested Loops – Performing search of inner table for each row found in outer table – Optimizer will choose only if index exists on inner table – Nested table scan – scan of entire inner table for each outer table row if no index on inner table – Generally least effective join method Sort-Merge – Each table sorted by value of the join columns – After sort, data merged – Best when Large amount of data needed No index on inner table

5 Join Methods (cont.) Hash – Hash table built for one of the tables – Hash table used to find matching rows in other table – Also good for large amounts of data – Can be similar in performance to sort-merge

6 Choosing Join Method See Table 10-1 (p. 296) Sort-Merge/Hash vs. Nested Loops – Nested Loops Better response time Smaller amounts of data Indexes needed – Sort-Merge/Hash Better throughput Larger amounts of data More memory needed for sorting or building hash table Better with parallel operations (especially Hash)

7 Choosing Join Method (cont.) Sort-Merge vs. Hash – Hash Generally performs better than Sort-Merge Only applicable to equi-joins Only has to process all of one table (creating the hash table) – Sort-Merge Applies to more situations than Hash Applies to equi and non-equi joins Both tables processed (sorted) More memory and CPU generally needed Outperforms Hash if data is pre-sorted

8 Choosing Join Method (cont.) Sort-Merge vs. Hash – Hash Generally performs better than Sort-Merge Only applicable to equi-joins Only has to process all of one table (creating the hash table) – Sort-Merge Applies to more situations than Hash Applies to equi and non-equi joins Both tables processed (sorted) More memory and CPU generally needed Outperforms Hash if data is pre-sorted

9 Choosing Join Method (cont.) When Joining A to B – Both are small – Small subset from B – Want first rows quickly – Want all rows quickly – FTS of A / parallelism – Limited memory NL – Depends – Yes – Depends – Yes, if.. – Yes SM/Hash – Yes – No – Depends – Yes – Maybe not

10 Optimizing Nested Loops Joins Nested Loops – Ensure index is on inner table – Join column is selective(low cardinality) Sort-Merge & Hash – Needs enough memory in PGA to perform well – Best if entire structure constructed in memory Avoid “multi-pass” operations to disk – Sort-Merge is the most resource intensive Two sorted tables Merge operation

11 Avoiding Joins Maintaining denormalized data from one table to another – Requires application process to copy data – Data integrity needs to be carefully maintained Storing tables in index cluster – Reduces IO by combining into single segment – SIZE parameter must be set appropriately – FTS operations still slow – Rarely Used Creating Materialized Views Create bitmap join index

12 Avoiding Joins (cont.) Creating Materialized Views – Allows transparent query rewrite – Keeps transaction data in log tables – Avoid join overhead for frequently used queries Create bitmap join index – Efficient method of matching values between indexes – Higher frequency of locking can occur

13 Join Order Optimizer calculates join possibilities – Factorial of number of tables being joined – Only two tables joined in single operation – Temporary result sets created for three or more tables – Let optimizer decide join order, but.. Ensure statistics are current Create histograms where appropriate

14 Join Order (cont.) If you don’t trust the optimizer – The driving table (first table in join) Should be most selective Should have most efficient WHERE clause – Eliminate rows from final result set as early as possible during join operations Try to process filtering conditions early on in the join – For small tables with indexes Use nested loops join Ensure all columns of WHERE clause are indexed

15 Outer Joins Rows returned from one table in a join, even if there is no matching rows in the other table Three types – Left Outer Join (rows missing from one table) – Right Outer Join (rows missing from one table) – Full Outer join (shows rows missing from both tables) Optimizer joins table with missing rows last Specified with – Proprietary oracle syntax (+) – ANSI syntax (e.g. LEFT OUTER JOIN, etc.) Inner Join – Shows only matching rows from both tables – This is the “default”

16 Star Joins Common in the data warehouse Star schema consists of – Large Fact table containing detailed rows and foreign keys – Dimension tables categorizes fact items (e.g. time, product, etc.) Oracle’s default approach is to: – Query all dimensions to retrieve foreign key values – Merge dimension result sets using Cartesian join – Resulting foreign keys used to identify fact table rows Requires many concatenated indexes

17 Star Transformation Cartesian join approach has drawbacks – Assumes small dimension tables, which may not be true – Concatenated index requirements across all dimension keys may not be practical Oracle created “Star Transformation” optimization – Uses bitmap indexes on fact table – Requires setting parameter STAR_TRANSFORMATION_ENABLED=TRUE – Also can use OPT_PARAM hint – Can validate star transformation via the execution plan – Easier to configure and manage – Supports widest range of possible WHERE clause conditions – Possible lock overhead with bitmap indexes still applies

18 Hierarchical Joins Special case of self-join Column in table points to the primary key of another row in the same table Next row points to a further row and so on Cascading effect Avoid indexes in execution plan

19 Subqueries Is a SELECT statement contained within another SQL Statement Types include – Simple – Correlated – Anti-join – Semi-join

20 Simple Subqueries Inner query makes no reference to parent query Example to find employees with lowest salary SELECT COUNT(*) FROM employees WHERE salary = (SELECT MIN (salary) FROM employees); Each query can and should be tuned independently Generally use more resources than running queries separately within a program

21 Correlated Subqueries Subquery refers to values in the parent query Subquery is logically executed once for each row returned by the parent query Usually accomplished via a join method SELECT employee_id, first_name, last_name, salary FROM employees a WHERE salary = (SELECT MIN (salary) FROM employees b WHERE b.department_id = a.department_id); Can generate inefficient plans Consider rewriting as joins or using analytic functions

22 Anti-join Subqueries As named, is the opposite of a join – Returns rows in one table that do not match rows from another – Expressed with ‘NOT IN’ or ‘NOT EXISTS’ subquery – Example: Google customers who are not Microsoft customers SELECT COUNT(*) FROM google_customers WHERE (cust_first_name, cust_last_name) NOT IN (SELECT cust_first_name, cust_last_name) FROM microsoft_customers) Optimizer generally uses HASH JOIN ANTI method May be beneficial to add index to subquery table Avoid NOT IN unless join keys are NOT NULL

23 Semi-join Subqueries Expressed as ‘WHERE IN’ or ‘WHERE EXISTS’ subquery SELECT COUNT(*) FROM google_customers WHERE (cust_first_name, cust_last_name) IN (SELECT cust_first_name, cust_last_name) FROM microsoft_customers) Returns rows from first table only once – Even if more than one matching rows in second table