Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.

Similar presentations


Presentation on theme: "1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based."— Presentation transcript:

1 1 Chapter 10 Joins and Subqueries

2 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based on Algorithms used Knowledge of data Subqueries – Complex by nature – Difficult for Optimizer to determine best plan

3 3 Types of Joins Equi-join (equality condition – i.e. “-”) Non-equi or Theta (non-equality – e.g. “<>”, between) Cross (Cartesian – i.e. no join condition) Outer (joining data not matching in other table) – Left – Right – Full Self (joining table to itself) Hierarchical (type of self-join) Anti (rows from one table without match from other) Semi (only one row from matching table returned)

4 4 Join Methods Nested Loops – Performing search of inner table for each row found in outer table – Optimizer will choose only if index exists on inner table – Nested table scan – scan of entire inner table for each outer table row if no index on inner table – Generally least effective join method Sort-Merge – Each table sorted by value of the join columns – After sort, data merged – Best when Large amount of data needed No index on inner table

5 5 Join Methods (cont.) Hash – Hash table built for one of the tables – Hash table used to find matching rows in other table – Also good for large amounts of data – Can be similar in performance to sort-merge

6 6 Choosing Join Method See Table 10-1 (p. 296) Sort-Merge/Hash vs. Nested Loops – Nested Loops Better response time Smaller amounts of data Indexes needed – Sort-Merge/Hash Better throughput Larger amounts of data More memory needed for sorting or building hash table Better with parallel operations (especially Hash)

7 7 Choosing Join Method (cont.) Sort-Merge vs. Hash – Hash Generally performs better than Sort-Merge Only applicable to equi-joins Only has to process all of one table (creating the hash table) – Sort-Merge Applies to more situations than Hash Applies to equi and non-equi joins Both tables processed (sorted) More memory and CPU generally needed Outperforms Hash if data is pre-sorted

8 8 Choosing Join Method (cont.) Sort-Merge vs. Hash – Hash Generally performs better than Sort-Merge Only applicable to equi-joins Only has to process all of one table (creating the hash table) – Sort-Merge Applies to more situations than Hash Applies to equi and non-equi joins Both tables processed (sorted) More memory and CPU generally needed Outperforms Hash if data is pre-sorted

9 9 Choosing Join Method (cont.) When Joining A to B – Both are small – Small subset from B – Want first rows quickly – Want all rows quickly – FTS of A / parallelism – Limited memory NL – Depends – Yes – Depends – Yes, if.. – Yes SM/Hash – Yes – No – Depends – Yes – Maybe not

10 10 Optimizing Nested Loops Joins Nested Loops – Ensure index is on inner table – Join column is selective(low cardinality) Sort-Merge & Hash – Needs enough memory in PGA to perform well – Best if entire structure constructed in memory Avoid “multi-pass” operations to disk – Sort-Merge is the most resource intensive Two sorted tables Merge operation

11 11 Avoiding Joins Maintaining denormalized data from one table to another – Requires application process to copy data – Data integrity needs to be carefully maintained Storing tables in index cluster – Reduces IO by combining into single segment – SIZE parameter must be set appropriately – FTS operations still slow – Rarely Used Creating Materialized Views Create bitmap join index

12 12 Avoiding Joins (cont.) Creating Materialized Views – Allows transparent query rewrite – Keeps transaction data in log tables – Avoid join overhead for frequently used queries Create bitmap join index – Efficient method of matching values between indexes – Higher frequency of locking can occur

13 13 Join Order Optimizer calculates join possibilities – Factorial of number of tables being joined – Only two tables joined in single operation – Temporary result sets created for three or more tables – Let optimizer decide join order, but.. Ensure statistics are current Create histograms where appropriate

14 14 Join Order (cont.) If you don’t trust the optimizer – The driving table (first table in join) Should be most selective Should have most efficient WHERE clause – Eliminate rows from final result set as early as possible during join operations Try to process filtering conditions early on in the join – For small tables with indexes Use nested loops join Ensure all columns of WHERE clause are indexed

15 15 Outer Joins Rows returned from one table in a join, even if there is no matching rows in the other table Three types – Left Outer Join (rows missing from one table) – Right Outer Join (rows missing from one table) – Full Outer join (shows rows missing from both tables) Optimizer joins table with missing rows last Specified with – Proprietary oracle syntax (+) – ANSI syntax (e.g. LEFT OUTER JOIN, etc.) Inner Join – Shows only matching rows from both tables – This is the “default”

16 16 Star Joins Common in the data warehouse Star schema consists of – Large Fact table containing detailed rows and foreign keys – Dimension tables categorizes fact items (e.g. time, product, etc.) Oracle’s default approach is to: – Query all dimensions to retrieve foreign key values – Merge dimension result sets using Cartesian join – Resulting foreign keys used to identify fact table rows Requires many concatenated indexes

17 17 Star Transformation Cartesian join approach has drawbacks – Assumes small dimension tables, which may not be true – Concatenated index requirements across all dimension keys may not be practical Oracle created “Star Transformation” optimization – Uses bitmap indexes on fact table – Requires setting parameter STAR_TRANSFORMATION_ENABLED=TRUE – Also can use OPT_PARAM hint – Can validate star transformation via the execution plan – Easier to configure and manage – Supports widest range of possible WHERE clause conditions – Possible lock overhead with bitmap indexes still applies

18 18 Hierarchical Joins Special case of self-join Column in table points to the primary key of another row in the same table Next row points to a further row and so on Cascading effect Avoid indexes in execution plan

19 19 Subqueries Is a SELECT statement contained within another SQL Statement Types include – Simple – Correlated – Anti-join – Semi-join

20 20 Simple Subqueries Inner query makes no reference to parent query Example to find employees with lowest salary SELECT COUNT(*) FROM employees WHERE salary = (SELECT MIN (salary) FROM employees); Each query can and should be tuned independently Generally use more resources than running queries separately within a program

21 21 Correlated Subqueries Subquery refers to values in the parent query Subquery is logically executed once for each row returned by the parent query Usually accomplished via a join method SELECT employee_id, first_name, last_name, salary FROM employees a WHERE salary = (SELECT MIN (salary) FROM employees b WHERE b.department_id = a.department_id); Can generate inefficient plans Consider rewriting as joins or using analytic functions

22 22 Anti-join Subqueries As named, is the opposite of a join – Returns rows in one table that do not match rows from another – Expressed with ‘NOT IN’ or ‘NOT EXISTS’ subquery – Example: Google customers who are not Microsoft customers SELECT COUNT(*) FROM google_customers WHERE (cust_first_name, cust_last_name) NOT IN (SELECT cust_first_name, cust_last_name) FROM microsoft_customers) Optimizer generally uses HASH JOIN ANTI method May be beneficial to add index to subquery table Avoid NOT IN unless join keys are NOT NULL

23 23 Semi-join Subqueries Expressed as ‘WHERE IN’ or ‘WHERE EXISTS’ subquery SELECT COUNT(*) FROM google_customers WHERE (cust_first_name, cust_last_name) IN (SELECT cust_first_name, cust_last_name) FROM microsoft_customers) Returns rows from first table only once – Even if more than one matching rows in second table


Download ppt "1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based."

Similar presentations


Ads by Google