Download presentation
Presentation is loading. Please wait.
Published byGarey Hawkins Modified over 8 years ago
1
Query Processing – Implementing Set Operations and Joins Chap. 19
2
Relational Model most uniform data structures most formal files + mathematical foundation Set theory Tables – relations, rows – tuples, columns – attributes
3
Relational Model Constrains 1 st normal form Key constraint Entity integrity Referential integrity – Relationships – 1:1, 1:N, N:M
5
SQL – Structured English Query Language Select From Where
6
SQL Select called Subselect Select expr {, expr} From tablename [alias] {, tablename [alias]} [Where search_condition] [Group By col {, col}] [Having search_condition]
7
Having For each project on which more than two employees work, retrieve the project number, project name, and the number of employees who work on that project. Select pnumber, pname, COUNT(*) From Project, Works_on Where pnumber =pno Group By pnumber, pname Having COUNT(*) > 2
8
SQL Select Select is really: Subselect {Set_Operation [all] Subselect} [Order By col [asc | desc] {, col [asc | desc]}]
9
Set Operations The Set Operations are: – UNION, MINUS and INTERSECT The resulting relations are sets of tuples; duplicate tuples are eliminated. Operations apply only to union compatible relations. The two relations must have the same number of attributes and the attributes must be of the same type.
10
Set Operation Select ssn from employee Minus Select essn from works_on Order by ssn
11
Relational Algebra Relational algebra (algebraic notation) and relational calculus (logical notation) created to demonstrate the potential for a query language of the relational model algebra and calculus are equivalent in expressive power Provide DML and DDL SQL based on relational algebra
12
Relational Algebra In relational algebra, a series of operations are combined to form a relational algebra expression (query) DML Operations: set theory operations relational db operations
13
Relational Algebra Set theory operations: – Union – Intersection – Difference – Cartesian Product Relational db operations: – Select – Project – Join
14
Select Statement Select - chooses columns (project operation in relational algebra) From - combines tables if > 1 table (join operation |X| in relational algebra) Where - chooses rows (select operation in relational algebra) – Result of a query is usually considered another relation This is always true, even if it is a single value (1 row, 1 col) – Results may contain duplicate tuples
15
SQL and relational algebra Select ssn, lname, fname From Employee, Department Where dno=dnumber and sex=‘F’ Select salary From Employee Select distinct salary From Employee
16
Set Operation Select ssn from employee Minus Select essn from works_on
17
Relational algebra set Operations Union U Intersection ∩ Minus (R-S)
18
Algorithms for implementing relational ops Set operations Union U – how to implement this? – Sort Sort both files scan both files concurrently (sort-merge) Add all tuples to result file, but no duplicates or – Hash Hash one file hash the other file no duplicates in result file
19
Set Op Algorithms cont’d Intersection ∩ - how to implement this? – Sort Sort both files scan both files concurrently (sort merge) duplicates added to result file or – Hash Hash one file hash the other duplicates added to result file
20
Set Op Algorithms Minus (R-S) - how to implement this? – hash R to hash file – if duplicate record found in S, remove tuple OR can use sort-merge
21
Relational algebra operations Project op π Equi-join |X| Select
22
Project Algorithms Project op π – more expensive if must eliminate duplicate tuples – sort or hash to do so
23
External Sorting Used for order by and join, union, distinct, etc. Large size files that do not fit into memory Sort-merge – Sort small subfiles (runs) – Uses buffer space in memory to sort a run – Sorted runs merged – take several passes (n-way merging) E.g. for a 4-way merge: 205 initial sorted runs merged 4 at a time into 52 larger subfiles, then merged 4 at a time into 13 sorted files, then into 4 sorted files, then 1
24
Joins Different ways to implement joins (equi- joins)? – Nested loop – Sort merge – Hashing
25
Equi-Join Algorithms |X| 1.nested (inner-outer) loop – for each record t in R retrieve every record s from S and test if satisfy join condition – If match, combine records and write to output file – CPU time: n*m
26
Equi-join 2. Sort-merge join – records of R and S ordered by value of join attribute – both files scanned in order, need to scan each file only once if duplicate values, have an inner loop and must back up the pointer – When match, combine records and write to output file CPU time: n+m plus time to sort (nlogn)
27
Equi-join 3. Hash join – records of both R and S hashed to same hash file – use same hashing function on join attributes – hash smaller file first (all fits in memory or partitioning used) – single hash of second file, – if match combine record with matching records of first file in output file – CPU time: (assume good hash function) n+m but no sorting
28
Different room for Wed. only!!! Shelby 2021
29
Nested Loop Join in Oracle Nested loop joins are useful when the following conditions are true: – The database joins small subsets of data – The join condition is an efficient method of accessing the second table Inner table dependent on outer table Else hash join used Must identify driving table as outer table – Most identify driving table as one with lowest filter ratio (smallest number of rows after any selects applied)
30
Hash Join in Oracle The optimizer uses a hash join if equijoin and if either of the following conditions are true: – A large amount of data must be joined – A large fraction of a small table must be joined – Uses smaller of 2 tables to build hash table on the join key (best if table fits in memory)
31
Sort Merge Join in Oracle Hash joins generally perform better than sort merge joins. However, sort merge joins can perform better than hash joins if both of the following conditions exist: – The row sources are sorted already – A sort operation does not have to be done if a sort merge join involves choosing a slower access method (e.g. an index scan as opposed to a full table scan), then the benefit of using a sort merge might be lost Sort merge joins are useful when the join condition between two tables is an inequality condition such as, or >=
32
Join Comparisons in Oracle Nested loop for small data sets with dependent tables Sort merge joins perform better than nested loop joins for large data sets You cannot use hash joins unless there is an equality condition
33
Access structures (index) Access structure to retrieve matching records – if an index on join attribute (of S) retrieve each record t in R and use access structure to directly retrieve all records in S that satisfy join record – CPU time: n+m – if secondary indexes on both join attributes No need to sort, but data may not be in order of join attribute -- problems?
34
Index versus table scan in Oracle Full table scans can be cheaper than indexes when accessing a large fraction of the blocks in a table Full table scans can use larger I/O calls which is cheaper than making many smaller calls with index
35
Nested joins Transforming nested queries to join queries – possible to transform most nested queries into equivalent unnested queries (join queries) – join query allows optimizer to use additional algorithms to optimize – will talk more about this later DB2 rules: – can transform nested query to join query when: 1) 'in' or '=any' connects subquery 2) subquery target list is single column, unique value (unique index indicates this) 3) 'not exits' predicate not transformed
36
What if correlated? Correlated nested query – Use nested loop join – Would sort merge or hash join be good?
37
Outer Join Modify a join algorithm, e.g. also return rows from one table with no match in the other table or Combination of relational algebra operations: join, minus, cartesian product and union operations
38
Outer Join Select lname, fname, dname // assume lname, fname unique From (Employee left outer join Department on dno=dnumber) Temp1 ← lname, fname, dname (Employee |X| dno=dnumber Department) Temp2 ← lname, fname (Employee) ─ lname, fname (Temp1) Temp3 ← Temp 2 X NULL Result ← Temp1 U Temp3
39
Select – For operation (will talk about this more later) Choose an index or not
40
Logical order of Evaluation Select pnumber, pname, COUNT(*) From Project, Works_on Where pnumber =pno Group By pnumber, pname Having COUNT(*) > 2 Order by pname – Apply Cartesian product to tables, – Join and select conditions – then group by and having. – Apply the select clause – order the result for the display.
41
Order of evaluation Actual order of evaluation? – Any guesses?
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.