ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 14 – Join Processing.

Slides:



Advertisements
Similar presentations
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 12, Part A.
Advertisements

Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
1 Lecture 23: Query Execution Friday, March 4, 2005.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 11 – Hash-based Indexing.
CS CS4432: Database Systems II Operator Algorithms Chapter 15.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Lecture 24: Query Execution Monday, November 20, 2000.
1  Simple Nested Loops Join:  Block Nested Loops Join  Index Nested Loops Join  Sort Merge Join  Hash Join  Hybrid Hash Join Evaluation of Relational.
SPRING 2004CENG 3521 Join Algorithms Chapter 14. SPRING 2004CENG 3522 Schema for Examples Similar to old schema; rname added for variations. Reserves:
Query Processing and Optimization
1 Optimization - Selection. 2 The Selection Operation Table: Reserves(sid, bid, day, agent) A page (block) can hold 100 Reserves tuples There are 1,000.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
1 Implementation of Relational Operations: Joins.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
RELATIONAL JOIN Advanced Data Structures. Equality Joins With One Join Column External Sorting 2 SELECT * FROM Reserves R1, Sailors S1 WHERE R1.sid=S1.sid.
Implementing Natural Joins, R. Ramakrishnan and J. Gehrke with corrections by Christoph F. Eick 1 Implementing Natural Joins.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapter 13.
1 Database Systems ( 資料庫系統 ) December 7, 2011 Lecture #11.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 13 – Query Evaluation.
Lecture 24 Query Execution Monday, November 28, 2005.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.
Query Processing CS 405G Introduction to Database Systems.
Lecture 17: Query Execution Tuesday, February 28, 2001.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 12 – Introduction to.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
More Optimization Exercises. Block Nested Loops Join Suppose there are B buffer pages Cost: M + ceil (M/(B-2))*N where –M is the number of pages of R.
Database Management Systems 1 Raghu Ramakrishnan Evaluation of Relational Operations Chpt 14.
Hash Tables and Query Execution March 1st, Hash Tables Secondary storage hash tables are much like main memory ones Recall basics: –There are n.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapters 13: 13.1—13.5.
CS 540 Database Management Systems
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
1 Lecture 23: Query Execution Monday, November 26, 2001.
Database Applications (15-415) DBMS Internals- Part VIII Lecture 19, March 29, 2016 Mohammad Hammoud.
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Systems (資料庫系統)
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Evaluation of Relational Operations
Database Applications (15-415) DBMS Internals- Part VII Lecture 19, March 27, 2018 Mohammad Hammoud.
Selected Topics: External Sorting, Join Algorithms, …
Lecture 13: Query Execution
Overview of Query Evaluation: JOINS
Lecture 22: Query Execution
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Lecture 22: Query Execution
Lecture 11: B+ Trees and Query Execution
Lecture 22: Friday, November 22, 2002.
Lecture 24: Query Execution
Lecture 20: Query Execution
CS222P: Principles of Data Management UCI, Fall 2018 Notes #11 Join!
Presentation transcript:

ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 14 – Join Processing

ICOM 6005Dr. Manuel Rodriguez Martinez2 Query Evaluation Techniques Read : –Chapter 12, sec –Chapter 13 Purpose: –Study different algorithms to execute (evaluate) SQL relational operators Selection Projection Joins Aggregates Etc.

ICOM 6005Dr. Manuel Rodriguez Martinez3 Join Processing DBMS assume that all projections and selections on single tables are taken first –Project tuples needed for join + tuples to be projected in the actual results Then joins are computed We shall study 5 types of join algorithms –Nested Loops Join –Block Nested Loops Join –Index Nested Loops Join –Sort-merge Join –Hash Join

ICOM 6005Dr. Manuel Rodriguez Martinez4 Nested Loops Join Input: –Tables R and S –Equijoin condition: r[i] = s[j] Compares ith attributes of r with jth attributes of s to join tuples Algorithm: for each r  R do for each s  S do if r[i]=s[j] then add t = result Notation –R is called the outer relation (scanned only once) –S is called the inner relation (scanned multiples times)

ICOM 6005Dr. Manuel Rodriguez Martinez5 Nested Loops Join (2) r1 r2 r3 r4 r5 r6 r7 r8 R s1 s2 s3 s4 s5 s6 S Need to fully scan S For each tuples in R

ICOM 6005Dr. Manuel Rodriguez Martinez6 Nested Loops Joins (3) Cost of the Join R  S: –Cost = NPages(R) + |R|*NPages(S) Usually you want the outer table to be the smallest table –But cost difference is marginal Works any type of join (natural, equi-join, theta-join) Example: –Driver (did:char(10), dname: char(20), dage: integer); Cardinality = 100,000 NPages = 2,000 –Car(cid:char(6), owner: char(10), make: char(10), year: integer); Cardinality = 40,000 NPages = 800

ICOM 6005Dr. Manuel Rodriguez Martinez7 Nested Loops Join (4) Join: Example: –Option 1: Driver is outer and Car is inner If 1 I/O is 10 ms, cost will be Cost = (100,000)*800 = 80,002,000 I/Os (9.3 days!) –Option 2: Car is outer and Driver is inner Cost = ,000*2000 = 80,000,800 I/Os (9.3. days!) –Option 2 saves 1200 I/Os

ICOM 6005Dr. Manuel Rodriguez Martinez8 Block Nested Loops Join Idea –join a block from outer table with blocks of inner table –Two schemes Page-at-a-Time Block-oriented –Page-at-a-Time Algorithm: for each block C in R do for each block D in S do for each r  C do for each s  D do if r[i]=s[j] then add t = result

ICOM 6005Dr. Manuel Rodriguez Martinez9 Block Nested Loops Join (2) r1 r2 r3 r4 r5 r6 r7 r8 s1 s2 s3 S R s4 s5 s6 Need to fully scan S For each page in R Join 1 page of R With 1 page of S

ICOM 6005Dr. Manuel Rodriguez Martinez10 Block Nested Loops Join (3) Cost of the Join R  S: –Cost = NPages(R) + NPages(R)*NPages(S) Works any type of join (natural, equi-join, theta-join) Example: Join: Example: –Option 1: Driver is outer and Car is inner Cost = (2000)*800 = 1,602,000I/Os (4.45 hours!) –Option 2: Driver is outer and Car is inner Cost = *2000 = 1,600,800 I/Os (4.45 hours!) –Option 2 saves 1200 I/Os

ICOM 6005Dr. Manuel Rodriguez Martinez11 Block Nested Loops Join (4) We can do better by leveraging on Buffers Load a bunch of pages from R on memory, call it T –T is a run of pages Join this set of pages T with a page from S Need B buffers for this –B - 2 for the run T, 1 for page of S, 1 for output page Algorithm for Block Oriented NLJ: for each run T of size B - 2 Build in-memory hash table H for T using B – 2 buffers for each block D in S do for each s  D do Iterator I = H.get(s[j]) // probe the hash table for each r  I // iterate over matching tuples if r[i]=s[j] then add t = result

ICOM 6005Dr. Manuel Rodriguez Martinez12 Block Nested Loops Join (6) r1 r2 r3 r4 r5 r6 s1 s2 s3 S T s4 s5 s6 r7 r8 r9 r10 r11 r12 R Pages on disk Buffer Pool Pages on disk Join a run of page of R With 1 page of S Run of pages In hash table

ICOM 6005Dr. Manuel Rodriguez Martinez13 Block Nested Loops Join (7) Cost of the Join R  S: –Cost = –When B -2 = 1,we get page –at – time join Works on natural and equi-join. Example: Join: Example: B = 22 –Option 1: Driver is outer and Car is inner Cost = (2000/20)*800 = 82,000 I/Os (13.6 min) –Option 2: Car is outer and Driver is inner Cost = (800/20)*2000 = 80,800 I/Os (13.5 min) –Option 2 saves 1200 I/Os

ICOM 6005Dr. Manuel Rodriguez Martinez14 Index Nested Loops Join Idea: –If a table has an index, and the search key K matches the join predicate, then index can be used to scan this table –The table that has the index becomes the inner table Algorithm: Index I = S.getIndex() // get handler for index on S for each r  R do Iterator T = I.search(r[i]) for each s  T do add t = result

ICOM 6005Dr. Manuel Rodriguez Martinez15 Index Nested Loops Join (2) r1 r2 r3 r4 r5 r6 r7 r8 s1 s2 s3 S R s4 s5 s6 … …

ICOM 6005Dr. Manuel Rodriguez Martinez16 Index Nested Loops Join (3) Cost of the Join R  S: –Clustered Hash Index: –Cost = NPages(R) + |R|*2 B+ tree: –Cost = NPages(R) + |R|*4 –Un-clustered Hash Index: –Cost = NPages(R) + |R|*3 B+ tree: –Cost = NPages(R) + |R|*4*NTuplesPerPage(S)

ICOM 6005Dr. Manuel Rodriguez Martinez17 Index Nested Loops Join (4) Example: Join: Example: –Scenario 1: Clustered B+ tree on Car Cost = (100,000*4) = 402,000 I/Os (1.12 hr) –Scenario 2: Clustered B+tree on Driver Cost = (40,000 * 4) = 160,800 I/Os (26.8 min) –Consider Scenario 1, Suppose Driver is sorted on join attribute. What happens?

ICOM 6005Dr. Manuel Rodriguez Martinez18 Sort-Merge Join Idea: –If tables are sorted on the join attribute, we can traverse them and join the matching tuples –In fact, it might be worth sorting the tables if not already sorted Algorithm has two stages: –Sorting phase Both tables are sorted on join attribute Use external sorting for this –Merging phase Both tables are scanned and matching tuples are joined

ICOM 6005Dr. Manuel Rodriguez Martinez19 Sort-Merge Join (2) r1 r2 r3 r4 r5 r6 r7 r8 s1 s2 s3 S R s4 s5 s6 Tuples are sorted by Join column Both tables are scanned concurrently Runs of matches are joined

ICOM 6005Dr. Manuel Rodriguez Martinez20 Sort-Merge Join(2) Algorith: Assume R is the smallest relation R2 = Sort table R; S2 = Sort table S; I1 = R2.scanIterator(); r = I1.next(); I2 = S2.scanIterator(); s = I2.next(); while there are tuples in R2 do while (r[i] < s[j]) r = I1.next(); while (s[j] < r[i]) s = I2.next(); while (s[j]==r[i]) sOld = s; while (s[j]==r[i]) add t = result s = I2.next(); s = sOld; r = I1.next();

ICOM 6005Dr. Manuel Rodriguez Martinez21 Sort-Merge Join (3) Cost of the Join R  S, having B buffers for sorting –Parameters: –Cost:

ICOM 6005Dr. Manuel Rodriguez Martinez22 Sort-Merge Join Example: Join: Recall: Example: B = 22 –Cost = = 14,000 I/Os (2.3 min)

ICOM 6005Dr. Manuel Rodriguez Martinez23 Hash Join Idea: –Hash both tables on the join attribute –Matching tuples must hash to the same corresponding buckets –You can simply inspect corresponding buckets on each table to find matching tuples –For this to work you need a lot of memory To fit the partitions into an in-memory hash table for probing

ICOM 6005Dr. Manuel Rodriguez Martinez24 Hash Join Partition Phase (Phase I)... H Input Relation Partitions Build B-1 disk-resident partitions of variables size Input Hash function Output 0 1 … B-1

ICOM 6005Dr. Manuel Rodriguez Martinez25 Hash Join Probing Phase (Phase II)... H2 Input Partitions Input Hash function Output Resulting tuples Join tuples by probing hash table R partition S

ICOM 6005Dr. Manuel Rodriguez Martinez26 Hash Join (3) Algorithm: T1 = Hash(R) T2 = Hash(S) for each l = 0, 1,…, k for each partition B L in T1 do for each tuple r  B L do insert r into in-memory hash table T3 for each partition C L in T2 do for each tuple s  C L do Iterator I = T3.get(s[j]); for each r  I do add t = result

ICOM 6005Dr. Manuel Rodriguez Martinez27 Hash Join (4) Cost of the Join R  S –Cost = (NPages(R)+NPages(S)) + (NPages(R)+NPages(S)) + (NPages(R)+NPages(S)) –Cost = 3(NPages(R)+NPages(S)) Example: Join: –Option 1: Driver is outer and Car is inner Cost = 3 * ( ) = 8,400 I/Os (1.4 min) –Option 2: Car is outer and Driver is inner Cost = 3 * ( ) = 8,400 I/Os (1.4 min) –You need to pick the one that has runs that fit in memory

ICOM 6005Dr. Manuel Rodriguez Martinez28 Some Issues This option for join evaluation is very memory consuming –Should only be used if enough buffers are available How many buffer is enough? –Let B be the number of buffers to use –At partitioning phase, we need 1 buffer for pages from S We are left with B-1 buffers for partitions of R Hence we will have B -1 partitions –Let M be the number of pages with tuples in table R –We have M/B-1 pages in each partition of R –Hash table size will be (M/B-1) * f, f is fudge factor to compensate for extra space need –We must have that B > (f*M/B-1) + 2 –Thus,