CS4432: Database Systems II Query Processing- Part 3 1.

Slides:



Advertisements
Similar presentations
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Advertisements

1 Lecture 23: Query Execution Friday, March 4, 2005.
15.8 Algorithms using more than two passes Presented By: Seungbeom Ma (ID 125) Professor: Dr. T. Y. Lin Computer Science Department San Jose State University.
Lecture 13: Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data.
CS CS4432: Database Systems II Operator Algorithms Chapter 15.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Bhargav Vadher (208) APRIL 9 th, 2008 Submittetd To: Dr. T Y Lin Computer Science Department San Jose State University.
Dr. Kalpakis CMSC 661, Principles of Database Systems Query Execution [15]
Completing the Physical-Query-Plan. Query compiler so far Parsed the query. Converted it to an initial logical query plan. Improved that logical query.
Nested-Loop joins “one-and-a-half” pass method, since one relation will be read just once. Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in.
Lecture 24: Query Execution Monday, November 20, 2000.
15.6 Index-Based Algorithms Sadiya Hameed ID: 206 CS257.
1 Query Processing Two-Pass Algorithms Source: our textbook.
Query Execution 15.5 Two-pass Algorithms based on Hashing By Swathi Vegesna.
1 Lecture 22: Query Execution Wednesday, March 2, 2005.
Nested Loops Joins Book Section of chapter 15.3 Submitted to : Prof. Dr. T.Y. LIN Submitted by: Saurabh Vishal.
15.5 Two-Pass Algorithms Based on Hashing 115 ChenKuang Yang.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Query Compiler: 16.7 Completing the Physical Query-Plan CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung ID: 212.
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
15.3 Nested-Loop Joins - Medha Pradhan - ID: CS 257 Section 2 - Spring 2008.
Sorting and Query Processing Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 29, 2005.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CSE 444: Lecture 24 Query Execution Monday, March 7, 2005.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 242 Database Systems II Query Execution.
CPS216: Advanced Database Systems Notes 06:Query Execution (Sort and Join operators) Shivnath Babu.
CSCE Database Systems Chapter 15: Query Execution 1.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
DBMS 2001Notes 5: Query Processing1 Principles of Database Management Systems 5: Query Processing Pekka Kilpeläinen (partially based on Stanford CS245.
CPS216: Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book Krishna Vellanki 124.
Lecture 24 Query Execution Monday, November 28, 2005.
Multi pass algorithms. Nested-Loop joins Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to.
CS4432: Database Systems II Query Processing- Part 2.
CPS216: Advanced Database Systems Notes 07:Query Execution (Sort and Join operators) Shivnath Babu.
CSCE Database Systems Chapter 15: Query Execution 1.
Query Processing CS 405G Introduction to Database Systems.
Lecture 17: Query Execution Tuesday, February 28, 2001.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
CS 540 Database Management Systems
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Query Processing Spring 2016.
1 Lecture 23: Query Execution Monday, November 26, 2001.
Query Processing COMP3017 Advanced Databases Nicholas Gibbins
CS4432: Database Systems II Query Processing- Part 1 1.
CS 540 Database Management Systems
CS 440 Database Management Systems
Selected Topics: External Sorting, Join Algorithms, …
15.6 Index Based Algorithms
(Two-Pass Algorithms)
Lecture 2- Query Processing (continued)
One-Pass Algorithms for Database Operations (15.2)
Implementation of Relational Operations
Lecture 24: Query Execution
Lecture 13: Query Execution
Query Execution Index Based Algorithms (15.6)
Lecture 23: Query Execution
Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.
Overview of Query Evaluation: JOINS
Lecture 22: Query Execution
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Lecture 22: Query Execution
Lecture 11: B+ Trees and Query Execution
Lecture 22: Friday, November 22, 2002.
Lecture 24: Query Execution
Lecture 20: Query Execution
Presentation transcript:

CS4432: Database Systems II Query Processing- Part 3 1

Covered So Far… One-Pass Operator Evaluation – Join, Duplicate Elimination, Group By, Set Union Two-Pass Evaluation (Sort-Based) – Sort, Duplicate Elimination, Join [Sort-Join, Sort-Merge-Join] 2 What’s Next… Two-Pass Evaluation (Hash-Based) – Duplicate Elimination, Join Index-Based Evaluation – Join Nest-Loop Join

Two-Pass Hash-Based Algorithms 3

Hash-Based Algorithms: Main Idea Data is too large to fit in memory Partition the data (using hashing) into buckets Work on each bucket individually – Either One-Pass or Two-Pass 4

Partitioning a Relation Read one block at a time  (1 Buffer) Keep M-1 buffers for the hash table (M-1 buckets) Hash each tuple to its bucket’s buffer If buffer of bucket X is full  write it to disk Good Hash function  Each bucket size = B(R) / M-1 5 R on disk 1 buffer (block at a time) Hash Table (M- 1 Buffers)

Hash-Based Two-Pass Duplicate Elimination Pass 1: Partition R using Hashing 6 Distinct R Pass 2: Load each bucket into memory and eliminate duplicates – Identical tuples must exist in the same bucket What is the I/O Cost What are the constraints for this algorithm to work in two pass? Each bucket size = B(R) / M-1 A bucket must fit in memory B(R)/M-1 <= M  B(R) < M 2 Pass 1  2 B(R), Pass 2  B(R) Total  3 B(R)

Hash-Based Two-Pass Join 7 Join R S Phase 1: Partition each relation using the same hash function into buckets – Hash function must be on join key Phase 2: Join Bucket x from R with Bucket x from S Phase 1: Partition R S R’s buckets S’s buckets

Hash-Based Two-Pass Join (Cont’d) 8 Join R S Phase 2: Join Buckets R.x and S.x Move the smaller bucket to memory  (M-2) buffers Read from the other bucket one block at a time and join  1 buffer Input buffer for R’s bucket Build hash table or search tree for S’s bucket (M-1 buffers) M main memory buffers Disk Output buffer Disk Join Result R’s bucket S’s bucket What is the I/O Cost What are the constraints for this algorithm to work in two pass?

Hash-Based Two-Pass Join 9 Join R S What is the I/O Cost 2 B(R) 2 B(S) B(R) + B(S) Total I/O cost = 3(B(R) + B(S))

Hash-Based Two-Pass Join 10 Join R S No constraints Smaller bucket must fit in memory  B(S)/M <= M (approximation)  B(S) <= M 2 Or Min(B(R), B(S)) <= M 2 What are the constraints? No constraints

Index-Based Evaluation 11

Clustered vs. Un-Clustered Index Data records are sorted on the index key If index returns N tuples, how many I/Os? – N/(number of tuples per block)  Number of tuples per block = T(R)/B(R) 12 Data records are randomly stored If index returns N tuples, how many I/Os? – N

Index-Based Algorithms Very useful and effective for selection & join Important property – If read R by following its indexing pointers  tuples will be sorted on the indexed column – Can be used to Sorting, Duplicate Elimination, Grouping operators 13

Need to Remember… B(R): # of blocks to hold all R tuples T(R): # tuples in R S(R): # of bytes in each of R’s tuple V(R, A): # distinct values in attribute R.A M: # of memory buffers available R R R is “clustered”  R’s tuples are packed into blocks  Accessing R requires B(R) I/Os R is “not clustered”  R’s tuples are distributed over the blocks  Accessing R requires T(R) I/Os 14

Index-Based Join Assume Joining R & S on column Y 15 Join R S For each r  R do [ X  index-on-S.Y-lookup(r.Y) For each s  X do Output (r,s) pair] Assume S.Y has an index R (the one with No index) becomes the outer relation S (the one with index) becomes the inner relation Follow the pointers from the index and retrieve the data tuples X from S What is the I/O Cost?

Index-Based Join 16 Join R S For each r  R do [ X  index-on-S.Y-lookup(r.Y) For each s  X do Output (r,s) pair] What is the I/O Cost? Read R block at a time  B(R) if R is clustered  T(R) if R is not clustered What is the expected size of X?  T(S) / V(S,Y) (we assume uniform dist.) How many lookups we do?  T(R) What is the index I/O cost? (Index height = H)  0 if the index in memory  H if entirely not in memory  (H-z) if the 1 st z-levels of index are in memory Translates to how many I/Os?  T(S)/ V(S,Y) if unclustered index  B(S)/V(S,Y) if clustered index

Index-Based Join 17 Join R S What is the I/O Cost? Assume R is clustered, S.Y index in memory, and the index is clustered….What is the cost?  B(R) + T(R) (B(S)/ V(S,Y)) Assume R is un-clustered, S.Y height = 3, the root is in memory, and the index is clustered….What is the cost?  T(R) + T(R) (2 + B(S)/ V(S,Y))

Block Nested-Loop Join 18

Block Nested-Loop Join Sometimes other techniques do not help Examples: – Join based on R.x <> S.x – Join based on R.x = F(S.x), F is black-box func. 19 Join R S If the smaller relation fits in memory Use “One-Pass Iteration Join” covered before If not… Allocate M-1 buffers for the smaller relation S (outer relation) For each (M-1) blocks from S – Use 1 buffer to scan R (inner relation) one block at a time, and join with the M-1 blocks of S Repeat with the next (M-1) blocks of S until all is done. S R S R M-1

Block Nested-Loop Join Sometimes other techniques do not help Examples: – Join based on R.x <> S.x – Join based on R.x = F(S.x), F is black-box function 20 Join R S If the smaller relation fits in memory Use “One-Pass Iteration Join” covered before If not… Allocate M-1 buffers for the smaller relation S (outer relation) For each (M-1) blocks from S – Use 1 buffer to scan R (inner relation) one block at a time, and join with the M-1 blocks of S Repeat with the next (M-1) blocks of S until all is done. What is the I/O Cost? S will be read once  B(S) For each M-1 blocks from S, R is read once  (B(S)/M-1) x B(R) Total = B(S) + (B(S)/M-1) x B(R) Exercise: Compute the cost if R is the outer relation??

Covered So Far… One-Pass Operator Evaluation – Join, Duplicate Elimination, Group By, Set Union Two-Pass Evaluation (Sort-Based) – Sort, Duplicate Elimination, Join [Sort-Join, Sort-Merge-Join] Two-Pass Evaluation (Hash-Based) – Duplicate Elimination, Join Index-Based Evaluation – Join Nest-Loop Join 21