Query Execution Two-pass Algorithms based on Hashing

Slides:



Advertisements
Similar presentations
Two-Pass Algorithms Based on Sorting
Advertisements

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
1 Lecture 23: Query Execution Friday, March 4, 2005.
15.8 Algorithms using more than two passes Presented By: Seungbeom Ma (ID 125) Professor: Dr. T. Y. Lin Computer Science Department San Jose State University.
Bhargav Vadher (208) APRIL 9 th, 2008 Submittetd To: Dr. T Y Lin Computer Science Department San Jose State University.
COMP 451/651 Optimizing Performance
Nested-Loop joins “one-and-a-half” pass method, since one relation will be read just once. Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in.
Lecture 24: Query Execution Monday, November 20, 2000.
ONE PASS ALGORITHM PRESENTED BY: PRADHYUMAN RAOL ID : 114 Instructor: Dr T.Y. LIN.
1 Query Processing Two-Pass Algorithms Source: our textbook.
Parallel Algorithms for Relational Operations. Many processors...and disks There is a collection of processors. –Often the number of processors p is large,
Query Execution 15.5 Two-pass Algorithms based on Hashing By Swathi Vegesna.
ONE PASS ALGORITHM PRESENTED BY: PRADHYUMAN RAOL ID : 114 Instructor: Dr T.Y. LIN.
1 Lecture 22: Query Execution Wednesday, March 2, 2005.
15.5 Two-Pass Algorithms Based on Hashing 115 ChenKuang Yang.
Query Execution :Nested-Loop Joins Rohit Deshmukh ID 120 CS-257 Rohit Deshmukh ID 120 CS-257.
Parallel Algorithms for Relational Operations Class ID: 21 Name: Shujia Zhang.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CSCE Database Systems Chapter 15: Query Execution 1.
CS4432: Database Systems II Query Processing- Part 3 1.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book Krishna Vellanki 124.
Lecture 24 Query Execution Monday, November 28, 2005.
Multi pass algorithms. Nested-Loop joins Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to.
CS4432: Database Systems II Query Processing- Part 2.
CSCE Database Systems Chapter 15: Query Execution 1.
Lecture 17: Query Execution Tuesday, February 28, 2001.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Query Processing Spring 2016.
1 Lecture 23: Query Execution Monday, November 26, 2001.
Query Processing COMP3017 Advanced Databases Nicholas Gibbins
Two-Pass Algorithms Based on Sorting
Chapter 4: Query Processing
15.1 – Introduction to physical-Query-plan operators
CS 440 Database Management Systems
Chapter 12: Query Processing
Chapter 12: Query Processing
Evaluation of Relational Operations
Chapter 15 QUERY EXECUTION.
15.5 Two-Pass Algorithms Based on Hashing
File Processing : Query Processing
Implementation of Relational Operations (Part 2)
Relational Operations
Database Applications (15-415) DBMS Internals- Part VI Lecture 15, Oct 23, 2016 Mohammad Hammoud.
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
Query Processing.
Chapter 13: Query Processing
Chapter 12: Query Processing
(Two-Pass Algorithms)
Module 13: Query Processing
Chapter 13: Query Processing
Lecture 2- Query Processing (continued)
One-Pass Algorithms for Database Operations (15.2)
Lecture 27: Optimizations
Chapter 12 Query Processing (1)
Lecture 24: Query Execution
Lecture 13: Query Execution
Lecture 23: Query Execution
Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.
Evaluation of Relational Operations: Other Techniques
Lecture 22: Query Execution
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
CSE 444: Lecture 25 Query Execution
Lecture 22: Query Execution
CPSC-608 Database Systems
Lecture 11: B+ Trees and Query Execution
Chapter 13: Query Processing
Lecture 22: Friday, November 22, 2002.
Lecture 24: Query Execution
Lecture 20: Query Execution
Presentation transcript:

Query Execution Two-pass Algorithms based on Hashing By Swathi Vegesna

At a glimpse Introduction Partitioning Relations by Hashing Algorithm for Duplicate Elimination Grouping and Aggregation Union, Intersection, and Difference Hash-Join Algorithm Sort based Vs Hash based Summary

Introduction Hashing is done if the data is too big to store in main memory buffers. Hash all the tuples of the argument(s) using an appropriate hash key. For all the common operations. there is a way to select the hash key so all the tuples that need to be considered together when we perform the operation have the same hash value. This reduces the size of the operand(s) by a factor equal to the number of buckets.

Partitioning Relations by Hashing Algorithm: initialize M-1 buckets using M-1 empty buffers; FOR each block b of relation R DO BEGIN read block b into the Mth buffer; FOR each tuple t in b DO BEGIN IF the buffer for bucket h(t) has no room for t THEN BEGIN copy the buffer t o disk; initialize a new empty block i n t h a t buffer; END; copy t to the buffer for bucket h(t); END ; FOR each bucket DO IF the buffer for t h is bucket is not empty THEN write the buffer to disk;

Duplicate Elimination For the operation δ(R) hash R to M-1 Buckets. (Note that two copies of the same tuple t will hash to the same bucket) Do duplicate elimination on each bucket Ri independently, using one-pass algorithm The result is the union of δ(Ri), where Ri is the portion of R that hashes to the ith bucket

Requirements Number of disk I/O's: 3*B(R) B(R) < M(M-1), only then the two-pass, hash-based algorithm will work In order for this to work, we need: hash function h evenly distributes the tuples among the buckets each bucket Ri fits in main memory (to allow the one-pass algorithm) i.e., B(R) ≤ M2

Grouping and Aggregation Hash all the tuples of relation R to M-1 buckets, using a hash function that depends only on the grouping attributes (Note: all tuples in the same group end up in the same bucket) Use the one-pass algorithm to process each bucket independently Uses 3*B(R) disk I/O's, requires B(R) ≤ M2

Union, Intersection, and Difference For binary operation we use the same has function to hash tuples of both arguments. R U S we hash both R and S to M-1 R S we hash both R and S to 2(M-1) R-S we hash both R and S to 2(M-1) Requires 3(B(R)+B(S)) disk I/O’s. Two pass hash based algorithm requires min(B(R)+B(S))≤ M2

Hash-Join Algorithm Use same hash function for both relations; hash function should depend only on the join attributes Hash R to M-1 buckets R1, R2, …, RM-1 Hash S to M-1 buckets S1, S2, …, SM-1 Do one-pass join of Ri and Si, for all i 3*(B(R) + B(S)) disk I/O's; min(B(R),B(S)) ≤ M2

Sort based Vs Hash based For binary operations, hash-based only limits size to min of arguments, not sum Sort-based can produce output in sorted order, which can be helpful Hash-based depends on buckets being of equal size Sort-based algorithms can experience reduced rotational latency or seek time

Summary Partitioning Relations by Hashing Algorithm for Duplicate Elimination Grouping and Aggregation Union, Intersection, and Difference Hash-Join Algorithm Sort based Vs Hash based

Thank you