Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016

Slides:



Advertisements
Similar presentations
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Advertisements

15.8 Algorithms using more than two passes Presented By: Seungbeom Ma (ID 125) Professor: Dr. T. Y. Lin Computer Science Department San Jose State University.
Implementation of Other Relational Algebra Operators, R. Ramakrishnan and J. Gehrke1 Implementation of other Relational Algebra Operators Chapter 12.
Query Execution Since our SQL queries are very high level the query processor does a lot of processing to supply all the details. An SQL query is translated.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
COMP 451/651 Optimizing Performance
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
Lecture 24: Query Execution Monday, November 20, 2000.
ONE PASS ALGORITHM PRESENTED BY: PRADHYUMAN RAOL ID : 114 Instructor: Dr T.Y. LIN.
One Pass Algorithm Presented By: Presented By: Farzana Forhad Farzana Forhad ID : 107.
Query Execution 15.5 Two-pass Algorithms based on Hashing By Swathi Vegesna.
ONE PASS ALGORITHM PRESENTED BY: PRADHYUMAN RAOL ID : 114 Instructor: Dr T.Y. LIN.
1 Lecture 22: Query Execution Wednesday, March 2, 2005.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CSCE Database Systems Chapter 15: Query Execution 1.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
CS4432: Database Systems II Query Processing- Part 3 1.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book Krishna Vellanki 124.
Multi pass algorithms. Nested-Loop joins Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to.
CS4432: Database Systems II Query Processing- Part 2.
CSCE Database Systems Chapter 15: Query Execution 1.
Lecture 17: Query Execution Tuesday, February 28, 2001.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
1 Lecture 23: Query Execution Monday, November 26, 2001.
Query Processing COMP3017 Advanced Databases Nicholas Gibbins
CS4432: Database Systems II Query Processing- Part 1 1.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
15.1 – Introduction to physical-Query-plan operators
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Management System
Chapter 12: Query Processing
Evaluation of Relational Operations
CPSC-608 Database Systems
Chapter 15 QUERY EXECUTION.
Query Execution Presented by Khadke, Suvarna CS 257
15.5 Two-Pass Algorithms Based on Hashing
Database Management Systems (CS 564)
Evaluation of Relational Operations: Other Operations
Database Systems Ch Michael Symonds
File Processing : Query Processing
Database Applications (15-415) DBMS Internals- Part VI Lecture 15, Oct 23, 2016 Mohammad Hammoud.
Query Execution Two-pass Algorithms based on Hashing
(Two-Pass Algorithms)
Chapters 15 and 16b: Query Optimization
Lecture 2- Query Processing (continued)
One-Pass Algorithms for Database Operations (15.2)
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Chapter 12 Query Processing (1)
Overview of Query Evaluation
Lecture 13: Query Execution
Lecture 23: Query Execution
Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.
Evaluation of Relational Operations: Other Techniques
Lecture 22: Query Execution
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Lecture 22: Query Execution
CPSC-608 Database Systems
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
Lecture 11: B+ Trees and Query Execution
Lecture 22: Friday, November 22, 2002.
Evaluation of Relational Operations: Other Techniques
Lecture 24: Query Execution
Lecture 20: Query Execution
Presentation transcript:

Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016 15.2 One-Pass Algorithms Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016

Keys How should we execute each of the individual steps of a logical query plan? What is an One-Pass algorithm? How does the One-Pass algorithm work for different operators?

How to execute each of the individual steps of a logical query plan? Each step of plan is an operation such as a join operation or selection operation or grouping operation etc. Join, selection, projection etc are operators. Algorithms for the operators are broadly classified into 3 classes: Sorting-based (to be covered in Section 15.4) Hash-based (to be covered in Section 15.5 and 20.1) Index-based (to be covered in Section 15.6)

How to execute each of the individual steps of a logical query plan? Algorithms for operators are divided into 3 “degrees” of difficulty and cost: One-Pass algorithms (covered in this Section) Two-Pass algorithms (covered in Sections 15.4 and 15.5) Multi-Pass algorithms (covered in Section 15.8)

What is an One-Pass algorithm? It is an algorithm that reads data only once from the disk. Usually, the algorithm requires, at least 1 of the arguments of the operator fit in main memory. Exceptions - Selection and Projection operators

How does the One-Pass algorithm work for different operators? Operators classified into 3 broad groups: Tuple-at-a-time, unary operations Full-relation, unary operations Full-relation, binary operations

How does the One-Pass algorithm work for Tuple-at-a-Time, Unary operations? Selection(σ) and Projection(∏). Don’t require an entire relation or even a large part of it, in memory at once. (For this reason they are exceptions for One-Pass) Read one block at a time, use 1 main memory buffer and produce output.

How does the One-Pass algorithm work for Tuple-at-a-Time, Unary operations? 2 buffers = 1 input buffer and 1 output buffer. Read blocks of R, one at a time, into input buffer. Perform the operation on each tuple (keep or discard) Move the selected or projected tuples into output buffer.

How does the One-Pass algorithm work for Tuple-at-a-Time, Unary operations? Space requirements: M >= 1 for only the input buffer, regardless of B. Note - Don’t consider output buffer as needed space because it might be functioning as I/P buffer for another operation or sending data to end-user.

How does the One-Pass algorithm work for Tuple-at-a-Time, Unary operations? Disk I/O requirements: Depends on how R is stored initially: R initially on disk -> Time taken for table scan or index scan of R Typically, the cost is B if R is clustered. The cost is T if R is not clustered.

How does the One-Pass algorithm work for Full Relation, Unary operations? These one-argument operations require seeing all or most of the tuples in memory at once. One-Pass algorithms for applicable to relations that are approximately of size M (the number of main- memory buffers) available or less. Grouping(Ɣ) and Duplicate-Elimination(δ) operators.

Duplicate-Elimination Operator How does the One-Pass algorithm work for Full Relation, Unary operations? Duplicate-Elimination Operator 3 buffers used 1 buffer is being used for incoming tuple These buffers store a copy of every tuple seen.

One-Pass algorithm for Duplicate-Elimination operator (δ) We read in each block of R one at a time, but for each tuple we need to make a decision as to whether: It is the first time we have seen this tuple, if yes, copy it to O/P buffer, else, We have seen this tuple before and don’t copy it to O/P buffer.

One-Pass algorithm for Duplicate-Elimination operator (δ) Naive data-structure (list) scenario: n tuples in memory in list. Processor time for 1 complete operation ∝ n2. Hash table, balanced BST can be used, introduce space overhead. But, overhead is small when compared to space for storing tuples. B(δ(R)) ≤ M [∵, We can have at-most M - 1 unique tuples in memory] Cannot compute size of δ(R) without computing δ(R) itself.

One-Pass algorithm for Grouping operator (ƔL) It gives us zero or more grouping attributes and one or more aggregated attributes. Create one entry for each group in main memory. Scan tuples of R, one block at a time for each value of the grouping attributes. Entry for group in memory consists of: values for the grouping attributes, and accumulated value(s) for aggregations. When all tuples of R have been read into input buffer and contributed to the aggregation(s) for their group, output can be produced by writing the tuple for each group. NOTE - Until the last tuple is seen, can’t begin to create output for Ɣ operation.

One-Pass algorithm for Grouping operator (ƔL) - Aggregate operations MIN(a) or MAX(a) aggregate: Record minimum or maximum value, respectively, of attribute ‘a’ seen for any tuple in the group so far. Change this min or max, if appropriate, each time a tuple of the group is seen. COUNT aggregation: Add one for each tuple of the group that is seen. SUM(a) aggregation: Add the value of ‘a’ to accumulated sum for its group. [a != NULL] AVG(a) aggregation (Hard case): Maintain 2 accumulations: count of number of tuples in the group (computed as for COUNT aggregation), and, accumulated sum of the attribute values of these tuples (computed as for SUM aggregation). After all tuples of R are seen, quotient of sum and count is the average.

Why does the One-Pass algorithm for Grouping operator (ƔL) not fit Iterator framework? Can’t produce output before the last tuple is seen. Entire grouping has to be done by the Open() before the first tuple can be retrieved by GetNext(). Main-memory data-structure used should be able to find the entry for each group, given values for the grouping attributes. Hash-tables or balanced trees commonly used. Search key for the structures is the grouping attributes only. I/O’s needed = B(R) (Clustered) I/O’s needed = T(R) (Non-Clustered) Memory buffers required M not related to B in any simple way, but M is less than B.

How does the One-Pass algorithm work for Full Relation, Binary operations? Binary operations discussed in book: Union (has bag and set variants) Intersection (has bag and set variants) Difference (has bag and set variants) Product Join (Natural Join) Equi-Joins can be implemented the same way as natural join after attributes are renamed appropriately. Theta-Joins can be implemented as product or equi-join followed by a selection for conditions that cannot be expressed in an equi-join.

How does the One-Pass algorithm work for Full Relation, Binary operations? Bag union can be achieved using M = 1 regardless of size of R and S. Other operations require smaller of R and S to be in memory and a data-structure for fast inserts and searches. Hash-table and balanced trees commonly used. Approximate requirements for other operations: min(B(R),B(S)) ≤ M [Clustered] min(T(R),T(S)) ≤ M [Non-Clustered] 1 buffer used to read blocks of larger relation, M buffers needed to store blocks of smaller relation in its main-memory data-structure.

One-Pass algorithm for Union operation Bag and Set variants of Union [∪B and ∪S ] : For R ∪B S - Copy each tuple of R to the O/P buffer and then copy every tuple of S to the O/P buffer. Number of disk I/O’s = B(R) + B(S) [Clustered] Number of disk I/O’s = T(R) + T(S) [Non-Clustered] Can be achieved using M = 1 regardless of size of R and S. For R ∪S S - Assuming R is larger of the relations, store S in memory M - 1, and build a search structure whose search key is the entire tuple and copy to O/P buffer. Read each block of R into Mth buffer, one at a time. For each tuple t of R, we see if t is in S, and if not, we copy t to O/P buffer, else, skip t.

One-Pass algorithm for Intersection operation Bag and Set variants of Intersection operation [ ∩B and ∩S ] : R ∩B S : Read S into M - 1 buffers, associate each distinct tuple of S a count. Multiple copies of t are not stored individually. Something like this -> {(t,c), …} Read in each block from R, one at a time, and for tuple t in R, check if t occurs in S. If not, skip t, else, if count of t is > 0, O/P t and decrement count by 1. If count of t = 0, skip t. Space taken assumption : B(S) ≤ M. R ∩S S : Read S into M - 1 buffers and build a search structure with full tuples as the search key. Read each block of R, and for each tuple t of R, see if t is also in S. If yes, copy to O/P buffer, else, skip t.

One-Pass algorithm for Difference operation Bag and Set variants of Difference operation [ -B and -S ] : R -S S ≠ S -S R (Not commutative): Read S into M - 1 buffers and build search structure with full tuple as search key. R -S S :: Read in each tuple t from R and check if t is in S. If yes, skip t else copy it to O/P buffer. S -S R :: Read in each tuple t from R and check if t is in S. If yes, delete t from copy of S in memory, else skip t. Copy S into O/P buffer. R -B S ≠ S -B R (Not commutative): Read S into M - 1 buffers and find associated count of each distinct tuple. S -B R :: Read each tuple t from R, check it t occurs in S, if yes, decrement its associated count. At the end, copy each tuple from main memory whose associated count > 0 and number of times we copy equals that count. R -B S :: Read each tuple t from R, check if it occurs in S. If yes, look at current count c associated with t. If c = 0, copy t to O/P buffer. If c > 0, don’t copy but decrement c by 1. If no, copy t to O/P buffer.

One-Pass algorithm for Product operation Read S into M - 1 buffers of memory. NO SPECIAL DATA STRUCTURE IS NEEDED. Read each block of R, and for each tuple t of R. Concatenate t with each tuple of S in memory and copy to O/P as it is formed. This algorithm may take considerable amount of processor time per tuple of R, N X M.

One-Pass algorithm for Natural Join We assume R(X,Y) and S(Y,Z) are being joined and Y represents all the attributes in common. X represents all attributes in R but not in S and Z represents all attributes in S and not in R. Assuming S is the smaller of the relations, Read all tuples of S and form them into a main-memory search structure with the attributes of Y as search key. Use M - 1 blocks of memory for this. [I/P buffer size is M - 1] Read each block of R into Mth buffer/block. For each tuple t of R, find the tuples of S that agree with t on all attributes of Y, using the search structure. For each matching tuple of S, form a tuple by concatenating it with t, and move the resulting tuple to O/P buffer. Takes B(R) + B(S) [Clustered] and T(R) + T(S) [Non-Clustered] disk I/O’s to read operands. Works as long as B(S) ≤ M - 1 or approximately, B(S) ≤ M Equi-Join checks for equality, although the attributes can have different names. Theta-Join is an equi-join or product followed by a selection operation.

References Chapter 15, Section 15.2 One-Pass algorithms - Database Systems - The complete book Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer Widom.