CPSC-608 Database Systems

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Two-Pass Algorithms Based on Sorting
Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 19 Algorithms for Query Processing and Optimization.
Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
CS 540 Database Management Systems
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
COMP 451/651 Optimizing Performance
Nested-Loop joins “one-and-a-half” pass method, since one relation will be read just once. Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in.
Lecture 24: Query Execution Monday, November 20, 2000.
1 Lecture 22: Query Execution Wednesday, March 2, 2005.
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #5.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #12.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 242 Database Systems II Query Execution.
CSCE Database Systems Chapter 15: Query Execution 1.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
CS4432: Database Systems II Query Processing- Part 3 1.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book Krishna Vellanki 124.
Multi pass algorithms. Nested-Loop joins Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to.
CS4432: Database Systems II Query Processing- Part 2.
CPSC-608 Database Systems Fall 2015 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
CSCE Database Systems Chapter 15: Query Execution 1.
Lecture 17: Query Execution Tuesday, February 28, 2001.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
CS 540 Database Management Systems
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Query Processing Spring 2016.
1 Lecture 23: Query Execution Monday, November 26, 2001.
Two-Pass Algorithms Based on Sorting
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Management System
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
15.5 Two-Pass Algorithms Based on Hashing
File Processing : Query Processing
Yan Huang - CSCI5330 Database Implementation – Access Methods
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
Query Execution Two-pass Algorithms based on Hashing
Performance Join Operator Select * from R, S where R.a = S.a;
Lecture 2- Query Processing (continued)
One-Pass Algorithms for Database Operations (15.2)
Lecture 13: Query Execution
Lecture 23: Query Execution
Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.
CPSC-608 Database Systems
Query processing and optimization
CPSC-608 Database Systems
Lecture 22: Query Execution
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
Lecture 22: Query Execution
CPSC-608 Database Systems
CPSC-608 Database Systems
Lecture 11: B+ Trees and Query Execution
CPSC-608 Database Systems
Lecture 20: Query Execution
Presentation transcript:

CPSC-608 Database Systems Fall 2018 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 Email: chen@cse.tamu.edu Notes #29

Algorithms Implementing Relational Algebraic Operations Projection and selection π, σ Set/bag operations US, ∩S, −S, UB, ∩B, −B Join operations Extended operations γ, δ, τ, table-scan × C , π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

If the operation is binary One-pass algorithms Condition: the main memory M is sufficiently large General framework: Read in an entire relation R; Process R; Read in the other relation S block by block; Sent the results to an output block If the operation is binary 3 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Rlarge ∩B Rsmall 1. Make Rsmall a balance tree; 2. FOR each tuple t in Rlarge DO IF t is in Rsmall THEN output t; and remove a copy of t from Rsmall Rsmall Rsmall Rlarge process disk 4 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , −B is not commutative main memory Rsmall Rsmall Rlarge process disk 5 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , −B is not commutative main memory Rsmall Rlarge −B Rsmall 1. Make Rsmall a balance tree; 2. FOR each tuple t in Rlarge DO IF t is not in Rsmall THEN output t ELSE remove a copy of t from Rsmall; Rsmall Rlarge process disk 6 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , −B is not commutative main memory Rsmall Rsmall −B Rlarge 1. Make Rsmall a balance tree; 2. FOR each tuple t in Rlarge DO IF t is in Rsmall THEN remove a copy of t from Rsmall; 3. Output Rsmall. Rsmall Rlarge process disk 7 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Rlarge × Rsmall 1. FOR each tuple t in Rlarge DO cross join t and each tuple in Rsmall and send to the output. Rsmall Rsmall Rlarge process disk 8 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Rlarge Rsmall 1. FOR each tuple t in Rlarge DO cross join t and each tuple in Rsmall ; IF the join satisfies C THEN send to the output. Rsmall C Rsmall Rlarge process disk 9 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Rlarge Rsmall 1. sort Rsmall by join attributes A; 2. FOR each tuple t in Rlarge DO find the tuples in Rsmall with the same A-value; join them with t and put in the output block Rsmall Rsmall Rlarge process disk 10 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Summary: Memory: M ≥ B(Rsmall) Cost: B(Rsmall) + B(Rlarge) Rsmall Rsmall Rlarge process disk 11 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. 12 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS 13 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 14 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS R US S 1. \\ in the first execution of the \\ tR-loop, output tS; 2. \\ in an execution of the tR-loop IF tR = tS THEN mark tR; 3. \\ at the end of the tR-loop IF tR is unmarked THEN output tR. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 15 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS R ∩S S 1. \\ in an execution of the tR-loop IF tR = tS THEN mark tR; 2. \\ at the end of the tR-loop IF tR is marked THEN output tR. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 16 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS R −S S 1. \\ in an execution of the tR-loop IF tR = tS THEN mark tR; 2. \\ at the end of the tR-loop IF tR is unmarked THEN output tR. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 17 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS R −S S 1. \\ in an execution of the tR-loop IF tR = tS THEN mark tR; 2. \\ at the end of the tR-loop IF tR is unmarked THEN output tR. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , Not working for S −S R because −S is not commutative 18 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 19 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS R ∩B S and R −B S Nested-loop does not seem to be effective for R ∩B S and R −B S Remark: we cannot simply mark tR. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 20 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS R ∩B S and R −B S Nested-loop does not seem to be effective for R ∩B S and R −B S Remark: we cannot simply mark tR. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 21 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 22 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Nested-loop is particularly simple for join operations For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop is particularly simple for join operations Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 23 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Nested-loop is particularly simple for join operations For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop is particularly simple for join operations Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS R join S IF tR and tS are joinable THEN Join tR and tS; IF the join is×or ) THEN output the join; ELSE \\ the join is C output the join if it satisfies C Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 24 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Memory: M ≥ 2 Cost: T(R)*T(S) + T(R) Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 25 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Memory: M ≥ 2 Cost: T(R)*T(S) + T(R) Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , Very bad 26 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Memory: M ≥ 2 Cost: T(R)*B(S) + T(R) Nested-loop (R □ S): FOR each tuple tR in R DO FOR each in S DO Apply the operation □ on tR and the tuples in bS block bS Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 27 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Memory: M ≥ 2 Cost: T(R)*B(S) + T(R) Nested-loop (R □ S): FOR each tuple tR in R DO FOR each in S DO Apply the operation □ on tR and the tuples in bS block bS Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , Still large 28 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Memory: M ≥ 2 Cost: B(R)*B(S) + B(R) Nested-loop (R □ S): FOR each in R DO FOR each in S DO Apply the operation □ on the tuples in bR and the tuples in bS block bR block bS Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 29 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Can it be further improved? For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Memory: M ≥ 2 Cost: B(R)*B(S) + B(R) Nested-loop (R □ S): FOR each in R DO FOR each in S DO Apply the operation □ on the tuples in bR and the tuples in bS block bR block bS Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , Can it be further improved? 30 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

max # blocks fitting in M For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Memory: M ≥ 2 Cost: B(R)*B(S)/M + B(R) Nested-loop (R □ S): FOR in R DO FOR each in S DO Apply the operation □ on the tuples in R and the tuples in bS max # blocks fitting in M block bS Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 31 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Memory: M ≥ 2 Cost: B(R)*B(S)/M + B(R) Nested-loop (R □ S): FOR in R DO FOR each in S DO Apply the operation □ on the tuples in R and the tuples in bS max # blocks fitting in M block bS Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , Very good if B(R) or B(S) is only slightly larger than M 32 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

max # blocks fitting in M For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Memory: M ≥ 2 Cost: B(R)*B(S)/M + B(R) Nested-loop (R □ S): FOR in R DO FOR each in S DO Apply the operation □ on the tuples in R and the tuples in bS max # blocks fitting in M block bS Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , Should pick the smaller relation for the outer loop (not working for −S ) 33 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Algorithms Implementing Relational Algebraic Operations Quick Review What We did Operations requiring almost no space: π, σ, UB, table-scan One-pass Algorithms: γ, δ, τ, US, ∩S, −S, ∩B, −B, Nested-loop Algorithms For binary operations: US, ∩S, −S, × C , × C , Memory: M = 2 Cost: π (R), σ(R), table-scan(R): B(R) UB(R, S): B(R) + B(S) Unary: Memory: M ≥ B(R) Cost: B(R) Binary: Memory: M ≥ B(Rsmall) Cost: B(Rsmall) + B(Rlarge) Memory: M ≥ 2 Cost: B(R)*B(S)/M + B(R) π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass algorithms π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, 35 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass algorithms Condition: large relations that cannot fit into the main memory M, but not extremely large. 36 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass algorithms Basic Ideas: Condition: large relations that cannot fit into the main memory M, but not extremely large. Basic Ideas: Break relations into smaller pieces that fit in the main memory, make them more organized, and store them back to disk; Apply operation based on “merging” the blocks from the smaller and more organized pieces. 37 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass algorithms Basic Ideas: Condition: large relations that cannot fit into the main memory M, but not extremely large. Basic Ideas: Break relations into smaller pieces that fit in the main memory, make them more organized, and store them back to disk; Apply operation based on “merging” the blocks from the smaller and more organized pieces. Two main techniques: Sort-based; Hash-based. 38 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass algorithms Basic Ideas: Condition: large relations that cannot fit into the main memory M, but not extremely large. Basic Ideas: Break relations into smaller pieces that fit in the main memory, make them more organized, and store them back to disk; Apply operation based on “merging” the blocks from the smaller and more organized pieces. Two main techniques: Sort-based; Hash-based. 39 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms 40 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; 41 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Review: Two-phase Multiway MergeSort Phase 1. making sorted sublists Repeat Fill the main memory with remaining tuples in R and sort them; Write the sorted sublist back to disk. Phase 2. Merging Bring in a block from each of the sorted sublists; Merge them and put in the output block;

Two-phase Multiway MergeSort Main memory Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort First Phase Main memory Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort First Phase Sort it Main memory Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort First Phase Main memory Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort First Phase Main memory Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort First Phase Sort it Main memory Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort First Phase Main memory Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort First Phase Main memory Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort First Phase Sort it Main memory Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort First Phase Main memory Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort First Phase Main memory Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort First Phase Sort it Main memory Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort First Phase Main memory Two-phase Multiway MergeSort Disk

Second Phase Main memory Disk

Two-phase Multiway MergeSort Second Phase Main memory One block per sublist Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort Main memory One block per sublist Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort Main memory One block per sublist Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort Main memory One block per sublist Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort Main memory One block per sublist Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort Main memory One block per sublist Two-phase Multiway MergeSort Disk

Two-phase Multiway MergeSort Main memory Two-phase Multiway MergeSort Disk

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; R 64 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Unary operations γ(R), δ(R), τ(R) main memory R 65 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Unary operations γ(R), δ(R), τ(R) main memory R Phase I 66 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Unary operations γ(R), δ(R), τ(R) main memory R R End of phase I 67 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Unary operations γ(R), δ(R), τ(R) main memory R Phase II 68 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Unary operations γ(R), δ(R), τ(R) main memory Phase II R R 69 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Unary operations γ(R), δ(R), τ(R) δ(R) 1. Remove all minimums except one; 2. Output the minimum main memory Phase II R R 70 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

(Apply to all algorithms) Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Unary operations γ(R), δ(R), τ(R) δ(R) 1. Remove all minimums except one; 2. Output the minimum main memory Phase II R R Remark. read in the next block from a sublist if its block is exhausted (Apply to all algorithms) 71 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Unary operations γ(R), δ(R), τ(R) γ(R) \\ sublists are sorted by \\ the grouping attributes 1. Group all tuples with the minimum grouping attributes; 2. Calculate the aggregation value; 3. Output a grouping tuple. main memory Phase II R R 72 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Unary operations γ(R), δ(R), τ(R) Summary: Memory: M ≥ √B(R) Cost: 3B(R) main memory Phase II R R 73 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory R R S 74 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory R R Phase I S 75 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory R R End of phase I S 76 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory R Phase II S 77 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory R Phase II S 78 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

May build an efficient data structure for searching the minimum. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory R May build an efficient data structure for searching the minimum. Phase II S 79 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , R US S REPEAT IF Rmin = Smin THEN send one copy to output; delete both; ELSE send the smaller to output, and delete it. main memory R Phase II S 80 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,