CPSC-608 Database Systems

CPSC-608 Database Systems
Fall 2018 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #31

Algorithms Implementing Relational Algebraic Operations
Projection and selection π, σ Set/bag operations US, ∩S, −S, UB, ∩B, −B Join operations Extended operations γ, δ, τ, table-scan × C , π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass algorithms Basic Ideas:
Condition: large relations that cannot fit into the main memory M, but not extremely large. Basic Ideas: Break relations into smaller pieces that fit in the main memory, make them more organized, and store them back to disk; Apply operation based on “merging” the blocks from the smaller and more organized pieces. Two main techniques: Sort-based; Hash-based. 3 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms
General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. main memory Phase I …… …… …… …… 4 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. One-pass Algorithms: γ, δ, τ, US, ∩S, −S, ∩B, −B, × C , Unary: Memory: M ≥ B(R) Cost: B(R) Binary: Memory: M ≥ B(Rsmall) Cost: B(Rsmall) + B(Rlarge) R 5 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold an entire bucket
Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations γ(R), δ(R) Assumed condition: M is large enough to hold an entire bucket Summary: Memory: M ≥ √B(R) Cost: 3B(R) main memory Bucket 1 Bucket 2 …… Bucket M R Phase II One bucket per time R Assume a good hash function 6 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory R R S 7 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory R Phase I S 8 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Bucket 1 Bucket 2 …… Bucket M R Phase I S 9 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Bucket 1 Bucket 2 …… Bucket M R Phase I Bucket 1 Bucket 2 S …… …… …… Bucket M 10 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Bucket 1 Bucket 2 …… Bucket M R End of phase I Bucket 1 Bucket 2 …… S …… …… Bucket M 11 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Bucket 1 Bucket 2 …… Bucket M R Phase II Bucket 1 Bucket 2 …… S …… …… Bucket M 12 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Bucket 1 Bucket 2 …… Bucket M R Again we can apply the one-pass algorithm if the smaller bucket can fit M Phase II Bucket 1 Bucket 2 …… S …… …… Bucket M 13 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Bucket 2 …… Bucket 1 R Again we can apply the one-pass algorithm if the smaller bucket can fit M …… …… Bucket M Bucket 1 Phase II Bucket 2 …… S …… …… Bucket M 14 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Assumed condition: M is large enough to hold the smaller bucket
Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , Assumed condition: M is large enough to hold the smaller bucket main memory Bucket 2 …… Bucket 1 R …… …… Bucket M Bucket 1 Phase II Bucket 2 …… S …… …… Bucket M 15 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , Assumed condition: M is large enough to hold the smaller bucket main memory Bucket 2 …… Bucket 1 Hash-based algorithm does not work for × and C R …… …… Bucket M Bucket 1 Phase II Bucket 2 …… S …… …… Bucket M 16 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket main memory Bucket 2 …… Bucket 1 R …… …… Bucket M Bucket 1 Phase II Bucket 2 …… S …… …… Bucket M 17 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket R US S \\ the same work for \\ ∩S, −S, ∩B, −B FOR each bucket index i DO call the one-pass algorithm on the Ri-bucket and Si-bucket main memory Bucket 2 …… Bucket 1 R …… …… Bucket M Bucket 1 Phase II Bucket 2 …… S …… …… Bucket M 18 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket R S \\ hash based on join attributes FOR each bucket index i DO call the one-pass algorithm on the Ri-bucket and Si-bucket main memory Bucket 2 …… Bucket 1 R …… …… Bucket M Bucket 1 Phase II Bucket 2 …… S …… …… Bucket M 19 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket Summary: Memory: M ≥ √ B(Rsmall) Cost: 3(B(R) + B(S)) main memory Bucket 2 …… Bucket 1 R …… …… Bucket M Bucket 1 Phase II Bucket 2 …… S …… …… Bucket M 20 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket Summary: Memory: M ≥ √ B(Rsmall) Cost: 3(B(R) + B(S)) Comments: Memory use is better than sort-based; The output is not sorted; Requires a good hash function. main memory Bucket 2 …… Bucket 1 R …… …… Bucket M Bucket 1 Phase II Bucket 2 …… S …… …… Bucket M 21 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s main memory R S 22 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; main memory R Phase I S Some free space 23 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); main memory R Phase I S Use it to hold some buckets 24 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket M = 5, keep k = 2 buckets in M, only write D = M − k = 3 buckets back to disk A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); main memory R Phase I S Bucket 1 S-buckets not written back to disk …… Bucket D 25 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket M = 5, keep k = 2 buckets in M, only write D = M − k = 3 buckets back to disk A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); main memory R Phase I Bucket 1 …… Bucket D S S-buckets not written back to disk 26 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket M = 5, keep k = 2 buckets in M, only write D = M − k = 3 buckets back to disk A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 Bucket D …… R Phase I Bucket 1 …… Bucket D S Directly operate with S-tuples here. 27 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket M = 5, keep k = 2 buckets in M, only write D = M − k = 3 buckets back to disk A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 Bucket D …… R End of phase I Bucket 1 …… Bucket D S Only D pairs of buckets left. 28 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket M = 5, keep k = 2 buckets in M, only write D = M − k = 3 buckets back to disk A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 Bucket D …… R Phase I Bucket 1 …… Bucket D S So the tuples in these k = 2 buckets save two disk I/O’s 29 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

How many (k) buckets should be left in M? Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 Bucket D …… R Phase I Bucket 1 …… Bucket D S So the tuples in these k = 2 buckets save two disk I/O’s 30 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

How many (k) buckets should be left in M? 1. Does not matter. More critical is the amount of M space used for holding these buckets; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 Bucket D …… R Phase I Bucket 1 …… Bucket D S So the tuples in these k buckets save two disk I/O’s 31 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

How many (k) buckets should be left in M? 1. Does not matter. More critical is the amount of M space used for holding these buckets; 2. larger k → smaller bucket → more buckets → more bucket blocks in M → less M space for holding buckets Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 Bucket D …… R Phase I Bucket 1 …… Bucket D S So the tuples in these k buckets save two disk I/O’s 32 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

How many (k) buckets should be left in M? 1. Does not matter. More critical is the amount of M space used for holding these buckets; 2. larger k → smaller bucket → more buckets → more bucket blocks in M → less M space for holding buckets 3. So k should be as small as possible: k = 1 Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 Bucket D …… R Phase I Bucket 1 …… Bucket D S So the tuples in these k buckets save two disk I/O’s 33 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

How many (k) buckets should be left in M? 1. Does not matter. More critical is the amount of M space used for holding these buckets; 2. larger k → smaller bucket → more buckets → more bucket blocks in M → less M space for holding buckets 3. So k should be as small as possible: k = 1 Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory …… Bucket 1 …… R Bucket D Phase I Bucket 1 …… …… Bucket D S The tuples in this bucket save two disk I/O’s 34 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

How many disk I/O’s we have saved? Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory …… Bucket 1 …… R Bucket D Phase I Bucket 1 …… …… Bucket D S The tuples in this bucket save two disk I/O’s 35 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms How many disk I/O’s we have saved? 1. Let k = 1, bucket size = M/2 Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory …… Bucket 1 …… R Bucket D Phase I Bucket 1 …… …… Bucket D S 36 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms How many disk I/O’s we have saved? 1. Let k = 1, bucket size = M/2 2. #buckets = M/2 + 1 ≈ M/2 (so B(S) ≈ M2/4) Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory …… Bucket 1 …… R Bucket D Phase I Bucket 1 …… …… Bucket D S 37 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms How many disk I/O’s we have saved? 1. Let k = 1, bucket size = M/2 2. #buckets = M/2 + 1 ≈ M/2 (so B(S) ≈ M2/4) 3. S saves 2*(M/2) = M = B(S)·(M/B(S)) disk I/O’s Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory …… Bucket 1 …… R Bucket D Phase I Bucket 1 …… …… Bucket D S 38 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms How many disk I/O’s we have saved? 1. Let k = 1, bucket size = M/2 2. #buckets = M/2 + 1 ≈ M/2 (so B(S) ≈ M2/4) 3. S saves 2*(M/2) = M = B(S)·(M/B(S)) disk I/O’s 4. R saves 2B(R)/(M/2) = B(R)·(M/B(S)) disk I/O’s Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory …… Bucket 1 …… R Bucket D Phase I Bucket 1 …… …… Bucket D S 39 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms How many disk I/O’s we have saved? 1. Let k = 1, bucket size = M/2 2. #buckets = M/2 + 1 ≈ M/2 (so B(S) ≈ M2/4) 3. S saves 2*(M/2) = M = B(S)·(M/B(S)) disk I/O’s 4. R saves 2B(R)/(M/2) = B(R)·(M/B(S)) disk I/O’s 5. Original cost without these savings: 3(B(R) + B(S)) Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory …… Bucket 1 …… R Bucket D Phase I Bucket 1 …… …… Bucket D S 40 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms How many disk I/O’s we have saved? 1. Let k = 1, bucket size = M/2 2. #buckets = M/2 + 1 ≈ M/2 (so B(S) ≈ M2/4) 3. S saves 2*(M/2) = M = B(S)·(M/B(S)) disk I/O’s 4. R saves 2B(R)/(M/2) = B(R)·(M/B(S)) disk I/O’s 5. Original cost without these savings: 3(B(R) + B(S)) 6. So now the cost is (3 – M/B(S))(B(R) + B(S)) Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory …… Bucket 1 …… R Bucket D Phase I Bucket 1 …… …… Bucket D S 41 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms How many disk I/O’s we have saved? 1. Let k = 1, bucket size = M/2 2. #buckets = M/2 + 1 ≈ M/2 (so B(S) ≈ M2/4) 3. S saves 2*(M/2) = M = B(S)·(M/B(S)) disk I/O’s 4. R saves 2B(R)/(M/2) = B(R)·(M/B(S)) disk I/O’s 5. Original cost without these savings: 3(B(R) + B(S)) 6. So now the cost is (3 – M/B(S))(B(R) + B(S)) (with a memory requirement M ≥ 2√ B(S)) Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory …… Bucket 1 …… R Bucket D Phase I Bucket 1 …… …… Bucket D S 42 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms How many disk I/O’s we have saved? 1. Let k = 1, bucket size = M/2 2. #buckets = M/2 + 1 ≈ M/2 (so B(S) ≈ M2/4) 3. S saves 2*(M/2) = M = B(S)·(M/B(S)) disk I/O’s 4. R saves 2B(R)/(M/2) = B(R)·(M/B(S)) disk I/O’s 5. Original cost without these savings: 3(B(R) + B(S)) 6. So now the cost is (3 – M/B(S))(B(R) + B(S)) (with a memory requirement M ≥ 2√ B(S)) Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 …… R Bucket D …… Phase I The trick can also be applied to unary operations, reducing the cost to 3B(R) − M 43 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms How many disk I/O’s we have saved? \\ In general: Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 …… R Bucket D …… Phase I The trick can also be applied to unary operations, reducing the cost to 3B(R) − M 44 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms How many disk I/O’s we have saved? \\ In general: 1. Let bucket size = c·M; \\ 0 < c < 1 #buckets = (1−c)M + 1 ≈ (1−c)M; \\B(S) = c(1−c)M2 \\ c is the larger root of c2 − c + B(S)/M2 Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 …… R Bucket D …… Phase I The trick can also be applied to unary operations, reducing the cost to 3B(R) − M 45 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms How many disk I/O’s we have saved? \\ In general: 1. Let bucket size = c·M; \\ 0 < c < 1 #buckets = (1−c)M + 1 ≈ (1−c)M; \\B(S) = c(1−c)M2 \\ c is the larger root of c2 − c + B(S)/M2 2. S saves 2c·M = 2c·B(S)·(M/B(S)) disk I/O’s Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 …… R Bucket D …… Phase I The trick can also be applied to unary operations, reducing the cost to 3B(R) − M 46 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms How many disk I/O’s we have saved? \\ In general: 1. Let bucket size = c·M; \\ 0 < c < 1 #buckets = (1−c)M + 1 ≈ (1−c)M; \\B(S) = c(1−c)M2 \\ c is the larger root of c2 − c + B(S)/M2 2. S saves 2c·M = 2c·B(S)·(M/B(S)) disk I/O’s 3. R saves 2B(R)/(1-c)M = 2c·B(R)·(M/B(S)) disk I/O’s Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 …… R Bucket D …… Phase I The trick can also be applied to unary operations, reducing the cost to 3B(R) − M 47 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms How many disk I/O’s we have saved? \\ In general: 1. Let bucket size = c·M; \\ 0 < c < 1 #buckets = (1−c)M + 1 ≈ (1−c)M; \\B(S) = c(1−c)M2 \\ c is the larger root of c2 − c + B(S)/M2 2. S saves 2c·M = 2c·B(S)·(M/B(S)) disk I/O’s 3. R saves 2B(R)/(1-c)M = 2c·B(R)·(M/B(S)) disk I/O’s 4. Original cost without these savings: 3(B(R) + B(S)) Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 …… R Bucket D …… Phase I The trick can also be applied to unary operations, reducing the cost to 3B(R) − M 48 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Two-pass hash-based algorithms How many disk I/O’s we have saved? \\ In general: 1. Let bucket size = c·M; \\ 0 < c < 1 #buckets = (1−c)M + 1 ≈ (1−c)M; \\B(S) = c(1−c)M2 \\ c is the larger root of c2 − c + B(S)/M2 2. S saves 2c·M = 2c·B(S)·(M/B(S)) disk I/O’s 3. R saves 2B(R)/(1-c)M = 2c·B(R)·(M/B(S)) disk I/O’s 4. Original cost without these savings: 3(B(R) + B(S)) 5. So now the cost is (3 – 2c·M/B(S))(B(R) + B(S)) \\ with a memory requirement M ≥ √ B(S)/c(1−c) Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, Assumed condition: M is large enough to hold the smaller bucket A trick to save some disk I/O’s 1. Use fewer buckets to leave some free M space; 2. Use the free M space to hold some buckets so they are not written back to disk (so save Disk I/O’s); 3. Operation on these buckets are performed directly. main memory Bucket 1 …… R Bucket D …… Phase I The trick can also be applied to unary operations, reducing the cost to 3B(R) − M 49 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

Algorithms Implementing Relational Algebraic Operations
Summary Memory: M = 2 Cost: π (R), σ(R), table-scan(R): B(R) UB(R, S): B(R) + B(S) Operations requiring almost no space: π, σ, UB, table-scan One-pass Algorithms: γ, δ, τ, US, ∩S, −S, ∩B, −B, Unary: Memory: M ≥ B(R) Cost: B(R) Binary: Memory: M ≥ B(Rsmall) Cost: B(Rsmall) + B(Rlarge) × C , Nested-loop Algorithms For binary operations: US, ∩S, −S, × C , Memory: M ≥ 2 Cost: B(R)*B(S)/M + B(R) Two-pass sort-based algorithms: γ, δ, τ, US, ∩S, −S, ∩B, −B, Unary: Memory: M ≥ √ B(R) Cost: 3B(R) Binary: Memory: M ≥ √ B(R) + B(S) Cost: 3(B(R) + B(S)) Two-pass hash-based algorithms: γ, δ, US, ∩S, −S, ∩B, −B, Unary: Memory: M ≥ √ B(R) Cost: 3B(R) Binary: Memory: M ≥ √ B(Rsmall) Cost: 3(B(Rsmall) + B(Rlarge)) Two-pass hybrid hash-based: γ, δ, US, ∩S, −S, ∩B, −B, Unary: Memory: M ≥ 2√ B(R) Cost: 3B(R) − M Binary: Memory: M ≥ √ B(Rsmall)/c(c-1) Cost: (3 – 2c·M/B(Rsmall)) · (B(Rsmall) + B(Rlarge)) (c is the larger root of c2−c+B(S)/M2) π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

parse tree-lqp convertor
Query Optimization An input database program P SELECT c FROM S(a,b), T(b,c) WHERE S.b = T.b AND a>4; Prepare a collection C of efficient algorithms for operations in relational algebra; parser <statement> <select-statement> select <select-list> from <tbl-list> where <search-condition> <select-sublist> <column-name> <tbl-name> , S(a,b) <b-term> <b-facor> and T(b,c) <b-primary> <comp-pred> <exp> <co-op> = <term> <factor> S.b T.b > a 4 <integer> parse tree View processing, Semantic checking preprocessing parse tree parse tree-lqp convertor logic query plan push selections, group joins apply logic laws logic query plan S(a,b) T(b,c) c × π σ S.b=T.b AND a>4 reduce the size of intermediate results Optimization via logic and size choices of algorithms, data structures, and computational modes logic query plan ScanTable(S(a,b)) ScanTable(T(b,c)) Alg-CrossProd Select(S.b=T.b & a>4) Project(c) output Lqp-pqp convertor take care of issues in optimization and security. physical query plan Optimization via algorithms and cost Machine executable code

CPSC-608 Database Systems

Similar presentations

Presentation on theme: "CPSC-608 Database Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CPSC-608 Database Systems

Similar presentations

Presentation on theme: "CPSC-608 Database Systems"— Presentation transcript:

Similar presentations

About project

Feedback