Presentation is loading. Please wait.

Presentation is loading. Please wait.

CPSC-608 Database Systems

Similar presentations


Presentation on theme: "CPSC-608 Database Systems"— Presentation transcript:

1 CPSC-608 Database Systems
Fall 2018 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #30

2 Algorithms Implementing Relational Algebraic Operations
Projection and selection π, σ Set/bag operations US, ∩S, −S, UB, ∩B, −B Join operations Extended operations γ, δ, τ, table-scan × C , π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

3 Algorithms Implementing Relational Algebraic Operations
Quick Review What We did Operations requiring almost no space: π, σ, UB, table-scan One-pass Algorithms: γ, δ, τ, US, ∩S, −S, ∩B, −B, Nested-loop Algorithms For binary operations: US, ∩S, −S, × C , × C , Memory: M = 2 Cost: π (R), σ(R), table-scan(R): B(R) UB(R, S): B(R) + B(S) Unary: Memory: M ≥ B(R) Cost: B(R) Binary: Memory: M ≥ B(Rsmall) Cost: B(Rsmall) + B(Rlarge) Memory: M ≥ 2 Cost: B(R)*B(S)/M + B(R) π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

4 Two-pass algorithms Basic Ideas:
Condition: large relations that cannot fit into the main memory M, but not extremely large. Basic Ideas: Break relations into smaller pieces that fit in the main memory, make them more organized, and store them back to disk; Apply operation based on “merging” the blocks from the smaller and more organized pieces. Two main techniques: Sort-based; Hash-based. 4 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

5 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; R 5 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

6 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Unary operations γ(R), δ(R), τ(R) Summary: Memory: M ≥ √B(R) Cost: 3B(R) main memory Phase II R R 6 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

7 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory R R S 7 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

8 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory R R Phase I S 8 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

9 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory R R End of phase I S 9 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

10 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory R Phase II S 10 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

11 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , R US S REPEAT IF Rmin = Smin THEN send one copy to output; delete both; ELSE send the smaller to output, and delete it. main memory R Phase II S 11 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

12 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , R ∩S S REPEAT IF Rmin = Smin THEN send one copy to output; delete both; ELSE delete the smaller. main memory R Phase II S 12 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

13 The same algorithm works for R ∩B S!
Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , R ∩S S REPEAT IF Rmin = Smin THEN send one copy to output; delete both; ELSE delete the smaller. main memory R Phase II S The same algorithm works for R ∩B S! 13 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

14 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , R −S S REPEAT IF Rmin = Smin THEN delete both; ELSE IF Rmin > Smin THEN delete Smin; ELSE \\ Rmin < Smin send Rmin to output; and delete Rmin. main memory R Phase II S 14 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

15 The same algorithm works for R −B S!
Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , R −S S REPEAT IF Rmin = Smin THEN delete both; ELSE IF Rmin > Smin THEN delete Smin; ELSE \\ Rmin < Smin send Rmin to output; and delete Rmin. main memory R Phase II S The same algorithm works for R −B S! 15 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

16 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , R × S main memory R Phase II S 16 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

17 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , R × S Sorted sublists do not seem to help computing R×S: each tuple of R should be joined with all tuples of S main memory R Phase II S 17 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

18 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , R × S R C S Sorted sublists do not seem to help computing R×S: each tuple of R should be joined with all tuples of S main memory The same is true for R C S R Phase II S 18 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

19 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , R × S R C S main memory R Simply use the Nested-Loop algorithm Memory: M ≥ 2 cost: B(R)B(S)/M + B(R) Phase II S 19 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

20 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , R S How about R S? main memory R Phase II S 20 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

21 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , R S How about R S? main memory R In extreme cases (a large number of tuples are joinable), the sort-based algorithm does not work for On the other hand, most practical and interesting cases are not that extreme Phase II S 21 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

22 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , R S \\ sublists are sorted by the join \\ attributes. REPEAT IF Rmin = Smin THEN collect all Rmin-tuples and all Smin-tuples, and send their join to the output; delete all Rmin and Smintuples; ELSE delete the smaller; main memory R Phase II S 22 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

23 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , Summary: Memory: M ≥ √ B(R) + B(S) Cost: 3(B(R) + B(S)) main memory R Phase II S 23 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

24 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , Summary: Memory: M ≥ √ B(R) + B(S) Cost: 3(B(R) + B(S)) Comments: main memory R Phase II S 24 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

25 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , Summary: Memory: M ≥ √ B(R) + B(S) Cost: 3(B(R) + B(S)) Comments: 1. Applicable unless the relations are extremely large, in that case we can extend the method to multiway pass; main memory R Phase II S 25 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

26 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , Summary: Memory: M ≥ √ B(R) + B(S) Cost: 3(B(R) + B(S)) Comments: 1. Applicable unless the relations are extremely large, in that case we can extend the method to multiway pass; 2. The output is sorted. main memory R Phase II S 26 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

27 Two-pass algorithms Basic Ideas:
Condition: large relations that cannot fit into the main memory M, but not extremely large. Basic Ideas: Break relations into smaller pieces that fit in the main memory, make them more organized, and store them back to disk; Apply operation based on “merging” the blocks from the smaller and more organized pieces. Two main techniques: Sort-based; Hash-based. Two-pass sort-based algorithms: γ, δ, τ, US, ∩S, −S, ∩B, −B, Unary: Memory: M ≥ √ B(R) Cost: 3B(R) Binary: Memory: M ≥ √ B(R) + B(S) Cost: 3(B(R) + B(S)) 27 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

28 Two-pass algorithms Basic Ideas:
Condition: large relations that cannot fit into the main memory M, but not extremely large. Basic Ideas: Break relations into smaller pieces that fit in the main memory, make them more organized, and store them back to disk; Apply operation based on “merging” the blocks from the smaller and more organized pieces. Two main techniques: Sort-based; Hash-based. 28 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

29 Two-pass hash-based algorithms
General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. R 29 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

30 Two-pass hash-based algorithms
General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. main memory Phase I …… …… …… …… 30 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

31 Two-pass hash-based algorithms
General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. main memory Phase I 31 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

32 Two-pass hash-based algorithms
General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. main memory Phase I …… …… …… …… 32 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

33 Two-pass hash-based algorithms
General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. R 33 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

34 Two-pass hash-based algorithms
General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. One-pass Algorithms: γ, δ, τ, US, ∩S, −S, ∩B, −B, × C , Unary: Memory: M ≥ B(R) Cost: B(R) Binary: Memory: M ≥ B(Rsmall) Cost: B(Rsmall) + B(Rlarge) R 34 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

35 Two-pass hash-based algorithms
General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations γ(R), δ(R), τ(R) main memory R 35 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

36 Two-pass hash-based algorithms
General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations γ(R), δ(R), τ(R) main memory R Phase I 36 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

37 Two-pass hash-based algorithms
General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations γ(R), δ(R), τ(R) main memory …… Bucket 1 Bucket 2 R Phase I Bucket M …… …… 37 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

38 Two-pass hash-based algorithms
General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations γ(R), δ(R), τ(R) main memory Bucket 1 Bucket 2 …… R End of phase I …… …… Bucket M 38 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

39 Two-pass hash-based algorithms
General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations γ(R), δ(R), τ(R) main memory Bucket 2 …… Bucket 1 One bucket per time R …… …… Bucket M Phase II 39 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

40 Two-pass hash-based algorithms
General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations γ(R), δ(R), τ(R) τ(R) main memory Bucket 2 …… Bucket 1 One bucket per time R …… …… Bucket M Phase II 40 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

41 Hash-based algorithm does not apply for sorting
Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations γ(R), δ(R), τ(R) τ(R) Hash-based algorithm does not apply for sorting main memory Bucket 1 Bucket 2 …… Bucket M R Phase II One bucket per time 41 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

42 Two-pass hash-based algorithms
General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations γ(R), δ(R), τ(R) δ(R) \\ all duplicates are in the \\ same bucket. Call one-pass algorithm on each bucket main memory Bucket 1 Bucket 2 …… Bucket M R Phase II One bucket per time R 42 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

43 M must be large enough to hold an entire bucket
Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations γ(R), δ(R), τ(R) δ(R) \\ all duplicates are in the \\ same bucket. Call one-pass algorithm on each bucket main memory Bucket 1 Bucket 2 …… Bucket M R Phase II One bucket per time R M must be large enough to hold an entire bucket 43 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

44 Assumed condition: M is large enough to hold an entire bucket
Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations γ(R), δ(R), τ(R) Assumed condition: M is large enough to hold an entire bucket δ(R) \\ all duplicates are in the \\ same bucket. Call one-pass algorithm on each bucket main memory Bucket 1 Bucket 2 …… Bucket M R Phase II One bucket per time R 44 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

45 Two-pass hash-based algorithms
General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations γ(R), δ(R), τ(R) Assumed condition: M is large enough to hold an entire bucket δ(R) \\ all duplicates are in the \\ same bucket. Call one-pass algorithm on each bucket main memory Bucket 1 Bucket 2 …… Bucket M R Phase II One bucket per time R Also work for γ(R) if hash is based on the grouping attributes 45 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

46 Assumed condition: M is large enough to hold an entire bucket
Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations γ(R), δ(R) Assumed condition: M is large enough to hold an entire bucket Summary: Memory: M ≥ √B(R) Cost: 3B(R) main memory Bucket 1 Bucket 2 …… Bucket M R Phase II One bucket per time R Assume a good hash function 46 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,


Download ppt "CPSC-608 Database Systems"

Similar presentations


Ads by Google