Presentation is loading. Please wait.

Presentation is loading. Please wait.

CPSC-608 Database Systems

Similar presentations


Presentation on theme: "CPSC-608 Database Systems"— Presentation transcript:

1 CPSC-608 Database Systems
Fall 2018 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #29

2 Algorithms Implementing Relational Algebraic Operations
Projection and selection π, σ Set/bag operations US, ∩S, −S, UB, ∩B, −B Join operations Extended operations γ, δ, τ, table-scan × C , π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

3 If the operation is binary
One-pass algorithms Condition: the main memory M is sufficiently large General framework: Read in an entire relation R; Process R; Read in the other relation S block by block; Sent the results to an output block If the operation is binary 3 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

4 One-pass algorithms Condition: the main memory M is sufficiently large
General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Rlarge ∩B Rsmall 1. Make Rsmall a balance tree; 2. FOR each tuple t in Rlarge DO IF t is in Rsmall THEN output t; and remove a copy of t from Rsmall Rsmall Rsmall Rlarge process disk 4 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

5 One-pass algorithms Condition: the main memory M is sufficiently large
General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , −B is not commutative main memory Rsmall Rsmall Rlarge process disk 5 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

6 One-pass algorithms Condition: the main memory M is sufficiently large
General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , −B is not commutative main memory Rsmall Rlarge −B Rsmall 1. Make Rsmall a balance tree; 2. FOR each tuple t in Rlarge DO IF t is not in Rsmall THEN output t ELSE remove a copy of t from Rsmall; Rsmall Rlarge process disk 6 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

7 One-pass algorithms Condition: the main memory M is sufficiently large
General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , −B is not commutative main memory Rsmall Rsmall −B Rlarge 1. Make Rsmall a balance tree; 2. FOR each tuple t in Rlarge DO IF t is in Rsmall THEN remove a copy of t from Rsmall; 3. Output Rsmall. Rsmall Rlarge process disk 7 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

8 One-pass algorithms Condition: the main memory M is sufficiently large
General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Rlarge × Rsmall 1. FOR each tuple t in Rlarge DO cross join t and each tuple in Rsmall and send to the output. Rsmall Rsmall Rlarge process disk 8 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

9 One-pass algorithms Condition: the main memory M is sufficiently large
General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Rlarge Rsmall 1. FOR each tuple t in Rlarge DO cross join t and each tuple in Rsmall ; IF the join satisfies C THEN send to the output. Rsmall C Rsmall Rlarge process disk 9 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

10 One-pass algorithms Condition: the main memory M is sufficiently large
General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Rlarge Rsmall 1. sort Rsmall by join attributes A; 2. FOR each tuple t in Rlarge DO find the tuples in Rsmall with the same A-value; join them with t and put in the output block Rsmall Rsmall Rlarge process disk 10 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

11 One-pass algorithms Condition: the main memory M is sufficiently large
General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory Summary: Memory: M ≥ B(Rsmall) Cost: B(Rsmall) + B(Rlarge) Rsmall Rsmall Rlarge process disk 11 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

12 For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. 12 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

13 For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS 13 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

14 For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 14 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

15 For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS R US S 1. \\ in the first execution of the \\ tR-loop, output tS; 2. \\ in an execution of the tR-loop IF tR = tS THEN mark tR; 3. \\ at the end of the tR-loop IF tR is unmarked THEN output tR. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 15 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

16 For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS R ∩S S 1. \\ in an execution of the tR-loop IF tR = tS THEN mark tR; 2. \\ at the end of the tR-loop IF tR is marked THEN output tR. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 16 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

17 For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS R −S S 1. \\ in an execution of the tR-loop IF tR = tS THEN mark tR; 2. \\ at the end of the tR-loop IF tR is unmarked THEN output tR. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 17 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

18 For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS R −S S 1. \\ in an execution of the tR-loop IF tR = tS THEN mark tR; 2. \\ at the end of the tR-loop IF tR is unmarked THEN output tR. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , Not working for S −S R because −S is not commutative 18 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

19 For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 19 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

20 For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS R ∩B S and R −B S Nested-loop does not seem to be effective for R ∩B S and R −B S Remark: we cannot simply mark tR. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 20 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

21 For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS R ∩B S and R −B S Nested-loop does not seem to be effective for R ∩B S and R −B S Remark: we cannot simply mark tR. Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 21 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

22 For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 22 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

23 Nested-loop is particularly simple for join operations
For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop is particularly simple for join operations Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 23 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

24 Nested-loop is particularly simple for join operations
For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop is particularly simple for join operations Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS R join S IF tR and tS are joinable THEN Join tR and tS; IF the join is×or ) THEN output the join; ELSE \\ the join is C output the join if it satisfies C Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 24 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

25 For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Memory: M ≥ 2 Cost: T(R)*T(S) + T(R) Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 25 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

26 For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Memory: M ≥ 2 Cost: T(R)*T(S) + T(R) Nested-loop (R □ S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation □ on tR and tS Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , Very bad 26 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

27 For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Memory: M ≥ 2 Cost: T(R)*B(S) + T(R) Nested-loop (R □ S): FOR each tuple tR in R DO FOR each in S DO Apply the operation □ on tR and the tuples in bS block bS Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 27 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

28 For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Memory: M ≥ 2 Cost: T(R)*B(S) + T(R) Nested-loop (R □ S): FOR each tuple tR in R DO FOR each in S DO Apply the operation □ on tR and the tuples in bS block bS Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , Still large 28 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

29 For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Memory: M ≥ 2 Cost: B(R)*B(S) + B(R) Nested-loop (R □ S): FOR each in R DO FOR each in S DO Apply the operation □ on the tuples in bR and the tuples in bS block bR block bS Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 29 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

30 Can it be further improved?
For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Memory: M ≥ 2 Cost: B(R)*B(S) + B(R) Nested-loop (R □ S): FOR each in R DO FOR each in S DO Apply the operation □ on the tuples in bR and the tuples in bS block bR block bS Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , Can it be further improved? 30 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

31 max # blocks fitting in M
For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Memory: M ≥ 2 Cost: B(R)*B(S)/M + B(R) Nested-loop (R □ S): FOR in R DO FOR each in S DO Apply the operation □ on the tuples in R and the tuples in bS max # blocks fitting in M block bS Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , 31 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

32 For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Memory: M ≥ 2 Cost: B(R)*B(S)/M + B(R) Nested-loop (R □ S): FOR in R DO FOR each in S DO Apply the operation □ on the tuples in R and the tuples in bS max # blocks fitting in M block bS Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , Very good if B(R) or B(S) is only slightly larger than M 32 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

33 max # blocks fitting in M
For larger relations When relations cannot fit in main memory, one-pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Memory: M ≥ 2 Cost: B(R)*B(S)/M + B(R) Nested-loop (R □ S): FOR in R DO FOR each in S DO Apply the operation □ on the tuples in R and the tuples in bS max # blocks fitting in M block bS Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , Should pick the smaller relation for the outer loop (not working for −S ) 33 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

34 Algorithms Implementing Relational Algebraic Operations
Quick Review What We did Operations requiring almost no space: π, σ, UB, table-scan One-pass Algorithms: γ, δ, τ, US, ∩S, −S, ∩B, −B, Nested-loop Algorithms For binary operations: US, ∩S, −S, × C , × C , Memory: M = 2 Cost: π (R), σ(R), table-scan(R): B(R) UB(R, S): B(R) + B(S) Unary: Memory: M ≥ B(R) Cost: B(R) Binary: Memory: M ≥ B(Rsmall) Cost: B(Rsmall) + B(Rlarge) Memory: M ≥ 2 Cost: B(R)*B(S)/M + B(R) π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

35 Two-pass algorithms π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan,
35 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

36 Two-pass algorithms Condition: large relations that cannot fit into the main memory M, but not extremely large. 36 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

37 Two-pass algorithms Basic Ideas:
Condition: large relations that cannot fit into the main memory M, but not extremely large. Basic Ideas: Break relations into smaller pieces that fit in the main memory, make them more organized, and store them back to disk; Apply operation based on “merging” the blocks from the smaller and more organized pieces. 37 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

38 Two-pass algorithms Basic Ideas:
Condition: large relations that cannot fit into the main memory M, but not extremely large. Basic Ideas: Break relations into smaller pieces that fit in the main memory, make them more organized, and store them back to disk; Apply operation based on “merging” the blocks from the smaller and more organized pieces. Two main techniques: Sort-based; Hash-based. 38 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

39 Two-pass algorithms Basic Ideas:
Condition: large relations that cannot fit into the main memory M, but not extremely large. Basic Ideas: Break relations into smaller pieces that fit in the main memory, make them more organized, and store them back to disk; Apply operation based on “merging” the blocks from the smaller and more organized pieces. Two main techniques: Sort-based; Hash-based. 39 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

40 Two-pass sort-based algorithms
40 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

41 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; 41 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

42 Review: Two-phase Multiway MergeSort
Phase 1. making sorted sublists Repeat Fill the main memory with remaining tuples in R and sort them; Write the sorted sublist back to disk. Phase 2. Merging Bring in a block from each of the sorted sublists; Merge them and put in the output block;

43 Two-phase Multiway MergeSort
Main memory Two-phase Multiway MergeSort Disk

44 Two-phase Multiway MergeSort
First Phase Main memory Two-phase Multiway MergeSort Disk

45 Two-phase Multiway MergeSort
First Phase Sort it Main memory Two-phase Multiway MergeSort Disk

46 Two-phase Multiway MergeSort
First Phase Main memory Two-phase Multiway MergeSort Disk

47 Two-phase Multiway MergeSort
First Phase Main memory Two-phase Multiway MergeSort Disk

48 Two-phase Multiway MergeSort
First Phase Sort it Main memory Two-phase Multiway MergeSort Disk

49 Two-phase Multiway MergeSort
First Phase Main memory Two-phase Multiway MergeSort Disk

50 Two-phase Multiway MergeSort
First Phase Main memory Two-phase Multiway MergeSort Disk

51 Two-phase Multiway MergeSort
First Phase Sort it Main memory Two-phase Multiway MergeSort Disk

52 Two-phase Multiway MergeSort
First Phase Main memory Two-phase Multiway MergeSort Disk

53 Two-phase Multiway MergeSort
First Phase Main memory Two-phase Multiway MergeSort Disk

54 Two-phase Multiway MergeSort
First Phase Sort it Main memory Two-phase Multiway MergeSort Disk

55 Two-phase Multiway MergeSort
First Phase Main memory Two-phase Multiway MergeSort Disk

56 Second Phase Main memory Disk

57 Two-phase Multiway MergeSort
Second Phase Main memory One block per sublist Two-phase Multiway MergeSort Disk

58 Two-phase Multiway MergeSort
Main memory One block per sublist Two-phase Multiway MergeSort Disk

59 Two-phase Multiway MergeSort
Main memory One block per sublist Two-phase Multiway MergeSort Disk

60 Two-phase Multiway MergeSort
Main memory One block per sublist Two-phase Multiway MergeSort Disk

61 Two-phase Multiway MergeSort
Main memory One block per sublist Two-phase Multiway MergeSort Disk

62 Two-phase Multiway MergeSort
Main memory One block per sublist Two-phase Multiway MergeSort Disk

63 Two-phase Multiway MergeSort
Main memory Two-phase Multiway MergeSort Disk

64 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; R 64 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

65 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Unary operations γ(R), δ(R), τ(R) main memory R 65 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

66 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Unary operations γ(R), δ(R), τ(R) main memory R Phase I 66 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

67 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Unary operations γ(R), δ(R), τ(R) main memory R R End of phase I 67 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

68 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Unary operations γ(R), δ(R), τ(R) main memory R Phase II 68 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

69 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Unary operations γ(R), δ(R), τ(R) main memory Phase II R R 69 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

70 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Unary operations γ(R), δ(R), τ(R) δ(R) 1. Remove all minimums except one; 2. Output the minimum main memory Phase II R R 70 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

71 (Apply to all algorithms)
Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Unary operations γ(R), δ(R), τ(R) δ(R) 1. Remove all minimums except one; 2. Output the minimum main memory Phase II R R Remark. read in the next block from a sublist if its block is exhausted (Apply to all algorithms) 71 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

72 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Unary operations γ(R), δ(R), τ(R) γ(R) \\ sublists are sorted by \\ the grouping attributes 1. Group all tuples with the minimum grouping attributes; 2. Calculate the aggregation value; 3. Output a grouping tuple. main memory Phase II R R 72 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

73 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Unary operations γ(R), δ(R), τ(R) Summary: Memory: M ≥ √B(R) Cost: 3B(R) main memory Phase II R R 73 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

74 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory R R S 74 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

75 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory R R Phase I S 75 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

76 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory R R End of phase I S 76 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

77 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory R Phase II S 77 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

78 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory R Phase II S 78 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

79 May build an efficient data structure for searching the minimum.
Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , main memory R May build an efficient data structure for searching the minimum. Phase II S 79 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,

80 Two-pass sort-based algorithms
General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by “merging” the sorted blocks; Binary operations on two relations R and S: US, ∩S, −S, ∩B, −B, ×, C , R US S REPEAT IF Rmin = Smin THEN send one copy to output; delete both; ELSE send the smaller to output, and delete it. main memory R Phase II S 80 π, σ, US, ∩S, −S, UB, ∩B, −B, γ, δ, τ, table-scan, × C ,


Download ppt "CPSC-608 Database Systems"

Similar presentations


Ads by Google