Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bhargav Vadher (208) APRIL 9 th, 2008 Submittetd To: Dr. T Y Lin Computer Science Department San Jose State University.

Similar presentations


Presentation on theme: "Bhargav Vadher (208) APRIL 9 th, 2008 Submittetd To: Dr. T Y Lin Computer Science Department San Jose State University."— Presentation transcript:

1 Bhargav Vadher (208) APRIL 9 th, 2008 Submittetd To: Dr. T Y Lin Computer Science Department San Jose State University

2  Introduction  Multipass sort-based algorithm.  Performance of multipass sort-based algorithm.  Multipass hash-based algorithm.  Performance of multipass hash-based algorithm.

3  So far we seen most of algorithm required two passes.  But, what if relation R is big and required multipass. › Multipass sort-based algorithm. › Multipass hash-based algorithm.

4  Assume that › Number of memory buffer = M › We have relation R and S  BASIS: if B(R) ≤ M then › Read R in main memory › Sort R by favorite sorting algorithm › Write R back to disk.  INDUCTION: if B(R) > M then › Partition R in M blocks (R 1, R 2, …….R M ) › Sort R i recursively i = 1,2,3….M › Merge sorted sub list into one

5 If we are not just sorting but also want to do unary operation › just modify the previous algorithm to calculate δ and γ. for δ  output 1 copy of each distinct tuple and discard the rest. for γ  sort only on grouping attribute.  combine tuples by grouping attribute. Finally › Divide the M buffers between R and S according to number of block in R and S acquired. › for R  M * B(R) / (B(R) + B(S)) S  rest of buffer blocks available.

6 Suppose S(M, k) = Max size of relation sorted with M block of buffer and k passes. BASIS: If k = 1  only one pass allowed so, B(R) ≤ M S(M, 1) = M INDUCTION: If k > 1  multiple pass allowed › partition R into M buffer blocks › S(M, k) = M S(M, k-1) where, k-1 = no. of pass for each block of R.

7 Each pass of algorithm… › Requests data from disk › Sort it with accordance method › Write it back to disk So, k – pass sorting algorithm requires › 2k B(R) disk I/O operations And, multipass sorting algorithm requires › 2 (k-1) (B(R) + B(S)) disk I/O operation for sort sub list + › B(R) + B(S) disk I/O operation for merging sorted sub list in final phase

8 Basics: › alternative approach of multipass algorithm › has the relations in M-1 buckets, where, M is number of memory buffers › for unary, apply the operation to each bucket individually › for binary, apply the operation to each corresponding pair of bucket

9 The approach can be described as… BASIS : for unary if the relation fits into the M memory blocks › Read it into the memory from disk › Perform the operation on it for binary if one of them relation fits into the M-1 memory blocks › Read that relation into main memory M-1 blocks › Read second relation 1 block at a time into M th block › Perform the operation

10 INDUCTION : If none of two relation fits into the main memory buffers › Hash each relation into main memory’s M-1 buckets. › Hash the alternative relations in M th bucket. › Recursively perform the operation on each bucket or pairs of corresponding buckets. › Accumulate the output form each of the bucket

11 For unary operation: Assume › operations are like δ and γ › Relation is R › Number of buuffer M › u(M, k) = number of blocks in largest relation with k pass hash BASIS: If u(M, 1) = M, since R must be fitted in M buffers so, B(R) ≤ M

12 INDUCTION:  Assume that first step divides R into M-1 equal buckets.  The buckets of second relation must be small enough to be handled by k-1 passes.  So, buckets are of size u(M, k-1).  Since R is divided in M-1 buckets, we have › u(M, k) = (M-1) u(M, k-1).  if we expand the recurrence above we can perform unary operation of relation R in k passes with M buffers › provided that M ≤ (B(R)) 1/k

13 For binary operation: BASIS : If we use the one pass algorithm to join then › Either R or S must be fit into M-1 blocks. › j(M, 1) = M-1. INDUCTION : › On the first of k passes, divide the R into M-1 buckets so each buckets is of 1 / (M-1) of entire relation.  So, j(M, k) = (M-1) j(M, k-1) › So, we can join R(X, Y) S(Y, Z) using k passes and M buffers  Provided M k ≥ min (B(R), B(S))

14 Q & A Thank You


Download ppt "Bhargav Vadher (208) APRIL 9 th, 2008 Submittetd To: Dr. T Y Lin Computer Science Department San Jose State University."

Similar presentations


Ads by Google