CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.

CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Email: pusheng@cs.umn.edupusheng@cs.umn.edu Feb 5, 2004

15.2 CSCI 5708, Spring 2004. University of Minnesota Outline Strategies for relational operations  Select Operation  Join Operation  Other Operations Evaluation of Expressions External Sorting

15.3 CSCI 5708, Spring 2004. University of Minnesota Join Operation Several different algorithms to implement joins  Nested-loop join  Block nested-loop join  Indexed nested-loop join  Sort-Merge join  Hash join Choice based on cost estimate We do not include cost to writing output to disk in our cost formulae  (b s * b r * join selectivity)/ bfr Examples use the following information  Number of records of DEPARTMENT: 50 EMPLOYEE: 6,000  Number of blocks of DEPARTMENT: 10 EMPLOYEE: 2000

15.4 CSCI 5708, Spring 2004. University of Minnesota Nested-Loop Join To compute the theta join r  s for each tuple t r in r do begin for each tuple t s in s do begin test pair (t r,t s ) to see if they satisfy the join condition  if they do, add t r t s to the result. end end r is called the outer relation and s the inner relation of the join. Requires no indices and can be used with any kind of join condition. Expensive since it examines every pair of tuples in the two relations.

15.5 CSCI 5708, Spring 2004. University of Minnesota Nested-Loop Join (Cont.) In the worst case, if there is enough memory only to hold one block of each relation, the estimated cost is n r  b s + b r disk accesses. If the smaller relation fits entirely in memory, use that as the inner relation. Reduces cost to b r + b s disk accesses. For example:  n DEP = 50, b DEP = 10, n EMP = 6000, b EMP =2000  DEPARTMENT as outer relation: 50 * 2,000 + 10 = 100,010  EMPLOYEE as the outer relation:6,000 * 50 + 2,000 = 302,000  use the relation with fewer tuples as the outer relation.  If DEPARTMENT fits entirely in memory, the cost estimate will be 10+2000 = 2010 disk accesses. Block nested-loops algorithm (next slide) is preferable.

15.6 CSCI 5708, Spring 2004. University of Minnesota Block Nested-Loop Join Variant of nested-loop join in which every block of inner relation is paired with every block of outer relation. for each block B r of r do begin for each block B s of s do begin for each tuple t r in B r do begin for each tuple t s in B s do begin Check if (t r,t s ) satisfy the join condition if they do, add t r t s to the result. end end end end

15.7 CSCI 5708, Spring 2004. University of Minnesota Block Nested-Loop Join (Cont.) Worst case estimate: b r  b s + b r block accesses.  Each block in the inner relation s is read once for each block in the outer relation (instead of once for each tuple in the outer relation Best case: b r + b s block accesses.

15.8 CSCI 5708, Spring 2004. University of Minnesota Block Nested-Loop Join (Cont.) What happens if number of buffer blocks > 3? e.g., M = 5 Disk r and s r s idle output

15.9 CSCI 5708, Spring 2004. University of Minnesota Block Nested-Loop Join (Cont.) Disk r and s r s output

15.10 CSCI 5708, Spring 2004. University of Minnesota Improvements Improvement on Buffer Use  M = memory size in blocks  use M - 2 disk blocks as blocking unit for outer relations  use one block for inner relations  use one block for output  Cost =  b r / (M-2)   b s + b r Other Improvements  If equi-join attribute forms a key on inner relation, stop inner loop on first match  Scan inner loop forward and backward alternately, to make use of the blocks remaining in buffer (with LRU replacement)  Use index on inner relation if available (next slide)

15.11 CSCI 5708, Spring 2004. University of Minnesota Example of Block Nested-Loop r  s: Cost =  b r / (M-2)   b s + b r For example: DEPARTMENT and EMPLOYEE  b DEP = 10, b EMP =2000  Let M = 5 with EMPLOYEE as the outer relation   2000 / 3  * 10 + 2000 = 6670 + 2000 = 8670 with DEPARTMENT as outer relation   10 / 3  * 2000 + 10 = 4*2000 + 10= 8010 use the relation with fewer blocks as the outer relation

15.12 CSCI 5708, Spring 2004. University of Minnesota Indexed Nested-Loop Join Index lookups can replace file scans if  join is an equi-join or natural join and  an index is available on the inner relation’s join attribute  Can construct an index just to compute a join. For each tuple t r in the outer relation r, use the index to look up tuples in s that satisfy the join condition with tuple t r. Worst case: buffer has space for only one page of r, and, for each tuple in r, we perform an index lookup on s. Cost of the join: b r + n r  c  Where c is the cost of traversing index and fetching all matching s tuples for one tuple or r  c can be estimated as cost of a single selection on s using the join condition. If indices are available on join attributes of both r and s, use the relation with fewer tuples as the outer relation.

15.13 CSCI 5708, Spring 2004. University of Minnesota Example of Nested-Loop Join Costs r  s: b r + n r  c For example: DEPARTMENT MGRSSN = SSN EMPLOYEE  n DEP = 50, b DEP = 10, n EMP = 6,000, b EMP =2,000 Suppose that secondary indexes exist on both SSN of EMPLOYEE and MGRSSN of DEPARTMENT  The number of index levels x SSN = 4 and x MGRSSN = 2 Cost of indexed nested loops join  with EMPLOYEE as the outer relation  Cost = 2,000 + 6,000 * (2+1) = 2,000 + 18,000= 20,000  with DEPARTMENT as the outer relation  Cost = 10 + (50 * 5) = 10 + 250 = 260

15.14 CSCI 5708, Spring 2004. University of Minnesota Merge-Join 1. Sort both relations on their join attribute (if not already sorted on the join attributes): Physically sorted 2. Merge the sorted relations to join them Copy Pair of Blocks of r and s into memory buffers Let r(i) for i th record in r and s(j) for j th record in s Let Pr(r) and Pr(s) be pointers Let i = 1, j= 1 If r(i)[a1] > s(j)[a1] pr(s) ++; else if r(i)[a1] > s(j)[a1] pr(r)++; else // equality output check … pr(s)++; pr(r)++; end; end

15.15 CSCI 5708, Spring 2004. University of Minnesota Merge-Join (Cont.) Can be used only for equi-joins and natural joins Each block needs to be read only once (assuming all tuples for any given value of the join attributes fit in memory) Thus number of block accesses for merge-join is b r + b s + the cost of sorting if relations are unsorted. This strategy is very efficient when the records of r and s are physically sorted by the values of join attributes.

15.16 CSCI 5708, Spring 2004. University of Minnesota Hash-Join Applicable for equi-joins and natural joins. A hash function h is used to partition tuples of BOTH relations h maps JoinAttrs values to {0, 1,..., n}, where JoinAttrs denotes the common attributes of r and s used in the natural join.  r 0, r 1,..., r n denote partitions of r tuples  Each tuple t r  r is put in partition r i where i = h(t r [JoinAttrs]).  r 0,, r 1..., r n denotes partitions of s tuples  Each tuple t s  s is put in partition s i, where i = h(t s [JoinAttrs]).

15.17 CSCI 5708, Spring 2004. University of Minnesota Hash-Join (Cont.)

15.18 CSCI 5708, Spring 2004. University of Minnesota Hash-Join (Cont.) r tuples in r i need only to be compared with s tuples in s i Need not be compared with s tuples in any other partition, since:  an r tuple and an s tuple that satisfy the join condition will have the same value for the join attributes.  If that value is hashed to some value i, the r tuple has to be in r i and the s tuple in s i.

15.19 CSCI 5708, Spring 2004. University of Minnesota Hash-Join Algorithm 1.Partition the relation r using hashing function h. When partitioning a relation, one block of memory is reserved as the output buffer for each partition. 2.Partition s similarly. 3.For each i: (a)Load r i into memory and build an in-memory hash index on it using the join attribute. This hash index uses a different hash function than the earlier one h. (b)Read the tuples in s i from the disk one by one. For each tuple t s locate each matching tuple t r in r j using the in-memory hash index. Output the concatenation of their attributes. The hash-join of r and s is computed as follows. Relation r is called the build input and s is called the probe input.

15.20 CSCI 5708, Spring 2004. University of Minnesota Hash-Join algorithm (Cont.) The value n and the hash function h is chosen such that each r i should fit in memory. it is best to choose the smaller relation as the build relation. Partition Hash Join  instead of partitioning n ways, use M – 1 partitions for r  One block buffer for input  Whenever a buffer for a partition gets filled, store the partition  Further partition the M – 1 partitions using a different hash function if applicable  Use same partitioning method on r and s  Recursive partitioning is rarely required given large memory

15.21 CSCI 5708, Spring 2004. University of Minnesota Handling of Overflows Hash-table overflow occurs in partition r i if r i does not fit in memory. Reasons could be  Many tuples in r with same value for join attributes  Bad hash function Partitioning is said to be skewed/nonuniform if some partitions have significantly more tuples than some others Overflow resolution can be done in partitioning phase  Partition r i is further partitioned using different hash function.  Partition s i must be similarly partitioned. Challenge with large numbers of duplicates  Fallback option: use block nested loops join on overflowed partitions

15.22 CSCI 5708, Spring 2004. University of Minnesota Cost of Hash-Join If recursive partitioning is not required: cost of hash join is 3(b r + b s )  Each record is read once and written once in partitioning phase  Each record is read a second time to perform join in probing phase When one relation can fit into memory buffer.  No need for partitioning  Nested-loop join based on hashing and probing  Cost estimate goes down to b r + b s  For example:  Whole DEPARTMENT can be read into memory and organized into a hash table on the join attribute  Each EMPLOYEE block is then read into buffer and each EMPLOYEE record in the buffer is hashed on its join attribute  Probe the corresponding in-memory bucket

15.23 CSCI 5708, Spring 2004. University of Minnesota Hybrid Hash–Join Useful when memory sized are relatively large, and the build input is bigger than memory. Main feature of hybrid hash join:  Keep the first partition of the build relation r in memory.  Where the join phase for the first partition is included in the partitioning phase  For example  First partition of DEPARTMENT is wholly in main memory  Other partitions of DEPARTMENT reside in disk  EMPLOYEE are partitioned. If a record hashes to the first partition, it is joined with the matching records  Goal: to join as many records during partitioning phase so as to save the cost of storing those records back to disk and rereading them a second time for probing phase In general, hybrid hash join is a good choice when both files have no good search structures

15.24 CSCI 5708, Spring 2004. University of Minnesota Complex Joins Join with a conjunctive condition: r  1   2 ...   n s  Either use nested loops/block nested loops, or  Compute the result of one of the simpler joins r  i s  final result comprises those tuples in the intermediate result that satisfy the remaining conditions  1 ...   i –1   i +1 ...   n Join with a disjunctive condition r  1   2 ...   n s  Either use nested loops/block nested loops, or  Compute as the union of the records in individual joins r  i s: (r  1 s)  (r  2 s) ...  (r  n s)

15.25 CSCI 5708, Spring 2004. University of Minnesota Other Operations Duplicate elimination can be implemented via hashing or sorting.  On sorting duplicates will come adjacent to each other, and all but one set of duplicates can be deleted. Optimization: duplicates can be deleted during run generation as well as at intermediate merge steps in external sort-merge.  Hashing is similar – duplicates will come into the same bucket. Projection is implemented by performing projection on each tuple followed by duplicate elimination.

15.26 CSCI 5708, Spring 2004. University of Minnesota Other Operations : Aggregation Aggregation can be implemented in a manner similar to duplicate elimination.  Sorting or hashing can be used to bring tuples in the same group together, and then the aggregate functions can be applied on each group.  Optimization: combine tuples in the same group during run generation and intermediate merges, by computing partial aggregate values  For count, min, max, sum: keep aggregate values on tuples found so far in the group. –When combining partial aggregate for count, add up the aggregates  For avg, keep sum and count, and divide sum by count at the end

15.27 CSCI 5708, Spring 2004. University of Minnesota Other Operations : Set Operations Set operations ( ,  and  ): can either use variant of merge-join after sorting, or variant of hash-join. E.g., Set operations using hashing: 1.Partition both relations using the same hash function, thereby creating, r 1,.., r n r 0, and s 1, s 2.., s n 2.Process each partition i as follows. Using a different hashing function, build an in-memory hash index on r i after it is brought into memory. 3. – r  s: Add tuples in s i to the hash index if they are not already in it. At end of s i add the tuples in the hash index to the result. – r  s: output tuples in s i to the result if they are already there in the hash index. – r – s: for each tuple in s i, if it is there in the hash index, delete it from the index. At end of s i add remaining tuples in the hash index to the result.

15.28 CSCI 5708, Spring 2004. University of Minnesota Other Operations : Outer Join Outer join can be computed either as  A join followed by addition of null-padded non-participating tuples.  by modifying the join algorithms. Modifying merge join to compute r s  In r s, non participating tuples are those in r –  R (r s)  Modify merge-join to compute r s: During merging, for every tuple t r from r that do not match any tuple in s, output t r padded with nulls.  Right outer-join and full outer-join can be computed similarly. Modifying hash join to compute r s  If r is probe relation, output non-matching r tuples padded with nulls  If r is build relation, when probing keep track of which r tuples matched s tuples. At end of s i output non-matched r tuples padded with nulls

15.29 CSCI 5708, Spring 2004. University of Minnesota Evaluation of Expressions So far: we have seen algorithms for individual operations Alternatives for evaluating an entire expression tree  Materialization: generate results of an expression whose inputs are relations or are already computed, materialize (store) it on disk. Repeat.  Pipelining: pass on tuples to parent operations even as an operation is being executed We study above alternatives in more detail

15.30 CSCI 5708, Spring 2004. University of Minnesota Materialization Materialized evaluation: evaluate one operation at a time, starting at the lowest-level. Use intermediate results materialized into temporary relations to evaluate next-level operations. E.g.  SALARY < 50K (  SALARY (EMPLOYEE))  Store the intermediate relation  SALARY (EMPLOYEE) Materialized evaluation is always applicable Cost of writing results to disk and reading them back can be quite high  Our cost formulas for operations ignore cost of writing results to disk, so  Overall cost = Sum of costs of individual operations + cost of writing intermediate results to disk

15.31 CSCI 5708, Spring 2004. University of Minnesota Pipelining Pipelined (stream-based) evaluation : evaluate several operations simultaneously, passing the results of one operation on to the next. Much cheaper than materialization: no need to store a temporary relation to disk. Pipelining may not always be possible:  E.g. merge join, or hash join  These result in intermediate results being written to disk and then read back always For pipelining to be effective, use evaluation algorithms that generate output tuples even as tuples are received for inputs to the operation.

15.32 CSCI 5708, Spring 2004. University of Minnesota Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each tuple. Q: Why sorting?  User may want answers in some order, e.g., employees retrieved by increasing age  Duplicate elimination: projection  Sort-merge join For relations that fit in memory, techniques like quicksort can be used. For relations that don’t fit in memory, external sort-merge is a good choice.

15.33 CSCI 5708, Spring 2004. University of Minnesota External Sort-Merge 1. Create sorted runs. Let i be 0 initially. Repeatedly do the following till the end of the relation: (a) Read M blocks of relation into memory (b) Sort the in-memory blocks (c) Write sorted data to run R i ; increment i. Let the final value of I be N 2. Merge the runs (N-way merge). We assume (for now) that N < M. 1. Use N blocks of memory to buffer input runs, and 1 block to buffer output. Read the first block of each run into its buffer page 2. repeat 1. Select the first record (in sort order) among all buffer pages 2. Write the record to the output buffer. If the output buffer is full write it to disk. 3. Delete the record from its input buffer page. If the buffer page becomes empty then read the next block (if any) of the run into the buffer. 3. until all input buffer pages are empty: Let M denote memory size (in pages).

15.34 CSCI 5708, Spring 2004. University of Minnesota External Sort-Merge (Cont.) If i  M, several merge passes are required.  In each pass, contiguous groups of M - 1 runs are merged.  A pass reduces the number of runs by a factor of M -1, and creates runs longer by the same factor.  E.g. If M=11, and there are 90 runs, one pass reduces the number of runs to 9, each 10 times the size of the initial runs  Repeated passes are performed till all runs have been merged into one.

15.35 CSCI 5708, Spring 2004. University of Minnesota Example: External Sorting Using Sort-Merge

15.36 CSCI 5708, Spring 2004. University of Minnesota External Merge Sort (Cont.) Cost analysis:  Total number of merge passes required:  log M–1 (b r /M) .  Disk accesses for initial run creation as well as in each pass is 2b r.  Sort phase: 2b r (each block is accessed twice: read and write) Thus total number of disk accesses for external sorting: b r ( 2  log M–1 (b r / M)  + 2)

CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.

Similar presentations

Presentation on theme: "CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.

Similar presentations

Presentation on theme: "CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004."— Presentation transcript:

Similar presentations

About project

Feedback