3 Query Execution Plans Query Plan: logical plan (declarative) manfSELECT B. manfFROM Beers B, Sells SWHERE B.name=S.beer ANDS.price < 20price < 20( nested loops)(Table scan)(Index scan)name=beerQuery Plan:logical plan (declarative)physical plan (procedural)– procedural implementation of each logical operator– scheduling of operationsBeersSells
4 Logical versus physical operators Logical operatorsRelational Algebra OperatorsJoin, Selection, Projection, Union, …Physical operatorsAlgorithms to implement logical operators.Hash join, nested loop join, …More than one physical operator for each logical operator
5 Communication between operators: iterator model Each physical operator implements three functions:Open: initializes the data structures.GetNext: returns the next tuple in the result.Close: ends the operation and frees the resources.It enables pipelining
6 Physical operators Logical operator: selection read the entire or selected tuples of relation R.tuples satisfy some predicateTable-scan: R resides in the secondary storage, read its blocks one by one.Index-scan: If there is an index on R, use the index to find the blocks.more efficientOther operators for join, union, group by, ...join is the most important one.focus of our lecture
7 Both relations fit in main memory Internal memory join algorithmsNested-loop join: check for every record in R and every record in S; time = O(|R||S|)Sort-merge join: sort R, S followed by merging;time = O(|S|*log|S|) (if |R|<|S|)Hash join: build a hash table for R; for every record in S, probe the hash table; time =O(|S|)
8 External memory join algorithms At least one relation does not fit into main memoryI/O access is the dominant costB(R): number of blocks of R.|R| or T(R) : number of tuples in R.Memory requirementM: number of blocks that fit in main memoryExample: internal memory join algorithms : B(R) + B(S)We do not consider the cost of writing the output.The results may be pipelined and never written to disk.
9 Nested-loop join of R and S For each block of R, and for each tuple r in the block:For each block of S, and for each tuple s in the block:Output rs if join condition evaluates to true over r and sR is called the outer table; S is called the inner tablecost: B(R) + |R| · B(S)Memory requirement: 4 (if double buffering is used)block-based nested-loop join- For each block of R, and for each block of S:For each r in the R block, and for each s in the S block: …cost: B(R) + B(R) · B(S)Memory requirement: 4 (if double buffering is used)
10 Improving nested-loop join Use up the available memory buffers MRead M - 2 blocks from RRead blocks of S one by one and join its tuples with R tuples in main memoryCost: B(R) + [ B(R) / (M – 2) ] B(S)almost B(R) B(S) / MMemory requirement: M
11 Index-based (zig-zag) join Join R and S on R.A = S.BUse ordered indexes over R.A and S.B to join the relations.B+ treeUse current indexes or build new ones.Similar to sort-merge join without sorting step.Use the larger value to probe the other index
12 Two Pass, multi-way merge sort Problem: sort relation R that does not fit in main memoryPhase 1: Read R in groups of M blocks, sort, and write them as runs of size M on disk.Main memoryDiskRelation Rruns12M-1. . .B(R)M Buffers. . .. . .
13 Two Pass, multi-way merge sort Phase 2: Merge M – 1 blocks at a time and write the results to disk.Read one block from each run.Keep one block for the output.Relation R (sorted)Diskruns12M-1M-1 Buffers1Output buffer2. . .. . .B(R)DiskMain Memory
14 Two pass, multi-way merge Sort Cost: 2B(R) in the first pass + B(R) in the second pass.Memory requirement: MB(R) <= M (M – 1) or simply B(R) <= M2How can we improve this bound?
15 General multi-way merge sort Pass 0: read M blocks of R at a time, sort them, and write out a level-0 runThere are [B(R) / M ] level-0 sorted runsPass i: merge (M – 1) level-(i-1) runs at a time, and write out a level-i run(M – 1) memory blocks for input, 1 to buffer output# of level-i runs = # of level-(i–1) runs / (M – 1)Final pass produces 1 sorted run
17 Analysis of multi-way merge sort Number of passes:cost#passes · 2 · B(R): each pass reads the entire relation once and writes it onceSubtract B(R) for the final passSimply O( B(R) · log M B(R) )Memory requirement: M
18 Sort-merge join algorithm Sort R and S according to the join attribute, then merge themr, s = the first tuples in sorted R and SRepeat until one of R and S is exhausted:If r.A > s.B then s = next tuple in Selse if r.A < s.B then r = next tuple in Relse output all matching tuples, andr, s = next in R and SCostsorting + 2 B(R)+ 2 B(S) (join on foreign key primary key)B(R) B(S) if everything joinsMemory Requirement: B(R) <= M2 , B(S) <= M2
19 Optimized sort-merge join algorithm Combine join with the merge phase of sortSort R and S in M runs (overall) of size M on disk.Merge and join the tuples in one pass.DiskRuns of R and SRSmerge. . .joinmerge. . .Main Memory
20 Optimized two-pass sort-merge join algorithm Cost: 3B(R) + 3B(S)Memory Requirement: B(R) + B(S) <= M2because we merge them in one passMore efficient but more strict requirement.
21 Hash join algorithmIdea: partition each relation into buckets by hashing their join attributes and consider corresponding buckets from R and S.If tuples of R and S are not assigned to corresponding buckets, they do not joinMain memoryDiskRelation RBufferhashfunctionBuckets12M-1. . .B(R)Buffers. . .. . .
22 (Partitioned) Hash join or R and S Step 1:Hash S into M bucketssend all buckets to diskStep 2Hash R into M bucketsSend all buckets to diskStep 3Join corresponding bucketsIf tuples of R and S are not assigned to corresponding buckets, they do not join
23 bucket Si ( < M-1 pages) Hash JoinM main memory buffersDiskOriginalRelationOUTPUT2INPUT1hashfunctionhM-1Buckets. . .Partition both relations using hash fn h: R tuples in partition i will only match S tuples in partition i.Bucketsof R & SInput bufferfor RiHash table forbucket Si ( < M-1 pages)M main memory buffersDiskOutputbufferJoin ResultRead in a partition of R, hash it using h2 (<> h!). Scan matching partition of S, search for matches.hashfn h2h214
24 Bucket Si ( < M-1 pages) Hash joinCost: 3 B(R) + 3 B(S).Memory Requirement:The smaller bucket must fit in main memory.Let min( B(R), B(S)) = B(S)B(S) / (M – 1) <= M, roughly B(S) <= M2Bucketsof R & SInput bufferfor RiBucket Si ( < M-1 pages)M main memory buffersDiskOutputbufferJoin Result24
25 Hybrid Hash joinWhen partitioning S, keep the records of the first bucket in memory as a hash table;When partitioning R, for records of the first bucket, probe the hash table directly;Saving: no need to write R1 and S1 to disk or read them back to memory.
26 Handle partition overflow Overflow on disk: an S partition is larger than memory size (note: don’t care about the size of S partitions)Solution: recursive partition.Overflow in memory: the in-memory hash table of S becomes too large.Solution: revise the partitioning scheme and keep a smaller partition in memory.
27 Hash-based versus sort-based join Hash join need smaller amount of main memorysqrt (min(B(R), B(S))) < sqrt (B(R) + B(S) )Hash join wins if the relations have different sizesHash join performance depends on the quality of hashingIt may be hard to generate balanced buckets for hash joinSort-based join wins if the relations are in sorted orderSort-based join generates sorted resultsuseful when there is Order By in the queryuseful the following operators need sorted inputSort-based join can handle inequality join predicates
28 Duality of Sort and Hash Divide-and-conquer paradigmSorting: physical division, logical combinationHashing: logical division, physical combinationHandling very large inputsSorting: multi-level mergeHashing: recursive partitioning
29 What you should knowHow nested-loop, zig-zag, sort-merge, and hash join processing algorithms workHow to compute their costs and memory requirementsWhat are their advantages and disadvantages
30 Carry Away MessagesOld conclusions/assumptions regularly need to be re-examined because of the change in the worldSystem R abandoned hash join, but the availability of large main memories changed the storyObserve changes in the world and question some relevant assumptionsIn general, changes in the world bring opportunities for innovations, so be alert about any changesHow have the needs for data management changed over time?Have the technologies for data management been tracking such changes?