Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke2 Review  We studied Relational Algebra  Many equivalent queries, produce same result  Which expression is most efficient?  We studied file organizations  Hash files, Sorted files,  Clustered & Unclustered Indexes  Sorting, searches, insert, delete

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke3 Problem Definition  SQL declarative language  It describes the query result, but not how to get it  Relational Algebra describes how to get results  Many relational algebra queries are equivalent  Question:  How to choose the right one for an SQL query?  How does a DBMS proceed to execute a query?  It generates a variety of possible plans (relational algebra queries), and finds the cheapest one to execute.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke4 Review: Relational Algebra Relational Algebra  Selection (  ) Selects a subset of rows from relation (horizontal).  Projection (  ) Retains only wanted columns from relation (vertical).  Cross-product (  ) Allows us to combine two relations.  Set-difference ( — ) Tuples in r1, but not in r2.  Union (  ) Tuples in r1 and/or in r2.  Intersection (  )  Join ( )  Division ( / )

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke5 System Catalogs  For each index:  structure (e.g., B+ tree) and search key fields  For each relation:  name, file name, file structure (e.g., Heap file)  attribute name and type, for each attribute  index name, for each index  integrity constraints  For each view:  view name and definition  Plus statistics, authorization, buffer pool size, etc. * Catalogs are themselves stored as relations !

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke6 Statistics and Catalogs  Need : Information about relations and indexes.  Catalogs typically contain at least:  # tuples (NTuples) # pages (NPages) for each relation.  # distinct key values (NKeys) and NPages for each index.  Index height, low/high key values (Low/High) for each tree index.  Catalogs updated periodically.  Updating whenever many data changes occurred;  Lots of approximation anyway, so slight inconsistency ok.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke7 Overview of Query Evaluation  SQL queries are translated into extended relational algebra:  Query Plan Reasoning : Tree of operators With a processing algorithm per operator Several algorithms for each operator

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke8 Query Plan Evaluation Reserves Sailors sid=sid bid=100 rating > 5 sname SELECT sname FROM Reserves R, Sailor S WHERE R.sid = S.sid and R.bid = 100 and S.rating > 5

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke9 Overview of Query Evaluation  Plan : Tree of R.A. ops, with choice of alg for each operator  Two main issues in query optimization:  For a given query, what plans are considered? Algorithm to search plan space for cheapest (estimated) plan.  How is the cost of a plan estimated? Cost models based on I/O estimates  Ideally: Want to find best plan.  Practically: Avoid worst plans!

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke10 Some Common Techniques  Algorithms for evaluating relational operators use some simple ideas extensively:  Indexing: Can use WHERE conditions to retrieve small set of tuples (selections, joins)  Iteration: Scan all tuple. Sometimes, faster to scan all tuples even if there is an index. Sometimes, we can scan the data entries in an index instead of the table itself.  Partitioning: By using sorting or hashing, we can partition the input tuples and replace an expensive operation by similar operations on smaller inputs. * Watch for these techniques as we discuss query evaluation!

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke11 Access Paths  An access path  A method of retrieving tuples from a table  Method:  File scan,  Index that matches a selection (in the query)  The choice of access path contributes significantly to cost of relational operator.  Most selective access path: An index or file scan that we estimate will require the fewest page I/Os.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke12 Access Path Method: Matching  A tree index matches (a conjunction of) terms that involve only attributes in a prefix of search key.  Example : Given tree index on  selection a=5 AND b=3 ?  selection a=5 AND b>6 ?  selection b=3 ?  selection a=5 AND c>2 ?  selection b =6 AND c=2 ?

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke13 Access Path Method: Matching  A hash index matches (a conjunction of) terms that has a term attribute = value for every attribute in search key of index.  Example : Given hash index on  selection a=5 AND b=3 AND c =5 ?  selection c=5 AND b=3 AND a=2 ?  selection a > 5 AND b=3 and c =5 ?  selection c = 5 AND b = 6 ?  selection a = 5 ?

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke14 Query Evaluation – In a Nutshell  Choose an Access Path to get at each table  Evaluate different algorithms for each relational operator  Choose the order to apply the relational operators  Interrelate the above

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke15 Evaluation of Relational Operators

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke16 Selection 

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke17 Selection  Case 2: No Index, Sorted Data on R.attr  What is the most selective path?  Binary search for first tuple meeting the criterion.  Scan R for all satisfied tuples. Left? Right? Both?  Cost: O(log 2 M)

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke18 Selection Using B+ tree index  Case 3: B+ tree Index  What is the most selective path?  Cost I (find qualifying data entries) + Cost II (retrieve records) : Cost I: depth of B+ tree, usually 2-3 I/Os. Cost II: clustered index: 1 I/O on average (depends on distribution) unclustered index: up to one I/O per qualifying tuple.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke19 Example : B+-Tree Index for Selection  Assume uniform distribution of names, about 10% of tuples qualify (100 pages, 10,000 tuples).  Clustered index: little more than 100 I/Os;  Unclustered index : up to 10,000 I/Os! SELECT * FROM Reserves R WHERE R.rname < ‘C%’

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke20 Selection: Improve Unclustered B+-Tree Key idea: Avoid retrieving the same page multiple times. 1. Find qualifying data entries in index. 2. Sort rid’s of data records to be retrieved. 3. Fetch rids in order. However, # of such pages likely to be still higher than with clustering.  Use of unclustered index for a range selection could be expensive. Simpler if just scan data file.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke21 Selection Using Hash Index  Hash index is good for equality selection.  Cost = Cost I (retrieve index bucket page) + Cost II (retrieving qualifying tuples from R) Cost I is 1 I/O (on average) Cost II could be up to one I/O per satisfying tuple.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke22 A Note on Complex Selections  Selection conditions are first converted to conjunctive normal form (CNF): (day<8/9/94 OR bid=5 OR sid=3 ) AND (rname=‘Paul’ OR bid=5 OR sid=3)  We only discuss case with no ORs;  See text if you are curious about the general case. (day<8/9/94 AND rname=‘Paul’) OR bid=5 OR sid=3

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke23 Conjunction  A condition with several predicates combined by conjunction (AND):  Example : day<8/9/94 AND bid=5 AND sid=3.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke24 Selection with Conjunction First approach: (utilizing single index)  Find the most selective access path, retrieve tuples using it.  To reduce the number of tuples retrieved  Apply any remaining terms that don’t match the index:  To discard some retrieved tuples  This does not affect number of tuples/pages fetched.  Example : Consider day<8/9/94 AND bid=5 AND sid=3.  A B+ tree index on day can be used;  then bid=5 and sid=3 must be checked for each retrieved tuple.  Hash index on could be used  day<8/9/94 must then be checked on fly.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke25 Selections with Conjunction Second approach (utilizing multiple index)  Assuming 2 or more matching indexes that use Alternatives (2) or (3) for data entries.  Get sets of rids of data records using each matching index.  Then intersect these sets of rids  Retrieve records and apply any remaining terms.  Example : Consider day<8/9/94 AND bid=5 AND sid=3.  A B+ tree index I on day and an index II on sid, both Alternative (2).  Retrieve rids of records satisfying day<8/9/94 using index I,  Retrieve rids of recs satisfying sid=3 using Index II  Intersect rids  Retrieve records and check bid=5.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke26 General Condition : Disjunction  Disjunction condition: one or more terms (R.attr op value) connected by OR (  ).  Example : ( day<8/9/94) OR (bid=5 AND sid=3)

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke27 General Selection (Disjunction)  Case 1: Index is not available for one of terms. Need a file scan. Check other conditions in this file scan.  E.g., Consider day<8/9/94 OR rname ='Joe'  No index on day. Need a File scan.  Even index is available in rname, does not help.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke28 General Selection (Disjunction)  Case 2: Every term has a matching index.  Retrieve candidate tuples using index.  Then Union the results  Example : consider day<8/9/94 OR rname ='Joe'  Assume two B+ tree indexes on day and rname.  Retrieve tuples satisfying day < 8/9/94  Retrieve tuples satisfying rname = 'Joe'  Union the retrieved tuples.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke29 Evaluation of Projection

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke30 Projection  The expensive part is removing duplicates.  DBMSes don’t remove duplicates unless the keyword DISTINCT is specified in a query.  Sorting: Sort on and remove duplicates.  Can optimize this by dropping unwanted information while sorting.  Hashing: Hash on to create partitions.  Load partitions into memory one at a time, build in- memory hash structure, and eliminate duplicates.  Index: with both R.sid and R.bid in the search key, may be cheaper to sort data entries! SELECT DISTINCT R.sid, R.bid FROM Reserves R

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke31 Projection: Sorting Based  Modify Pass 0 of external sort to eliminate unwanted fields.  Runs of about 2B pages are produced,  Benefit: Tuples in runs are smaller than input tuples. Size ratio depends on # and size of fields that are dropped.  Modify merging passes to eliminate duplicates.  Number of result tuples smaller than input. Difference depends on # of duplicates. SELECT DISTINCT R.sid, R.bid FROM Reserves R

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke32 Projection: Sorting Based  Cost:  In Pass 0, read original relation: size M  Write out same number of smaller tuples. But # of page << M.  In merging passes, fewer tuples written out in each pass.  Example:  Using Reserves, 1000 input pages reduced to 250 in Pass 0 if size ratio is 0.25 SELECT DISTINCT R.sid, R.bid FROM Reserves R

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke33 Projection Based on Hashing  Partitioning phase (with B buffer pages)  Read R using one input buffer.  For each tuple, discard unwanted fields, apply hash function h1 to choose one of B-1 output buffers.  Result is B-1 partitions of tuples with no unwanted fields 2 tuples from different partitions guaranteed to be distinct.  Duplicate elimination phase  For each partition read it and build an in-memory hash table, using hash function h2 (<> h1 ) on all fields, while discarding duplicates.  If partition does not fit in memory, can apply hash-based projection algorithm recursively to this partition.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke34 Projection Based on Hashing  Cost:  M + 2T, T is the pages of all partitions T << M  Why? For partitioning, We read entire R, hence M pages Write out each tuple new tuple, but with fewer fields. Hence, T pages. Read all partitions in next phase, additional T pages. This assumes that each partition fits in memory. Not always true!

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke35 Discussion of Projection  Sort-based approach is the standard;  Better handling of skew and result is sorted.  If an index on the relation contains all wanted attributes in its search key, can do index-only scan.  Apply projection techniques to data entries (much smaller!)  If an ordered (i.e., tree) index contains all wanted attributes as prefix of search key, can do even better:  Retrieve data entries in order (index-only scan), discard unwanted fields, compare adjacent tuples to check for duplicates.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke36 Evaluation of Joins

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke37 Equality Joins With One Join Column SELECT * FROM Reserves R1, Sailors S1 WHERE R1.sid=S1.sid

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke38 Typical Choices for Evaluating Joins  Nested Loops Join  Simple Nested Loops Join: Tuple-oriented  Simple Nested Loops Join: Page-oriented  Block Nested Loops Join Not covered!  Index Nested Loops Join  Sort Merge Join  Hash Join

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke39 Simple Nested Loops Join  Tuple-oriented Algorithm : For each tuple in outer relation R, we scan inner relation S.  Cost :  Scan of outer + for each tuple of outer, scan of inner relation.  Cost = M + (p R * M) * N  Cost = 1000 + 100*1000*500 I/Os = 50,001,000 I/Os. foreach tuple r in R do foreach tuple s in S do if r i == s j then add to result RS

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke40 Simple Nested Loops Join: Cost  RS

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke41 Block Nested Loops Join  One page as input buffer for scanning inner S,  One page as the output buffer,  Remaining pages to hold ``block’’ of outer R.  For each matching tuple r in R-block, s in S-page, add to result.  Then read next R-block, scan S again. Etc.  Find matching tuple ?  Use in-memory hashing.... R & S Hash table for block of R (k < B-1 pages) Input buffer for S Output buffer... Join Result RS

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke42 Cost of Block Nested Loops  Cost: Scan of outer + #outer blocks * scan of inner  #outer blocks =

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke43 Examples of Block Nested Loops  Cost: Scan of outer + #outer blocks * scan of inner  With Reserves (R) as outer, & 100 pages of R as block:  Cost of scanning R is 1000 I/Os; a total of 10 blocks.  Per block of R, we scan Sailors (S); 10*500 I/Os.  E.g., If a block is 90 pages of R, we would scan S 12 times.  With 100-page block of Sailors as outer:  Cost of scanning S is 500 I/Os; a total of 5 blocks.  Per block of S, we scan Reserves; 5*1000 I/Os.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke44 Examples of Block Nested Loops  Optimizations?  With sequential reads considered, analysis changes: may be best to divide buffers evenly between R and S.  Double buffering would also be suitable.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke45 Index Nested Loops Join  An index on join column of one relation  Which is the inner relation?  Use S as inner and exploit the index.  Cost:  Scan the outer relation R  For each R tuple, sum cost of finding matching S tuples  Cost: M + ( (M*p R ) * cost of finding matching S tuples) foreach tuple r in R do foreach tuple s in S where r i == s j do add to result

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke46 Index Nested Loops Join: Cost  For each R tuple, cost of probing S index is on average:  about 1.2 pages for hash index,  2 - 4 pages for B+ tree.  Cost of retrieving S tuples (assuming Alt. (2) or (3) for data entries) depends on clustering:  Clustered : 1 I/O (typical),  Unclustered: up to 1 I/O per matching S tuple.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke47 Examples of Index Nested Loops  Hash-index (Alt. 2) on sid of Sailors (as inner):  Scan Reserves: 1000 page I/Os, There are 100*1000 tuples = 100,000 tuples.  For each Reserves tuple: 1.2 I/Os to get data entry in index (this is page index), plus 1 I/O to get (the exactly one) matching Sailors tuple. Total: 100,000 * (1.2 + 1 ) = 220,000 I/Os.  In total, we have: 1000 I/Os + 220,000 I/Os = 221,000 I/Os

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke48 Examples of Index Nested Loops  Hash-index (Alt. 2) on sid of Reserves (as inner):  Scan Sailors: 500 page I/Os, 80*500 tuples = 40,000 tuples.  For each Sailors tuple: 1.2 I/Os to find index page with data entries for Reserves, plus, Assuming uniform distribution: 2.5 reservations per sailor (100,000 / 40,000). Cost of retrieving them is Clustered index: 1 page Unclustered index: up to 2.5 I/Os  Total: Between 500 + 40,000 * (1.2 + 1 ) and 500 + 40,000* (1.2 + 2.5 ).

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke49 Summary Index Nested Loops Join  Assume: M Pages in R, p R tuples per page, N Pages in S, p S tuples per page, B Buffer Pages.  Nested Loops Join  Simple Nested Loops Join Tuple-oriented: M + p R * M * N Page-oriented: M + M * N Smaller relation as outer can give some improvement.  Block Nested Loops Join M + N*  M/(B-2)  Dividing buffer evenly between R and S helps.  Index Nested Loops Join M + ( (M*p R ) * cost of finding matching S tuples) cost of finding matching S tuples = cost of Index Probe + cost of retrieving the tuples  Discussion.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke50 Sort-Merge Join (R S) i=j (1). Sort R and S on the join column. (2). Scan R and S to do a “merge” on join column (3). Output result tuples. Recall External Sort!

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke51 Example of Sort-Merge Join Recall External Sort!

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke52 Sort-Merge Join (R S)  Merge on Join Column: Advance scan of R until current R-tuple >= current S tuple, then advance scan of S until current S-tuple >= current R tuple; do this until current R tuple = current S tuple. At this point, all R tuples with same value in Ri ( current R group ) and all S tuples with same value in Sj ( current S group ) match ; So output for all pairs of such tuples. Then resume scanning R and S (as above) i=j (1). Sort R and S on the join column. (2). Scan R and S to do a “merge” on join col. (3). Output result tuples.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke53 Join: Sort-Merge (R S)  Note:  R is scanned once;  Each S group is scanned once per matching R tuple.  Multiple scans of an S group are likely to find needed pages in buffer. i=j

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke54 Cost of Sort-Merge Join Cost of sort-merge :  Sort R  Sort S  Merge R and S

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke55 Example of Sort-Merge Join Discussion  Best case: ?  Worst case: ?  Average case ?

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke56 Cost of Sort-Merge Join  Best Case Cost: (M+N)  Already sorted.  The cost of scanning, M+N  Worst Case Cost: M log M + N log N + (M+N)  Many pages in R in same partition. ( Worst, all of them). The pages for this partition in S don’t fit into RAM. Re-scan S is needed. Multiple scan S is expensive!  Could the scan cost be M*N?  Note: Guarantee M+N if key-FK join, or no duplicates. R S

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke57 Cost of Sort-Merge Join  Average Cost:  ~ In practice, roughly linear in M and N  So, O ( M log M + N log N + (M+N) )

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke58 Comparison with Sort-Merge Join  Average Cost: O(M log M + N log N + (M+N))  Assume B in {35, 100, 300}; and R = 1,000 pages, S = 500 pages  Sort-Merge Join  both R and S can be sorted in 2 passes (why?),  log M = log N = 2  total join cost: 2*2*1000 + 2*2*500 + (1000 + 500) = 7,500.  Block Nested Loops Join: 2,500 ~ 15,000

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke59 Refinement of Sort-Merge Join  IDEA :  Combine the merging phases when sorting R ( or S) with the merging in join algorithm.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke60 Refinement of Sort-Merge Join  IDEA : Combine the merging phases when sorting R ( or S) with the merging in join algorithm.  If we do the following: perform Pass 0 of sort on R; perform Pass 0 of sort on S; merge and join on the fly – the total IO cost for join is 3 (M + N)  When is the above possible? When M/B + N/B + 1 = (M + N) (The above expression is modified from that in the book)  Cost: 3 (M + N) as follows (read+write R and S in Pass 0) + (read R and S in merging pass and join on fly) + (writing of result tuples).  In example, cost goes down from 7500 to 4500 I/Os.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke61 Hash-Join Partition both relations using same hash fn h : R tuples in partition i will only match S tuples in partition i. B main memory buffers Disk Original Relation OUTPUT 2 INPUT 1 hash function h B-1 Partitions 1 2 B-1...

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke62 Hash-Join  Read in a partition of R, hash it using h2 (<> h!). Scan the matching partition of S, search for matches. Partitions of R & S Input buffer for Si Hash table for partition Ri (k < B-1 pages) B main memory buffers Disk Output buffer Disk Join Result hash fn h2

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke63 Cost of Hash-Join  In partitioning phase, read+write both relations:  2(M+N).  In matching phase, read both relations:  M+N.  Total : 3(M+N)  E.g., total of 4,500 I/Os in our running example.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke64 Observation on Hash-Join  Memory Requirement: When is total cost 3 (M + N)?  Partition fit into available memory?  Assuming B buffer pages. #partitions k <= B-1 (why?), (to min size of each partition, we choose #partitions = B – 1)  Assuming uniformly sized partitions, and maximizing k, we get: k= B-1, and size of partition = M/(B-1) ( M is the number of pages of R ) in-memory hash table to speed up the matching of tuples, a little more memory is needed: f * M/(B-1) (You can assume f = 1, unless explicitly specified) f is fudge factor used to capture the small increase in size between the partition and a hash table for partition.  Probing phase, one for S, one for output, B>= f*M/(B-1)+2 for hash join to perform well (i.e., cost of hash join = 3 (M + N)). In other words, (B – 1) (B – 2) >= f * M

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke65 Observation on Hash Join  Overflow  If the hash function does not partition uniformly, one or more R partitions may not fit in memory.  Significantly degrade the performance.  Can apply hash-join technique recursively to do the join of this overflow R-partition with corresponding S-partition.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke66 Hybrid Hash-Join  Idea: Do not write one of the partitions of R and S to disk.  When is it possible? We can keep one of the partitions of the smaller relation always in memory.  B >= f * M/k (buffers for keeping a partition) + (k – 1) (keep 1 page in buffer for each of the remaining partitions) + 1 (1 page in buffer for reading in S (or later R)) + 1 (1 output page when reading in R) Remember: k = number of partitions i.e., (B – (k + 1)) >= f * M/k  Choose such an appropriate k (or number of partitions)

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke67 Hybrid Hash-Join (contd)  How to perform Hybrid Hash-Join?  Partitioning S is done as: Build an in-memory hash table for the first partition of S during the partitioning phase. Other partitions keep 1 page in buffer and write to disk when needed. 1 buffer page for reading in S  Partitioning R is done as: If a tuple hashes to the partition corresponding to the in-memory partition of S, then join and output tuples If a tuple hashes to any of the remaining (k – 1) partitions, write it to the buffer page (and write this buffer page to disk as needed) 1 buffer page for reading in R; 1 buffer page for output  Remaining partitions of R and S are done as usual  Saving: avoid writing the first partitions of R and S to disk. E.g. R = 500 pages, S=1000 pages B = 300 (We make 2 partitions) partition phase: scan R and write one partition out. 500 + 250 scan S and write out one partition. 1000 + 500 probing phase: only second partition is scaned: 250+500 Total = 3000 ( Hash Join will take 4500 )

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke68 Hash-Join vs. Block Nested Join  If hash table for entire smaller relation fits in memory, equal.  Otherwise, Hash-Join is more effective. H H H H H S1S2S3S4S5 R1 R2 R3 R4 R5 Block Nested Join Hash Join

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke69 Hash-Join vs. Sort-Merge Join 

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke70 General Join Conditions  Equalities over several attributes  (e.g., R.sid=S.sid AND R.rname=S.sname ):  INL-Join : build index on (if S is inner); or use existing indexes on sid or sname.  SM-Join and H-Join : sort/partition on combination of the two join columns.  Inequality conditions  (e.g., R.rname < S.sname ):  INL-Join: need (clustered!) B+ tree index. Range probes on inner; # matches likely to be much higher than for equality joins.  Hash Join, Sort Merge Join not applicable.  Block NL quite likely to be the best join method here. could be M*N (very unlikely!)

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke71 Relational Algebra Equivalences  Allow us to choose different join orders and to `push’ selections and projections ahead of joins.  Selections : ( Cascade ) ( Commute ) (Cascade)  Joins:R (S T) (R S) T (Associative) (R S) (S R) (Commute) R (S T) (T R) S + Show that:

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke72 More Equivalences  Commute projection and selection:   attr (  Cond (R))   Cond (  attr (R)), if attr  all attributes in Cond  Selection between attributes of the two arguments of a cross-product converts cross-product to a join.  A selection on just attributes of R commutes with R S. (i.e., (R S) (R) S )  Similarly, if a projection follows a join R S, we can `push’ it by retaining only attributes of R (and S) that are needed for the join or are kept by the projection.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke73 Relational Query Optimization  Briefly

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke74 Highlights of Query Optimizer  Cost estimation: Approximate art at best.  Statistics, maintained in system catalogs, used to estimate cost of operations and result sizes.  Considers combination of CPU and I/O costs.  Plan Space: Too large, must be pruned.  Only the space of left-deep plans is considered. Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation.  Cartesian products avoided.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke75 Cost Estimation  For each plan considered, must estimate cost:  Must estimate cost of each operation in plan tree. Depends on input cardinalities. We’ve already discussed how to estimate the cost of operations (sequential scan, index scan, joins, etc.)  Must also estimate size of result for each operation in tree! Use information about the input relations. For selections and joins, assume independence of predicates.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke76 Size Estimation and Reduction Factors  Consider a query block:  Maximum # tuples in result is the product of the cardinalities of relations in the FROM clause.  Reduction factor (RF) associated with each term reflects the impact of the term in reducing result size. Result cardinality = Max # tuples * product of all RF’s.  Implicit assumption that terms are independent!  Term col=value has RF 1/NKeys(I), given index I on col  Term col1=col2 has RF 1/ MAX (NKeys(I1), NKeys(I2))  Term col>value has RF (High(I)-value)/(High(I)-Low(I)) SELECT attribute list FROM relation list WHERE term1 AND... AND termk

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke77 Schema for Examples  Similar to old schema; rname added for variations.  Reserves:  Each tuple is 40 bytes long, 100 tuples per page, 1000 pages.  Sailors:  Each tuple is 50 bytes long, 80 tuples per page, 500 pages. Sailors ( sid : integer, sname : string, rating : integer, age : real) Reserves ( sid : integer, bid : integer, day : dates, rname : string)

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke78 Motivating Example  Cost: 500+500*1000 I/Os  By no means the worst plan!  Misses several opportunities: selections could have been `pushed’ earlier, no use is made of any available indexes, etc.  Goal of optimization: To find more efficient plans that compute the same answer. SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 Reserves Sailors sid=sid bid=100 rating > 5 sname Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) (On-the-fly) RA Tree: Plan:

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke79 Alternative Plans 1 (No Indexes)  Main difference: push selects.  With 5 buffers, cost of plan:  Scan Reserves (1000) + write temp T1 (10 pages, if we have 100 boats, uniform distribution).  Scan Sailors (500) + write temp T2 (250 pages, if we have 10 ratings).  Sort T1 (2*2*10), sort T2 (2*3*250), merge (10+250)  Total: 3560 page I/Os.  If we used BNL join, join cost = 10+4*250, total cost = 2770.  If we `push’ projections, T1 has only sid, T2 only sid and sname :  T1 fits in 3 pages, cost of BNL drops to under 250 pages, total < 2000. Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Scan; write to temp T1 (Scan; write to temp T2 (Sort-Merge Join)

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke80 Alternative Plans 2 With Indexes  With clustered index on bid of Reserves, we get 100,000/100 = 1000 tuples on 1000/100 = 10 pages.  INL with pipelining (outer is not materialized). v Decision not to push rating>5 before the join is based on availability of sid index on Sailors. v Cost: Selection of Reserves tuples (10 I/Os); for each, must get matching Sailors tuple (1000*1.2); total 1210 I/Os. v Join column sid is a key for Sailors. –At most one matching tuple, unclustered index on sid OK. –Projecting out unnecessary fields from outer doesn’t help. Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Use hash index; do not write result to temp) (Index Nested Loops, with pipelining ) (On-the-fly)

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke81 Summary  There are several alternative evaluation algorithms for each relational operator.  A query is evaluated by converting it to a tree of operators and evaluating the operators in the tree.  Must understand query optimization in order to fully understand the performance impact of a given database design (relations, indexes) on a workload (set of queries).  Two parts to optimizing a query:  Consider a set of alternative plans. Must prune search space; typically, left-deep plans only.  Must estimate cost of each plan that is considered. Must estimate size of result and cost for each plan node. Key issues : Statistics, indexes, operator implementations.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.

Similar presentations

Presentation on theme: "Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.

Similar presentations

Presentation on theme: "Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12."— Presentation transcript:

Similar presentations

About project

Feedback