Presentation is loading. Please wait.

Presentation is loading. Please wait.

CPSC 404, Laks V.S. Lakshmanan1 Overview of Query Evaluation Chapter 12 Ramakrishnan & Gehrke (Sections 12.1-12.3)

Similar presentations


Presentation on theme: "CPSC 404, Laks V.S. Lakshmanan1 Overview of Query Evaluation Chapter 12 Ramakrishnan & Gehrke (Sections 12.1-12.3)"— Presentation transcript:

1 CPSC 404, Laks V.S. Lakshmanan1 Overview of Query Evaluation Chapter 12 Ramakrishnan & Gehrke (Sections 12.1-12.3)

2 CPSC 404, Laks V.S. Lakshmanan2 What will you learn from this lecture? v Query Evaluation basics v System catalog (revisit) v Generic techniques for operator evaluation v Access paths v Implementation choices for select/project.

3 CPSC 404, Laks V.S. Lakshmanan3 Some basics to be borne in mind. v Recall  3 alternatives for data entry representation. v An index may be clustered or unclustered. v An index may be dense or sparse: –E.g., students(ID, name, addr, dept) file sorted on ID and indexed on ID (alternative #2) and on dept. (alt. #3). ID index – sparse; dept. index – dense. –Index on ID is a primary index while that on dept. is a secondary index. v A file may have multiple indexes.

4 CPSC 404, Laks V.S. Lakshmanan4 Schema for Examples v Assume the following sizes: v Songs: – Each tuple is 50 bytes long, 80 tuples per page, 500 pages. (= # index pages if it had a hash index with bkt=pg, and with data entries = alternative #1). [what are we assuming here?] v Ratings: – Each tuple is 40 bytes long, 100 tuples per page, 1000 pages. (= # leaf pages if it had a B+tree index, with data entries = alternative #1, and pages filled to capacity). Songs ( sid : integer, sname : string, genre : string, year : date) Ratings ( uid : integer, sid : integer, time : date, rating : integer)

5 CPSC 404, Laks V.S. Lakshmanan5 System catalog v maintains important metadata about all data in the db. v what is data? –application data in tables –also all indexes for the tables v system catalog = catalog tables = data dictionary = catalog.

6 CPSC 404, Laks V.S. Lakshmanan6 System catalog –  index: index name and structure (e.g., hash vs. B+trees, clustered or not), search key attribute(s). –auxiliary info. on views. v what info. is maintained in the catalog? v “static” info.: –  table: table name, file name, file structure (e.g., sorted/heap file), attribute name(s) and type(s), index name(s), primary and foreign key constraints.

7 CPSC 404, Laks V.S. Lakshmanan7 System catalog v statistical info.  relation: –cardinality = #tuples –size = #pages –index cardinality = #distinct keys –index size = #index pages (leaf) –index height – only for tree indexes –index range – min/max key values. v may maintain additional info. – e.g., Histograms. v most of the static info. stored as a table (already discussed).

8 CPSC 404, Laks V.S. Lakshmanan8 Generic Techniques for Operator Evaluation v indexing: use index to examine only tuples satisfying given selection/join condition. (also called index probe.) viviteration: scan a table (i.e., data pages) to retrieve tuples satisfying condition (table scan) OR scan index pages and examine data entries therein (index scan). –w–when is the latter feasible? why is it a good idea? –t–the actual index structure is NOT used. (i.e., index scan != index probe.) vpvpartitioning: sorting and hashing used for this purpose. (commonly needed for group-bys, duplicate elimination, etc.)

9 CPSC 404, Laks V.S. Lakshmanan9 Generic Techniques reviewed v Table scan = read the whole table (usually, from disk). v Index scan = *read* all data entries in the index file (makes sense for alternatives #2 and #3). –Have to read much less than for table scan. –Can we use index scan for alt. #1? –What if we had alternative 2 but a sparse index? v Index probe = *use* index to home in on tuples satisfying selection condition(s). –Typically, best option for equality selection.

10 CPSC 404, Laks V.S. Lakshmanan10 Some important terms v access path = method for accessing tuples of a table: –Table/index scan OR –index plus matching selection condition. (only consider conjunctions C of attr op value.)  consider probing the index. v when does an index match a selection condition C ? –hash index: for each search key attribute A, C has a conjunct of the form A = v. –B+tree index: for some prefix of search key, for every attribute A in the prefix, C has a conjunct of the form A op v. –why treat match differently? what does match do for us?

11 CPSC 404, Laks V.S. Lakshmanan11 Access Paths v suppose we have a hash index on {uid, sid, time}. does it match uid=1 & sid=2 & time=3? what about uid=1 & sid=2? and uid=1 & time=3? uid=1 & sid=2 & time > 2? v when a hash index matches a selection condition, we can fetch just those tuples that satisfy it. v when the match (between the index and selection condition) covers some (but not all) conjuncts in it, we can still fetch those tuples that satisfy all matched conjuncts and check them for satisfaction of remaining conditions. Clustered vs. unclustered index makes a big difference!

12 CPSC 404, Laks V.S. Lakshmanan12 Access Paths v suppose we have a B+tree index on (uid, sid, time). does it match uid=1 & sid=2 & time>=3? what about uid= 1 & sid=2? and uid=1 & time>=3? v when there is a match, we can exploit the B+tree index to fetch tuples satisfying the matched conjuncts. v if we had separate indexes on {uid, sid} and on {time} what are our options for time>=3 & uid=1 & sid=2? –use one of the indexes and verify unmatched conditions. –use both indexes and intersect rid sets.

13 CPSC 404, Laks V.S. Lakshmanan13 Reduction Factor v when there is total match, can fetch exactly tuples that satisfy given condition. v if not, can fetch tuples satisfying the matched conjuncts. v reduction factor: fraction of tuples in table that satisfy a conjunct. v smaller the RF  more selective the access path using the index. v RF often estimated using independence assumption.

14 CPSC 404, Laks V.S. Lakshmanan14 Operator– Take 1 Algorithms (Selection)  A op val (R). –No index?  table scan (plan 1). –If there are indexes, use one to fetch tuples satisfying matched conjuncts and verify rest (plan 2). –E.g.: sname < `S%’ & year=2007 on Songs. u Approx. 70% RF for sname < `S%’ (uniformity assumption – extremely approximate for this example). u Clustered B+tree index on sname: 70% x 500 = 350 pages. u Unclustered B+tree index: 70% x 40,000 = 28,000 tuples – could be 28,000 pages (I/Os) in the worst case! –Better off scanning table when RF > 1.25% in this example! u Can you construct a similar situation for Ratings?

15 CPSC 404, Laks V.S. Lakshmanan15 Some explanations v Where did RF=70% come from? Assuming sname’s starting with each letter are equally likely, for sname < `%S’, there are 18 possible cases in range out of a total of 26 possible cases (26 letters of the alphabet). This gives 18/26 which is approx. 70%. v Why are we better off scanning the table when RF > 1.25%? Because 40000 * 1.25% = 500 tuples which might in the worst case cost us 500 I/Os. Even if we did a table scan it would cost us the same!

16 CPSC 404, Laks V.S. Lakshmanan16 Projection --  uid,sid (Ratings). v Straightforward op; key challenge – duplicate elimination; can be expensive. v Avoid DE if you can. (e.g., no DISTINCT in select clause.) –Simple table scan or index scan when index keys include project fields. v DE: Plan 1 – use partitioning: –Scan Ratings and write out (uid,sid) pairs; handshake with phase 1 of external sorting. –Handshake last pass of phase 2 with DE.

17 CPSC 404, Laks V.S. Lakshmanan17 Projection (contd.) v Plan 2: If index data entries include {uid, sid}, then sort data entries as opposed to records. v Plan 3: Suppose there is a clustered index on (uid,sid, ), simply retireve data entries and DE on the fly. (since they are already sorted.) v Note: both plans above scan index file rather than data file! (But you look at the data entries in the index file.)


Download ppt "CPSC 404, Laks V.S. Lakshmanan1 Overview of Query Evaluation Chapter 12 Ramakrishnan & Gehrke (Sections 12.1-12.3)"

Similar presentations


Ads by Google