CPSC 404, Laks V.S. Lakshmanan1 Overview of Query Evaluation Chapter 12 Ramakrishnan & Gehrke (Sections 12.1-12.3)

Slides:



Advertisements
Similar presentations
Overview of Query Evaluation (contd.) Chapter 12 Ramakrishnan and Gehrke (Sections )
Advertisements

Evaluation of Relational Operators CS634 Lecture 11, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Implementation of Other Relational Algebra Operators, R. Ramakrishnan and J. Gehrke1 Implementation of other Relational Algebra Operators Chapter 12.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 Overview of Query Evaluation Chapter Objectives  Preliminaries:  Core query processing techniques  Catalog  Access paths to data  Index matching.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
SPRING 2004CENG 3521 Query Evaluation Chapters 12, 14.
1 Relational Query Optimization Module 5, Lecture 2.
Implementation of Relational Operations CS186, Fall 2005 R&G - Chapter 14 First comes thought; then organization of that thought, into ideas and plans;
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
Evaluation of Relational Operators 198:541. Relational Operations  We will consider how to implement: Selection ( ) Selects a subset of rows from relation.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Query Optimization II R&G, Chapters 12, 13, 14 Lecture 9.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “How index-learning turns no student pale Yet.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
1 Evaluation of Relational Operations Yanlei Diao UMass Amherst March 01, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Overview of Implementing Relational Operators and Query Evaluation
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Implementing Relational Operators and Query Evaluation Chapter 12.
1 Overview of Query Evaluation Chapter Overview of Query Evaluation  Plan : Tree of R.A. ops, with choice of alg for each op.  Each operator typically.
CPSC 404, Laks V.S. Lakshmanan1 Tree-Structured Indexes BTrees -- ISAM Chapter 10 – Ramakrishnan & Gehrke (Sections )
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
Lec3/Database Systems/COMP4910/031 Evaluation of Relational Operations Chapter 14.
Copyright © Curt Hill Query Evaluation Translating a query into action.
Database Systems/comp4910/spring20031 Evaluation of Relational Operations Why does a DBMS implements several algorithms for each algebra operation? What.
1 Database Systems ( 資料庫系統 ) December 3, 2008 Lecture #10.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Implementing Relational Operators and Query Evaluation Chapter 12.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
1 Overview of Storage and Indexing Chapter 8. 2 Data on External Storage  Disks: Can retrieve random page at fixed cost  But reading several consecutive.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “How index-learning turns no student pale Yet.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “If you don’t find it in the index, look very.
CPSC404, Laks V.S. Lakshmanan1 Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections )
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 13 – Query Evaluation.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
File Organizations and Indexing
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 12 – Introduction to.
Database Management Systems 1 Raghu Ramakrishnan Evaluation of Relational Operations Chpt 14.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Evaluation of Relational Operations
Database Management Systems (CS 564)
Evaluation of Relational Operations: Other Operations
Relational Operations
Lecture 12 Lecture 12: Indexing.
CS222P: Principles of Data Management Notes #11 Selection, Projection
Database Applications (15-415) DBMS Internals- Part VI Lecture 15, Oct 23, 2016 Mohammad Hammoud.
Overview of Query Evaluation
Overview of Query Evaluation
Overview of Query Evaluation
Implementation of Relational Operations
CS222: Principles of Data Management Notes #11 Selection, Projection
Evaluation of Relational Operations: Other Techniques
Overview of Query Evaluation
Evaluation of Relational Operations: Other Techniques
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #10 Selection, Projection Instructor: Chen Li.
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #08 Comparisons of Indexes and Indexing Performance Instructor: Chen Li.
Presentation transcript:

CPSC 404, Laks V.S. Lakshmanan1 Overview of Query Evaluation Chapter 12 Ramakrishnan & Gehrke (Sections )

CPSC 404, Laks V.S. Lakshmanan2 What will you learn from this lecture? v Query Evaluation basics v System catalog (revisit) v Generic techniques for operator evaluation v Access paths v Implementation choices for select/project.

CPSC 404, Laks V.S. Lakshmanan3 Some basics to be borne in mind. v Recall  3 alternatives for data entry representation. v An index may be clustered or unclustered. v An index may be dense or sparse: –E.g., students(ID, name, addr, dept) file sorted on ID and indexed on ID (alternative #2) and on dept. (alt. #3). ID index – sparse; dept. index – dense. –Index on ID is a primary index while that on dept. is a secondary index. v A file may have multiple indexes.

CPSC 404, Laks V.S. Lakshmanan4 Schema for Examples v Assume the following sizes: v Songs: – Each tuple is 50 bytes long, 80 tuples per page, 500 pages. (= # index pages if it had a hash index with bkt=pg, and with data entries = alternative #1). [what are we assuming here?] v Ratings: – Each tuple is 40 bytes long, 100 tuples per page, 1000 pages. (= # leaf pages if it had a B+tree index, with data entries = alternative #1, and pages filled to capacity). Songs ( sid : integer, sname : string, genre : string, year : date) Ratings ( uid : integer, sid : integer, time : date, rating : integer)

CPSC 404, Laks V.S. Lakshmanan5 System catalog v maintains important metadata about all data in the db. v what is data? –application data in tables –also all indexes for the tables v system catalog = catalog tables = data dictionary = catalog.

CPSC 404, Laks V.S. Lakshmanan6 System catalog –  index: index name and structure (e.g., hash vs. B+trees, clustered or not), search key attribute(s). –auxiliary info. on views. v what info. is maintained in the catalog? v “static” info.: –  table: table name, file name, file structure (e.g., sorted/heap file), attribute name(s) and type(s), index name(s), primary and foreign key constraints.

CPSC 404, Laks V.S. Lakshmanan7 System catalog v statistical info.  relation: –cardinality = #tuples –size = #pages –index cardinality = #distinct keys –index size = #index pages (leaf) –index height – only for tree indexes –index range – min/max key values. v may maintain additional info. – e.g., Histograms. v most of the static info. stored as a table (already discussed).

CPSC 404, Laks V.S. Lakshmanan8 Generic Techniques for Operator Evaluation v indexing: use index to examine only tuples satisfying given selection/join condition. (also called index probe.) viviteration: scan a table (i.e., data pages) to retrieve tuples satisfying condition (table scan) OR scan index pages and examine data entries therein (index scan). –w–when is the latter feasible? why is it a good idea? –t–the actual index structure is NOT used. (i.e., index scan != index probe.) vpvpartitioning: sorting and hashing used for this purpose. (commonly needed for group-bys, duplicate elimination, etc.)

CPSC 404, Laks V.S. Lakshmanan9 Generic Techniques reviewed v Table scan = read the whole table (usually, from disk). v Index scan = *read* all data entries in the index file (makes sense for alternatives #2 and #3). –Have to read much less than for table scan. –Can we use index scan for alt. #1? –What if we had alternative 2 but a sparse index? v Index probe = *use* index to home in on tuples satisfying selection condition(s). –Typically, best option for equality selection.

CPSC 404, Laks V.S. Lakshmanan10 Some important terms v access path = method for accessing tuples of a table: –Table/index scan OR –index plus matching selection condition. (only consider conjunctions C of attr op value.)  consider probing the index. v when does an index match a selection condition C ? –hash index: for each search key attribute A, C has a conjunct of the form A = v. –B+tree index: for some prefix of search key, for every attribute A in the prefix, C has a conjunct of the form A op v. –why treat match differently? what does match do for us?

CPSC 404, Laks V.S. Lakshmanan11 Access Paths v suppose we have a hash index on {uid, sid, time}. does it match uid=1 & sid=2 & time=3? what about uid=1 & sid=2? and uid=1 & time=3? uid=1 & sid=2 & time > 2? v when a hash index matches a selection condition, we can fetch just those tuples that satisfy it. v when the match (between the index and selection condition) covers some (but not all) conjuncts in it, we can still fetch those tuples that satisfy all matched conjuncts and check them for satisfaction of remaining conditions. Clustered vs. unclustered index makes a big difference!

CPSC 404, Laks V.S. Lakshmanan12 Access Paths v suppose we have a B+tree index on (uid, sid, time). does it match uid=1 & sid=2 & time>=3? what about uid= 1 & sid=2? and uid=1 & time>=3? v when there is a match, we can exploit the B+tree index to fetch tuples satisfying the matched conjuncts. v if we had separate indexes on {uid, sid} and on {time} what are our options for time>=3 & uid=1 & sid=2? –use one of the indexes and verify unmatched conditions. –use both indexes and intersect rid sets.

CPSC 404, Laks V.S. Lakshmanan13 Reduction Factor v when there is total match, can fetch exactly tuples that satisfy given condition. v if not, can fetch tuples satisfying the matched conjuncts. v reduction factor: fraction of tuples in table that satisfy a conjunct. v smaller the RF  more selective the access path using the index. v RF often estimated using independence assumption.

CPSC 404, Laks V.S. Lakshmanan14 Operator– Take 1 Algorithms (Selection)  A op val (R). –No index?  table scan (plan 1). –If there are indexes, use one to fetch tuples satisfying matched conjuncts and verify rest (plan 2). –E.g.: sname < `S%’ & year=2007 on Songs. u Approx. 70% RF for sname < `S%’ (uniformity assumption – extremely approximate for this example). u Clustered B+tree index on sname: 70% x 500 = 350 pages. u Unclustered B+tree index: 70% x 40,000 = 28,000 tuples – could be 28,000 pages (I/Os) in the worst case! –Better off scanning table when RF > 1.25% in this example! u Can you construct a similar situation for Ratings?

CPSC 404, Laks V.S. Lakshmanan15 Some explanations v Where did RF=70% come from? Assuming sname’s starting with each letter are equally likely, for sname < `%S’, there are 18 possible cases in range out of a total of 26 possible cases (26 letters of the alphabet). This gives 18/26 which is approx. 70%. v Why are we better off scanning the table when RF > 1.25%? Because * 1.25% = 500 tuples which might in the worst case cost us 500 I/Os. Even if we did a table scan it would cost us the same!

CPSC 404, Laks V.S. Lakshmanan16 Projection --  uid,sid (Ratings). v Straightforward op; key challenge – duplicate elimination; can be expensive. v Avoid DE if you can. (e.g., no DISTINCT in select clause.) –Simple table scan or index scan when index keys include project fields. v DE: Plan 1 – use partitioning: –Scan Ratings and write out (uid,sid) pairs; handshake with phase 1 of external sorting. –Handshake last pass of phase 2 with DE.

CPSC 404, Laks V.S. Lakshmanan17 Projection (contd.) v Plan 2: If index data entries include {uid, sid}, then sort data entries as opposed to records. v Plan 3: Suppose there is a clustered index on (uid,sid, ), simply retireve data entries and DE on the fly. (since they are already sorted.) v Note: both plans above scan index file rather than data file! (But you look at the data entries in the index file.)