Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.

Slides:



Advertisements
Similar presentations
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 12, Part A.
Advertisements

Evaluation of Relational Operators CS634 Lecture 11, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapters 14.
1 Overview of Query Evaluation Chapter Objectives  Preliminaries:  Core query processing techniques  Catalog  Access paths to data  Index matching.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
SPRING 2004CENG 3521 Query Evaluation Chapters 12, 14.
1 Relational Query Optimization Module 5, Lecture 2.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
1 Implementation of Relational Operations Module 5, Lecture 1.
Quick Review of Apr 17 material Multiple-Key Access –There are good and bad ways to run queries on multiple single keys Indices on Multiple Attributes.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
1 Optimization - Selection. 2 The Selection Operation Table: Reserves(sid, bid, day, agent) A page (block) can hold 100 Reserves tuples There are 1,000.
Query Optimization II R&G, Chapters 12, 13, 14 Lecture 9.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 2, 2004 Some slide content derived.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
DBMS Internals: Storage February 27th, Representing Data Elements Relational database elements: A tuple is represented as a record CREATE TABLE.
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Overview of Implementing Relational Operators and Query Evaluation
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Implementing Relational Operators and Query Evaluation Chapter 12.
1 Overview of Query Evaluation Chapter Overview of Query Evaluation  Plan : Tree of R.A. ops, with choice of alg for each op.  Each operator typically.
Database systems/COMP4910/Melikyan1 Relational Query Optimization How are SQL queries are translated into relational algebra? How does the optimizer estimates.
Advanced Databases: Lecture 8 Query Optimization (III) 1 Query Optimization Advanced Databases By Dr. Akhtar Ali.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Advanced Databases: Lecture 6 Query Optimization (I) 1 Introduction to query processing + Implementing Relational Algebra Advanced Databases By Dr. Akhtar.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Database Systems/comp4910/spring20031 Evaluation of Relational Operations Why does a DBMS implements several algorithms for each algebra operation? What.
Implementing Natural Joins, R. Ramakrishnan and J. Gehrke with corrections by Christoph F. Eick 1 Implementing Natural Joins.
Implementation of Relational Operators/Estimated Cost 1.Select 2.Join.
1 Database Systems ( 資料庫系統 ) December 3, 2008 Lecture #10.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Implementing Relational Operators and Query Evaluation Chapter 12.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 12 – Introduction to.
Database Management Systems 1 Raghu Ramakrishnan Evaluation of Relational Operations Chpt 14.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
1 Database Systems ( 資料庫系統 ) Chapter 12 Overview of Query Evaluation November 22, 2004 By Hao-hua Chu ( 朱浩華 )
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Content based on Chapter 10 Database Management Systems, (3 rd.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Database Applications (15-415) DBMS Internals- Part IX Lecture 20, March 31, 2016 Mohammad Hammoud.
CS522 Advanced database Systems
Database Management System
Storage and Indexes Chapter 8 & 9
Database Management Systems (CS 564)
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Introduction to Query Optimization
Evaluation of Relational Operations
Overview of Query Optimization
Introduction to Database Systems
Examples of Physical Query Plan Alternatives
Relational Operations
Database Applications (15-415) DBMS Internals- Part VI Lecture 15, Oct 23, 2016 Mohammad Hammoud.
B+-Trees and Static Hashing
Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud.
Overview of Query Evaluation
Overview of Query Evaluation
Implementation of Relational Operations
Evaluation of Relational Operations: Other Techniques
Database Systems (資料庫系統)
Overview of Query Evaluation
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Relational Query Optimization
Relational Query Optimization
Presentation transcript:

Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali

Advanced Databases: Lecture 2 Query Optimization (I) 2 What is Optimization Best use of resources. –Good time management –Effective allocations of lecturers, labs to course units Efficient solution to a problem. –Quick response to a user query Less costly. –Solar Energy Vs. Nuclear Vs. hydro-electric power –Minimum I/O, CPU cycles, Memory Space

Advanced Databases: Lecture 2 Query Optimization (I) 3 Query Optimization A classical component of a DBMS. Choosing best composition of algebraic operators to answer a query. –A query (e.g. in SQL) may have several alternative representations in algebra. –The optimizer selects a best possible algebraic representation. Choosing an efficient and less costly plan to answer a query. –One that takes less time to compute. –One with least cost (in terms of I/Os). Why Query Optimization? –To make query evaluation faster. –To reduce the response time of the query processor. –To allow the user write queries without being aware of the physical access mechanisms and without asking her/him to explicitly dictate the system how the queries should be evaluated.

Advanced Databases: Lecture 2 Query Optimization (I) 4 Recommended Text Database Management Systems By R. Ramakrishnan, Chapters 12, 13 (copy provided) Fundamental of Database Systems – 3 rd Edition By R. Elmasri and S. B. Navati, Chapter 18 An Introduction to Database Systems – 7 th Edition By C. J. Date, Chapter 17

Advanced Databases: Lecture 2 Query Optimization (I) 5 Query Processing – the context

Advanced Databases: Lecture 2 Query Optimization (I) 6 Example database schema We will use the following schema throughout this lecture: Sailors(sid:integer, sname:string, rating:integer, age:real) Reserves(sid:integer, bid:integer, day:date, rname:string) Consider the following statistics about the relations. –Each tuple of Reserves is 40 bytes long, –A data page can hold 100 Reserves tuples, –The size of Reserves relation is 1000 pages, –Each tuple of Sailors is 50 bytes long, –A data page can hold 80 Sailors tuples, and –The size of Sailors relation is 500 pages.

Advanced Databases: Lecture 2 Query Optimization (I) 7 Translating SQL into Relational Algebra After the SQL query is parsed and it is syntactically correct, then it is mapped onto Relational Algebra (RA) expression. Usually shown as a query tree (bottom up). Consider the SQL query: SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid = S.sid AND R.bid = 100 AND S.rating > 5 The same query in RA:  sname (  bid=100 and rating > 5 (Reserves ⋈ sid=sid Sailors))

Advanced Databases: Lecture 2 Query Optimization (I) 8 Implementation of Relational Operators We will discuss how to implement: –Selection (  ) Selects a subset of rows from a relation. –Projection (  ) Picks only required attributes and removes unwanted attributes from a relation. –Join (⋈) Combines two relations.

Advanced Databases: Lecture 2 Query Optimization (I) 9 Access Paths There is usually more than one way to retrieve tuples from a relation, if indexes are available and if the query contains selection conditions. The selection condition comes from a select or a join. The alternative ways to retrieve tuples from a relation are called access paths. An access path is either: –A file scan (when there is no selection condition or no index can be used). –An index plus a matching selection condition. For example, attr op value, where op is an operator (, =), and there is an index available on attr.

Advanced Databases: Lecture 2 Query Optimization (I) 10 Implementing Selection operator Depends on the available file organizations, that is whether we have: –No index available and the physical file for a given relation is unsorted. Too much expensive. –No index but the file is sorted on some attribute. –A B+ tree index is available. –A Hash index is available. For each of the above, the selection operator costs differently and that is the main thing to know.

Advanced Databases: Lecture 2 Query Optimization (I) 11 Selection Operator – an Example Query Consider the following query: SELECT * FROM Reserves WHERE rname = ‘Joe’ Consider that there are 100 tuples that qualify for the result of the above query. That is 100 tuples have rname = ‘Joe’.

Advanced Databases: Lecture 2 Query Optimization (I) 12 Selection using no index & no sorting For a general selection query:  R.attr op value (R), we have to scan the entire file to get the qualifying tuples. Note that op can be, =, <>, etc. For each tuple, it is tested to see if the given condition (R.attr op value) holds. If the conditions holds then the tuple is added to the result. The cost of this approach is M I/Os, where M is the number of pages in R. For the example query, the cost is 1000 I/Os because there are 1000 pages in Reserves relation.

Advanced Databases: Lecture 2 Query Optimization (I) 13 Selection using sorting but no index For a general selection query:  R.attr op value (R), if R is physically sorted on R.attr, we use a binary search to locate the first qualifying tuple. We keep on testing the condition on the tuples in every page that is scanned and add them to the result until the condition fails to hold. The cost of this approach is equal to the cost of binary search plus the number of pages that have been read. –The cost of binary search = log 2 M I/Os –The cost of retrieving tuples = T I/Os where T is the number of pages scanned to retrieve the qualifying tuples. For the example query, the cost is computed as follows: –The binary search cost = log = log 1000/ log 2 = 9.96  10 –Since the number of qualifying tuples are 100, 1 page will hold these tuples and scanning that page will cost 1 I/O. –So the total cost is = 11 I/Os.

Advanced Databases: Lecture 2 Query Optimization (I) 14 B+ tree Index B+ tree index is a balanced tree in which the internal nodes (the top two levels) direct the search and the leaf nodes contain data entries. Searching for a record requires just a traversal from the root to the appropriate leaf node. The length of the path from the root to a leaf is called height of the tree (usually 2 or 3). To search for entry 9*, we follow the left most child pointer from the root (as 9 6). Once at the leaf node, data entries can be found sequentially. Leaf nodes are inter-connected which makes it suitable for range queries.

Advanced Databases: Lecture 2 Query Optimization (I) 15 Selection using B+ tree index For a general selection query:  R.attr op value (R), B+ tree is best if R.attr is not equality (e.g. ). It is also good for = operator. We search the B+ tree to find the first page that contains a qualifying tuple. Assume that the tree index is clustered. We then read all those pages that contain the qualifying tuples. The cost of this approach is equal to the sum of the following: –The cost of identifying the starting page = 2 or 3 I/Os. We assume 2 I/Os throughout. –The cost of retrieving tuples = T I/Os where T is the number of pages scanned to retrieve the qualifying tuples. For the example query, the cost is computed as follows: –Since the number of qualifying tuples are 100, 1 page will hold these tuples and scanning that page will cost 1 I/O. –So the total cost is = 3 I/Os.

Advanced Databases: Lecture 2 Query Optimization (I) 16 Hash Index A function called hash function is applied to the hash field value (key field) to get the address of the disk page in which the record is stored. A bucket is a set of records. The directory is an array of size n (4 in the figure), each element is a pointer to a bucket. To search for a data entry: the hash function is applied to the search field and the last bits of its binary form is used to get a number between 0 and 3. this number gives the array position to get the pointer to the desired bucket. to locate a record with key field 5 (binary 101), we look at directory element 01 and follow the pointer to the data page (Bucket B).

Advanced Databases: Lecture 2 Query Optimization (I) 17 Selection using Hash Index For a general selection query:  R.attr op value (R), hash index is best if R.attr is equality (=). It is not good for not equality (e.g., <>). We retrieve the index page that contain the rids (record identifiers) of the qualifying tuples. Then the pages that contain these tuples are scanned. The cost of this approach is equal to the sum of the following: –The cost to retrieve the index page = 1 I/O –The cost of retrieving tuples = T I/Os where T is the number of pages scanned to retrieve the qualifying tuples. –For none-equality operators, T = the number of qualifying tuples. For the example query, the cost is computed as follows: –Since the number of qualifying tuples are 100, 1 page will hold these tuples and scanning that page will cost 1 I/O. –So the total cost is = 2 I/Os.

Advanced Databases: Lecture 2 Query Optimization (I) 18 Summary of Lecture 7 Query Optimization –What and why Query Processing –The various stages through which a query goes Translation of SQL into Relational Algebra –Internal representation of the query Access Paths –Different paths and ways to get the same data Implementation of the Selection Operator –Different ways of evaluating selection using different access paths