CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Slides:



Advertisements
Similar presentations
Query Processing Chapter 21 in Textbook.
Advertisements

Query optimisation.
Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 19 Algorithms for Query Processing and Optimization.
Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
1 CSE 480: Database Systems Lecture 22: Query Optimization Reference: Read Chapter 15.6 – 15.8 of the textbook.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
SPRING 2004CENG 3521 Query Evaluation Chapters 12, 14.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 19 Algorithms for Query Processing and Optimization.
Query processing and optimization. Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level.
QUERY OPTIMIZATION AND QUERY PROCESSING.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization.
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization See Sections 15.1, 2, 3, 7.
ICS (072)Query Processing and Optimization 1 Chapter 15 Algorithms for Query Processing and Optimization ICS 424 Advanced Database Systems Dr.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
Query Processing & Optimization
Chapter 19 Query Processing and Optimization
Query Processing Presented by Aung S. Win.
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Advanced Databases: Lecture 8 Query Optimization (III) 1 Query Optimization Advanced Databases By Dr. Akhtar Ali.
Database Management 9. course. Execution of queries.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
Department of Computer Science and Engineering, HKUST Slide Query Processing and Optimization Query Processing and Optimization.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.
Query Processing and Optimization
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Copyright © Curt Hill Query Evaluation Translating a query into action.
Switch off your Mobiles Phones or Change Profile to Silent Mode.
Lecture 11: Query processing and optimization Jose M. Peña
Query Processing Bayu Adhi Tama, MTI. 1 ownerNoclient © Pearson Education Limited 1995, 2005.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Chapter 18 Query Processing. 2 Chapter - Objectives u Objectives of query processing and optimization. u Static versus dynamic query optimization. u How.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
9-1 © Prentice Hall, 2007 Topic 9: Physical Database Design Object-Oriented Systems Analysis and Design Joey F. George, Dinesh Batra, Joseph S. Valacich,
CSCI Query Processing1 QUERY PROCESSING & OPTIMIZATION Dr. Awad Khalil Computer Science Department AUC.
Advance Database Systems Query Optimization Ch 15 Department of Computer Science The University of Lahore.
Relational Algebra p BIT DBMS II.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Chapter 18 Query Processing and Optimization. Chapter Outline u Introduction. u Using Heuristics in Query Optimization –Query Trees and Query Graphs –Transformation.
CS4432: Database Systems II
Chapter 13: Query Processing
Database Applications (15-415) DBMS Internals- Part IX Lecture 20, March 31, 2016 Mohammad Hammoud.
CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.
Query Processing and Optimization, and Database Tuning
Query Optimization Heuristic Optimization
UNIT 11 Query Optimization
Database Management System
Chapter 12: Query Processing
Overview of Query Optimization
COST ESTIMATION FOR THE RELATIONAL ALGEBRA OPERATIONS MIT 813 GROUP 15 PRESENTATION.
Chapter 15 QUERY EXECUTION.
Evaluation of Relational Operations: Other Operations
Advance Database Systems
Evaluation of Relational Operations: Other Techniques
Evaluation of Relational Operations: Other Techniques
Algorithms for Query Processing and Optimization
Presentation transcript:

CS263 Lecture 19 Query Optimisation

 Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic Processing Strategies  Cost Estimation for RA Operations LECTURE PLAN

Motivation for Query Optimisation List all the managers that work in the sales department. SELECT * FROM emp, dept WHERE emp.deptno = dept.deptno AND emp.job = ‘Manager’ AND dept.name = ‘Sales’;  (job = ‘Manager’)  (name=‘Sales’)  (emp.deptno = dept.deptno) (EMP X DEPT)  (job = ‘Manager’)  (name=‘Sales’) (EMP emp.deptno = dept.deptno DEPT) (  (job = ‘Manager’) (EMP)) emp.deptno = dept.deptno (  (name=‘Sales’) (DEPT)) There are at least three alternative ways of representing this query as a Relational Algebra expression.

Motivation for Query Optimisation  (job = ‘Manager’)  (name=‘Sales’)  (emp.deptno = dept.deptno) (EMP X DEPT) Metrics: 1000 tuples in the EMP relation 50 tuples in the DEPT relation 50 employees are Managers (one per department) 5 separate Sales departments (across the country) Cost of processing the following query alternate: Cartesian product of EMP and DEPT: ( ) record I/O’s to read the relations +(1000 * 50) record I/O’s to create an intermediate relation to store result Selection on result of Cartesian product: (1000 * 50) record I/O’s to read tuples and compare against predicate Total cost of the query: ( ) + 2*(1000 * 50) = 101, 050 record I/O’s.

Motivation for Query Optimisation Metrics: 1000 tuples in the EMP relation 50 tuples in the DEPT relation 50 employees are Managers (one per department) 5 separate Sales departments (across the country) Cost of processing the following query alternate: Join of EMP and DEPT over deptno: ( ) record I/O’s to read the relations +(1000) record I/O’s to create an intermediate relation to store join result Selection on result of Join: (1000) record I/O’s to read each tuple and compare against predicate Total cost of the query: ( ) + 2*(1000) = 3, 050 record I/O’s.  (job = ‘Manager’)  (name=‘Sales’) (EMP emp.deptno = dept.deptno DEPT)

Motivation for Query Optimisation Cost of processing the following query: (  (job = ‘Manager’) (EMP)) emp.deptno = dept.deptno (  (name=‘Sales’) (DEPT)) Select ‘Managers’ in EMP: (1000) record I/O’s to read the relations +(50) record I/O’s to create an intermediate relation to store select result Select ‘Sales’ in DEPT: (50) record I/O’s to read the relations +(5) record I/O’s to create an intermediate relation to store select result Join of previous two selections over deptno: (50 + 5) record I/O’s to read the relations Total cost of the query: (1000 2*(50) + 5 +(50 +5)) = 1, 160 record I/O’s.

Phases of Query Processing

Query Processing Stage - 1 Cast the query into internal form  This involves the conversion of the original (SQL) query into some internal representation more suitable for machine manipulation.  The internal representation typically chosen is either some kind of ‘abstract syntax tree’, or a relational algebra ‘query tree’.

Relational Algebra Query Trees A Relational Algebra query can be represented as a ‘query tree’. For example the query to list all the managers that work in the sales department could be described as one of the following:  (job = ‘Manager’)  (name=‘Sales’)  (emp.deptno = dept.deptno) (EMP X DEPT) EMP DEPT X  (job = ‘Manager’)  (name=‘Sales’)  (emp.deptno = dept.deptno) Leaves Intermediate operations Root

Relational Algebra Query Trees A Relational Algebra query can be represented as a ‘query tree’. For example the query to list all the managers that work in the sales department could be described as one of the following:  (job = ‘Manager’)  (name=‘Sales’)  (emp.deptno = dept.deptno) (EMP X DEPT) EMP DEPT X  (job = ‘Manager’)  (name=‘Sales’)  (emp.deptno = dept.deptno) Leaves Intermediate operations Root

Relational Algebra Query Trees  (job = ‘Manager’)  (name=‘Sales’) (EMP emp.deptno = dept.deptno DEPT) EMP DEPT  (job = ‘Manager’)  (name=‘Sales’) emp.deptno = dept.deptno Alternative‘query tree’ for the query to list all the managers that work in the sales department:

Relational Algebra Query Trees (  (job = ‘Manager’) (EMP)) emp.deptno = dept.deptno (  (name=‘Sales’) (DEPT)) EMP DEPT emp.deptno = dept.deptno  (job = ‘Manager’)  (name=‘Sales’) Alternative‘query tree’ for the query to list all the managers that work in the sales department:

Query Processing Stage - 2 Convert to canonical form  Find a more ‘efficient’ representation of the query by converting the internal representation into some equivalent (canonical) form through the application of a set of well-defined ‘transformation rules’.  The set of transformation rules to apply will generally be the result of the application of specific heuristic processing strategies associated with particular DBMSs.

1.Conjunctive selection operations can cascade into individual selection operations (and vice versa). Sometimes referred to as cascade of selection.  p  q  r (R) =  p (  q (  r (R))) Example:  deptno=10  sal>1000 (Emp) =  deptno=10 (  sal>1000 (Emp)) Transformation Rules for RA Operations

2.Commutativity of selection  p (  q (R)) =  q (  p (R)) Example:  sal>1000 (  deptno=10 (Emp)) =  deptno=10 (  sal>1000 (Emp)) Transformation Rules for RA Operations

3.In a sequence of projection operations, only the last in the sequence is required.  L  M …  N (R) =  L (R) Example:  deptno  name (Dept) =  deptno (Dept)) Transformation Rules for RA Operations

4.Commutativity of selection and projection.  Ai, …, Am (  p (R)) =  p (  Ai, …, Am (R)) where p  {A 1, A 2, …, A m } Example:  name, job (  name=‘Smith’ (Emp)) =  name=‘Smith' (  name, job (Staff)) Transformation Rules for RA Operations Selection predicate (p) is only made up of projected attributes

5.Commutativity of theta-join (and Cartesian product).  R  p S = S  p R Transformation Rules for RA Operations R X S = S X R Example: EMP emp.deptno = dept.deptno DEPT = DEPT emp.deptno = dept.deptno EMP NOTE: Theta-join is a generalisation of both the equi-join and natural-join

6.Commutativity of selection and theta-join (or Cartesian product). Transformation Rules for RA Operations Example:  emp.deptno=10 (EMP)) emp.deptno = dept.deptno DEPT =  emp.deptno=10 (EMP emp.deptno = dept.deptno DEPT) (  p (R)) r S =  p ( R r S) where p  {A 1, A 2, …, A m } Selection predicate (p) is only made up of join attributes

7.Commutativity of projection and theta-join (or Cartesian product). Transformation Rules for RA Operations Example:  job, location, deptno (EMP emp.deptno = dept.deptno DEPT) = (  job, deptno (EMP)) emp.deptno = dept.deptno (  location, deptno (DEPT))  L (R r S) = (  L1 ( R)) r (  L2 ( S)) Project attributes L = L1  L2, where L1 are attributes of R, and L2 are attributes of S. L will also contain the join attributes

8.Commutativity of union and intersection (but not set difference). R  S = S  R R  S = S  R Transformation Rules for RA Operations

9.Commutativity of selection and set operations (union, intersection, and set difference). Union  p (R  S) =  p (S)   p (R) Intersection  p (R  S) =  p (S)   p (R) Set Difference  p (R - S) =  p (S) -  p (R)

10Commutativity of projection and union  L (R  S) =  L (S)   L (R) Transformation Rules for RA Operations

11Associativity of natural join (and Cartesian product)  Natural Join (R  S)  T = R  (S  T) Cartesian Product (R X S) X T = R X (S X T) Transformation Rules for RA Operations

12Associativity of union and intersection (but not set difference)  Union (R  S)  T = S  (R  T) Intersection (R  S)  T = S  (R  T)

Heuristic Processing Strategies Perform selection operations as early as possible Translate a Cartesian product and subsequent selection (whose predicate represents a join condition) into a join operation. Use associativity of binary operations to ensure that the most restrictive selection operations are executed first Perform projections as early as possible. Compute common expressions once

Heuristic Processing - Example EMP DEPT  (job =‘Manager’)  (name=‘Sales’) emp.deptno = dept.deptno EMP DEPT  (job =‘Manager’)  (name=‘Sales’) emp.deptno = dept.deptno EMP DEPT  (job =‘Manager’)  (name=‘Sales’) emp.deptno = dept.deptno EMP DEPT emp.deptno = dept.deptno  (job =‘Manager’)  (name=‘Sales’) EMP DEPT emp.deptno = dept.deptno  (job =‘Manager’)  (name=‘Sales’) EMP DEPT emp.deptno = dept.deptno  (job =‘Manager’)  (job =‘Manager’)  (name=‘Sales’) EMP DEPT X  (job =‘Manager’)  (name=‘Sales’)  (emp.deptno = dept.deptno) EMP DEPT X  (job =‘Manager’)  (name=‘Sales’)  (emp.deptno = dept.deptno) EMP DEPT X  (job =‘Manager’)  (name=‘Sales’)  (emp.deptno = dept.deptno) Optimised Canonical Query

Query Processing Stage - 3 Choose candidate low-level procedures  Consider the (optimised canonical) query as a series of low-level operations (join, restrict, etc…).  For each of these operations generate alternative execution strategies and calculate the cost of such strategies on the basis of statistical information held about the database tables (files).

Query Processing Stage - 4 Generate query plans and choose the cheapest  Construct a set of ‘candidate’ Query Execution Plans (QEPs).  Each QEP is constructed by selecting a candidate implementation procedure for each operation in the canonical query and then combining them to form a string of associated operations.  Each QEP will have an (estimated) cost associated with it – the sum of the cost of each of its operations.  Choose the QEP with the least cost.

Cost Based Optimisation Cost Based Optimisation (stages 3 & 4)  A good declarative query optimiser does not rely solely on heuristic processing strategies.  It chooses the QEP with the lowest estimated cost.  After heuristic rules are applied to a query, there still remains a number of alternative ways to execute it.  The Query Optimiser estimates the cost of executing each one (or at least a number) of these alternatives, and selects the cheapest one.

Costs associated with query execution Secondary storage access costs:  Searching for data blocks on disk,  Reading data blocks from disk  Writing data block to disk Storage costs  Cost of storing intermediate (temp) files Computation costs  Cost of CPU usage Main memory usage costs  Cost of buffering data Communication costs  Cost of moving data across

Database statistics used in cost estimation Information held on each relation:  number of tuples  number of blocks  blocking factor  primary access method  primary access attributes  secondary indexes  secondary indexing attributes  number of levels for each index  number of distinct values of each attribute

Physical Data Structures – File Types  Heap (Sequential, Unordered)  no key columns  queries, other than appends, scan every page  rows are appended at the end  duplicate rows are allowed  Ordered  physically sorted data file with no index  Hash (Random, Direct)  data is located based on the (calculated) value of a hash field (key)  Indexed Sequential (ISAM)  sorted data file with a primary index  B + Tree  dynamic multilevel index  reuses deleted space on associated data pages

Strategies for implementing the RESTRICT operation Different access strategies dependant upon the structure of the file in which the relation is stored, and whether the predicate attribute(s) have been indexed/hashed: Each uses a different cost algorithm (which refers to specific database statistics).  Linear Search (Heap)  Binary Search (Ordered)  Equality on Hash Key  Equality condition on primary key  Inequality condition on primary key  Equality condition on secondary index  Inequality condition on secondary B + Tree index If the selection predicate is a composite (AND & OR) then there are additional cost considerations!

Strategies for implementing the JOIN operation Different access strategies dependant upon the structure of the files in which the relations to be joined are stored, and whether the join attributes have been indexed/hashed: Each uses its own cost algorithm (which refers to specific database statistics).  Block nested loop join  Indexed nested loop join  Sort-merge join  Hash join

Query Optimisation Summary The aims of query processing are to transform a query written in a high-level language (SQL), into a correct and efficient execution strategy expressed in a low-level language (Relational Algebra), and to execute the strategy to retrieve the required data. There are many equivalent transformations of the same high-level query, the DBMS has to choose the one that minimises resource usage. There are two main techniques for query optimisation. The first uses heuristic rules that order the operations in a query. The second compares different execution strategies for those operations, based on their relative costs, and selects the least resource intensive (cheapest) ones.