Database Management 9. course. Execution of queries.

Slides:



Advertisements
Similar presentations
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Advertisements

Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapters 14.
1 CSE 480: Database Systems Lecture 22: Query Optimization Reference: Read Chapter 15.6 – 15.8 of the textbook.
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
SPRING 2004CENG 3521 Query Evaluation Chapters 12, 14.
Query processing and optimization. Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level.
Ch.14: Query Optimization  Introduction  Catalog Information for Cost Estimation  Estimation of Statistics  Transformation of Relational Expressions.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Processing (overview)
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Database System Concepts 5 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 14: Query Optimization.
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization See Sections 15.1, 2, 3, 7.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
ICS (072)Query Processing and Optimization 1 Chapter 15 Algorithms for Query Processing and Optimization ICS 424 Advanced Database Systems Dr.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Bitmap Indexes.
Query Processing & Optimization
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
Query Processing Presented by Aung S. Win.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
COMP 5138 Relational Database Management Systems Semester 2, 2007 Lecture 12 Query Processing and Optimization.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
Department of Computer Science and Engineering, HKUST Slide Query Processing and Optimization Query Processing and Optimization.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 13: Query Processing.
Computing & Information Sciences Kansas State University Tuesday, 03 Apr 2007CIS 560: Database System Concepts Lecture 29 of 42 Tuesday, 03 April 2007.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Database Management 7. course. Reminder Disk and RAM RAID Levels Disk space management Buffering Heap files Page formats Record formats.
16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.
CS4432: Database Systems II Query Processing- Part 2.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Query Processing and Query Optimization Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Chapter 18 Query Processing and Optimization. Chapter Outline u Introduction. u Using Heuristics in Query Optimization –Query Trees and Query Graphs –Transformation.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 13: Query Processing
CS4432: Database Systems II Query Processing- Part 1 1.
Database Applications (15-415) DBMS Internals- Part IX Lecture 20, March 31, 2016 Mohammad Hammoud.
Query Processing  Basic Steps in Query Processing – an overview  Measures of Query Cost  Query Processing- Several algorithms  Selection Operation.
Query Processing and Query Optimization Database System Implementation CSE 507 Slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts.
Database Management 7. course. Reminder Disk and RAM RAID Levels Disk space management Buffering Heap files Page formats Record formats.
Chiu Luk CS257 Database Systems Principles Spring 2009
Database System Implementation CSE 507
Database Management System
Prepared by : Ankit Patel (226)
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Chapter 12: Query Processing
Overview of Query Optimization
Chapter 15 QUERY EXECUTION.
File Processing : Query Processing
File Processing : Query Processing
Query Processing and Optimization
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
One-Pass Algorithms for Database Operations (15.2)
Chapter 12 Query Processing (1)
Query Optimization.
Lecture 11: B+ Trees and Query Execution
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Presentation transcript:

Database Management 9. course

Execution of queries

Query evaluation Query Parse, compile Relational algebra Optimize Execution plan Evaluate Statistics Query output Data

Query – SQL Parse – Correct SQL query? Relational algebra – Understandable for the computer Optimize – Based on what?

Execution plan – If several queries give the same result: which is the best? Evaluate – Find the proper data records Query output – Give answer to the user

Optimization example Data of a bank select balance from account where balance<2500 Two relational algebra representation

The cost of an operation depends on the algorithms we can use: e.g. an index speeds up the selection Primitive: elemental operation (projection, selection, …) Pipeline: building blocks for evaluations and statistics Input of a primitive=output of the previous primitive

Catalogue cost approximation For choosing the proper strategy The approximation of cost is needed Cost approximation can be done based on several attributes – Space – Time Statistics are stored in the catalogue

Content of the catalogue Number of records in relation r: n r Number of blocks used for relation r: b r Size of one records in relation r: s r Number of records in one block: f r Number of different values of attribute A in relation r: V(A,r) = |π A (r)| Average number of records that fulfills an equality selection for attribute A: SC(A,r)

Catalogue information about indexes Hash tables are considered as special indexes Average number of pointers in one node (averge number of children): f i Height of tree i: HT i =|log fi V(A,r)| or in case of hash, HT i =1 Lowest level index Block (number of leaf nodes): LB i

Statistics should be updated after every modification  expensive Updated when DB has time Not always consistent, but gives good approximation

Cost of operations Just approximations: reading/writing is assumed to need the same time

Equality selection

Range selection

Types of join Distinct

Nested with blocks

Indexed nested-loop join If one of the relations is indexed No need for full scan Cost: b r + n r *c, where c is the cost of selection on s

Merge join First sort the relations based on the join attributes Reading the relations once is enough Cost: cost of sorting+b r +b s

Other operations Filter repetition (distinct) – Sort – Delete Cost: cost of sorting Projection: cost of sorting +(filter repetition+)b r Union: Sort relations+merge+filter repetition Intersection: sort both+select common rows Difference: sort+delete rows from 2nd relation

Evaluation - Materialization Tree of operations Leaves: relations Nodes: operations Cost: storing temporal relations + cost of operations Parallel processing

Pipelining Temporal storing is reduced Result records are given for the next process and not stored any more Save memory (records are stored, not relations) Sorting is not possible Demand-driven pipeline: system requires data when needed Data-driven pipeline: operations push data to the pipeline without request until the buffer gets full

Pipeline evaluation Records arrive one after another Merge cannot be used Indexed nested-loop join can be used

Transformation of relational expressions Transform to equivalent expressions with smaller evaluation time Example: Give me the names of customers who have account in Brooklyn Time consuming (selection after join 3 tables) Much better

Equivalence rules Predicates: Θ, Θ 1, Θ 2 Attributes: L 1, L 2, L 3 Relational algebra expression: E, E 1, E 2 Cascade selection: Commutativity: Cascade projection: Connection of join and Descartes multipliation:

Commutativity of theta-join: Associativity of natural join: Distributivity of selection on join – Θ 0 contains attributes from E 1

Commutativity of union and intersection Assiciativity of union and intersection Distributivity of selection on union, intersection, and difference Distributivity of projection on union

These are only examples!

Choosing evaluation plan Create algorithm for the expressions Give order for the operations Take them into processes Example: pipeline use 1. index use linear scan Sort to filter repetition

Cost-based optimization List all the equivalent expressions Assign execution plan for every plan Calculate the cost for every plan Choose the cheapest (based on approximations and statistics) Disadvantage: if too many plan, then too many pre calculations

Example Joining 3 relations: 6 ways and parenthesized in two ways: (2*(n-1)!) / (n-1)! If n=10 then 176 billions of plans… Solution: use some heuristics Consider First optimal join for the first 3 relations, then join with the rest: plans remain  not good!

Rules for heuristics Do the selection at the beginning to reduce the number of rows Do the projection as soon as possible to reduce the size of rows Split the conjunction of selections to sequence of selections (use only one selection at the time) Push down the selections on the tree Use the selection or join which results in the least number of rows  use associativity of join

If join is equivalent to a Descartes multiplication and a selection comes next then merge them into a join operation: less records are generated Break the projection lists, push them up on the tree (sometimes new projections can be generated) Search subtrees where pipeline can be applied

1.By applying the rules, several trees are got 2.Calculate the cost 3.Apply the cheapest The optimization adds a cost  optimize it The optimal optimizer optimizes the cost of its own work and the execution too.

Thank you for your attention!