Query processing and optimization

Slides:



Advertisements
Similar presentations
The Relational Model and Relational Algebra Nothing is so practical as a good theory Kurt Lewin, 1945.
Advertisements

Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
1 CSE 480: Database Systems Lecture 22: Query Optimization Reference: Read Chapter 15.6 – 15.8 of the textbook.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query Compiler. The Query Compiler Parses SQL query into parse tree Transforms parse tree into expression tree (logical query plan) Transforms logical.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
CS 4432query processing1 CS4432: Database Systems II.
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization See Sections 15.1, 2, 3, 7.
ICS (072)Query Processing and Optimization 1 Chapter 15 Algorithms for Query Processing and Optimization ICS 424 Advanced Database Systems Dr.
Chapter 19 Query Processing and Optimization
The Relational Model Codd (1970): based on set theory Relational model: represents the database as a collection of relations (a table of values --> file)
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
Query Processing Presented by Aung S. Win.
Relational Model & Relational Algebra. 2 Relational Model u Terminology of relational model. u How tables are used to represent data. u Connection between.
1 The Relational Data Model, Relational Constraints, and The Relational Algebra.
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
CSCE Database Systems Chapter 15: Query Execution 1.
Database Management 9. course. Execution of queries.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Department of Computer Science and Engineering, HKUST Slide Query Processing and Optimization Query Processing and Optimization.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
Query Processing and Optimization
SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview.
Lecture 11: Query processing and optimization Jose M. Peña
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
CSCI Query Processing1 QUERY PROCESSING & OPTIMIZATION Dr. Awad Khalil Computer Science Department AUC.
Relational Algebra p BIT DBMS II.
Query Processing – Implementing Set Operations and Joins Chap. 19.
The Relational Model of Data Prof. Yin-Fu Huang CSIE, NYUST Chapter 2.
Chapter 18 Query Processing and Optimization. Chapter Outline u Introduction. u Using Heuristics in Query Optimization –Query Trees and Query Graphs –Transformation.
Chapter 13: Query Processing
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
Query Processing COMP3017 Advanced Databases Nicholas Gibbins
Relational Algebra COMP3211 Advanced Databases Nicholas Gibbins
Chapter 71 The Relational Data Model, Relational Constraints & The Relational Algebra.
CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.
Relational Model Database Management Systems, 3rd ed., Ramakrishnan and Gehrke, Chapter 3.
Query Processing and Optimization, and Database Tuning
COMP3017 Advanced Databases
Indexes By Adrienne Watt.
Module 2: Intro to Relational Model
Database Management System
Lecture 2 The Relational Model
Methodology – Physical Database Design for Relational Databases
Chapter 2: Intro to Relational Model
Constraints AND Examples
Relational Algebra Chapter 4, Part A
Chapter 15 QUERY EXECUTION.
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
CS 3630 Database Design and Implementation
Instructor: Mohamed Eltabakh
Relational Algebra Chapter 4, Sections 4.1 – 4.2
QUERY OPTIMIZATION.
Chapter 2: Intro to Relational Model
Chapter 2: Intro to Relational Model
Relational Database Design
Example of a Relation attributes (or columns) tuples (or rows)
Database Design: Relational Model
Chapter 2: Intro to Relational Model
Evaluation of Relational Operations: Other Techniques
CPSC-608 Database Systems
INSTRUCTOR: MRS T.G. ZHOU
Database Dr. Roueida Mohammed.
Question 1: Basic Concepts (45 %)
Query Compiler By:Payal Gupta Shirali Choksi Professor :Tsau Young Lin.
Query Processing.
Constraints AND Examples
Algorithms for Query Processing and Optimization
Presentation transcript:

Query processing and optimization Jose M. Peña jose.m.pena@liu.se

ER diagram Relational model MySQL

Relation schema PNumber Name Address Telephone E-mail Age Attributes yymmdd-xxxx Textual string less than 30 chars rrr - nn nn nn aaaaannn Positive integer 0<x<150 Domain = set of atomic values

Relation PNumber Name Address Telephone E-mail Age 123456-7890 Anders Andersson Rydsvägen 1 013-11 22 33 andan111 25 112233-4455 Veronika Pettersson Alsätersg 2 013-22 33 44 verpe222 27 Tuple = list of values in the corresponding domains, or NULL

Key constraints Relation = set of tuples. Then, no duplicates are allowed. Then, every tuple is uniquely identifiable (superkey, candidate key, primary key which are all time-invariant). PNumber Name Address Telephone E-mail Age 123456-7890 Anders Andersson Rydsvägen 1 013-11 22 33 andan111 25 112233-4455 Veronika Pettersson Alsätersg 2 013-22 33 44 verpe222 27

Integrity constraints Entity integrity constraint = no primary key value is NULL. A set of attributes FK in a relation R1 is a foreign key to another relation R2 with primary key PK if domain(FK) = domain(PK), and FK in R1 takes value NULL or one of the values of PK in R2. Referential integrity constraint = conditions (i) and (ii) above hold.

Relational algebra Relational algebra = language for querying the relational model. It is a procedural language = how to carry out the query, as opposed to what to retrieve = declarative language, i.e. relational calculus. Basis for SQL. Basis for implementation and optimization of queries.

Select Selects the tuples of a relation satisfying some condition over its attributes.

Example: select STUDENT: PNum Name Address TelNr PNum Name Address 112233-4455 Elin Rydsvägen 1 112233 223344-5566 Nisse Alsätersgatan 3 223344 334455-6677 Rydsvägen 3 334455 113322-1122 Pelle Rydsvägen 2 113322 552233-1144 Monika Rydsvägen 4 443322 442211-2222 Patrik Rydsvägen 6 111122 334433-1111 Camilla Alsätersgatan 1 665544 PNum Name Address TelNr 334455-6677 Nisse Rydsvägen 3 334455 334433-1111 Camilla Alsätersgatan 1 665544

Project Projects a relation over some attributes. The result must be a relation = duplicates are removed.

Example: project STUDENT: PNum Name Address TelNr PNum Name 112233-4455 Elin Rydsvägen 1 112233 223344-5566 Nisse Alsätersgatan 3 223344 334455-6677 Rydsvägen 3 334455 PNum Name 112233-4455 Elin 223344-5566 Nisse 334455-6677

Union, intersection and difference R and S must be compatible, i.e. the same number of attributes and with the same domains. The result must be a relation = duplicates are removed (union).

Example: Intersection STUDENT: PNum Name Address TelNr 112233-4455 Elin Rydsvägen 1 112233 223344-5566 Nisse Alsätersgatan 3 223344 334455-6677 Rydsvägen 3 334455 EMPLOYEE: PNum Name Office address TelNr 884455-4455 Monika Teknikringen 1 111112 223344-5566 Nisse Alsätersgatan 3 223344 668877-7766 Patrik Teknikringen 3 332211 PNum Name Address TelNr 223344-5566 Nisse Alsätersgatan 3 223344

Cartesian product R: S: R x S Name STATE Name STATE Key City Los Angeles Calif 5 San Fransisco 7 Oakland 8 Boston Atlanta Ga Mass R: Name STATE Los Angeles Calif Oakland Atlanta Ga San Fransisco Boston Mass S: R x S Key City 5 San Fransisco 7 Oakland 8 Boston

Join Joins two tuples from two relations if they satisfy some condition over their attributes. Join = Cartesian product followed by selection. Tuples with NULL in the condition attributes do not appear in the result. Recall: Join only on foreign key-primary key attributes. R S R.A1=S.B3 AND R.A5<S.A1

Example: join R: S: Name STATE Key City R S R.Name=S.City Los Angeles Calif Oakland Atlanta Ga San Fransisco Boston Mass Key City 5 San Fransisco 7 Oakland 8 Boston R S R.Name=S.City Name STATE Key City Oakland Calif 7 San Fransisco 5 Boston Mass 8

Los Angeles Atlanta Name STATE Key City Calif 5 San Fransisco 7 Oakland 8 Boston Atlanta Ga Mass

Example: join R: S: Name Area R S R.Area<=S.Key Key City Los Angeles 2 Oakland 9 Atlanta 7 San Fransisco 11 Boston 16 Name Area Key City Los Angeles 2 5 San Fransisco 7 Oakland 8 Boston Atlanta R S S: R.Area<=S.Key Key City 5 San Fransisco 7 Oakland 8 Boston

Los Angeles Atlanta Name Area Key City 2 5 San Fransisco 7 Oakland 8 Boston 9 Atlanta 11 16

Variants of join Theta join = join. Equijoin = join with only equality conditions. Natural join = equijoin in which one of the duplicate attributes is removed (attributes in the conditions must have the same name). Unless otherwise specified, natural join joins all the attributes with the same name in R and S. R * S A

Example

Query trees πattributes σcondition Tree that represents a relational algebra expression. Leaves = base tables. Internal nodes = relational algebra operators applied to the node’s children. The tree is executed from leaves to root. Example: List the last name of the employees born after 1957 who work on a project named ”Aquarius”. SELECT E.LNAME FROM EMPLOYEE E, WORKS_ON W, PROJECT P WHERE P.PNAME = ‘Aquarius’ AND P.PNUMBER = W.PNO AND W.ESSN = E.SSN AND E.BDATE > ‘1957-12-31’ πattributes Canonial query tree SELECT attributes FROM A, B, C WHERE condition X C A B σcondition Construct the canonical query tree as follows Cartesian product of the FROM-tables Select with WHERE-condition Project to the SELECT-attributes

Equivalent query trees

Query processing Real world User 4 User 3 User 2 Model User 1 Updates Queries Answers User 2 Updates Queries Answers Model Queries Answers Updates User 1 Updates Queries Answers Processing of queries and updates Database management system Access to stored data Physical database 24

Query processing Canonical query tree (usually very inefficient) StarsIn( movieTitle, movieYear, starName ) MovieStar( name, address, gender, birthdate ) SELECT movieTitle FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE ’%1960’); Canonical query tree (usually very inefficient)

Parsing and validating Control of used relations: They have to be declared in FROM. They must exist in the database. Control and resolve attributes: Attributes must exist in the relations. Type checking: Attributes that are compared must be of the same type.

Query optimizer Heuristic: Use joins instead of cartesian product+selections and do selection and projection as soon as possible, in order to keep the intermediate tables as small as possible, because if the tables do not fit in memory, then we need to perform fewer disc accesses, if the tables fit in memory, then we use less memory, if the tables are distributed, then we reduce communication, and if the tables have to be sorted, joined, etc., then we use less computation power

Query optimizer Heuristic algorithm: Fewest tuples ? Smallest size ? Smallest selectivity ? DBMS catalog contains required info. Heuristic algorithm: Break up conjunctive select into cascade. Move down select as far as possible in the tree. Rearrange select operations: The most restrictive should be executed first. Convert Cartesian product followed by selection into join. Move down project operations as far as possible in the tree. Create new projections so that only the required attributes are involved in the tree. Identify subtrees that can be executed by a single algorithm.

Equivalence rules

Execution plans Execution plan: Optimized query tree extended with access methods and algorithms to implement the operations.

Query optimizer Compare the estimate cost estimate of different execution plans and choose the cheapest. The cost estimate decomposes into the following components. Access cost to secondary storage. Depends on the access method and file organization. Leading term for large databases. Storage cost . Storing intermediate results on disk. Computation cost. In-memory searching, sorting, computation. Leading term for small databases. Memory usage cost. Memory buffers needed in the server. Communication cost. Remote connection cost, network transfer cost. Leading term for distributed databases. The costs above are estimated via the information in the DBMS catalog (e.g. #records, record size, #blocks, primary and secondary access methods, #distinct values, selectivity, etc.).

Exercises True or false ? Optimize the queries below:

Solutions

Solutions

Solutions