CS 338Query Evaluation7-1 Query Evaluation Lecture Topics Query interpretation Basic operations Costs of basic operations Examples Textbook Chapter 12.

Slides:



Advertisements
Similar presentations
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Advertisements

CS CS4432: Database Systems II Operator Algorithms Chapter 15.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
Query processing and optimization. Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level.
1 Implementation of Relational Operations Module 5, Lecture 1.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
SPRING 2004CENG 3521 Join Algorithms Chapter 14. SPRING 2004CENG 3522 Schema for Examples Similar to old schema; rname added for variations. Reserves:
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Processing (overview)
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
Chapter 19 Query Processing and Optimization
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
1 Implementation of Relational Operations: Joins.
Query Processing Chapter 12
COMP 5138 Relational Database Management Systems Semester 2, 2007 Lecture 12 Query Processing and Optimization.
CSCE Database Systems Chapter 15: Query Execution 1.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Implementing Natural Joins, R. Ramakrishnan and J. Gehrke with corrections by Christoph F. Eick 1 Implementing Natural Joins.
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 13: Query Processing.
Computing & Information Sciences Kansas State University Tuesday, 03 Apr 2007CIS 560: Database System Concepts Lecture 29 of 42 Tuesday, 03 April 2007.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
Chapter 12 Query Processing. Query Processing n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation of Expressions 2.
CS4432: Database Systems II Query Processing- Part 3 1.
CS4432: Database Systems II Query Processing- Part 2.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
Query Processing CS 405G Introduction to Database Systems.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
CS 540 Database Management Systems
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
Query Processing and Query Optimization CS 157B Dennis Le Weishan Wang.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 13: Query Processing
CS4432: Database Systems II Query Processing- Part 1 1.
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Management System
Evaluation of Relational Operations
COST ESTIMATION FOR THE RELATIONAL ALGEBRA OPERATIONS MIT 813 GROUP 15 PRESENTATION.
Chapter 15 QUERY EXECUTION.
Evaluation of Relational Operations: Other Operations
Examples of Physical Query Plan Alternatives
File Processing : Query Processing
Relational Operations
Yan Huang - CSCI5330 Database Implementation – Access Methods
Chapters 15 and 16b: Query Optimization
Lecture 2- Query Processing (continued)
Overview of Query Evaluation
Implementation of Relational Operations
Lecture 23: Query Execution
Evaluation of Relational Operations: Other Techniques
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

CS 338Query Evaluation7-1 Query Evaluation Lecture Topics Query interpretation Basic operations Costs of basic operations Examples Textbook Chapter 12

CS 338Query Evaluation7-2 Steps in query interpretation Translation –check SQL syntax –check existence of relations and attributes –replace views by their definitions –transform query into an internal form (similar to relational algebra) Optimization –generate alternative access plans, i.e., procedure, for processing the query –select an efficient access plan Processing –execute the access plan Data Delivery

CS 338Query Evaluation7-3 Example select V.Vno, Vname, count(*), sum(Amount) from Vendor V, Transaction T where V.Vno = T.Vno and V.Vno between 1000 and 2000 group by V.Vno, Vname having sum(Amount) > 100 The following is a possible access plan for this query: –Scan the Vendor table, select all tuples where Vno is between 1000 and 2000, eliminate attributes other than Vno and Vname, and place the result in a temporary relation R 1 –Join the tables R 1 and Transaction, eliminate attributes other than Vno, Vname, and Amount, and place the result in a temporary relation R 2. This may involve: sorting R 1 on Vno sorting Transaction on Vno merging the two sorted relations to produce R 2

CS 338Query Evaluation7-4 Perform grouping on R 2, and place the result in a temporary relation R 3. This may involve: –sorting R 2 on Vno and Vname –grouping tuples with identical values of Vno and Vname –counting the number of tuples in each group, and adding their Amounts Scan R 3, select all tuples with sum(Amount) > 100 to produce the result.... continued

CS 338Query Evaluation7-5 Pictorial Access Plan Result Vendor Transaction Scan (Sum(Amount) > 100) Grouping (Vno, Vname) Join (V.Vno = T.Vno) Scan (Vno between 1000 and 2000)

CS 338Query Evaluation7-6 Some basic query processing operations Tuple Selection –without an index –with a clustered index –with an unclustered index –with multiple indices Projection Joining –nested loop join –hash join –sort-merge join –and others... Grouping and Duplicate Elimination –by sorting –by hashing Sorting

CS 338Query Evaluation7-7 Tuple Selection sequential scanning: –when no appropriate index is available, we must check all file blocks holding tuples for the relation. –even if an index is available, scanning the entire relation may be faster in certain circumstances: the relation is very small the relation is large, but we expect most of the tuples in the relation to satisfy the selection criteria –E.g.: select Anum from Accounts where Balance > 1.00

CS 338Query Evaluation7-8 Joining relations select C.Cnum, Balance from Customer C, Accounts A where C.Cnum = A.Cnum Two relations can be joined using the “nested loop” join method. Conceptually, that method works like this: for each tuple c in Customer do for each tuple a in Accounts do if c.Cnum = a.Cnum then output c.Cnum,a.Balance end end

CS 338Query Evaluation7-9 Other techniques for join There are many other ways to perform a join. For example, if there is an index on the Cnum attribute of the Accounts relation, an index join could be used. Conceptually, that would work as follows: for each tuple c in Customer do use the index to find Accounts tuples a where a.Cnum matches c.Cnum if there are any such tuples a then output c.Cnum, a.Balance end end

CS 338Query Evaluation7-10 other join techniques: –sort-merge join: sort the tuples of Accounts and of Customer on their Cnum values, then merge the sorted relations. –hash join: assign each tuple of Accounts and of Customer to a “bucket” by applying a hash function to its Cnum value. Within each bucket, look for Account and Customer tuples that have matching Cnum values.... continued

CS 338Query Evaluation7-11 Costs of basic operations Alternative access plans may be compared according to cost. The cost of an access plan is the sum of the costs of its component operations. There are many possible cost metrics. However, most metrics reflect the amounts of system resources consumed by the access plan. System resources may include: –disk block I/O’s –processing time –network bandwidth

CS 338Query Evaluation7-12 A simple cost model The cost of an operation will be an estimate of the number of file block retrievals required to perform that operation. The cost of an operation will depend on the number of input tuples for that operation, and on the tuple blocking factor. The blocking factor is the number of tuples that are stored in each file block. For a relation R: –|R| will represent the number of tuples in R –b(R) will represent the blocking factor for R –|R|/b(R) is the number of blocks used to store R We will assume that an index search has a cost of 2.

CS 338Query Evaluation7-13 Cost of selection relation Mark = (Studnum, Course, Assignnum, Mark) select Studnum, Mark from Mark where Course = PHYS and Studnum = 100 and Mark > 90 Indices: –clustering index CourseInd on Course –non-clustering index StudnumInd on Studnum Assume: –|Mark| = –b(Mark) = 50 –500 different students –100 different courses –10 different assignments

CS 338Query Evaluation7-14 Selection Strategy: use CourseInd Assuming uniform distribution of tuples over the courses, there will be about |Mark|/100 = 100 tuples with Course = PHYS. Searching the CourseInd index has a cost of 2. Retrieval of the 100 matching tuples adds a cost of 100/b(Mark) data blocks, for a total cost of 4. Selection of N tuples from relation R using a clustered index has a cost of 2 + N/b(R)

CS 338Query Evaluation7-15 Selection strategy: use StudnumInd Assuming uniform distribution of tuples over student numbers, there will be about |Mark|/500 = 20 tuples for each student. Searching the StudnumInd has a cost of 2. Since this is not a clustered index, we will make the pessimistic assumption that each matching record is on a separate data block, i.e., 20 blocks will need to be read. The total cost is 22. Selection of N tuples from relation R using a non-clustered index has a cost of 2 + N

CS 338Query Evaluation7-16 Selection strategy: scan the relation The relation occupies 10,000/50 = 200 blocks, so 200 block I/O operations will be required. Selection of N tuples from relation R by scanning the entire relation has a cost of |R|/b(R)

CS 338Query Evaluation7-17 Cost of joining relation Mark = (Studnum, Course, Assignnum, Mark) relation Student = (Studnum, Surname, Initials, Address, Birthdate, Sex) selectSurname, Course, Assignnum, Mark from Mark natural join Student Assume: –|Mark| = 10,000 –b(Mark) = 50 –|Student| = 1,000 –b(Student) = 20

CS 338Query Evaluation7-18 Cost of nested-loop join for each Mark block M do for each Student block S do for each tuple m in M do for each tuple s in S do if m[Studnum] = s[Studnum] then join {m} and {s} and add to the result Each of the 200 blocks of Mark must be retrieved once. Each of the 50 blocks of Student must be retrieved once for each block of Mark, for a total of 200 x 50 = 10,000 student blocks retrieved. The total cost is 10,200 block retrievals.

CS 338Query Evaluation7-19 The cost of a nested-loop join of R 1 and R 2 is either: or depending on which relation is the outer relation continued

CS 338Query Evaluation7-20 Cost of index join Suppose there is a unique, non- clustered index on Student [Studnum] for each Mark block Bm do for each tuple m in Bm do use the index to locate Student tuples s with s[Studnum] = m[Studnum] join {m} and {s} and add to the result Each of Mark’s 200 blocks must be retrieved once. For each tuple of Mark, a single tuple of Student will be retrieved. This will require an index lookup and a data block retrieval, for a cost of 3. The total cost is * = 30,200 block retrievals.

CS 338Query Evaluation7-21 The cost of a unique index join, where R 1 is the outer relation, and R 2 is the inner, indexed relation, is: continued

CS 338Query Evaluation7-22 Summary An access plan is a detailed plan for the execution of a query. It describes: –the sequence of basic operations (select, project, join, etc.) used to process the query –how each operation will be implemented, e.g., which join method will be used, which indices will be used to perform a selection. Each operation may be assigned a cost, and an access plan may be assigned a cost by summing the costs of its component operations. The cost assigned to an operation depends on assumptions about the data in the relations.