1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different.

Slides:



Advertisements
Similar presentations
Examples of Physical Query Plan Alternatives
Advertisements

Database Management Systems, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 12, Part A.
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
SPRING 2004CENG 3521 Query Evaluation Chapters 12, 14.
1 Relational Query Optimization Module 5, Lecture 2.
1 Implementation of Relational Operations Module 5, Lecture 1.
DB performance tuning using indexes Section 8.5 and Chapters 20 (Raghu)
Evaluation of Relational Operators 198:541. Relational Operations  We will consider how to implement: Selection ( ) Selects a subset of rows from relation.
1  Simple Nested Loops Join:  Block Nested Loops Join  Index Nested Loops Join  Sort Merge Join  Hash Join  Hybrid Hash Join Evaluation of Relational.
SPRING 2004CENG 3521 Join Algorithms Chapter 14. SPRING 2004CENG 3522 Schema for Examples Similar to old schema; rname added for variations. Reserves:
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
Query Processing 1: Joins and Sorting R&G, Chapters 12, 13, 14 Lecture 8 One of the advantages of being disorderly is that one is constantly making exciting.
1 Optimization - Selection. 2 The Selection Operation Table: Reserves(sid, bid, day, agent) A page (block) can hold 100 Reserves tuples There are 1,000.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
CS186 Final Review Query Optimization.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
1 Evaluation of Relational Operations Yanlei Diao UMass Amherst March 01, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
1 Implementation of Relational Operations: Joins.
Query Processing 2: Sorting & Joins
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Implementing Relational Operators and Query Evaluation Chapter 12.
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
Lec3/Database Systems/COMP4910/031 Evaluation of Relational Operations Chapter 14.
RELATIONAL JOIN Advanced Data Structures. Equality Joins With One Join Column External Sorting 2 SELECT * FROM Reserves R1, Sailors S1 WHERE R1.sid=S1.sid.
Implementing Natural Joins, R. Ramakrishnan and J. Gehrke with corrections by Christoph F. Eick 1 Implementing Natural Joins.
Storage and Indexing1 Overview of Storage and Indexing.
1 Overview of Storage and Indexing Chapter 8. 2 Data on External Storage  Disks: Can retrieve random page at fixed cost  But reading several consecutive.
1 Database Systems ( 資料庫系統 ) December 7, 2011 Lecture #11.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
More Optimization Exercises. Block Nested Loops Join Suppose there are B buffer pages Cost: M + ceil (M/(B-2))*N where –M is the number of pages of R.
Database Management Systems 1 Raghu Ramakrishnan Evaluation of Relational Operations Chpt 14.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
1 Triggers. 2 PL/SQL reminder We presented PL/SQL- a Procedural extension to the SQL language. We reviewed the structure of an anonymous PL/SQL block:
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2007.
1 Overview of Storage and Indexing Chapter 8. 2 Review: Architecture of a DBMS  A typical DBMS has a layered architecture.  The figure does not show.
Implementation of Relational Operations Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein, Mike Franklin, and etc for.
Database Systems (資料庫系統)
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Introduction to Query Optimization
Evaluation of Relational Operations
Evaluation of Relational Operations: Other Operations
Introduction to Database Systems
Examples of Physical Query Plan Alternatives
Relational Operations
Database Applications (15-415) DBMS Internals- Part VII Lecture 19, March 27, 2018 Mohammad Hammoud.
Database Applications (15-415) DBMS Internals- Part VI Lecture 15, Oct 23, 2016 Mohammad Hammoud.
Selected Topics: External Sorting, Join Algorithms, …
Overview of Query Evaluation
Implementation of Relational Operations
Relational Query Optimization
CS222: Principles of Data Management Notes #11 Selection, Projection
Evaluation of Relational Operations: Other Techniques
Overview of Query Evaluation: JOINS
Lecture 11: B+ Trees and Query Execution
Evaluation of Relational Operations: Other Techniques
CS222/CS122C: Principles of Data Management UCI, Fall Notes #11 Join!
CS222P: Principles of Data Management UCI, Fall 2018 Notes #11 Join!
Presentation transcript:

1 Optimization Recap and examples

2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different alternatives could result in huge run-time differences. Our aim is to introduce the basic hardware used, and optimization principles

3 Disk-Memory-CPU Delete from Sailors where sid=90 DISK sailors Reserves Main Memory CPU

4 Hardware Recap The DB is kept on the Disk. The Disk is divided into BLOCKS Any processing of the information occurs in the Main Memory. Therefore, a block which we want to access has to be brought from the Disk to the memory, and perhaps written back. Blocks are read/written from/to the Disk as single units. The time of reading/writing a block to/from the disk is an I/O operation, and takes a lot of time.

5 Hardware Recap We assume a constant time for each Disk access, and that only disk access define the run time. We do not consider writing to the disk Every table in the DB is stored as a File (on the Disk), which is a ‘bunch of Blocks’. We will deal with files that are ‘heap-sorted’, i.e., there is no order in the file tuples Every block contains many tuples, each of them has a Record ID (RID), which states its location: (number of block, number of tuple within the block)

6 SID SNAME ratingage 1923Joe Phil Boe Bill Paul Jim Vicky954 RID Block 1 Block 2 Block 3 (1,1) (1,2) (1,3) (2,1) (2,2) (2,3) (3,1)

7 SID SNAME ratingage 1923Joe Phil Boe Bill Paul Jim Vicky954 B blocks t tuples Q: What would be the cost of the following queries? Select * from sailors Select * from sailors where sname= ‘ Jim ’ Select * from sailors where rating>4 Answer: B

8 Indexes on files An Index on a table is an additional file which helps access the data fast. An index holds ‘data entries’ to the table file The index can have the structure of a B+ Tree, or a hash function.

9 Tree index on sname of sailors ‘ A ’ -> ’ M ’ B1 ‘ N ’ -> ’ Z ’ B2 ‘ N ’ -> ’ T ’ L3 ‘ U ’ -> ’ Z ’ L4 ‘ A ’ -> ’ G ’ L1 ‘ H ’ -> ’ M ’ L2 Root block Leaf blocks Branch blocks ‘ Bill ’ (2,1) ‘ Boe ’ (1,3) ‘ Vicky ’ (3,1) … ‘ Paul ’ (2,2) ‘ Phil ’ (1,2) ‘ Jim ’ (2,3) ‘ Joe ’ (1,1) B1 L4L3 L2L1 B2 SIDSNAMEratingage 1923Joe Phil Boe Bill Paul Jim Vicky954

10 Tree index The tree is kept balanced The tree entries are always ordered The leaves point to the exact location of tuples Getting to the leaf is typically 2-3 I/O Each leaf points to the next/previous leaf A Clustered index means that the index and the table are ordered by the same attribute

11 Tree index on sname of sailors ‘ Bill ’ (2,1) ‘ Boe ’ (1,3) ‘ Phil ’ (1,2) … ‘ Joe ’ (2,2) ‘ Joe ’ (3,1) ‘ Jim ’ (2,3) ‘ Joe ’ (1,1) B1 L4L3 L2L1 B2 SIDSNAMEratingage 1923Joe Phil Boe Bill Joe Jim Joe954 How would the following queries be processed? Select * from sailors where sname= ‘ Joe ’ Select * from sailors Select * from sailors where sname> ’ J ’ Notice: index is not clustered

12 Tree index on sname of sailors ‘ Bill ’ (1,1) ‘ Boe ’ (1,2) ‘ Phil ’ (3,1) … ‘ Joe ’ (2,2) ‘ Joe ’ (2,3) ‘ Jim ’ (1,1) ‘ Joe ’ (2,1) B1 L4L3 L2L1 B2 SIDSNAMEratingage 1226Bill Boe Jim Joe Joe Joe Phil941 How would the following queries be processed? Select * from sailors where sname= ‘ Joe ’ Select * from sailors Select * from sailors where sname> ’ J ’ Notice: index is clustered

13 Hash index Works in a similar way, but using a hash function instead of a tree Works only for equality conditions Average of 1.2 I/O to get to the tuple location

14 Natural Join We want to compute Naïve algorithm: SELECT * FROM Reserves R, Sailors S WHERE R.sid = S.sid Foreach tuple r in R Foreach tuple s in S if r.sid=s.sid add r,s to result Cost: B R +t R *B S Running example data t R =5000 t S =10, tuples per block 12 buffer pages = *200=1,000,100

15 Natural Join We want to compute We have 4 optional algorithms: 1.Block Nested Loops Join 2.Index Nested Loops Join 3.Sort Merge Join 4.Hash Join SELECT * FROM Reserves R, Sailors S WHERE R.sid = S.sid This is assuming there is not enough space in the memory for the smaller of the 2 relations+2

16 Block Nested Loop Join Suppose there are B available blocks in the memory, B R blocks of relation R, and B S blocks of relations S, and B R <B S Until all blocks of R have been read: –Read B-2 blocks of R –Read all blocks of S (one by one), and write the result Run time: B R + B S * ceil(B R /(B-2)) = *100/10=2,100

17 Index Nested Loop Suppose there is an index on sid of Sailors Until all blocks of R have been read: –Read a block of R –For each tuple in the block, use the index of S to locate the matching tuples in S. We mark the time it takes to read the tuples in S that match a single tuple in R as X. Run time: B R + t R *X If the index is clustered, X=2-4 If it is not clustered, we evaluate X. = *3=15,100

18 Q: So when would we typically choose to use an index-nested loop over block- nested? A: Look at the inequality…

19 Sort-Merge Join Sort both relations on the join column Join them according to the join algorithm: sidbiddayagent /4/96Joe /3/96Frank /2/96Joe /7/96Sam /7/96Sam /6/96Frank sidsnameratingage 22dustin745 28yuppy935 31lubber855 36lubber636 44guppy535 58rusty1035

20 Run time of Sort-Merge M,N: number of blocks of the relations Sorting: MlogM+NlogN Merging: N+M if no partition is scanned twice. Total: MlogM+NlogN+N+M Especially good if one or both of the relations are already sorted. = 100*7+200* =2,600

21Question Suppose: tuple size= 100 bytes number of tuples (employees)=3,000 Page size=1000 bytes You have an unclustered index on Hobby. You know that 50 employees collect stamps. Would you use the index? And for 1,000 stamp-lovers? SELECT E.dno FROM Employees E WHERE E.hobby=‘stamps’

22 Question 2 Length of tuples, Number of tuples –Emp: 20 bytes, 20,000 tuples –Dept: 40 bytes, 5000 tuples Pages contain 4000 bytes; 12 buffer pages Which algorithm would you use if there is an unclustered tree index on E.eid? And clustered? SELECT E.ename FROM Employees E, Departments D WHERE E.eid=D.eid