Query Compiler: 16.7 Completing the Physical Query-Plan CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung ID: 212.

Slides:



Advertisements
Similar presentations
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Advertisements

1 Lecture 23: Query Execution Friday, March 4, 2005.
15.8 Algorithms using more than two passes Presented By: Seungbeom Ma (ID 125) Professor: Dr. T. Y. Lin Computer Science Department San Jose State University.
CS CS4432: Database Systems II Operator Algorithms Chapter 15.
Bhargav Vadher (208) APRIL 9 th, 2008 Submittetd To: Dr. T Y Lin Computer Science Department San Jose State University.
Dr. Kalpakis CMSC 661, Principles of Database Systems Query Execution [15]
Completing the Physical-Query-Plan. Query compiler so far Parsed the query. Converted it to an initial logical query plan. Improved that logical query.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
COMP 451/651 Optimizing Performance
Greedy Algo. for Selecting a Join Order The "greediness" is based on the idea that we want to keep the intermediate relations as small as possible at each.
Nested-Loop joins “one-and-a-half” pass method, since one relation will be read just once. Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Lecture 24: Query Execution Monday, November 20, 2000.
Cs44321 CS4432: Database Systems II Query Optimizer – Cost Based Optimization.
15.6 Index-Based Algorithms Sadiya Hameed ID: 206 CS257.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
Query Execution 15.5 Two-pass Algorithms based on Hashing By Swathi Vegesna.
THE QUERY COMPILER 16.6 CHOOSING AN ORDER FOR JOINS By: Nitin Mathur Id: 110 CS: 257 Sec-1.
1 Lecture 22: Query Execution Wednesday, March 2, 2005.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
15.5 Two-Pass Algorithms Based on Hashing 115 ChenKuang Yang.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
T HE Q UERY C OMPILER Prepared by : Ankit Patel (226)
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 242 Database Systems II Query Execution.
CSCE Database Systems Chapter 15: Query Execution 1.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
CPS216: Advanced Database Systems Notes 07:Query Execution Shivnath Babu.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
DBMS 2001Notes 5: Query Processing1 Principles of Database Management Systems 5: Query Processing Pekka Kilpeläinen (partially based on Stanford CS245.
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
CS4432: Database Systems II Query Processing- Part 3 1.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book Krishna Vellanki 124.
16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.
Lecture 24 Query Execution Monday, November 28, 2005.
Multi pass algorithms. Nested-Loop joins Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to.
CS4432: Database Systems II Query Processing- Part 2.
CSCE Database Systems Chapter 15: Query Execution 1.
Query Processing CS 405G Introduction to Database Systems.
Lecture 17: Query Execution Tuesday, February 28, 2001.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Completing the Physical- Query-Plan and Chapter 16 Summary ( ) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.
Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
1 Choosing an Order for Joins. 2 What is the best way to join n relations? SELECT … FROM A, B, C, D WHERE A.x = B.y AND C.z = D.z Hash-Join Sort-JoinIndex-Join.
CS 540 Database Management Systems
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Query Processing Spring 2016.
1 Lecture 23: Query Execution Monday, November 26, 2001.
Query Processing COMP3017 Advanced Databases Nicholas Gibbins
CS4432: Database Systems II Query Processing- Part 1 1.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
15.1 – Introduction to physical-Query-plan operators
Query Processing Exercise Session 4.
Database Management System
Prepared by : Ankit Patel (226)
Chapter 15 QUERY EXECUTION.
Query Execution Presented by Khadke, Suvarna CS 257
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Chapter 12 Query Processing (1)
Query Execution Index Based Algorithms (15.6)
Lecture 23: Query Execution
Lecture 11: B+ Trees and Query Execution
Completing the Physical-Query-Plan and Chapter 16 Summary ( )
Lecture 24: Query Execution
Lecture 20: Query Execution
Presentation transcript:

Query Compiler: 16.7 Completing the Physical Query-Plan CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung ID: 212

Outline 16.7 Completing the Physical-Query-Plan I. Choosing a Selection Method II. Choosing a Join Method III. Pipelining Versus Materialization IV. Pipelining Unary Operations V. Pipelining Binary Operations

Before complete Physical-Query-Plan A query previously has been Parsed and Preprocessed (16.1) Converted to Logical Query Plans (16.3) Estimated the Costs of Operations (16.4) Determined costs by Cost-Based Plan Selection (16.5) Weighed costs of join operations by choosing an Order for Joins

16.7 Completing the Physical-Query-Plan 3 topics related to turning LP into a complete physical plan 1. Choosing of physical implementations such as Selection and Join methods 2. Decisions regarding to intermediate results (Materialized or Pipelined) 3. Notation for physical-query-plan operators

I. Choosing a Selection Method (A) Algorithms for each selection operators 1. Can we use an created index on an attribute? If yes, index-scan. Otherwise table-scan) 2. After retrieve all condition-satisfied tuples in (1), then filter them with the rest selection conditions

Choosing a Selection Method(A) (cont.) Recall  Cost of query = # disk I/O’s How costs for various plans are estimated from σ C (R) operation 1. Cost of table-scan algorithm a) B(R) if R is clustered b) T(R) if R is not clustered 2. Cost of a plan picking an equality term (e.g. a = 10) w/ index-scan a) B(R) / V(R, a) clustering index b) T(R) / V(R, a) nonclustering index 3. Cost of a plan picking an inequality term (e.g. b < 20) w/ index-scan a) B(R) / 3 clustering index b) T(R) / 3 nonclustering index

Example Selection: σ x=1 AND y=2 AND z<5 (R) - Where paremeters of R(x, y, z) are : T(R)=5000,B(R)=200, V(R,x)=100, andV(R, y)=500 - Relation R is clustered - x, y have nonclustering indexes, only index on z is clustering.

Example (cont.) Selection options: 1. Table-scan  filter x, y, z. Cost is B(R) = 200 since R is clustered. 2. Use index on x =1  filter on y, z. Cost is 50 since T(R) / V(R, x) is (5000/100) = 50 tuples, index is not clustering. 3. Use index on y =2  filter on x, z. Cost is 10 since T(R) / V(R, y) is (5000/500) = 10 tuples using nonclustering index. 4. Index-scan on clustering index w/ z < 5  filter x, y. Cost is about B(R) /3 = 67

Example (cont.) Costs option 1 = 200 option 2 = 50 option 3 = 10 option 4 = 67 The lowest Cost is option 3. Therefore, the preferred physical plan 1. retrieves all tuples with y = 2 2. then filters for the rest two conditions (x, z).

II. Choosing a Join Method Determine costs associated with each join algorithms: 1. One-pass join, and nested-loop join devotes enough buffer to joining 2. Sort-join is preferred when attributes are pre-sorted or two or more join on the same attribute such as ( R(a, b) S(a, c)) T(a, d) - where sorting R and S on a will produce result of R S to be sorted on a and used directly in next join

3. Index-join for a join with high chance of using index created on the join attribute such as R(a, b) S(b, c) 4. Hashing join is the best choice for unsorted or non-indexing relations which needs multipass join. Choosing a Join Method (cont.)

III. Pipelining Versus Materialization Materialization (naïve way) store (intermediate) result of each operations on disk Pipelining (more efficient way) Interleave the execution of several operations, the tuples produced by one operation are passed directly to the operations that used it store (intermediate) result of each operations on buffer, which is implemented on main memory

Unary = a-tuple-at-a-time or full relation selection and projection are the best candidates for pipelining. IV. Pipelining Unary Operations R In buf Unary operation Out buf In buf Unary operation Out buf M-1 buffers

Pipelining Unary Operations (cont.) Pipelining Unary Operations are implemented by iterators

V. Pipelining Binary Operations Binary operations : , , -,, x The results of binary operations can also be pipelined. Use one buffer to pass result to its consumer, one block at a time. The extended example shows tradeoffs and opportunities

Example Consider physical query plan for the expression ( R(w, x) S(x, y)) U(y, z) Assumption R occupies 5,000 blocks, S and U each 10,000 blocks. The intermediate result R S occupies k blocks for some k. Both joins will be implemented as hash-joins, either one-pass or two-pass depending on k There are 101 buffers available.

Example (cont.) First consider join R S, neither relations fits in buffers Needs two-pass hash-join to partition R into 100 buckets (maximum possible) each bucket has 50 blocks The 2 nd pass hash-join uses 51 buffers, leaving the rest 50 buffers for joining result of R S with U.

Example (cont.) Case 1: suppose k  49, the result of R S occupies at most 49 blocks. Steps 1. Pipeline in R S into 49 buffers 2. Organize them for lookup as a hash table 3. Use one buffer left to read each block of U in turn 4. Execute the second join as one-pass join.

Example (cont.) The total number of I/O’s is 55,000 45,000 for two-pass hash join of R and S 10,000 to read U for one-pass hash join of (R S) U.

Example (cont.) Case 2: suppose k > 49 but < 5,000, we can still pipeline, but need another strategy which intermediate results join with U in a 50- bucket, two-pass hash-join. Steps are: 1. Before start on R S, we hash U into 50 buckets of 200 blocks each. 2. Perform two-pass hash join of R and U using 51 buffers as case 1, and placing results in 50 remaining buffers to form 50 buckets for the join of R S with U. 3. Finally, join R S with U bucket by bucket.

Example (cont.) The number of disk I/O’s is: 20,000 to read U and write its tuples into buckets 45,000 for two-pass hash-join R S k to write out the buckets of R S k+10,000 to read the buckets of R S and U in the final join The total cost is 75,000+2k.

Example (cont.) Compare Increasing I/O’s between case 1 and case 2 k  49 (case 1) Disk I/O’s is 55,000 k > 50  5000 (case 2) k=50, I/O’s is 75,000+(2*50) = 75,100 k=51, I/O’s is 75,000+(2*51) = 75,102 k=52, I/O’s is 75,000+(2*52) = 75,104 Notice: I/O’s discretely grows as k increases from 49  50.

Example (cont.) Case 3: k > 5,000, we cannot perform two-pass join in 50 buffers available if result of R S is pipelined. Steps are 1. Compute R S using two-pass join and store the result on disk. 2. Join result on (1) with U, using two-pass join.

Example (cont.) The number of disk I/O’s is: 45,000 for two-pass hash-join R and S k to store R S on disk 30,000 + k for two-pass join of U in R S The total cost is 75,000+4k.

Example (cont.) In summary, costs of physical plan as function of R S size.

Questions & Answers

For your attention

Reference [1] H. Garcia-Molina, J. Ullman, and J. Widom, “Database System: The Complete Book,” second edition: p , Prentice Hall, New Jersy, 2008