CS143:Evaluation and Optimization

Slides:



Advertisements
Similar presentations
Chapter 13: Query Processing
Advertisements

CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
1 40T1 60T2 30T3 10T4 20T5 10T6 60T7 40T8 20T9 R S C C R JOIN S?
CS CS4432: Database Systems II Operator Algorithms Chapter 15.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Processing (overview)
1 40T1 60T2 30T3 10T4 20T5 10T6 60T7 40T8 20T9 R S C C R JOIN S?
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
Query Evaluation and Optimization Main Steps 1.Translate into RA: select/project/join 2.Greedy optimization of RA: by pushing selection and projection.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 242 Database Systems II Query Execution.
CPS216: Advanced Database Systems Notes 06:Query Execution (Sort and Join operators) Shivnath Babu.
COMP 5138 Relational Database Management Systems Semester 2, 2007 Lecture 12 Query Processing and Optimization.
CPS216: Advanced Database Systems Notes 07:Query Execution Shivnath Babu.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
CPS216: Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.
CPS216: Advanced Database Systems Notes 07:Query Execution (Sort and Join operators) Shivnath Babu.
CS4432: Database Systems II Query Processing- Part 2.
CPS216: Advanced Database Systems Notes 07:Query Execution (Sort and Join operators) Shivnath Babu.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
CS 540 Database Management Systems
Query Processing and Query Optimization Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System.
CS4432: Database Systems II Query Processing- Part 1 1.
Chapter 4: Query Processing
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Management System
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Chapter 13: Query Processing
Chapter 12: Query Processing
Query Processing.
Chapter 13: Query Processing
Database Management Systems (CS 564)
File Processing : Query Processing
File Processing : Query Processing
Dynamic Hashing Good for database that grows and shrinks in size
Query Processing and Optimization
Yan Huang - CSCI5330 Database Implementation – Access Methods
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Query Processing.
Chapter 13: Query Processing
CPT-S 415 Big Data Yinghui Wu EME B45 1.
Query processing and optimization
Chapter 13: Query Processing
Example R1 R2 over common attribute C
Chapter 12: Query Processing
Chapter 13: Query Processing
Highlights of Query Processing And Optimization
Performance Join Operator Select * from R, S where R.a = S.a;
Module 13: Query Processing
Chapters 15 and 16b: Query Optimization
Chapter 13: Query Processing
Chapter 12: Query Processing
Lecture 2- Query Processing (continued)
Chapter 13: Query Processing
Chapter 13: Query Processing
Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.
Query processing and optimization
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Chapter 13: Query Processing
C. Faloutsos Query Optimization – part 2
Chapter 13: Query Processing
Chapter 13: Query Processing
Yan Huang - CSCI5330 Database Implementation – Query Processing
Chapter 12: Query Processing
Presentation transcript:

CS143:Evaluation and Optimization Ch 13-14

Basic Steps in Query Processing Parsing and translation: ... Translation of SQL queries into naïve RA Optimization: of RA expressions by Pushing selection and projection Estimating cost of different join orders, and selecting that of minimum cost Selecting best implementation for each operator. 3. Evaluation

Join Operation Several different algorithms to implement joins Nested-loop join Block nested-loop join Indexed nested-loop join Merge-join Hash-join Choice based on cost estimate

R JOIN S? Nested Join R S 40 T1 60 T2 30 T3 10 T4 20 T5 10 T6 60 T7 40 C C R S 40 T1 60 T2 30 T3 10 T4 20 T5 10 T6 60 T7 40 T8 20 T9

Nested-Loop Join For each r  R do For each s  S do if r.C = s.C then output r,s pair R S 40 T1 60 T2 30 T3 10 T4 20 T5 10 T6 60 T7 40 T8 20 T9 R is the outer relation and S is the inner relation

Block Nested-Loop Join Variant of nested-loop join in which every block of inner relation is paired with every block of outer relation. for each block Br of r do begin for each block Bs of s do begin for each tuple tr in Br do begin for each tuple ts in Bs do begin Check if (tr,ts) satisfy the join condition if they do, add tr • ts to the result. end end end end

Index Join (conceptual) If index supporting S.C does not exist , create one, and (2) For each r  R do X  index-lookup(S.C, r.C) For each s  X, output (r,s) R C 40 T1 60 T2 30 T3 10 T4 20 T5 S C 10 T6 60 T7 40 T8 20 T9

Selection and Indexing If we have a S.C condition supported by an existing index we use the index If we have a conjunction, such as S.A>5 and S.B <10 with indexes on both, then we can select the better of the two (optimization) If there is no index, do we create one?

Cost of Joins Nested Loop: bR +|R| x bS (bRand bS denote the number of blocks in R and S, resp.) this is the worst case. What is the best case? Pre-existing Index on S: bR +|R| x k (k: depth of index) Block-Nested-Loop Join: bR + bR x bS (bS:number of blocks in S)

Sort-Merge Join Sort the relations first and join R S 10 T4 20 T5 30 40 T1 60 T2 10 T6 20 T9 40 T8 60 T7

Merge-Join Can be used only for equi-joins and natural joins Sort both relations on their join attribute (if not already sorted on the join attributes). Merge the sorted relations to join them

Sort-Merge Join (conceptual) (1) if R and S not sorted, sort them (2) i  1; j  1; While (i  |R|)  (j  |S|) do if R[i].C = S[j].C then outputTuples else if R[i].C > S[j].C then j  j+1 else if R[i].C < S[j].C then i  i+1 R S 10 T4 20 T5 30 T3 40 T1 60 T2 10 T6 20 T9 40 T8 60 T7

External Sorting Using Sort-Merge

Hash Join Hash tuples into buckets Join between only matching buckets H(k) = k mod 3 1 2 R 1 2 S 40 T1 60 T2 30 T3 10 T4 20 T5 10 T6 60 T7 40 T8 20 T9

Hash Join (conceptual) Hash function h, range 1  k Algorithm Hashing stage (bucketizing) Hash R tuples into G1,…,Gk buckets Hash S tuples into H1,…,Hk buckets Join stage For i = 1 to k do match tuples in Gi, Hi buckets R S … … …

Hash Join Step (1): Hashing stage buckets R Step (2): Join stage ... Memory buckets G1 G2 Gk R S R H1 G1 H2 G2 ... G3 Gi Hi ... memory

Materialization vs Pipelining Materialized evaluation: evaluate one operation at a time, starting at the lowest-level. Use intermediate results materialized into temporary relations to evaluate next-level operations. Pipelined evaluation: one tuple at the time.

Summary of Join Algorithms Nested-loop join ok for “small” relations (relative to memory size) Hash join usually best for equi-join if relations not sorted and no index Merge join for sorted relations Sort merge join good for non-equi-join Consider index join if index exists DBMS maintains statistics on data

Statistics collection commands DBMS has to collect statistics on tables/indexes For optimal performance Without stats, DBMS does stupid things… DB2 RUNSTATS ON TABLE <userid>.<table> AND INDEXES ALL Oracle ANALYZE TABLE <table> COMPUTE STATISTICS ANALYZE TABLE <table> ESTIMATE STATISTICS (cheaper than COMPUTE) Run the command after major update/index construction