Yan Huang - CSCI5330 Database Implementation – Query Processing

Slides:



Advertisements
Similar presentations
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Advertisements

CS CS4432: Database Systems II Logical Plan Rewriting.
CS CS4432: Database Systems II Operator Algorithms Chapter 15.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Query Processing.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
CS 4432query processing - lecture 131 CS4432: Database Systems II Lecture #13 Query Processing Professor Elke A. Rundensteiner.
CS 4432logical query rewriting - lecture 151 CS4432: Database Systems II Lecture #15 Logical Query Rewriting Professor Elke A. Rundensteiner.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
Dr. Kalpakis CMSC 461, Database Management Systems Query Processing.
Query Processing Chapter 12
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
©Silberschatz, Korth and Sudarshan7.1 Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join Operation Other Operations.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Chapter 13: Query Processing Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join Operation Other Operations.
CPS216: Data-Intensive Computing Systems Introduction to Query Processing Shivnath Babu.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
CPT-S Topics in Computer Science Big Data 1 1 Yinghui Wu EME 49.
Chapter 12 Query Processing. Query Processing n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation of Expressions 2.
Chapter 13: Query Processing
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Chapters 15-16a1 (Slides by Hector Garcia-Molina, Chapters 15 and 16: Query Processing.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
Computing & Information Sciences Kansas State University Monday, 03 Nov 2008CIS 560: Database System Concepts Lecture 27 of 42 Monday, 03 November 2008.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
Query Processing CS 405G Introduction to Database Systems.
Lecture 3 - Query Processing (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
13.1 Chapter 13: Query Processing n Overview n Measures of Query Cost n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation.
Chapter 13: Query Processing. Overview Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
1 Ullman et al. : Database System Principles Notes 6: Query Processing.
1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Query Optimization.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
Chapter 4: Query Processing
Database Management System
Chapter 13: Query Processing
Chapter 12: Query Processing
Query Processing.
CS 245: Database System Principles
Chapter 13: Query Processing
File Processing : Query Processing
Yan Huang - CSCI5330 Database Implementation – Access Methods
Query Processing.
Chapter 13: Query Processing
Chapter 13: Query Processing
Focus: Relational System
Chapter 12: Query Processing
Chapter 13: Query Processing
CS 245: Database System Principles
Module 13: Query Processing
Chapters 15 and 16b: Query Optimization
Chapter 13: Query Processing
Chapter 12: Query Processing
Lecture 2- Query Processing (continued)
Chapter 13: Query Processing
Chapter 13: Query Processing
Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Chapter 13: Query Processing
Chapter 13: Query Processing
Chapter 13: Query Processing
CPS216: Data-Intensive Computing Systems Query Processing (contd.)
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
Chapter 13: Query Processing
Chapter 12: Query Processing
Presentation transcript:

Yan Huang - CSCI5330 Database Implementation – Query Processing 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Yan Huang - CSCI5330 Database Implementation – Query Processing Exercise Compute depositor customer, with depositor as the outer relation. Customer has a secondary B+-tree index on customer-name Blocking factor 20 keys #customer = 400b/10,000t #depositor =100b/5,000t Merge join log(400) + log(100) + 400 + 100 = 516 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Yan Huang - CSCI5330 Database Implementation – Query Processing Hybrid Merge-join If one relation is sorted, and the other has a secondary B+-tree index on the join attribute Merge the sorted relation with the leaf entries of the B+-tree . Sort the result on the addresses of the unsorted relation’s tuples Scan the unsorted relation in physical address order and merge with previous result, to replace addresses by the actual tuples Sequential scan more efficient than random lookup 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Yan Huang - CSCI5330 Database Implementation – Query Processing Hybrid Merge-join r … s … … 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Yan Huang - CSCI5330 Database Implementation – Query Processing Hash join Hash function h, range 0  k Buckets for R1: G0, G1, ... Gk Buckets for R2: H0, H1, ... Hk Algorithm (1) Hash R1 tuples into G buckets (2) Hash R2 tuples into H buckets (3) For i = 0 to k do match tuples in Gi, Hi buckets 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Simple example hash: even/odd R1 R2 Buckets 2 5 Even 4 4 R1 R2 3 12 Odd: 5 3 8 13 9 8 11 14 2 4 8 4 12 8 14 3 5 9 5 3 13 11 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Yan Huang - CSCI5330 Database Implementation – Query Processing Example Hash Join R1, R2 contiguous (un-ordered)  Use 100 buckets  Read R1, hash, + write buckets R1  100 ... ... 10 blocks 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

 Then repeat for all buckets -> Same for R2 -> Read one R1 bucket; build memory hash table -> Read corresponding R2 bucket + hash probe R1 Relation R1 is called the build input and R2 is called the probe input. R2 ... R1 ... memory  Then repeat for all buckets 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Yan Huang - CSCI5330 Database Implementation – Query Processing Cost: “Bucketize:” Read R1 + write Read R2 + write Join: Read R1, R2 Total cost = 3 x [ bR1+bR2] Note: this is an approximation since buckets will vary in size and we have to round up to blocks 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Minimum memory requirements: Size of R1 bucket =(x/k) k = number of memory buffers x = number of R1 blocks So... (x/k) < k k > x Which relation should be the build relation? 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Example of Cost of Hash-Join customer depositor Assume that memory size is 20 blocks bdepositor= 100 and bcustomer = 400. depositor is to be used as build input. Partition it into five partitions, each of size 20 blocks. This partitioning can be done in one pass. Similarly, partition customer into five partitions,each of size 80. This is also done in one pass. Therefore total cost: 3(100 + 400) = 1500 block transfers ignores cost of writing partially filled blocks 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Yan Huang - CSCI5330 Database Implementation – Query Processing Complex Joins Join with a conjunctive condition: r 1  2...   n s Either use nested loops/block nested loops, or Compute the result of one of the simpler joins r i s final result comprises those tuples in the intermediate result that satisfy the remaining conditions 1  . . .  i –1  i +1  . . .  n Join with a disjunctive condition r 1  2 ...  n s Compute as the union of the records in individual joins r  i s: (r 1 s)  (r 2 s)  . . .  (r n s) 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Yan Huang - CSCI5330 Database Implementation – Query Processing Other Operations Duplicate elimination can be implemented via hashing or sorting. Projection is implemented by performing projection on each tuple followed by duplicate elimination. 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Other Operations : Aggregation Sorting Hashing 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Other Operations : Set Operations Set operations (,  and ) can either use variant of merge-join after sorting or variant of hash-join. 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Evaluation of Expressions So far: we have seen algorithms for individual operations Alternatives for evaluating an entire expression tree Materialization: generate results of an expression whose inputs are relations or are already computed, materialize (store) it on disk. Repeat. Pipelining: pass on tuples to parent operations even as an operation is being executed 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Yan Huang - CSCI5330 Database Implementation – Query Processing Q  Query Plan Focus: Relational System Others? 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Yan Huang - CSCI5330 Database Implementation – Query Processing Example Select B,D From R,S Where R.A = “c”  S.E = 2  R.C=S.C 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Relational Algebra - can be used to describe plans... Ex: Plan I B,D sR.A=“c” S.E=2  R.C=S.C X R S OR: B,D [ sR.A=“c” S.E=2  R.C = S.C (RXS)] 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Yan Huang - CSCI5330 Database Implementation – Query Processing Another idea: B,D sR.A = “c” sS.E = 2 R S Plan II natural join 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Example: Estimate costs L.Q.P P1 P2 …. Pn C1 C2 …. Cn Pick best! 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Relational algebra optimization Transformation rules (preserve equivalence) What are good transformations? 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Rules: Natural joins & cross products & union R S = S R (R S) T = R (S T) 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Yan Huang - CSCI5330 Database Implementation – Query Processing Note: Carry attribute names in results, so order is not important Can also write as trees, e.g.: T R R S S T 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Rules: Natural joins & cross products & union R S = S R (R S) T = R (S T) R x S = S x R (R x S) x T = R x (S x T) R U S = S U R R U (S U T) = (R U S) U T 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Yan Huang - CSCI5330 Database Implementation – Query Processing Rules: Selects sp1p2(R) = sp1vp2(R) = sp1 [ sp2 (R)] [ sp1 (R)] U [ sp2 (R)] 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Yan Huang - CSCI5330 Database Implementation – Query Processing Rules: Project Let: X = set of attributes Y = set of attributes XY = X U Y pxy (R) = px [py (R)] 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Yan Huang - CSCI5330 Database Implementation – Query Processing Rules: s + combined Let p = predicate with only R attribs q = predicate with only S attribs m = predicate with only R,S attribs sp (R S) = sq (R S) = [sp (R)] S R [sq (S)] 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Rules: s + combined (continued) Some Rules can be Derived: spq (R S) = spqm (R S) = spvq (R S) = 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Yan Huang - CSCI5330 Database Implementation – Query Processing spq (R S) = [sp (R)] [sq (S)] spqm (R S) = sm [(sp R) (sq S)] spvq (R S) = [(sp R) S] U [R (sq S)] 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Yan Huang - CSCI5330 Database Implementation – Query Processing Rules: p,s combined Let x = subset of R attributes z = attributes in predicate P (subset of R attributes) px[sp (R) ] = px pxz {sp [ px (R) ]} 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Yan Huang - CSCI5330 Database Implementation – Query Processing Rules: p, combined Let x = subset of R attributes y = subset of S attributes z = intersection of R,S attributes pxy (R S) = pxy{[pxz (R) ] [pyz (S) ]} 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Yan Huang - CSCI5330 Database Implementation – Query Processing pxy {sp (R S)} = pxy {sp [pxz’ (R) pyz’ (S)]} z’ = z U {attributes used in P } 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Rules for s, p combined with X similar... e.g., sp (R X S) = ? 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Yan Huang - CSCI5330 Database Implementation – Query Processing Rules s, U combined: sp(R U S) = sp(R) U sp(S) sp(R - S) = sp(R) - S = sp(R) - sp(S) 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Which are “good” transformations? sp1p2 (R)  sp1 [sp2 (R)] sp (R S)  [sp (R)] S R S  S R px [sp (R)]  px {sp [pxz (R)]} 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Conventional wisdom: do projects early Example: R(A,B,C,D,E) x={E} P: (A=3)  (B=“cat”) px {sp (R)} vs. pE {sp{pABE(R)}} 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

What if we have A, B indexes? But B = “cat” A=3 Intersect pointers to get pointers to matching tuples 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing

Yan Huang - CSCI5330 Database Implementation – Query Processing Bottom line: No transformation is always good Usually good: early selections 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Query Processing