Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Lecture 22: Query Execution Wednesday, March 2, 2005.

Similar presentations


Presentation on theme: "1 Lecture 22: Query Execution Wednesday, March 2, 2005."— Presentation transcript:

1 1 Lecture 22: Query Execution Wednesday, March 2, 2005

2 2 Outline Query execution: 15.1 – 15.5

3 3 Architecture of a Database Engine Parse Query Select Logical Plan Select Physical Plan Query Execution SQL query Query optimization Logical plan Physical plan

4 4 Logical Algebra Operators Union, intersection, difference Selection  Projection  Join |x| Duplicate elimination  Grouping  Sorting 

5 5 Logical Query Plan SELECT city, count(*) FROM sales GROUP BY city HAVING sum(price) > 100 SELECT city, count(*) FROM sales GROUP BY city HAVING sum(price) > 100 sales(product, city, price)  city, sum(price)→p, count(*) → c  p > 100  city, c T1(city,p,c) T2(city,p,c) T3(city, c) T1, T2, T3 = temporary tables

6 6 Logical Query Plan SELECT S.buyer FROM Purchase P, Person Q WHERE P.buyer=Q.name AND Q.city=‘seattle’ AND Q.phone > ‘5430000’ SELECT S.buyer FROM Purchase P, Person Q WHERE P.buyer=Q.name AND Q.city=‘seattle’ AND Q.phone > ‘5430000’ Purchase Person Buyer=name City=‘seattle’ phone>’5430000’ buyer 

7 7 Physical Query Plan SELECT S.buyer FROM Purchase P, Person Q WHERE P.buyer=Q.name AND Q.city=‘seattle’ AND Q.phone > ‘5430000’ SELECT S.buyer FROM Purchase P, Person Q WHERE P.buyer=Q.name AND Q.city=‘seattle’ AND Q.phone > ‘5430000’ Query Plan: logical tree implementation choice at every node scheduling of operations. Purchase Person Buyer=name City=‘seattle’ phone>’5430000’ buyer (Simple Nested Loops)  (Table scan)(Index scan) Some operators are from relational algebra, and others (e.g., scan) are not.

8 8 Question in Class Logical operator: Product(pname, cname) |  | Company(cname, city) Propose three physical operators for the join, assuming the tables are in main memory: 1. 2. 3.

9 9 Question in Class Product(pname, cname) |x| Company(cname, city) 1000000 products 1000 companies How much time do the following physical operators take if the data is in main memory ? Nested loop jointime = Sort and merge = merge-jointime = Hash jointime =

10 10 Cost Parameters In database systems the data is on disks, not in main memory The cost of an operation = total number of I/Os Cost parameters: B(R) = number of blocks for relation R T(R) = number of tuples in relation R V(R, a) = number of distinct values of attribute a

11 11 Cost Parameters Clustered table R: –Blocks consists only of records from this table –B(R)  T(R) / blockSize Unclustered table R: –Its records are placed on blocks with other tables –When R is unclustered: B(R)  T(R) When a is a key, V(R,a) = T(R) When a is not a key, V(R,a)

12 12 Cost Cost of an operation = number of disk I/Os needed to: –read the operands –compute the result Cost of writing the result to disk is not included on the following slides

13 13 Scanning a Table Clustered relation: –Table scan: Result may be unsorted: B(R) Result needs to be sorted: 3B(R) –Index scan Unsorted: B(R) Sorted: B(R) or 3B(R) Unclustered relation –Unsorted: T(R) –Sorted: T(R) + 2B(R)

14 14 One-Pass Algorithms Selection  (R), projection  (R) Both are tuple-at-a-time algorithms Cost: B(R) Input bufferOutput buffer Unary operator

15 15 One-pass Algorithms Hash join: R |x| S Scan S, build buckets in main memory Then scan R and join Cost: B(R) + B(S) Assumption: B(S) <= M

16 16 One-pass Algorithms Duplicate elimination  (R) Need to keep tuples in memory When new tuple arrives, need to compare it with previously seen tuples Balanced search tree, or hash table Cost: B(R) Assumption: B(  (R)) <= M

17 17 Question in Class Grouping: Product(name, department, quantity)  department, sum(quantity) (Product)  Answer(department, sum) Question: how do you compute it in main memory ? Answer:

18 18 One-pass Algorithms Grouping:  department, sum(quantity) (R) Need to store all departments in memory Also store the sum(quantity) for each department Balanced search tree or hash table Cost: B(R) Assumption: number of departments fits in memory

19 19 One-pass Algorithms Binary operations: R ∩ S, R ∪ S, R – S Assumption: min(B(R), B(S)) <= M Scan one table first, then the next, eliminate duplicates Cost: B(R)+B(S)

20 20 Question in Class H  emptyHashTable /* scan R */ For each x in R do insert(H, x ) /* scan S */ For each y in S do _____________________ /* collect result */ for each z in H do output(z) H  emptyHashTable /* scan R */ For each x in R do insert(H, x ) /* scan S */ For each y in S do _____________________ /* collect result */ for each z in H do output(z) What do we do in each of these cases: R ∩ S, R ∪ S, R – S

21 21 Nested Loop Joins Tuple-based nested loop R ⋈ S Cost: T(R) B(S) when S is clustered Cost: T(R) T(S) when S is unclustered for each tuple r in R do for each tuple s in S do if r and s join then output (r,s) for each tuple r in R do for each tuple s in S do if r and s join then output (r,s)

22 22 Nested Loop Joins We can be much more clever Question: how would you compute the join in the following cases ? What is the cost ? –B(R) = 1000, B(S) = 2, M = 4 –B(R) = 1000, B(S) = 3, M = 4 –B(R) = 1000, B(S) = 6, M = 4

23 23 Nested Loop Joins Block-based Nested Loop Join for each (M-2) blocks bs of S do for each block br of R do for each tuple s in bs for each tuple r in br do if “r and s join” then output(r,s) for each (M-2) blocks bs of S do for each block br of R do for each tuple s in bs for each tuple r in br do if “r and s join” then output(r,s)

24 24 Nested Loop Joins... R & S Hash table for block of S (M-2 pages) Input buffer for R Output buffer... Join Result

25 25 Nested Loop Joins Block-based Nested Loop Join Cost: –Read S once: cost B(S) –Outer loop runs B(S)/(M-2) times, and each time need to read R: costs B(S)B(R)/(M-2) –Total cost: B(S) + B(S)B(R)/(M-2) Notice: it is better to iterate over the smaller relation first R |x| S: R=outer relation, S=inner relation


Download ppt "1 Lecture 22: Query Execution Wednesday, March 2, 2005."

Similar presentations


Ads by Google