Lecture 11: B+ Trees and Query Execution

Slides:



Advertisements
Similar presentations
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Advertisements

Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
1 Lecture 23: Query Execution Friday, March 4, 2005.
Lecture 13: Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data.
CS CS4432: Database Systems II Operator Algorithms Chapter 15.
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
Nested-Loop joins “one-and-a-half” pass method, since one relation will be read just once. Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in.
Lecture 24: Query Execution Monday, November 20, 2000.
1 Lecture 18: Indexes Monday, November 10, Midterm Problem 1a: select student.sname, avg(takes.grade) from student, takes where student.sid =
1 Lecture 22: Query Execution Wednesday, March 2, 2005.
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
1 Lecture 19: B-trees and Hash Tables Wednesday, November 12, 2003.
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
DBMS Internals: Storage February 27th, Representing Data Elements Relational database elements: A tuple is represented as a record CREATE TABLE.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CSE 444: Lecture 24 Query Execution Monday, March 7, 2005.
Storage and Indexing February 26 th, 2003 Lecture 19.
CS4432: Database Systems II Query Processing- Part 3 1.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
Lecture 24 Query Execution Monday, November 28, 2005.
Multi pass algorithms. Nested-Loop joins Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to.
CS4432: Database Systems II Query Processing- Part 2.
CSCE Database Systems Chapter 15: Query Execution 1.
Lecture 17: Query Execution Tuesday, February 28, 2001.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Storage and Indexing. How do we store efficiently large amounts of data? The appropriate storage depends on what kind of accesses we expect to have to.
CS 540 Database Management Systems
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Query Processing Spring 2016.
1 Lecture 23: Query Execution Monday, November 26, 2001.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2007.
CS 540 Database Management Systems
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Management System
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
15.5 Two-Pass Algorithms Based on Hashing
Cse 344 April 25th – Disk i/o.
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
February 16th – Disk i/o and estimation
Query Execution Two-pass Algorithms based on Hashing
Lecture 21: Indexes Monday, November 13, 2000.
Lecture 19: Data Storage and Indexes
Chapters 15 and 16b: Query Optimization
Lecture 21: B-Trees Monday, Nov. 19, 2001.
Lecture 2- Query Processing (continued)
Lecture 28: Index 3 B+ Trees
CSE 544: Lecture 11 Storing Data, Indexes
Implementation of Relational Operations
Lecture 24: Query Execution
Lecture 13: Query Execution
Storage and Indexing.
Query Execution Index Based Algorithms (15.6)
Lecture 23: Query Execution
Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.
General External Merge Sort
Overview of Query Evaluation: JOINS
Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes May 16, 2008.
Lecture 22: Query Execution
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
CSE 444: Lecture 25 Query Execution
Lecture 22: Query Execution
Indexing February 28th, 2003 Lecture 20.
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
Lecture 20: Indexes Monday, February 27, 2006.
CS4433 Database Systems Indexing.
CSE 544: Query Execution Wednesday, 5/12/2004.
Lecture 22: Friday, November 22, 2002.
Lecture 24: Query Execution
Lecture 20: Query Execution
Presentation transcript:

Lecture 11: B+ Trees and Query Execution Monday, May 06, 2002

B+ Trees Search trees Idea in B Trees: Idea in B+ Trees: make 1 node = 1 block Idea in B+ Trees: Make leaves into a linked list (range queries are easier)

B+ Trees Basics Parameter d = the degree Each node has >= d and <= 2d keys (except root) Each leaf has >=d and <= 2d keys: 30 120 240 Keys k < 30 Keys 30<=k<120 Keys 120<=k<240 Keys 240<=k 40 50 60 Next leaf 40 50 60

B+ Tree Example d = 2 80 20 60 100 120 140 10 15 18 20 30 40 50 60 65 80 85 90 10 15 18 20 30 40 50 60 65 80 85 90

B+ Tree Design How large d ? Example: 2d x 4 + (2d+1) x 8 <= 4096 Key size = 4 bytes Pointer size = 8 bytes Block size = 4096 byes 2d x 4 + (2d+1) x 8 <= 4096 d = 170

Searching a B+ Tree Exact key values: Range queries: Start at the root Proceed down, to the leaf Range queries: As above Then sequential traversal Select name From people Where age = 25 Select name From people Where 20 <= age and age <= 30

B+ Trees in Practice Typical order: 100. Typical fill-factor: 67%. average fanout = 133 Typical capacities: Height 4: 1334 = 312,900,700 records Height 3: 1333 = 2,352,637 records Can often hold top levels in buffer pool: Level 1 = 1 page = 8 Kbytes Level 2 = 133 pages = 1 Mbyte Level 3 = 17,689 pages = 133 Mbytes

Insertion in a B+ Tree Insert (K, P) Find leaf where K belongs, insert If no overflow (2d keys or less), halt If overflow (2d+1 keys), split node, insert in parent: If leaf, keep K3 too in right node When root splits, new root has 1 key only (K3, ) to parent K1 K2 K3 K4 K5 P0 P1 P2 P3 P4 p5 K1 K2 P0 P1 P2 K4 K5 P3 P4 p5

Insertion in a B+ Tree Insert K=19 80 20 60 100 120 140 10 15 18 20 30 50 60 65 80 85 90 10 15 18 20 30 40 50 60 65 80 85 90

Insertion in a B+ Tree After insertion 80 20 60 100 120 140 10 15 18 19 20 30 40 50 60 65 80 85 90 10 15 18 19 20 30 40 50 60 65 80 85 90

Insertion in a B+ Tree Now insert 25 80 20 60 100 120 140 10 15 18 19 30 40 50 60 65 80 85 90 10 15 18 19 20 30 40 50 60 65 80 85 90

Insertion in a B+ Tree After insertion 80 20 60 100 120 140 10 15 18 19 20 25 30 40 50 60 65 80 85 90 10 15 18 19 20 25 30 40 50 60 65 80 85 90

Insertion in a B+ Tree But now have to split ! 80 20 60 100 120 140 10 15 18 19 20 25 30 40 50 60 65 80 85 90 10 15 18 19 20 25 30 40 50 60 65 80 85 90

Insertion in a B+ Tree After the split 80 20 30 60 100 120 140 10 15 18 19 20 25 30 40 50 60 65 80 85 90 10 15 18 19 20 25 30 40 50 60 65 80 85 90

Deletion from a B+ Tree Delete 30 80 20 30 60 100 120 140 10 15 18 19 25 30 40 50 60 65 80 85 90 10 15 18 19 20 25 30 40 50 60 65 80 85 90

Deletion from a B+ Tree After deleting 30 May change to 40, or not 80 20 30 60 100 120 140 10 15 18 19 20 25 40 50 60 65 80 85 90 10 15 18 19 20 25 40 50 60 65 80 85 90

Deletion from a B+ Tree Now delete 25 80 20 30 60 100 120 140 10 15 18 19 20 25 40 50 60 65 80 85 90 10 15 18 19 20 25 40 50 60 65 80 85 90

Deletion from a B+ Tree After deleting 25 Need to rebalance Rotate 80 20 30 60 100 120 140 10 15 18 19 20 40 50 60 65 80 85 90 10 15 18 19 20 40 50 60 65 80 85 90

Deletion from a B+ Tree Now delete 40 80 19 30 60 100 120 140 10 15 18 50 60 65 80 85 90 10 15 18 19 20 40 50 60 65 80 85 90

Deletion from a B+ Tree After deleting 40 Rotation not possible Need to merge nodes 80 19 30 60 100 120 140 10 15 18 19 20 50 60 65 80 85 90 10 15 18 19 20 50 60 65 80 85 90

Deletion from a B+ Tree Final tree 80 19 60 100 120 140 10 15 18 19 20 50 60 65 80 85 90 10 15 18 19 20 50 60 65 80 85 90

Cost Parameters B(R) = number of blocks for relation R T(R) = number of tuples in relation R V(R, a) = number of distinct values of attribute a

Cost Cost of an operation = number of disk I/Os needed to: read the operands write any intermediate results compute the result Cost of writing the result to disk is not included Question: the cost of sorting a table with B blocks ? Answer:

Scanning Tables The table is clustered (i.e. blocks consists only of records from this table): Table-scan: if we know where the blocks are Index scan: if we have a sparse index to find the blocks The table is unclustered (e.g. its records are placed on blocks with other tables) May need one read for each record

Sorting While Scanning Sometimes it is useful to have the output sorted Three ways to scan it sorted: If there is a primary or secondary index on it, use it during scan If it fits in memory, sort there If not, use multi-way merge sort

Cost of the Scan Operator Clustered relation: Table scan: B(R); to sort: 3B(R) Index scan: B(R); to sort: B(R) or 3B(R) Unclustered relation T(R); to sort: T(R) + 2B(R)

One-Pass Algorithms Selection s(R), projection P(R) Both are tuple-at-a-time algorithms Cost: B(R) Unary operator Input buffer Output buffer

One-pass Algorithms Duplicate elimination d(R) Need to keep tuples in memory When new tuple arrives, need to compare it with previously seen tuples Balanced search tree, or hash table Cost: B(R) Assumption: B(d(R)) <= M

One-pass Algorithms Grouping: gcity, sum(price) (R) Need to store all cities in memory Also store the sum(price) for each city Balanced search tree or hash table Cost: B(R) Assumption: number of cities fits in memory

One-pass Algorithms Binary operations: R ∩ S, R U S, R – S Assumption: min(B(R), B(S)) <= M Scan one table first, then the next, eliminate duplicates Cost: B(R)+B(S)

Nested Loop Joins Tuple-based nested loop R S for each tuple r in R do Cost: T(R) T(S), sometimes T(R) B(S) for each tuple r in R do for each tuple s in S do if r and s join then output (r,s)

Nested Loop Joins Block-based Nested Loop Join for each (M-1) blocks bs of S do for each block br of R do for each tuple s in bs for each tuple r in br do if r and s join then output(r,s)

Hash table for block of S Nested Loop Joins R & S Join Result Hash table for block of S (k < B-1 pages) . . . . . . . . . Input buffer for R Output buffer Question: suppose B(R1) = 1000, B(R2) = 2, M = 3. What is the best way to do a nested loop join ? Its cost ?

Nested Loop Joins Block-based Nested Loop Join Cost: Read S once: cost B(S) Outer loop runs B(S)/(M-1) times, and each time need to read R: costs B(S)B(R)/(M-1) Total cost: B(S) + B(S)B(R)/(M-1) Notice: it is better to iterate over the smaller relation first R S: R=outer relation, S=inner relation

Two-Pass Algorithms Based on Sorting Recall: multi-way merge sort needs only two passes ! Assumption: B(R) <= M2 Cost for sorting: 3B(R)

Two-Pass Algorithms Based on Sorting Duplicate elimination d(R) Trivial idea: sort first, then eliminate duplicates Step 1: sort chunks of size M, write cost 2B(R) Step 2: merge M-1 runs, but include each tuple only once cost B(R) Total cost: 3B(R), Assumption: B(R) <= M2

Two-Pass Algorithms Based on Sorting Grouping: gcity, sum(price) (R) Same as before: sort, then compute the sum(price) for each group As before: compute sum(price) during the merge phase. Total cost: 3B(R) Assumption: B(R) <= M2

Two-Pass Algorithms Based on Sorting Binary operations: R ∩ S, R U S, R – S Idea: sort R, sort S, then do the right thing A closer look: Step 1: split R into runs of size M, then split S into runs of size M. Cost: 2B(R) + 2B(S) Step 2: read one block of each sub-list and merge (semantics depends on operation) Total cost: 3B(R)+3B(S) Assumption: B(R)+B(S)<= M2

Two-Pass Algorithms Based on Sorting Join R S Start by sorting both R and S on the join attribute: Cost: 4B(R)+4B(S) (because need to write to disk) Read both relations in sorted order, match tuples Cost: B(R)+B(S) Difficulty: many tuples in R may match many in S If at least one set of tuples fits in M, we are OK Otherwise need nested loop, higher cost Total cost: 5B(R)+5B(S) Assumption: B(R) <= M2, B(S) <= M2

Two-Pass Algorithms Based on Sorting Join R S If the number of tuples in R matching those in S is small (or vice versa) we can compute the join during the merge phase Total cost: 3B(R)+3B(S) Assumption: B(R) + B(S) <= M2

Two Pass Algorithms Based on Hashing Idea: partition a relation R into buckets, on disk Each bucket has size approx. B(R)/M Does each bucket fit in main memory ? Yes if B(R)/M <= M, i.e. B(R) <= M2 M main memory buffers Disk Relation R OUTPUT 2 INPUT 1 hash function h M-1 Partitions . . . 1 2 B(R)

Hash Based Algorithms for d Recall: d(R) = duplicate elimination Step 1. Partition R into buckets Step 2. Apply d to each bucket (may read in main memory) Cost: 3B(R) Assumption:B(R) <= M2

Hash Based Algorithms for g Recall: g(R) = grouping and aggregation Step 1. Partition R into buckets Step 2. Apply g to each bucket (may read in main memory) Cost: 3B(R) Assumption:B(R) <= M2

Hash-based Join R S Recall the main memory hash-based join: Scan S, build buckets in main memory Then scan R and join

Partitioned Hash Join R S Step 1: Step 2 Step 3 Hash S into M buckets send all buckets to disk Step 2 Hash R into M buckets Send all buckets to disk Step 3 Join every pair of buckets

Hash table for partition Hash-Join B main memory buffers Disk Original Relation OUTPUT 2 INPUT 1 hash function h M-1 Partitions . . . Partition both relations using hash fn h: R tuples in partition i will only match S tuples in partition i. Partitions of R & S Input buffer for Ri Hash table for partition Si ( < M-1 pages) B main memory buffers Disk Output buffer Join Result hash fn h2 Read in a partition of R, hash it using h2 (<> h!). Scan matching partition of S, search for matches. 14

Partitioned Hash Join Cost: 3B(R) + 3B(S) Assumption: min(B(R), B(S)) <= M2

Hybrid Hash Join Algorithm Partition S into k buckets But keep first bucket S1 in memory, k-1 buckets to disk Partition R into k buckets First bucket R1 is joined immediately with S1 Other k-1 buckets go to disk Finally, join k-1 pairs of buckets: (R2,S2), (R3,S3), …, (Rk,Sk)

Hybrid Join Algorithm How big should we choose k ? Average bucket size for S is B(S)/k Need to fit B(S)/k + (k-1) blocks in memory B(S)/k + (k-1) <= M k slightly smaller than B(S)/M

Hybrid Join Algorithm How many I/Os ? Recall: cost of partitioned hash join: 3B(R) + 3B(S) Now we save 2 disk operations for one bucket Recall there are k buckets Hence we save 2/k(B(R) + B(S)) Cost: (3-2/k)(B(R) + B(S)) = (3-2M/B(S))(B(R) + B(S))

Indexed Based Algorithms Recall that in a clustered index all tuples with the same value of the key are clustered on as few blocks as possible a a a a a a a a a a

Index Based Selection Selection on equality: sa=v(R) Clustered index on a: cost B(R)/V(R,a) Unclustered index on a: cost T(R)/V(R,a)

Index Based Selection Example: B(R) = 2000, T(R) = 100,000, V(R, a) = 20, compute the cost of sa=v(R) Cost of table scan: If R is clustered: B(R) = 2000 I/Os If R is unclustered: T(R) = 100,000 I/Os Cost of index based selection: If index is clustered: B(R)/V(R,a) = 100 If index is unclustered: T(R)/V(R,a) = 5000 Notice: when V(R,a) is small, then unclustered index is useless

Index Based Join R S Assume S has an index on the join attribute Iterate over R, for each tuple fetch corresponding tuple(s) from S Assume R is clustered. Cost: If index is clustered: B(R) + T(R)B(S)/V(S,a) If index is unclustered: B(R) + T(R)T(S)/V(S,a)

Index Based Join Assume both R and S have a sorted index (B+ tree) on the join attribute Then perform a merge join (called zig-zag join) Cost: B(R) + B(S)