1 CPS216: Data-intensive Computing Systems Operators for Data Access (contd.) Shivnath Babu
2 Insertion in a B-Tree 49 n = Insert: 62
3 Insertion in a B-Tree 49 n = Insert: 62 62
4 Insertion in a B-Tree 49 n = Insert: 50
5 Insertion in a B-Tree 49 n = Insert: 50 62
6 Insertion in a B-Tree 49 n = Insert: 75 62
7 Insertion in a B-Tree 49 n = Insert:
8 Insertion
9 Insertion
10 Insertion
11 Insertion
12 Insertion
13 Insertion
14 Insertion
15 Insertion
16 Insertion
17 Insertion
18 Insertion
19 Insertion: Primitives Inserting into a leaf node Inserting into a leaf node Splitting a leaf node Splitting a leaf node Splitting an internal node Splitting an internal node Splitting root node Splitting root node
20 Inserting into a Leaf Node
21 Inserting into a Leaf Node
22 Inserting into a Leaf Node
Splitting a Leaf Node
Splitting a Leaf Node
Splitting a Leaf Node
Splitting a Leaf Node
Splitting a Leaf Node
[ 59, 66)[54, 59) … … [66,74) Splitting an Internal Node
… … [ 59, 66)[54, 59)[66,74) Splitting an Internal Node
… … [66, 99) [ 59, 66)[54, 59) [21,66) [66,74) Splitting an Internal Node
[ 59, 66)[54, 59)[66,74) Splitting the Root
[ 59, 66)[54, 59)[66,74) Splitting the Root
[ 59, 66)[54, 59)[66,74) Splitting the Root
34 Deletion
35 Deletion redistribute
36 Deletion
37 Deletion - II
merge
39 Deletion - II
40 Deletion - II
41 Deletion - II
42 Deletion - II merge Not needed
43 Deletion - II
44 Deletion: Primitives Delete key from a leaf Delete key from a leaf Redistribute keys between sibling leaves Redistribute keys between sibling leaves Merge a leaf into its sibling Merge a leaf into its sibling Redistribute keys between two sibling internal nodes Redistribute keys between two sibling internal nodes Merge an internal node into its sibling Merge an internal node into its sibling
45 Merge Leaf into Sibling …72
46 Merge Leaf into Sibling …7285
47 Merge Leaf into Sibling …7285
48 Merge Leaf into Sibling …72 85
49 Merge Internal Node into Sibling [52, 59) [59,63) … …
50 Merge Internal Node into Sibling [52, 59) [59,63) 59 … …
51 B-Tree Roadmap B-Tree B-Tree Recap Recap Insertion (recap) Insertion (recap) Deletion Deletion Construction Construction Efficiency Efficiency B-Tree variants B-Tree variants Hash-based Indexes Hash-based Indexes
52 Question How does insertion-based construction perform?
53 B-Tree Construction Sort
B-Tree Construction Scan
B-Tree Construction Scan
56 B-Tree Construction Why is sort-based construction better than insertion-based one?
57 Cost of B-Tree Operations Height of B-Tree: H Height of B-Tree: H Assume no duplicates Assume no duplicates Question: what is the random I/O cost of: Question: what is the random I/O cost of: Insertion: Insertion: Deletion: Deletion: Equality search: Equality search: Range Search: Range Search:
58 Height of B-Tree Number of keys: N Number of keys: N B-Tree parameter: n B-Tree parameter: n Height ≈ log N = n log N log n In practice: 2-3 levels
59 Question: How do you pick parameter n? 1. Ignore inserts and deletes 2. Optimize for equality searches 3. Assume no duplicates
60 Roadmap B-Tree B-Tree B-Tree variants B-Tree variants Sparse Index Sparse Index Duplicate Keys Duplicate Keys Hash-based Indexes Hash-based Indexes
61 Roadmap B-Tree B-Tree B-Tree variants B-Tree variants Hash-based Indexes Hash-based Indexes Static Hash Table Static Hash Table Extensible Hash Table Extensible Hash Table Linear Hash Table Linear Hash Table
62 Hash-Based Indexes Adaptations of main memory hash tables Adaptations of main memory hash tables Support equality searches Support equality searches No range searches No range searches
Indexing Problem (recap) a 1 2 a i a n a A = val Index Keys record pointers
64 Main Memory Hash Table buckets 32 (null) key h (key) h (key) = key % 8
65 Adapting to disk 1 Hash Bucket = 1 Block 1 Hash Bucket = 1 Block All keys that hash to bucket stored in the block All keys that hash to bucket stored in the block Intuition: keys in a bucket usually accessed together Intuition: keys in a bucket usually accessed together No need for linked lists of keys … No need for linked lists of keys …
66 Adapting to Disk How do we handle this?
67 Adapting to disk 1 Hash Bucket = 1 Block 1 Hash Bucket = 1 Block All keys that hash to bucket stored in the block All keys that hash to bucket stored in the block Intuition: keys in a bucket usually accessed together Intuition: keys in a bucket usually accessed together No need for linked lists of keys … No need for linked lists of keys … … but need linked list of blocks (overflow blocks) … but need linked list of blocks (overflow blocks)
68 Adapting to Disk
69 Adapting to disk Bucket Id Disk Address mapping Bucket Id Disk Address mapping Contiguous blocks Contiguous blocks Store mapping in main memory Store mapping in main memory Too large? Too large? Dynamic Linear and Extensible hash tables Dynamic Linear and Extensible hash tables
70 Beware of claims that assume 1 I/O for hash tables and 3 I/Os for B-Tree!!