Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.

Similar presentations


Presentation on theme: "1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu."— Presentation transcript:

1 1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu

2 2 Insertion in a B-Tree 49 n = 2 15 36 Insert: 62

3 3 Insertion in a B-Tree 49 n = 2 15 36 Insert: 62 62

4 4 Insertion in a B-Tree 49 n = 2 15 3662 Insert: 50

5 5 Insertion in a B-Tree 49 n = 2 15 3650 Insert: 50 62

6 6 Insertion in a B-Tree 49 n = 2 15 3650 Insert: 75 62

7 7 Insertion in a B-Tree 49 n = 2 15 3650 Insert: 75 62 75

8 8 Insertion

9 9 Insertion

10 10 Insertion

11 11 Insertion

12 12 Insertion

13 13 Insertion

14 14 Insertion

15 15 Insertion

16 16 Insertion

17 17 Insertion

18 18 Insertion

19 19 Insertion: Primitives Inserting into a leaf node Inserting into a leaf node Splitting a leaf node Splitting a leaf node Splitting an internal node Splitting an internal node Splitting root node Splitting root node

20 20 Inserting into a Leaf Node 54576062 58

21 21 Inserting into a Leaf Node 54576062 58

22 22 Inserting into a Leaf Node 54576062 58

23 23 61 5457606258 5466 Splitting a Leaf Node

24 24 61 5457606258 5466 Splitting a Leaf Node

25 25 61 5457616258 5466 60 Splitting a Leaf Node

26 26 61 5457616258 5466 60 59 Splitting a Leaf Node

27 27 61 5457616258 5466 60 59 Splitting a Leaf Node

28 59 546640 [ 59, 66)[54, 59) 7484 9921 … … [66,74) Splitting an Internal Node

29 59 5466407484 9921 … … [ 59, 66)[54, 59)[66,74) Splitting an Internal Node

30 5954 66 407484 9921 … … [66, 99) [ 59, 66)[54, 59) [21,66) [66,74) Splitting an Internal Node

31 5466407484 59 [ 59, 66)[54, 59)[66,74) Splitting the Root

32 5466407484 59 [ 59, 66)[54, 59)[66,74) Splitting the Root

33 54 66 40748459 [ 59, 66)[54, 59)[66,74) Splitting the Root

34 34 Deletion

35 35 Deletion redistribute

36 36 Deletion

37 37 Deletion - II

38 merge

39 39 Deletion - II

40 40 Deletion - II

41 41 Deletion - II

42 42 Deletion - II merge Not needed

43 43 Deletion - II

44 44 Deletion: Primitives Delete key from a leaf Delete key from a leaf Redistribute keys between sibling leaves Redistribute keys between sibling leaves Merge a leaf into its sibling Merge a leaf into its sibling Redistribute keys between two sibling internal nodes Redistribute keys between two sibling internal nodes Merge an internal node into its sibling Merge an internal node into its sibling

45 45 Merge Leaf into Sibling 545864687275 67 85…72

46 46 Merge Leaf into Sibling 5458646875 67 …7285

47 47 Merge Leaf into Sibling 5458646875 67 …7285

48 48 Merge Leaf into Sibling 5458646875 …72 85

49 49 Merge Internal Node into Sibling 41 4852 6374 59 [52, 59) [59,63) … …

50 50 Merge Internal Node into Sibling 41 485263 59 [52, 59) [59,63) 59 … …

51 51 B-Tree Roadmap B-Tree B-Tree Recap Recap Insertion (recap) Insertion (recap) Deletion Deletion Construction Construction Efficiency Efficiency B-Tree variants B-Tree variants Hash-based Indexes Hash-based Indexes

52 52 Question How does insertion-based construction perform?

53 53 B-Tree Construction 111315213441485762758197 Sort

54 B-Tree Construction 759721415715111348346281 Scan 758197 111315213441 4857 62

55 B-Tree Construction 214875 111315213441 4857 62758197 Scan

56 56 B-Tree Construction Why is sort-based construction better than insertion-based one?

57 57 Cost of B-Tree Operations Height of B-Tree: H Height of B-Tree: H Assume no duplicates Assume no duplicates Question: what is the random I/O cost of: Question: what is the random I/O cost of: Insertion: Insertion: Deletion: Deletion: Equality search: Equality search: Range Search: Range Search:

58 58 Height of B-Tree Number of keys: N Number of keys: N B-Tree parameter: n B-Tree parameter: n Height ≈ log N = n log N log n In practice: 2-3 levels

59 59 Question: How do you pick parameter n? 1. Ignore inserts and deletes 2. Optimize for equality searches 3. Assume no duplicates

60 60 Roadmap B-Tree B-Tree B-Tree variants B-Tree variants Sparse Index Sparse Index Duplicate Keys Duplicate Keys Hash-based Indexes Hash-based Indexes

61 61 Roadmap B-Tree B-Tree B-Tree variants B-Tree variants Hash-based Indexes Hash-based Indexes Static Hash Table Static Hash Table Extensible Hash Table Extensible Hash Table Linear Hash Table Linear Hash Table

62 62 Hash-Based Indexes Adaptations of main memory hash tables Adaptations of main memory hash tables Support equality searches Support equality searches No range searches No range searches

63 Indexing Problem (recap) a 1 2 a i a n a A = val Index Keys record pointers

64 64 Main Memory Hash Table buckets 32 (null) 10 48 2775 21 55 0 3 1 2 4 5 6 7 key h (key) h (key) = key % 8

65 65 Adapting to disk 1 Hash Bucket = 1 Block 1 Hash Bucket = 1 Block All keys that hash to bucket stored in the block All keys that hash to bucket stored in the block Intuition: keys in a bucket usually accessed together Intuition: keys in a bucket usually accessed together No need for linked lists of keys … No need for linked lists of keys …

66 66 Adapting to Disk How do we handle this?

67 67 Adapting to disk 1 Hash Bucket = 1 Block 1 Hash Bucket = 1 Block All keys that hash to bucket stored in the block All keys that hash to bucket stored in the block Intuition: keys in a bucket usually accessed together Intuition: keys in a bucket usually accessed together No need for linked lists of keys … No need for linked lists of keys … … but need linked list of blocks (overflow blocks) … but need linked list of blocks (overflow blocks)

68 68 Adapting to Disk

69 69 Adapting to Disk 0 1 2 Is there any other issue? Map ‘bucket id’ to disk location

70 70 Adapting to disk 1 Hash Bucket = 1 Block 1 Hash Bucket = 1 Block Bucket Id  Disk Address mapping Bucket Id  Disk Address mapping Contiguous blocks Contiguous blocks Store mapping in main memory Store mapping in main memory Too large? Too large?

71 71 Beware of claims that assume 1 I/O for hash tables and 3 I/Os for B-Tree!!

72 72 Adapting to disk 1 Hash Bucket = 1 Block (or more than one contiguous blocks) 1 Hash Bucket = 1 Block (or more than one contiguous blocks) Bucket Id  Disk Address mapping Bucket Id  Disk Address mapping Number of buckets Number of buckets ≈ Number of keys (main memory version) ≈ Number of keys (main memory version) ≈ Number of blocks (disk version) ≈ Number of blocks (disk version) Textbook: Static Hash Table

73 73 Assigned Reading Insertion and Deletion on Static Hash Table Section 13.4

74 74 Roadmap B-Tree B-Tree B-Tree variants B-Tree variants Hash-based Indexes Hash-based Indexes Static Hash Table Static Hash Table Extensible Hash Table Extensible Hash Table Linear Hash Table Linear Hash Table

75 75 Dynamic Hash Indexes Static Hash Table: Static Hash Table: Fixed number of buckets Fixed number of buckets Waste space / inefficient Waste space / inefficient Dynamic Hash Tables: Dynamic Hash Tables: Number of buckets can increase / decrease dynamically Number of buckets can increase / decrease dynamically

76 76 Extensible Hash Table: Main Ideas (Abstract) Hash Function: {Keys}  {Large space of hash values} Hash Function: {Keys}  {Large space of hash values} Buckets dynamically partition space of hash values Buckets dynamically partition space of hash values Insertions: partitioning grows finer Insertions: partitioning grows finer i.e., more buckets i.e., more buckets Deletions: partitioning grows coarser Deletions: partitioning grows coarser i.e., fewer buckets i.e., fewer buckets

77 77 Extensible Hash Table: Main Ideas (concrete) Hash Function: {Keys}  bit string of length b 0 1 1 1 0 1 0 0 Example: Bucket: prefix of bit string All (keys with) hash values having that prefix fall into that bucket

78 11 0 10 01011010 01100110 10110001 10011010 11011110 prefixes Hash Value  bucket?

79 11 0 10 01011010 01100110 10110001 10011010 11011110 00 01 10 11 i = 2 i = max length of prefix

80 80 i = 0. Insertion

81 81 i = 0. 10110001 Insertion

82 82 i = 0. 10110001 Insertion

83 83 i = 0. 10110001 00110101 Insertion

84 84 i = 0. 10110001 00110101 11010010 Insertion

85 85 i = 0 0 10110001 00110101 11010010 1 Insertion

86 86 i = 0 0 10110001 00110101 11010010 1 Insertion

87 87 i = 1 0 10110001 00110101 11010010 1 0 1 Insertion

88 88 i = 1 0 10110001 00110101 11010010 1 0 1 Insertion

89 89 i = 1 0 10110001 00110101 11010010 1 0 1 11001101 Insertion

90 90 i = 1 0 10110001 00110101 11010010 1 0 1 11001101 Insertion

91 91 i = 1 0 10110001 00110101 11010010 10 0 1 11001101 11 Insertion

92 92 i = 1 0 10110001 00110101 11010010 10 0 1 11001101 11 Insertion

93 93 i = 2 0 10110001 00110101 11010010 10 00 11001101 11 01 10 11 Insertion

94 94 i = 2 0 10110001 00110101 11010010 10 00 11001101 11 01 10 11 11001101 Insertion

95 95 Deletion Inverse of insertion: work out details

96 96 i = 2 1 00 01 10 11 Textbook Notation Number of bits in prefix 0

97 97 Extensible Hash Table Directory doubles in size during some inserts One Issue:

98 98 Roadmap B-Tree B-Tree B-Tree variants B-Tree variants Hash-based Indexes Hash-based Indexes Static Hash Table Static Hash Table Extensible Hash Table Extensible Hash Table Linear Hash Table Linear Hash Table

99 99 Linear Hash Table Differences from Extensible Hash Table: Differences from Extensible Hash Table: Bucket: suffix of the hash value Bucket: suffix of the hash value Grows linearly (avoids doubling of directory) Grows linearly (avoids doubling of directory)

100 10 00 1 01011000 01100100 10110001 10011001 11011110 suffixes Linear Hash Table

101 101 0 1 Linear Growth

102 102 00 1 10 redistribute Linear Growth

103 00 01 10 11 redistribute Linear Growth

104 104 What does linear growth buy? 000 01 10 11 100 i = 3 101 000 001 010 011 100 110 111 Redundant if we know # buckets = 5

105 105 What does linear growth buy? 000 01 10 11 100 i = 3 000 001 010 011 100 i = 3 n = 3


Download ppt "1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu."

Similar presentations


Ads by Google