Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CPS216: Data-intensive Computing Systems Operators for Data Access (contd.) Shivnath Babu.

Similar presentations


Presentation on theme: "1 CPS216: Data-intensive Computing Systems Operators for Data Access (contd.) Shivnath Babu."— Presentation transcript:

1 1 CPS216: Data-intensive Computing Systems Operators for Data Access (contd.) Shivnath Babu

2 2 Insertion in a B-Tree 49 n = 2 15 36 Insert: 62

3 3 Insertion in a B-Tree 49 n = 2 15 36 Insert: 62 62

4 4 Insertion in a B-Tree 49 n = 2 15 3662 Insert: 50

5 5 Insertion in a B-Tree 49 n = 2 15 3650 Insert: 50 62

6 6 Insertion in a B-Tree 49 n = 2 15 3650 Insert: 75 62

7 7 Insertion in a B-Tree 49 n = 2 15 3650 Insert: 75 62 75

8 8 Insertion

9 9 Insertion

10 10 Insertion

11 11 Insertion

12 12 Insertion

13 13 Insertion

14 14 Insertion

15 15 Insertion

16 16 Insertion

17 17 Insertion

18 18 Insertion

19 19 Insertion: Primitives Inserting into a leaf node Inserting into a leaf node Splitting a leaf node Splitting a leaf node Splitting an internal node Splitting an internal node Splitting root node Splitting root node

20 20 Inserting into a Leaf Node 54576062 58

21 21 Inserting into a Leaf Node 54576062 58

22 22 Inserting into a Leaf Node 54576062 58

23 23 61 5457606258 5466 Splitting a Leaf Node

24 24 61 5457606258 5466 Splitting a Leaf Node

25 25 61 5457616258 5466 60 Splitting a Leaf Node

26 26 61 5457616258 5466 60 59 Splitting a Leaf Node

27 27 61 5457616258 5466 60 59 Splitting a Leaf Node

28 59 546640 [ 59, 66)[54, 59) 7484 9921 … … [66,74) Splitting an Internal Node

29 59 5466407484 9921 … … [ 59, 66)[54, 59)[66,74) Splitting an Internal Node

30 5954 66 407484 9921 … … [66, 99) [ 59, 66)[54, 59) [21,66) [66,74) Splitting an Internal Node

31 5466407484 59 [ 59, 66)[54, 59)[66,74) Splitting the Root

32 5466407484 59 [ 59, 66)[54, 59)[66,74) Splitting the Root

33 54 66 40748459 [ 59, 66)[54, 59)[66,74) Splitting the Root

34 34 Deletion

35 35 Deletion redistribute

36 36 Deletion

37 37 Deletion - II

38 merge

39 39 Deletion - II

40 40 Deletion - II

41 41 Deletion - II

42 42 Deletion - II merge Not needed

43 43 Deletion - II

44 44 Deletion: Primitives Delete key from a leaf Delete key from a leaf Redistribute keys between sibling leaves Redistribute keys between sibling leaves Merge a leaf into its sibling Merge a leaf into its sibling Redistribute keys between two sibling internal nodes Redistribute keys between two sibling internal nodes Merge an internal node into its sibling Merge an internal node into its sibling

45 45 Merge Leaf into Sibling 545864687275 67 85…72

46 46 Merge Leaf into Sibling 5458646875 67 …7285

47 47 Merge Leaf into Sibling 5458646875 67 …7285

48 48 Merge Leaf into Sibling 5458646875 …72 85

49 49 Merge Internal Node into Sibling 41 4852 6374 59 [52, 59) [59,63) … …

50 50 Merge Internal Node into Sibling 41 485263 59 [52, 59) [59,63) 59 … …

51 51 B-Tree Roadmap B-Tree B-Tree Recap Recap Insertion (recap) Insertion (recap) Deletion Deletion Construction Construction Efficiency Efficiency B-Tree variants B-Tree variants Hash-based Indexes Hash-based Indexes

52 52 Question How does insertion-based construction perform?

53 53 B-Tree Construction 111315213441485762758197 Sort

54 B-Tree Construction 759721415715111348346281 Scan 758197 111315213441 4857 62

55 B-Tree Construction 214875 111315213441 4857 62758197 Scan

56 56 B-Tree Construction Why is sort-based construction better than insertion-based one?

57 57 Cost of B-Tree Operations Height of B-Tree: H Height of B-Tree: H Assume no duplicates Assume no duplicates Question: what is the random I/O cost of: Question: what is the random I/O cost of: Insertion: Insertion: Deletion: Deletion: Equality search: Equality search: Range Search: Range Search:

58 58 Height of B-Tree Number of keys: N Number of keys: N B-Tree parameter: n B-Tree parameter: n Height ≈ log N = n log N log n In practice: 2-3 levels

59 59 Question: How do you pick parameter n? 1. Ignore inserts and deletes 2. Optimize for equality searches 3. Assume no duplicates

60 60 Roadmap B-Tree B-Tree B-Tree variants B-Tree variants Sparse Index Sparse Index Duplicate Keys Duplicate Keys Hash-based Indexes Hash-based Indexes

61 61 Roadmap B-Tree B-Tree B-Tree variants B-Tree variants Hash-based Indexes Hash-based Indexes Static Hash Table Static Hash Table Extensible Hash Table Extensible Hash Table Linear Hash Table Linear Hash Table

62 62 Hash-Based Indexes Adaptations of main memory hash tables Adaptations of main memory hash tables Support equality searches Support equality searches No range searches No range searches

63 Indexing Problem (recap) a 1 2 a i a n a A = val Index Keys record pointers

64 64 Main Memory Hash Table buckets 32 (null) 10 48 2775 21 55 0 3 1 2 4 5 6 7 key h (key) h (key) = key % 8

65 65 Adapting to disk 1 Hash Bucket = 1 Block 1 Hash Bucket = 1 Block All keys that hash to bucket stored in the block All keys that hash to bucket stored in the block Intuition: keys in a bucket usually accessed together Intuition: keys in a bucket usually accessed together No need for linked lists of keys … No need for linked lists of keys …

66 66 Adapting to Disk How do we handle this?

67 67 Adapting to disk 1 Hash Bucket = 1 Block 1 Hash Bucket = 1 Block All keys that hash to bucket stored in the block All keys that hash to bucket stored in the block Intuition: keys in a bucket usually accessed together Intuition: keys in a bucket usually accessed together No need for linked lists of keys … No need for linked lists of keys … … but need linked list of blocks (overflow blocks) … but need linked list of blocks (overflow blocks)

68 68 Adapting to Disk

69 69 Adapting to disk Bucket Id  Disk Address mapping Bucket Id  Disk Address mapping Contiguous blocks Contiguous blocks Store mapping in main memory Store mapping in main memory Too large? Too large? Dynamic  Linear and Extensible hash tables Dynamic  Linear and Extensible hash tables

70 70 Beware of claims that assume 1 I/O for hash tables and 3 I/Os for B-Tree!!


Download ppt "1 CPS216: Data-intensive Computing Systems Operators for Data Access (contd.) Shivnath Babu."

Similar presentations


Ads by Google