Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 More on Indexes Secondary Indexes B-Trees Source: our textbook, slides by Hector Garcia-Molina.

Similar presentations


Presentation on theme: "1 More on Indexes Secondary Indexes B-Trees Source: our textbook, slides by Hector Garcia-Molina."— Presentation transcript:

1 1 More on Indexes Secondary Indexes B-Trees Source: our textbook, slides by Hector Garcia-Molina

2 2 Secondary Indexes uSometimes we want multiple indexes on a relation. wEx: search Candies(name,manf) both by name and by manufacturer uTypically the file would be sorted using the key (ex: name) and the primary index would be on that field. uThe secondary index is on any other attribute (ex: manf). uSecondary index also facilitates finding records, but cannot rely on them being sorted

3 3 Sparse Secondary Index? uNo! uSince records are not sorted on that key, cannot predict the location of a record from the location of any other record. uThus secondary indexes are always dense.

4 4 Sequence field 50 30 70 20 40 80 10 100 60 90 Sparse index 30 20 80 100 90... does not make sense!

5 5 Design of Secondary Indexes uAlways dense, usually with duplicates uConsists of key-pointer pairs ("key" means search key, not relation key) uEntries in index file are sorted by key uTherefore second-level index is sparse

6 6 Secondary indexes Sequence field 50 30 70 20 40 80 10 100 60 90 10 20 30 40 50 60 70... 10 50 90... sparse second- level dense first- level

7 7 Secondary Index and Duplicate Keys uScheme in previous diagram wastes space in the present of duplicate keys uIf a search key value appears n times in the data file, then there are n entries for it in the index.

8 8 Duplicate values & secondary indexes 10 20 40 20 40 10 40 10 40 30 10 20 30 40... one option... Problem: excess overhead! disk space search time

9 9 Buckets uTo avoid repeating values, use a level of indirection uPut buckets between the secondary index file and the data file uOne entry in index for each search key K; its pointer goes to a location in a "bucket file", called the bucket for K uBucket holds pointers to all records with search key K

10 10 Duplicate values & secondary indexes 10 20 40 20 40 10 40 10 40 30 10 20 30 40 50 60... buckets saves space as long as search-keys are larger than pointers and average key appears at least twice

11 11 Why “bucket” idea is useful IndexesRecords name: primary Emp (name,dept,floor,...) dept: secondary floor: secondary

12 12 Query: SELECT name FROM Emp WHERE dept = 'Toy' AND floor = 2 dept indexEmp floor index Toy 2  Intersect Toy dept bucket and floor 2 bucket to get set of matching Emp’s Saves disk I/O's

13 13 Summary of Indexes So Far uAdvantages: wsimple windex is sequential file, good for scans uDisadvantages weither inserts are expensive wor lose sequentiality (cf. next slide) uInstead use B-tree data structure to implement index

14 14 ExampleIndex (sequential) continuous free space 10 20 30 40 50 60 70 80 90 39 31 35 36 32 38 34 33 overflow area (not sequential)

15 15 B-Trees uSeveral related data structures uKey features are: wautomatically adjust number of levels of indexes as size of data file changes wstorage on blocks is managed to keep every block between half full and full => no overflow blocks needed uWe'll actually study B+ trees

16 16 B-Tree Structure uan example of a balanced search tree: every root-to-leaf path has same length ueach node (vertex) in the tree is a block, which contains search keys and pointers uparameter n, which is largest value so that n+1 pointers and n keys fit in one block wEx: If block size is 4096 bytes, keys are 4 bytes, and pointers are 8 bytes, then n = 340.

17 17 Constraints on B-Tree Nodes uKeys in leaf nodes are copies of keys from data file, in sorted order uRoot contains between 2 and n+1 index node pointers uEach internal node contains between  (n+1)/2  and n+1 index node pointers uEach non-leaf node consists of ptr 1,key 1,ptr 2,key 2,…,key m-1,ptr m where ptr i points to index node with keys between key i-1 and key i

18 18 Constraints (cont'd) uEach leaf contains between  (n+1)/2  and n data record pointers, plus a "next leaf" pointer uAssociated with each data record pointer is a key, and the pointer points to the data record with that key

19 19 Example B-tree nodes with n = 3 30 35 30 35 30 textbook notationmore concise notation Leaf: Non-leaf: to record with key 30 to record with key 35 to part of tree with keys < 30 to part of tree with keys ≥ 30

20 20 Sample non-leaf to keysto keysto keys to keys < 5757  k<8181  k<95  95 57 81 95

21 21 Sample leaf node: From non-leaf node to next leaf in sequence 57 81 95 To record with key 57 To record with key 81 To record with key 85

22 22 Full nodemin. node Non-leaf Leaf n=3 120 150 180 30 3 5 11 30 35 counts even if null

23 23 Root B-Tree Examplen=3 100 120 150 180 30 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 … to records …

24 24 Insert into B+tree (a) simple case wspace available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root

25 25 (a) Insert key = 32 n=3 3 5 11 30 31 30 100 32

26 26 (a) Insert key = 7 n=3 3 5 11 30 31 30 100 3535 7 7

27 27 (c) Insert key = 160 n=3 100 120 150 180 150 156 179 180 200 160 180 160 179

28 28 (d) New root, insert 45 n=3 10 20 30 123123 10 12 20 25 30 32 40 45 4030 new root

29 29 (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf Deletion from B-tree

30 30 (b) Coalesce with sibling wDelete 50 10 40 100 10 20 30 40 50 n=4 40

31 31 (c) Redistribute keys wDelete 50 10 40 100 10 20 30 35 40 50 n=4 35

32 32 40 45 30 37 25 26 20 22 10 14 1313 10 2030 40 (d) Non-leaf coalese wDelete 37 n=4 40 30 25 new root

33 33 B-tree deletions in practice –Often, coalescing is not implemented wToo hard and not worth it!

34 34 Applications of B-Trees uB-tree is used to implement indexes uThe data record pointers in the leaves correspond to the data record pointers in sequential indexes uSome example uses: wB-tree search key is primary key for data file, leaf pointers form a dense index on the file wB-tree search key is primary key for data file, leaf pointers form a sparse index on the file wB-tree search key is not primary key, leaf pointers form a dense index on the file

35 35 B-Trees with Duplicate Keys Change definition of B-tree: uIf key K appears in an internal node, then K is the smallest "new" key in the subtree S rooted at the pointer that follows K in the node u"New" means K does not appear in the part of the B-tree to the left of S but it does appear in S uAllow null key in certain situations

36 36 Example B-Tree with Duplicates 17 -- 37 43 7 235235 7 13 17 23 37 41 43 47

37 37 Lookup in B-Trees uAssume no duplicate keys. uAssume B-tree is a dense index. uTo find the record with key K, search starting at the root and ending at a leaf: wif current node is not a leaf and has keys K 1, K 2, …, K n, find the smallest key, K i, in the sequence that is ≤ K. wfollow the (i+1)-st pointer to a node at the next level and repeat wwhen a leaf node is reached, find the key with value K and follow the associated pointer to the data record

38 38 Range Queries with B-Trees uRange query: a query in which a range of values is sought. Examples: wSELECT * FROM R WHERE R.k > 40; wSELECT * FROM R WHERE R.k >= 10 AND R.k <= 25; uTo find all keys in the range [a,b]: wDo a lookup on a: leads to leaf where a could be wSearch the leaf for all keys ≥ a wIf we find a key > b, we are done wElse follow next-leaf pointer and continue searching in the next leaf wContinue until finding a key > b or no more leaves

39 39 Efficiency of B-Trees uB-trees allow lookup, insertion and deletion of records with very few disk I/Os uNumber of disk I/Os is number of levels in the B- tree plus cost of any reorganization uIf n is at least 10, then splitting/merging blocks will be rare and usually limited to the leaves uFor typical sizes of keys, pointers, blocks and files, 3 levels suffice (see next slide) uAlso can keep root block of B-tree in memory

40 40 Size of B-Tree uAssume w4096 bytes per block w4 bytes per key (e.g., integer) w8 bytes per pointer wno header info in the block uThen n = 340 (can keep n keys and n+1 pointers in a block) uAssume on average a block has 255 pointers uCount: wone node at level 1 (the root) w255 nodes at level 2 w255*255 = 65,025 nodes at level 3 (leaves) weach leaf has 255 pointers, so total number of records is more than 16 million


Download ppt "1 More on Indexes Secondary Indexes B-Trees Source: our textbook, slides by Hector Garcia-Molina."

Similar presentations


Ads by Google