Presentation is loading. Please wait.

Presentation is loading. Please wait.

Indexing Techniques. Advanced DatabasesIndexing Techniques2 The Problem What can we introduce to make search more efficient? –Indices! What is an index?

Similar presentations


Presentation on theme: "Indexing Techniques. Advanced DatabasesIndexing Techniques2 The Problem What can we introduce to make search more efficient? –Indices! What is an index?"— Presentation transcript:

1 Indexing Techniques

2 Advanced DatabasesIndexing Techniques2 The Problem What can we introduce to make search more efficient? –Indices! What is an index? ……

3 Advanced DatabasesIndexing Techniques3 Definitions Index: an auxiliary data structure to speed up record retrieval Search key: the field/s of a table which is/are indexed Storage: index files that contain index records –Each entry storing Actual data record or, search key value k and record ID or, search key value k and list of records IDs Types: ordered and unordered (hash) indices Page iPage i+1 Paul Anna Tim

4 Advanced DatabasesIndexing Techniques4 Types of Ordered Indices (1/3) Assuming ordered data files Depending on which field is indexed –Primary index: search key is ordering key field Pointer for each page –Secondary index: search key is non ordering field Paul00112233 Anna00112234 Matt00112235 Tim00112236 Carol00112237 Rob00112238 00112233 00112235 00112236 00112238 Anna Carol Paul Tim primary secondary

5 Advanced DatabasesIndexing Techniques5 Types of Ordered Indices (2/3) Depending on the density of index records –Dense index: an index record for each distinct search key value, ie every record –Sparse index: index records for only some search key values search key value for first record in page pointer to page Paul00112233 Anna00112234 Matt00112235Tim00112236 Carol00112237 Rob0011223800112233 00112235 00112236 00112238 sparse 00112233 00112234 00112235 00112236 00112237 00112238 dense

6 Advanced DatabasesIndexing Techniques6 Types of Ordered Indices (3/3) Ordering field is nonkey (may have duplicates) –Clustered index –Unclustered index Paul00112233 Anna00112234 Matt00112235 Tim00112236 Carol00112237 Rob00112238 Paul01112233 Tim01112236 Tim02112236 Anna Carol Matt Paul Rob Tim 00112233 00112234 00112235 00112236 00112237 00112238 01112233 01112236 02112236 clustered unclustered

7 Advanced DatabasesIndexing Techniques7 Indices Exercise 2 15 records 128 bytes/record 2 10 bytes/page ordered file equality search on ordering field, unspanned organization –without an index –with a primary index on field of size 12 bytes assume pointer 4 bytes long

8 Advanced DatabasesIndexing Techniques8 Multi-level Indices (1/2) If access using first-level index is still expensive Build a sparse index on the first-level index –Multi-level Index Fan-out: index blocking factor Paul00112233 Anna00112234 Matt00112235 Tim00112236 Carol00112237 Rob00112238 00112233 00112234 00112235 00112236 00112237 00112238 00112233 00112235 00112236 first-level index second-level index

9 Advanced DatabasesIndexing Techniques9 Multi-level Indices (2/2) 2 6 index records/page (fan-out) 2 15 index records 1st-level –2 9 pages 2nd-level –2 9 index records –2 3 pages 3rd-level –2 3 index records –1 page 1 <= 2 15 / (2 6 ) t t = ceil(log 2 6 2 15 ) = 3 t = ceil(log fo #index-records)

10 Advanced DatabasesIndexing Techniques10 Dynamic multi-level indices So far assumed indices are physically ordered files –expensive insertions and deletions Dynamic multi-level indices –B trees –B + trees

11 Advanced DatabasesIndexing Techniques11 Tree-structured Indices For each node: K 1 < K 2 < … K q-1 For each value X in subtree pointed to by P i –K i-1 < X < K i, 1<i<q –X < K i, i=1 –K i-1 < X, i=q P1P1 K1K1 …K i-1 PiPi KiKi …K q-1 PqPq XXX

12 Advanced DatabasesIndexing Techniques12 B tree Problems: empty nodes, unbalanced trees –solution: B trees ………………………

13 Advanced DatabasesIndexing Techniques13 B tree: Definition Each node:, P 2,…,, P q > P i tree pointer, K i search value, Pr i data pointer For each node: K 1 < K 2 < … K q-1 For each value X in subtree pointed to by P i –K i-1 < X < K i, 1<i<q –X < K i, i=1 –K i-1 < X, i=q Each node at most q pointers –B tree is order q Each node at least ceil(q/2) tree pointers –except from root Internal node with p pointers has p-1 values All leaves at the same level –balanced tree

14 Advanced DatabasesIndexing Techniques14 B tree: Example 58 ø1ø3øø6ø7øø9ø12ø tree pointer data pointer ø null pointer

15 Advanced DatabasesIndexing Techniques15 B + tree Most implementations of B tree are B + tree Data pointers only in leaves –more entries in internal nodes than regular B trees –less internal nodes –less levels –faster access

16 Advanced DatabasesIndexing Techniques16 B + tree: Definition Internal nodes: Leaf nodes:,,…,, P next > Pr i points a data records or block of pointers of such records leaf order 120150180 150156179 180200 100101110 120130

17 Advanced DatabasesIndexing Techniques17 100101110 120130 150156179 180200 3511 3035 120150180 30 100 B+ tree: Search At each level, find smallest K i larger than search key Follow associated pointer P i

18 Advanced DatabasesIndexing Techniques18 B+ tree: Insert Nodes may overflow or underflow Ignoring overflow or underflow Inserting data record with with search key value k –find leaf node –if k found add record to file, create indirect block if there isn’t one add record pointer to indirect block –if k not found add data record to file insert record pointer in leaf node (all search keys in order)

19 Advanced DatabasesIndexing Techniques19 B+ tree: Delete Ignoring overflow or underflow Find leaf node with search key value k Find data record pointer, delete record delete index record –and indirect block, if any, if empty

20 Advanced DatabasesIndexing Techniques20 B+ tree: Simple Insert Insert 42 100101110 120130 150156179 180200 3511 3035 12015018030 100 k < 100 42

21 Advanced DatabasesIndexing Techniques21 B+ tree: Leaf Overflow (1/2) Insert 9 100101110 120130 150156179 180200 3511 303542 12015018030 100 k < 100

22 Advanced DatabasesIndexing Techniques22 B+ tree: Leaf Overflow (2/2) first ceil(n/2) in existing node, rest in new leaf node n=3+1=4 100101110 120130 150156179 180200 120150180 930 100 k < 100 35303542911

23 Advanced DatabasesIndexing Techniques23 930 k < 100 35303542911 B+ tree: Internal Node Overflow (1/3) Insert 210, insert 205 100101110 120130 150156179 180200210 120150180 100

24 Advanced DatabasesIndexing Techniques24 B+ tree: Internal Node Overflow (2/3) Leaf Split 930 k < 100 35303542911 100101110 120130 150156179 180200 120150180 100 205210

25 Advanced DatabasesIndexing Techniques25 B+ tree: Internal Node Overflow (3/3) 930 k < 100 35303542911 100101110 120130 150156179 180200 120 100150 205210 180205

26 Advanced DatabasesIndexing Techniques26 B+ tree: New Root (1/2) Insert 210, insert 205 100101110 120130 150156179 180200 120150180 205210

27 Advanced DatabasesIndexing Techniques27 B+ tree: New Root (2/2) 180205 100101110 120130 150156179 180200 120 205210 150

28 Advanced DatabasesIndexing Techniques28 Index Insert Exercise Insert 8, 7, 41 930 35 3542911

29 Advanced DatabasesIndexing Techniques29 B+ tree: Delete Simple delete case Underflow case: –redistribute records –coalesce with siblings –update parents

30 Advanced DatabasesIndexing Techniques30 B+ tree: Simple Delete (1/2) Delete 110 180205 100101110 120130 150156179 180200 120 205210215 150

31 Advanced DatabasesIndexing Techniques31 B+ tree: Simple Delete (2/2) Leaf Updated 180205 100101120130 150156179 180200 120 205210215 150

32 Advanced DatabasesIndexing Techniques32 B+ tree: Delete Redistribution (1/2) Delete 180 180205 100101120130 150156179 180200 120 205210215 150

33 Advanced DatabasesIndexing Techniques33 B+ tree: Delete Redistribution (2/2) Redistribute entries –left or right sibling 179205 100101120130150156179200 120 205210 150

34 Advanced DatabasesIndexing Techniques34 B+ tree: Delete Coalesce (1/4) Delete 101 179205 100101120130150156179200 120 205210215 150

35 Advanced DatabasesIndexing Techniques35 B+ tree: Delete Coalesce (2/4) Leaf updated No redistribution –sibling coalesce 179205 100120130150156179200 120 205210215 150

36 Advanced DatabasesIndexing Techniques36 B+ tree: Delete Coalesce (3/4) Leaf updated No redistribution –sibling coalesce 179205 100120130150156179200 205210215 150

37 Advanced DatabasesIndexing Techniques37 B+ tree: Delete Coalesce (4/4) Redistribution 205 100120130150156179200 150 205210215 179

38 Hashing Techniques

39 Advanced DatabasesIndexing Techniques39 Static Hashing (1/2) Store records in buckets with overflow chains Allocate a fixed number of buckets M Problems: –small M long overflow chains, slow search-delete-insert null h

40 Advanced DatabasesIndexing Techniques40 Static Hashing (2/2) Problems: –large M wasted space, slow scan null h

41 Advanced DatabasesIndexing Techniques41 Dynamic Hashing Splitting and coalescing buckets as the database grows-shrinks One scheme: Extendible Hashing Hash function generates large values, eg 32 bits –use i bits, change i as database size changes If overflow, double the number of buckets –use i+1 bits of the hash function –but, expensive: read all pages M and distribute records in 2*M pages solution: use a directory and double the size of the directory –only split bucket that overflowed

42 Advanced DatabasesIndexing Techniques42 Extendible Hashing (1/4) h(18) = 10010 2 01 00 11 10 1620 2 1 2 2 Directory Buckets 37 2 A B C D 18

43 Advanced DatabasesIndexing Techniques43 Extendible Hashing (2/4) h(4) = 00100 2 01 00 11 10 1620 2 1 2 2 37 2 A B C D 18

44 Advanced DatabasesIndexing Techniques44 Extendible Hashing (3/4) 2 01 00 11 10 16 3 1 2 2 37 2 A B C D 18 204 3 A1

45 Advanced DatabasesIndexing Techniques45 Extendible Hashing (4/4) 3 001 000 011 010 16 3 1 2 2 37 2 A B C D 18 204 3 A1 101 100 111 110 Global Depth Local Depth If bucket full: –split bucket –increment LD If GD=LD –increment GD –double directory

46 Advanced DatabasesIndexing Techniques46 Extendible Hashing: Delete If deletion make bucket empty –merge with split image If directory pointers point to same bucket as split image –directory halved

47 Advanced DatabasesIndexing Techniques47 Extendible Hashing: Summary Avoids overflow pages Directory can get large Key search requires just 2 page reads Space utilization fluctuates –59-90% for uniformly distributed records

48 Advanced DatabasesIndexing Techniques48 Extendible Hashing: Exercise Initially GD = LD = 1 M = 2 buckets Hash function: h(k) = k mod 2 i inserts: 14, 18, 22, 3, 9 deletes 9, 22, 3 1 01 00 128 1 5 1


Download ppt "Indexing Techniques. Advanced DatabasesIndexing Techniques2 The Problem What can we introduce to make search more efficient? –Indices! What is an index?"

Similar presentations


Ads by Google