Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.

Similar presentations


Presentation on theme: "1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files."— Presentation transcript:

1 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files Hashing Hashing Static Static Dynamic Hashing Dynamic Hashing More: bitmap indexing More: bitmap indexing

2 2 Hashing Static hashing Static hashing Dynamic hashing Dynamic hashing

3 3 key  h(key) Hashing...... Buckets (typically 1 disk block)

4 4 Example hash function Key = ‘x 1 x 2 … x n ’ n byte character string Key = ‘x 1 x 2 … x n ’ n byte character string Have b buckets Have b buckets h: add x 1 + x 2 + ….. x n h: add x 1 + x 2 + ….. x n compute sum modulo b compute sum modulo b

5 5  This may not be best function … Good hash  Expected number of function:keys/bucket is the function:keys/bucket is the same for all buckets

6 6 Within a bucket: Do we keep keys sorted? Do we keep keys sorted? Yes, if CPU time critical Yes, if CPU time critical & Inserts/Deletes not too frequent & Inserts/Deletes not too frequent

7 7 Next: example to illustrate inserts, overflows, deletes h(K)

8 8 EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 01230123 d a c b h(e) = 1 e

9 9 01230123 a b c e d EXAMPLE: deletion Delete: e f f g maybe move “g” up c d

10 10 Rule of thumb: Try to keep space utilization Try to keep space utilization between 50% and 80% Utilization = # keys used Utilization = # keys used total # keys that fit total # keys that fit If < 50%, wasting space If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket If > 80%, overflows significant depends on how good hash function is & on # keys/bucket

11 11 How do we cope with growth? Overflows and reorganizations Overflows and reorganizations Dynamic hashing Dynamic hashing Extensible Extensible Linear Linear

12 12 Extensible hashing: two ideas (a) Use i of b bits output by hash function b h(K)  h(K)  use i  grows over time…. use i  grows over time…. 00110101

13 13 (b) Use directory h(K)[i ] to bucket............

14 14 Example: h(k) is 4 bits; 2 keys/bucket i =i =i =i = 1 1 1 0001 1001 1100 Insert 1010 1 1100 1010 New directory 2 00 01 10 11 i =i =i =i = 2 2

15 15 1 0001 2 1001 1010 2 1100 Insert: 0111 0000 00 01 10 11 2 i = Example continued 0111 0000 0111 0001 2 2

16 16 00 01 10 11 2 i = 2 1001 1010 2 1100 2 0111 2 0000 0001 Insert: 1001 Example continued 1001 1010 000 001 010 011 100 101 110 111 3 i = 3 3

17 17 Extensible hashing: deletion No merging of blocks No merging of blocks Merge blocks and cut directory if possible Merge blocks and cut directory if possible (Reverse insert procedure)

18 18 Deletion example: Run thru insert example in reverse! Run thru insert example in reverse!

19 19 Extensible hashing Can handle growing files - with less wasted space - with no full reorganizations Summary +Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) - -

20 20 Advanced indexing Multiple attributes Multiple attributes Bitmap indexing Bitmap indexing

21 21 Multiple-Key Access Use multiple indices for certain types of queries. Use multiple indices for certain types of queries. Example: Example: select account-number from account where branch-name = “Perryridge” and balance = 1000 Possible strategies? Possible strategies?

22 22 Indices on Multiple Attributes where branch-name = “PP” and balance = 1000 where branch-name = “PP” and balance = 1000 Suppose we have an index on combined search-key (branch-name, balance). BB,1000 CC,200 PP,800 PP,1500 AB,200 AA,2000 AA,2300 AA,2500 AB,200 AC,200 CC,200 DD,200 DD,300 CC,200 PP,300 PP,800 PP,1000 PP,1300 PP,1500 PP,1560

23 23 where branch-name = “PP” and balance < 1000 where branch-name = “PP” and balance < 1000 Suppose we have an index on combined search-key (branch-name, balance). BB,1000 CC,200 PP,800 PP,1500 AB,200 AA,2000 AA,2300 AA,2500 AB,200 AC,200 CC,200 DD,200 DD,300 CC,200 PP,300 PP,800 PP,1000 PP,1300 PP,1500 PP,1560 search pp,0 search pp,1000

24 24 where branch-name < “PP” and balance = 1000? where branch-name < “PP” and balance = 1000? Suppose we have an index on combined search-key (branch-name, balance). BB,1000 CC,200 PP,800 PP,1500 AB,200 AA,2000 AA,2300 AA,2500 AB,200 AC,200 CC,200 DD,200 DD,300 CC,200 PP,300 PP,800 PP,1000 PP,1300 PP,1500 PP,1560 NO!

25 25 Bitmap Indices An index designed for multiple valued search keys An index designed for multiple valued search keys

26 26 Bitmap Indices (Cont.) Unique values of gender Unique values of income-level Bitmap(size = table size) The income-level value of record 3 is L1

27 27 Bitmap Indices (Cont.) Some properties of bitmap indices Some properties of bitmap indices Number of bitmaps for each attribute? Number of bitmaps for each attribute? Size of each bitmap? Size of each bitmap? When is the bitmap matrix sparse and what attributes are good for bitmap indices? When is the bitmap matrix sparse and what attributes are good for bitmap indices?

28 28 Bitmap Indices (Cont.) Bitmap indices generally very small compared with relation size Bitmap indices generally very small compared with relation size E.g. if record is 100 bytes, space for a single bitmap is 1/800 of space used by relation. E.g. if record is 100 bytes, space for a single bitmap is 1/800 of space used by relation. If number of distinct attribute values is 8, bitmap is only 1% of relation size If number of distinct attribute values is 8, bitmap is only 1% of relation size What about insertion? What about insertion? Deletion? Deletion?

29 29 Bitmap Indices Queries Sample query: Males with income level L1 10010 AND 10100 = 10000 What about the number of males with income level L1? even faster!

30 30 Bitmap Indices Queries Queries are answered using bitmap operations Queries are answered using bitmap operations Intersection (and) Intersection (and) Union (or) Union (or) Complementation (not) Complementation (not)


Download ppt "1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files."

Similar presentations


Ads by Google