Presentation is loading. Please wait.

Presentation is loading. Please wait.

COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.

Similar presentations


Presentation on theme: "COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV."— Presentation transcript:

1 COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

2 2 Topics How to choose a Hash Function? Closed hashing Linear hashing Quadratic hashing Double hashing

3 3 Hash Functions Good hash function: Easy & fast to compute Has minimal number of clashes Data items are spread uniformly throughout the array Hashing problems reduce to the following points: Finding a hashing method that minimizes collisions Resolving collisions when they do happen

4 4 Hashing Methods Integer Type It is sufficient for a hash function to operate on integers Any arbitrary integer can be converted into an integer within a certain range The index of the hash table lies within a specific range Solutions Digit Selection Folding Modulo arithmetic

5 5 Hashing Methods Digit Selection Choose a group of digits from the number Use combination of Mod/div operations on the search key One of the most effective hashing methods

6 6 Hashing Methods Digit Selection Example Assume table size = 1000 Key= 01234567 Choose 2 nd, 4 th,& last digits H(key) = 147 key = d1 d2 d3 d4 d5 d6 d7 d8 d9 Choose leftmost 3 digits H(key) = key Div 1000000 = d1 d2 d3 Choose rightmost 3 digits H(key) = key Mod 1000 = d7 d8 d9

7 7 Hashing Methods Digit Selection Mid-square Method (Multiplication) First Variant Key is squared, then some digits of this square are selected to give the index. Example k= 54321 H(k) = k 2 = 2950771041 Pick up 3 middle digits  index = 077

8 8 Hashing Methods Folding Method Digits are added together instead of just being selected Digits can first be grouped and then add the groups Folding can be done more than once on the search key

9 9 Hashing Methods Folding Method Example: Key = 1234567 H(Key) = 1 + 2 + 3 + 4 + 5 + 6 + 7 = 28 Disadvantage All values will be put in the range Solution Divide into groups then fold Key = 1234567 Groups:1234567 Fold:12 + 345 + 67 =454 Hash again to fit into table size

10 10 Hashing Methods Modulo Arithmetic Choose a prime table size Divide the search key using modulo the size of the table h(x) = x mod TableSize Items will be distributed over the table Advantages Simple Reduces collisions items will be evenly distributed if table size is a prime number

11 11 Hashing Methods What should be done if the search key is a character? Convert the character string into some integer before applying the hash function How should we do that? Use the ASCII code: Can lead to duplication (e.g. NOTE and TONE will result in the same hash function) Write a numeric value for each character in binary Concatenate the results

12 12 Hashing Methods Example: Key = NOTE ASCII code for each character N = 14 = (01110) // Order of ‘N’ in alphabet O = 15 = (01111) T = 20 = (10100) E = 5 = (00101) Concatenation Binary result: y = (01110 01111 10100 00101) Equivalent decimal X = 474,757 Apply hash function h(x) = x mod TableSize

13 13 Closed Hashing (Open Addressing) No secondary data structure All the data goes inside the table. On collision, try alternate cells until an empty cell is found. How? Bigger table is needed.

14 14 Linear Probing Linear search from position where collision occurred.

15 15 Linear Probing This is called a collision, because there is already another valid record at [2]. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [100] Number 506643548 Number 233667136 Number 281942902 Number 155778322 Number 580625685 Number 265-7917 My hash value is [2]. [2] is occupied, how to do

16 16 Linear Probing This is called a collision, because there is already another valid record at [2]. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [100] Number 506643548 Number 233667136 Number 281942902 Number 155778322 Number 580625685 Number 265-7917 My hash value is [2]. When a collision occurs, move forward until you find an empty spot. When a collision occurs, move forward until you find an empty spot.

17 17 Linear Probing This is called a collision, because there is already another valid record at [2]. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [100] Number 506643548 Number 233667136 Number 281942902 Number 155778322 Number 580625685 Number 265-7917 My hash value is [2]. [5] is empty, I can insert it

18 18 Linear Probing This is called a collision, because there is already another valid record at [2]. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [100] Number 506643548 Number 233667136 Number 281942902 Number 155778322 Number 580625685 The new record goes in the empty spot. The new record goes in the empty spot. Number 701466868

19 19 Linear Probing Find the next index in the array up until the maximum subscript is reached and then it should return to the first index (wrap around) Try alternate cells Cells h 0 (x), h 1 (x), h 2 (x), … are tried until an free cell is found h i (x) = ( hash(x) + f(i) ) mod TSIZE f(0) = 0 Linear probing f(i) = i

20 20 Searching for a Key The data that's attached to a key can be found fairly quickly. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [100] Number 506643548 Number 233667136 Number 281942902 Number 155778322 Number 580625685Number 701466868 Number 265-7917

21 21 Searching for a Key Calculate the hash value. Check that location of the array for the key.. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [100] Number 506643548 Number 233667136 Number 281942902 Number 155778322 Number 580625685Number 701466868 Number 265-7917 My hash value is [2]. Not me.

22 22 Searching for a Key Keep moving forward until you find the key, or you reach an empty spot. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [100] Number 506643548 Number 233667136 Number 281942902 Number 155778322 Number 580625685Number 701466868 Number 265-7917 My hash value is [2]. Not me.

23 23 Searching for a Key Keep moving forward until you find the key, or you reach an empty spot. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [100] Number 506643548 Number 233667136 Number 281942902 Number 155778322 Number 580625685Number 701466868 Number 265-7917 My hash value is [2]. Not me.

24 24 Searching for a Key Keep moving forward until you find the key, or you reach an empty spot. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [100] Number 506643548 Number 233667136 Number 281942902 Number 155778322 Number 580625685Number 701466868 Number 265-7917 My hash value is [2]. Yes!

25 25 Searching for a Key When the item is found, the information can be copied to the necessary location. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [100] Number 506643548 Number 233667136 Number 281942902 Number 155778322 Number 580625685Number 701466868 Number 265-7917 My hash value is [2]. Yes!

26 26 Deleting a Record Records may also be deleted from a hash table [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [100] Number 506643548 Number 233667136 Number 281942902 Number 155778322 Number 580625685Number 701466868 Please delete me.

27 27 Deleting a Record Records may also be deleted from a hash table. But the location must not be left as an ordinary "empty spot" since that could interfere with searches. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [100] Number 233667136 Number 281942902 Number 155778322 Number 580625685Number 701466868

28 28 Deleting a Record Records may also be deleted from a hash table. But the location must not be left as an ordinary "empty spot" since that could interfere with searches. The location must be marked in some special way so that a search can tell that the spot used to have something in it. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [100] Number 233667136 Number 281942902 Number 155778322 Number 580625685Number 701466868

29 29 Linear Probing Advantage Uses less memory than chaining don’t have to store all the links Disadvantages Can be slower than chaining may have to walk along the table for a long way Difficult to delete a key and associated record. has an impact on the search process Clustering Primary clustering Table contains groups of consecutively occupied locations

30 30 Linear probing: f(i) = i Quadratic probing: f(i) = i 2 Insert 10, 40, 60, 20, 30, 70, 80 Quadratic Probing 0 1 2 3 4 5 6 7 8 9 10 0202 40 1212 10 40 60 2 10 40 60 20 3232 10 40 60 30 20 4242 10 40 60 70 30 20 5252 10 40 60 70 30 20 6 2 mod 10 = 6

31 31 Quadratic Probing Advantages Easy to compute Avoids primary clustering Disadvantage Not all entries are searched. Might not encounter a free storage location even when there are locations that are still free Elements that has the same hash value will probe the same set of alternate cells Secondary clustering Not a big problem in practice Use a good hash function

32 32 Double Hashing Use two hash functions one as before that generates the ‘home’ position. second one generates a sequence of offsets from the home position that define the probe sequence. probe = (probe + offset) mod N If the size of the table is prime, this method will eventually examine every position in the table.

33 33 Problems with Closed Hashing Table too full Running time too long Inserts could fail Must be chosen in advance Don’t know the number of elements Rehashing Build a new table that is about twice as big Hash the elements into the new table Need to apply new hash function to every item in the old hash table

34 34 Summary Hash tables are specialized for dictionary operations: Insert, Delete, Search Principle: Turn the key field of the record into a number, which we use as an index for locating the item in an array. O(1) in the ideal case Problems: find a good hash function, collisions, wasted space, do not support ordering queries Implementations: open hashing, closed hashing, dynamic hashing

35 35 Reveiw What is a perfect hash function? What is a collision? What is meant by clustering? How does clustering affect the overall efficiency of hashing? What is a bucket? What is the time complexity for insertion, deletion, and search in Hashing?


Download ppt "COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV."

Similar presentations


Ads by Google