Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hash Tables CSIT 402 Data Structures II. Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions.

Similar presentations


Presentation on theme: "Hash Tables CSIT 402 Data Structures II. Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions."— Presentation transcript:

1 Hash Tables CSIT 402 Data Structures II

2 Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining Open addressing: linear probing, quadratic probing, double hashing Rehashing Load factor

3 Goal Develop a structure that will allow user to insert/delete/find records in constant average time –structure will be a table (relatively small) –table completely contained in memory –implemented by an array –capitalizes on ability to access any element of the array in constant time

4 Hash Table Applications Symbol table in compilers Accessing tree or graph nodes by name –E.g., city names in Google maps Maintaining a transposition table in games –Remember previous game situations and the move taken (avoid re- computation) Dictionary lookups –Spelling checkers –Natural language understanding (word sense) Heavily used in text processing languages –E.g., Perl, Python, etc.

5 Hash Function Assume table (array) size is N Function f(x) maps any key x to an int between 0 and N−1 For example, assume that N=15, that key x is a non-negative int between 0 and MAX_INT, and hash function f(x) = x % 15

6 Hash Function Thus, since f(x) = x % 15, if x = 25 129 35 2501 47 36 f(x) = 10 9 5 11 2 6 Storing the keys in the array is not a problem. Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 _ _ 47 _ _ 35 36 _ _ 129 25 2501 _ _ _

7 Hash Function What happens when you try to insert: x = 65 ? x = 65 f(x) = 5 Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 _ _ 47 _ _ 35 36 _ _ 129 25 2501 _ _ _ 65(?)

8 Hash Function What happens when you try to insert: x = 65 ? x 65 f(x) 5 Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 47 35 36 129 25 2501 65(?) This is called a collision.

9 Handling Collisions Separate Chaining Open Addressing –Linear Probing –Quadratic Probing –Double Hashing

10 Handling Collisions Separate Chaining

11 Let each array element be the head of a chain. Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14       47 65 36 129 25 2501  35 Where would you store: 29, 16, 14, 99, 127 ?

12 Separate Chaining Let each array element be the head of a chain: Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14          16 47 65 36 127 99 25 2501 14    35 129 29 Where would you store: 29, 16, 14, 99, 127 ? Note that we use the insertAtHead() method when inserting new keys.

13 Handling Collisions Linear Probing

14 Let key x be stored in element f(x)=t of the array Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 47 35 36 129 25 2501 65(?) What do you do in case of a collision? If the hash table is not full, attempt to store key in array elements (t+1)%N, (t+2)%N, (t+3)%N … until you find an empty slot.

15 Linear Probing Where do you store 65 ? Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 47 35 36 65 129 25 2501    attempts

16 Linear Probing If the hash table is not full, attempt to store key in array elements (t+1)%N, (t+2)%N, … Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 47 35 36 65 129 25 2501 29  Where would you store: 29, 16, 14, 99, 127 ?

17 Linear Probing If the hash table is not full, attempt to store key in array elements (t+1)%N, (t+2)%N, … Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 16 47 35 36 65 129 25 2501 29  Where would you store: 16, 14, 99, 127 ?

18 Linear Probing If the hash table is not full, attempt to store key in array elements (t+1)%N, (t+2)%N, … Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 14 16 47 35 36 65 129 25 2501 29   attempt Where would you store: 14, 99, 127 ?

19 Linear Probing If the hash table is not full, attempt to store key in array elements (t+1)%N, (t+2)%N, … Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 14 16 47 35 36 65 129 25 2501 99 29     attempts Where would you store: 99, 127 ?

20 Linear Probing If the hash table is not full, attempt to store key in array elements (t+1)%N, (t+2)%N, … Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 14 16 47 35 36 65 127 129 25 2501 99 29  attempts Where would you store: 127 ?

21 Linear Probing Leads to problem of clustering. Elements tend to cluster in dense intervals in the array.     

22 Drawbacks of Linear Probing Works until array is full, but as number of items N approaches TableSize, access time approaches O(N) Very prone to cluster formation (as in our example) –If a key hashes anywhere into a cluster, finding a free cell involves going through the entire cluster – and making it grow! –Primary clustering – clusters grow when keys hash to values close to each other Can have cases where table is empty except for a few clusters –Does not satisfy good hash function criterion of distributing keys uniformly

23 Handling Collisions Quadratic Probing

24 Let key x be stored in element f(x)=t of the array Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 47 35 36 129 25 2501 65(?) What do you do in case of a collision? If the hash table is not full, attempt to store key in array elements (t+1 2 )%N, (t+2 2 )%N, (t+3 2 )%N … until you find an empty slot.

25 Quadratic Probing Where do you store 65 ? f(65)=t=5 Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 47 35 36 129 25 2501 65     t t+1 t+4 t+9 attempts

26 Quadratic Probing If the hash table is not full, attempt to store key in array elements (t+1 2 )%N, (t+2 2 )%N … Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 29 47 35 36 129 25 2501 65  t+1 t attempts Where would you store: 29, 16, 14, 99, 127 ?

27 Quadratic Probing If the hash table is not full, attempt to store key in array elements (t+1 2 )%N, (t+2 2 )%N … Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 29 16 47 35 36 129 25 2501 65  t attempts Where would you store: 16, 14, 99, 127 ?

28 Quadratic Probing If the hash table is not full, attempt to store key in array elements (t+1 2 )%N, (t+2 2 )%N … Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 29 16 47 14 35 36 129 25 2501 65    t+1 t+4 t attempts Where would you store: 14, 99, 127 ?

29 Quadratic Probing If the hash table is not full, attempt to store key in array elements (t+1 2 )%N, (t+2 2 )%N … Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 29 16 47 14 35 36 129 25 2501 99 65    t t+1 t+4 attempts Where would you store: 99, 127 ?

30 Quadratic Probing If the hash table is not full, attempt to store key in array elements (t+1 2 )%N, (t+2 2 )%N … Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 29 16 47 14 35 36 127 129 25 2501 99 65  t attempts Where would you store: 127 ?

31 Quadratic Probing Tends to distribute keys better Alleviates problem of clustering

32 Handling Collisions Double Hashing

33 Let key x be stored in element f(x)=t of the array Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 47 35 36 129 25 2501 65(?) What do you do in case of a collision? Define a second hash function f 2 (x)=d. Attempt to store key in array elements (t+d)%N, (t+2d)%N, (t+3d)%N … until you find an empty slot.

34 Double Hashing Collision Resolution Strategy –Apply if hash function produces collision Typical second hash function f 2 (x)=R − ( x % R ) where R is a prime number, R < N

35 Double Hashing Where do you store 65 ? f(65)=t=5 Collision at 5 means apply double hashing Let f 2 (x)= 11 − (x % 11) f 2 (65)=d=1 Note: R=11, N=15 Attempt to store key in array elements (t+d)%N, (t+2d)%N, (t+3d)%N … Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 1314 47 35 36 65 129 25 2501    t t=(t+2)%15 attempts t=(t+d)%15 =(t+1)%15

36 Double Hashing If the hash table is not full, attempt to store key in array elements (t+d)%N, (t+2d)%N … f 2 (x)= 11 − (x % 11) f 2 (29)=d=4 But f(29)= 14 => No collision => No f 2 this time Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 47 35 36 65 129 25 2501 29  t attempt Where would you store: 29, 16, 14, 99, 127 ?

37 Double Hashing If the hash table is not full, attempt to store key in array elements (t+d)%N, (t+2d)%N … Let f 2 (x)= 11 − (x % 11) f 2 (16)=d=6 But f(16)= 1 => No collision Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 16 47 35 36 65 129 25 2501 29  t attempt Where would you store: 16, 14, 99, 127 ?

38 Double Hashing If the hash table is not full, attempt to store key in array elements (t+d)%N, (t+2d)%N … Let f 2 (x)= 11 − (x % 11) f 2 (14)=d=8 f(14)= 14 => Collision Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 14 16 47 35 36 65 129 25 2501 29    t+16=(14+16)%15 t+8=(14+8)%15 t attempts Where would you store: 14, 99, 127 ? Initially hashes to 14%15=14

39 Double Hashing If the hash table is not full, attempt to store key in array elements (t+d)%N, (t+2d)%N … Let f 2 (x)= 11 − (x % 11) f 2 (99)=d=11 Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 14 16 47 35 36 65 129 25 2501 99 29     (t+22)%15 (t+11)%15 t (t+33)%15 attempts Where would you store: 99, 127 ? f(99)= 9 => No collision First application of 2ndary hash function

40 Double Hashing If the hash table is not full, attempt to store key in array elements (t+d)%N, (t+2d)%N … Let f 2 (x)= 11 − (x % 11) f 2 (127)=d=5 Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 14 16 47 35 36 65 129 25 2501 99 29    (t+10)%15 t (t+5)%15 attempts Where would you store: 127 ?

41 Delete Element Collision Ramifications Chaining Linear Probing Quadratic Probing Double Hashing

42 Chaining/Buckets Deletion from Linked List No Issues Linear Probing Issue: Removal of element within cluster Deletion Issues

43 [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number 233667136 Number 281942902 Number 155778322... Number 580625685 Number 701466868 The location where the element was must not be left as an ordinary "empty spot" since that could interfere with searches and deletions (why not insertions?). The location must be marked in some special way so that a search can tell that the spot used to have something in it.

44 Deletion Issues General Solutions Each slot can be marked as empty, deleted, or occupied Cascade following elements one slot back, according to the collision handling scheme Remove all successor elements and reinsert Second table of deleted items. Commonly used in search engines. Rebuild hash table Move element at end of cluster to fill slot Some of these are not foolproof!

45 Searching For a Key Elephant in the room If there’s a collision, how do you know when you found your item, since all mapped to the same slot? The key is assumed to be a unique value Keep in mind: The hash function maps sparse data into an array or similar It is applied to the key

46 Hash Tables in C++ STL STL has hash table containers –hash_set –hash_map

47 Hash Set in STL #include struct eqstr { bool operator()(const char* s1, const char* s2) const { return strcmp(s1, s2) == 0; } }; void lookup(const hash_set, eqstr>& Set, const char* word) { hash_set, eqstr>::const_iterator it = Set.find(word); cout << word << ": " << (it != Set.end() ? "present" : "not present") << endl; } int main() { hash_set, eqstr> Set; Set.insert("kiwi"); lookup(Set, “kiwi"); } KeyHash fn Key equality test

48 Hash Map in STL #include struct eqstr { bool operator() (const char* s1, const char* s2) const { return strcmp(s1, s2) == 0; } }; int main() { hash_map, eqstr> months; months["january"] = 31; months["february"] = 28; … months["december"] = 31; cout " << months[“january"] << endl; } KeyDataHash fn Key equality test Internally treated like insert (or overwrite if key already present)

49 12/26/03Hashing - Lecture 1049 Simple Hashes It's possible to have very simple hash functions if you are certain of your keys For example, –suppose we know that the keys s will be real numbers uniformly distributed over 0  s < 1 –Then a very fast, very good hash function is hash(s) = floor(s·m) where m is the size of the table

50 12/26/03Hashing - Lecture 1050 Nonnumerical Keys Many hash functions assume that the universe of keys is the natural numbers N={0,1,…} Need to find a function to convert the actual key to a natural number quickly and effectively before or during the hash calculation Generally work with the ASCII character codes when converting strings to numbers

51 12/26/03Hashing - Lecture 1051 Load Factor of a Hash Table Let N = number of items to be stored Load factor = N/TableSize –TableSize = 101 and N =505, then = 5 –TableSize = 101 and N = 10, then = 0.1 Average length of chained list = and so average time for accessing an item = O(1) + O( ) –Want to be smaller than 1 but close to 1 if good hashing function (i.e. TableSize  N) –With chaining hashing continues to work for > 1

52 Conclusions Best to choose N=prime number Issue of Load Factor = fraction of hash table occupied –should rehash when load factor between 0.5 and 0.7 Rehashing –approximately double the size of hash table (select N=nearest prime) –redefine hash function(s) –rehash keys into new table


Download ppt "Hash Tables CSIT 402 Data Structures II. Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions."

Similar presentations


Ads by Google