Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.

Similar presentations


Presentation on theme: "1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record."— Presentation transcript:

1 1 Chapter 9 Maps and Dictionaries

2 2 A basic problem We have to store some records and perform the following: add new record add new record delete record delete record search a record by key search a record by key Find a way to do these efficiently!

3 3 Map (Dictionary) A Map is a collection of pairs of the form ( k, e ) where K is a key K is a key E is the element associated with the key E is the element associated with the key Operations allowed on a map Get the element associated with a particular key Get the element associated with a particular key Insert an element with a specific key Insert an element with a specific key Delete an element with a specific key Delete an element with a specific key

4 4 Array as table 234903893 43920398 93429382 643930281 519394413 838203980 tom mary peter david andy betty 73 100 20 56.8 81.5 90 studidnamescore 129384343bill49... Consider this problem. We want to store 1000 student records and search them by their social security number.

5 5 Array as table : 519394413 : 129384343 0 : : andy : bill : : 81.5 : 49 : studidnamescore 643930281 david56.8 : 838203980 : : : betty : : : 90 : : 1000000000 One approach would be to store the records in an array (index 0..9999999). The index is used as the student id, i.e. the record of the student with id 0012345 is stored at A[12345]

6 6 Array as table Store the records in an array where the index corresponds to the key add - very fast O(1) add - very fast O(1) delete - very fast O(1) delete - very fast O(1) search - very fast O(1) search - very fast O(1)Problems?

7 7 Typical Solution Normally it is not possible to make a table large enough to hold the values associated with all possible keys A table used to lookup student names might only contain a few thousand entries A table used to lookup student names might only contain a few thousand entries This means The table length will be less than the key range The table length will be less than the key range

8 8Hashing Two components Table (can be accessed randomly by position) Table (can be accessed randomly by position) Hash Function  h(key) Hash Function  h(key) Maps keys into positions in the table Generally speaking, the domain of h << the range of h In other words, h maps lots of possible keys to just a (relatively) few possible outputs. Basic Idea Element with the key k is stored in position h(k) of the table Element with the key k is stored in position h(k) of the table

9 9 Hash function int Hash(key) Imagine that we have a magic function that we’ll call “Hash”. It maps the key (student ssnum) of the 1000 records into the integers 0..999, one to one. No two different keys maps to the same number. H(‘001233445’) = 134 H(‘003334333’) = 67 H(‘230056789’) = 764 … H(‘769908080’) = 3

10 10 Hash table : betty : bill : : 90 : 49 : studidnamescore andy81.5 : : david : : : 56.8 : : 003334333 : 990803480 : 230012345 : : 330056789 : 3 67 0 764 999 134 To store a record, we compute Hash(ssnum) for the record and store it at the location Hash(ssnum) of the array. To search for a student, we only need to peek at the location Hash(ssnum).

11 11 Hash table with Perfect Hash Such magic function is called perfect hash add - very fast O(1) add - very fast O(1) delete - very fast O(1) delete - very fast O(1) search - very fast O(1) search - very fast O(1) But it is generally difficult to design perfect hash. (for example, when the potential key space is large) A hash function should try to mix the information in the key and convert it to an index within the range of the hash table (the hash address).

12 12 Hash function A hash function maps a key to an index within a particular range A good hash function: Provides a one-to-one mapping between table locations and keys. Provides a one-to-one mapping between table locations and keys. Is easy and quick to compute. Is easy and quick to compute. Achieves an even distribution of the keys that actually occur over the locations in the table. Achieves an even distribution of the keys that actually occur over the locations in the table. It is often difficult to come up with a good hash function. May require a mathematician or statistical analysis of the expected keys. May require a mathematician or statistical analysis of the expected keys.

13 13 Phone Numbers Consider using a hash function to lookup the name of a faculty member given their BSU phone number How would you do it? How would you do it?

14 14 Common Hash Functions Division (very common): H(key) = key % hashTable.length H(key) = key % hashTable.length Using a prime number for a modulus usually has the effect of spreading the keys quite uniformly Using a prime number for a modulus usually has the effect of spreading the keys quite uniformlyTruncation: Ignore part of the key Ignore part of the key A good example is using the last 4 digits of a SSN A good example is using the last 4 digits of a SSNFolding: Partition the key into several parts and combine the parts in some way Partition the key into several parts and combine the parts in some way

15 15 Collisions Generally speaking, we cannot avoid collisions Collision resolution – what do we do when two different keys map to the same index? H(‘0012345’) = 134 H(‘0033333’) = 67 H(‘0056789’) = 764 … H(‘9903030’) = 3 H(‘9908080’) = 3

16 16 Linear Probing When a collision occurs just look for the next available slot To find an element in a table Hash to its position Hash to its position If it is not there look at successive slots until If it is not there look at successive slots until You find it You hit an empty slot You return to the original slot Removing an element is tricky

17 17 Linear Probing 01234567890123456789

18 18 Linear Probing 10 01234567890123456789 add( 10 )

19 19 Linear Probing 10 23 01234567890123456789 add( 23 )

20 20 Linear Probing 10 23 3 01234567890123456789 COLLISION!! add( 3 )

21 21 Linear Probing 10 23 3 56 01234567890123456789 add( 56 )

22 22 Linear Probing 10 23 3 44 56 01234567890123456789 COLLISION!! add( 44 )

23 23 Linear Probing 10 23 3 44 56 14 01234567890123456789 COLLISION!! add( 14 )

24 24 Linear Probing 10 23 3 44 56 14 93 01234567890123456789 COLLISION!! add( 93 )

25 25 Linear Probing 10 81 23 3 44 56 14 93 01234567890123456789 COLLISION!! add( 81 )

26 26 Linear Probing 10 81 72 23 3 44 56 14 93 01234567890123456789 COLLISION!! add( 72 )

27 27 Linear Probing 10 81 72 23 3 44 56 14 93 100 01234567890123456789 COLLISION!! add( 100 )

28 28 Analysis The major drawback of linear probing is clustering Values tend to clump up in the table Values tend to clump up in the table Thus the sequential searches required to find an element become longer and longer Thus the sequential searches required to find an element become longer and longer Possible solutions Pseudo-random probing Pseudo-random probing Quadratic probing ( h+1, h+4, h+9, …) Quadratic probing ( h+1, h+4, h+9, …) Probe at ( h + i 2 ) mod table.Length Key dependent increments Key dependent increments For example use the first digit as an increment

29 29 Chained Hash Table 2 4 1 0 3 nil 5 : HASHMAX Key: 9903030 name: tom score: 73 One way to handle collision is to store the collided records in a linked list. The array now stores pointers to such lists. If no key maps to a certain hash value, that array entry points to nil.

30 30 Chaining NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 01234567890123456789

31 31 Chaining NULL NULL NULL NULL NULL NULL NULL NULL NULL 01234567890123456789 10 add( 10 )

32 32 Chaining NULL NULL NULL NULL NULL NULL NULL NULL 01234567890123456789 10 23 add( 23 )

33 33 Chaining NULL NULL NULL NULL NULL NULL NULL NULL 01234567890123456789 10 323 add( 3 )

34 34 Chaining NULL NULL NULL NULL NULL NULL NULL 01234567890123456789 10 323 56 add( 56 )

35 35 Chaining NULL NULL NULL NULL NULL NULL 01234567890123456789 10 323 56 44 add( 44 )

36 36 Chaining NULL NULL NULL NULL NULL NULL 01234567890123456789 10 323 56 1444 add( 14 )

37 37 Chaining NULL NULL NULL NULL NULL NULL 01234567890123456789 10 933 56 1444 23 add( 93 )

38 38 Chaining NULL NULL NULL NULL NULL 01234567890123456789 10 933 56 1444 23 81 add( 81 )

39 39 Chaining NULL NULL NULL NULL 01234567890123456789 10 933 56 1444 23 81 72 add( 72 )

40 40 Chaining NULL NULL NULL NULL 01234567890123456789 100 933 56 1444 23 81 72 10 add( 100 )

41 41 Analysis Advantages of chaining Space savings if items are large Space savings if items are large Simple and efficient collision handling Simple and efficient collision handling Deleting items is very easy Deleting items is very easy Disadvantages of chaining Links take up space Links take up space As chains increases in length search time takes longer As chains increases in length search time takes longer

42 42 Chained Hash table Hash table, where collided records are stored in linked list good hash function, appropriate hash size good hash function, appropriate hash size Few collisions. Add, delete, search very fast O(1) otherwise … otherwise … some hash value has a long list of collided records.. add - just insert at the head fast O(1) delete a target - delete from unsorted linked list slow O(n) search - sequential search slow O(n)

43 43

44 44

45 45

46 46

47 47

48 48 hashCode() All Java objects have a method named hashCode() (defined in class Object ) By default hashCode() returns a value based on the address in memory where the object is stored By default hashCode() returns a value based on the address in memory where the object is stored General rules for implementing hashCode() When invoked more than once on the same object it must return the same value each time. When invoked more than once on the same object it must return the same value each time. If o1.equals(o2) then o1.hashCode() must be equal to o2.hashCode(). If o1.equals(o2) then o1.hashCode() must be equal to o2.hashCode(). Note that it is not required that if o1.equals(o2) is false that o1.hashCode() != o2.hashCode()

49 49

50 50

51 51

52 52


Download ppt "1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record."

Similar presentations


Ads by Google