Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hashtables.

Similar presentations


Presentation on theme: "Hashtables."— Presentation transcript:

1 Hashtables

2 Picture of a hashtable You can think of this as a dictionary – with words and definitions. KEY e.g. student id VALUE e.g. student name 089 JOHN 045 DAVE 939 STEVE

3 A basic problem We have to store some records and perform the following: add new record delete record search a record by key Find a way to do these efficiently!

4 Unsorted array Use an array to store the records, in unsorted order
add - add the records as the last entry fast O(1) delete a target - slow at finding the target, fast at filling the hole (just take the last entry) O(n) search - sequential search slow O(n)

5 Sorted array Use an array to store the records, keeping them in sorted order add - insert the record in proper position. much record movement slow O(n) delete a target - how to handle the hole after deletion? Much record movement slow O(n) search - binary search fast O(log n)

6 Linked list Store the records in a linked list (unsorted)
add - fast if one can insert node anywhere O(1) delete a target - fast at disposing the node, but slow at finding the target O(n) search - sequential search slow O(n) (if we only use linked list, we cannot use binary search even if the list is sorted.)

7 More approaches have better performance but are more complex
Hash table Tree (BST, Heap, …)

8 What is a Hash Table ? The simplest kind of hash table is an array of records. This example has 701 records. This lecture introduces hash tables, which are an array-based method for implementing a Dictionary. You should recall that we have seen dictionaries implemented in other ways, for example with a binary search tree. The abstract properties of a dictionary remain the same: We can insert items in the dictionary, and each item has a key associated with it. When we want to retrieve an item, we specify only the key, and the retrieval process finds the associated data. What we do now is use an array to implement the dictionary. The array is an array of records. In this example, we could store up to 701 records in the array. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] . . . An array of records

9 What is a Hash Table ? [ 4 ] Number Each record has a special field, called its key. In this example, the key is a long integer field called Number. Each record in the array contains two parts. The first part is a number that we'll use for the key of the item. We could use something else for the keys, such as a string. But for a hash table, numbers make the most convenient keys. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] . . .

10 What is a Hash Table ? [ 4 ] Number The number might be a person's identification number, and the rest of the record has information about the person. The numbers might be identification numbers of some sort, and the rest of the record contains information about a person. So the pattern that you see here is the same pattern that you've seen in other dictionaries: Each entry in the dictionary has a key (in this case an identifying number) and some associated data. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] . . .

11 What is a Hash Table ? When a hash table is in use, some spots contain valid records, and other spots are "empty". When a hash table is being used as a dictionary, some of the array locations are in use, and other spots are "empty", waiting for a new entry to come along. Oftentimes, the empty spots are identified by a special key. For example, if all our identification numbers are positive, then we could use 0 as the Number that indicates an empty spot. With this drawing, locations [0], [4], [6], and maybe some others would all have Number=0. . . . [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number Number Number Number

12 Inserting a New Record Number In order to insert a new record, the key must somehow be converted to an array index. The index is called the hash value of the key. In order to insert a new entry, the key of the entry must somehow be converted to an index in the array. For our example, we must convert the key number into an index between 0 and 700. The conversion process is called hashing and the index is called the hash value of the key. . . . [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number Number Number Number

13 Inserting a New Record Typical way create a hash value:
Number Typical way create a hash value: (Number mod 701) There are many ways to create hash values. Here is a typical approach. a. Take the key mod 701 (which could be anywhere from 0 to 700). So, quick, what is (580,625,685 mod 701) ? What is ( mod 701) ? . . . [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number Number Number Number

14 Inserting a New Record 3 Typical way to create a hash value:
Number Typical way to create a hash value: (Number mod 701) Three. 3 What is ( mod 701) ? . . . [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number Number Number Number

15 Inserting a New Record [3]
Number The hash value is used for the location of the new record. [3] So, this new item will be placed at location [3] of the array. . . . [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number Number Number Number

16 Inserting a New Record The hash value is used for the location of the new record. The hash value is always used to find the location for the record. . . . [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number Number Number Number Number

17 Collisions Number Here is another new record to insert, with a hash value of 2. My hash value is [2]. Sometimes, two different records might end up with the same hash value. . . . [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number Number Number Number Number

18 When a collision occurs,
Collisions Number This is called a collision, because there is already another valid record at [2]. When a collision occurs, move forward until you find an empty spot. This is called a collision. When a collision occurs, the insertion process will move forward through the array until an empty spot is found. Sometimes you will have a second collision... . . . [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number Number Number Number Number

19 When a collision occurs,
Collisions Number This is called a collision, because there is already another valid record at [2]. When a collision occurs, move forward until you find an empty spot. ...and a third collision... . . . [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number Number Number Number Number

20 When a collision occurs,
Collisions Number This is called a collision, because there is already another valid record at [2]. When a collision occurs, move forward until you find an empty spot. But if there are any empty spots, eventually you will reach an empty spot, and the new item is inserted here. . . . [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number Number Number Number Number

21 Collisions This is called a collision, because there is already another valid record at [2]. The new record goes in the empty spot. The new record is always placed in the first available empty spot, after the hash value. . . . [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number Number Number Number Number Number

22 Where would you be placed in this table, if there is no collision
Where would you be placed in this table, if there is no collision? Use your national insurance number or some other favorite number. Time for another quiz Did anyone end up with a hash value of 700? Well, I did. My ID number is , which has a hash value of 700. (No, not really, but I needed to illustrate another kind of collision. There is a collision with the last location of the array. In this case, you would circle back to the start of the array, and try location number 0, then 1, .... until you find an empty spot. In this example, I would be inserted at location 0, since that location is empty. . . . [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number Number Number Number Number Number

23 Searching for a Key Number The data that's attached to a key can be found fairly quickly. It is fairly easy to search for a particular item based on its key. . . . [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number Number Number Number Number Number

24 Searching for a Key Calculate the hash value.
Number Calculate the hash value. Check that location of the array for the key. My hash value is [2]. Start by computing the hash value, which is 2 in this case. Then check location 2. If location 2 has a different key than the one you are looking for, then move forward... Not me. . . . [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number Number Number Number Number Number

25 Searching for a Key Number Keep moving forward until you find the key, or you reach an empty spot. My hash value is [2]. ...if the next location is not the one we are looking for, then keep moving forward... Not me. . . . [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number Number Number Number Number Number

26 Searching for a Key Number Keep moving forward until you find the key, or you reach an empty spot. My hash value is [2]. Keep moving forward until you find the sought-after key... Not me. . . . [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number Number Number Number Number Number

27 Searching for a Key Number Keep moving forward until you find the key, or you reach an empty spot. My hash value is [2]. In this case we find the key at location [5]. Yes! . . . [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number Number Number Number Number Number

28 Searching for a Key Number When the item is found, the information can be copied to the necessary location. My hash value is [2]. The data from location [5] can then be copied to to provide the result of the search function. What happens if a search reaches an empty spot? In that case, it can halt and indicate that the key was not in the hash table. Yes! . . . [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number Number Number Number Number Number

29 Deleting a Record Records may also be deleted from a hash table.
Records can be deleted from a hash table... Please delete me. . . . [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number Number Number Number Number Number

30 Deleting a Record Records may also be deleted from a hash table.
But the location must not be left as an ordinary "empty spot" since that could interfere with searches. But the spot of the deleted record cannot be left as an ordinary empty spot, since that would interfere with searches. (Remember that a search can stop when it reaches an empty spot.) . . . [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number Number Number Number Number

31 Deleting a Record Records may also be deleted from a hash table.
But the location must not be left as an ordinary "empty spot" since that could interfere with searches. The location must be marked in some special way so that a search can tell that the spot used to have something in it. Instead we must somehow mark the location as "a location that used to have something here, but no longer does." We might do this by using some other special value for the Number field of the record. In any case, a search can not stop when it reaches "a location that used to have something here". A search can only stop when it reaches a true empty spot. . . . [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] Number Number Number Number Number

32 Array as table 0012345 andy 81.5 0033333 betty 90 0056789 david 56.8
studid name score andy 81.5 betty 90 david 56.8 ... peter 20 mary 100 ... tom 73 bill 49 Consider this problem. We want to store 1,000 student records and search them by student id.

33 Array as table : : : 12345 andy 81.5 : : : 33333 betty 90 : : : 56789
name score : : : One way is to store the records in a huge array (index ). The index is used as the student id, i.e. the record of the student with studid is stored at A[12345] -- Is this a good idea? 12345 andy 81.5 : : : 33333 betty 90 : : : 56789 david 56.8 : : : : : : bill 49 : : : If I have 70 friends, and I want to store their mobile phone numbers, I do not want an array in size. I could use a table about 140 slots in it.

34 Array as table It is also called Direct-address Hash Table.
• Each slot, or position, corresponds to a key in U. • If there’s an element x with key k, then T [k] contains a pointer to x. • Otherwise, T [k] is empty, represented by NIL.

35 Array as table Store the records in a huge array where the index corresponds to the key add - very fast O(1) delete - very fast O(1) search - very fast O(1) But it wastes a lot of memory! Not feasible.

36 function Hash(key: KeyType): integer;
Hash function function Hash(key: KeyType): integer; Imagine that we have such a magic function Hash. It maps the key (studid) of the 1000 records into the integers , one to one. No two different keys maps to the same number. H(‘ ’) = 134 H(‘ ’) = 67 H(‘ ’) = 764 H(‘ ’) = 3

37 Hash Table : : : 9908080 bill 49 : : : 0033333 betty 90 : : : 0012345
name score To store a record, we compute Hash(stud_id) for the record and store it at the location Hash(stud_id) of the array. To search for a student, we only need to peek at the location Hash(target stud_id). : : : 3 bill 49 : : : 67 betty 90 : : : 134 andy 81.5 : : : 764 david 56.8 : : : 999 : : :

38 Hash Table with Perfect Hash
Such magic function is called perfect hash add - very fast O(1) delete - very fast O(1) search - very fast O(1) But it is generally difficult to design perfect hash. (e.g. when the potential key space is large)

39 function Hash(key: KeyType);
Hash function A hash function maps a key to an index within in a range Desirable properties: simple and quick to calculate even distribution, avoid collision as much as possible function Hash(key: KeyType);

40 Division Method h(k) = k mod m Certain values of m may not be good:
When m = 2p then h(k) is the p lower-order bits of the key Good values for m are prime numbers which are not close to exact powers of 2. For example, if you want to store 2000 elements then m=701 (m = hash table length) yields a hash function: h(key) = k mod 701

41 Collision For most cases, we cannot avoid collision
Collision resolution - how to handle when two different keys map to the same index H(‘ ’) = 134 H(‘ ’) = 67 H(‘ ’) = 764 H(‘ ’) = 3 H(‘ ’) = 3

42 Solutions to Collision
The problem arises because we have two keys that hash in the same array entry, a collision. There are two ways to resolve collision: Hashing with Chaining: every hash table entry contains a pointer to a linked list of keys that hash in the same entry Hashing with Open Addressing: every hash table entry contains only one key. If a new key hashes to a table entry which is filled, systematically examine other table entries until you find one empty entry to place the new key

43 Chained Hash Table nil nil nil : nil
One way to handle collision is to store the collided records in a linked list. The array now stores pointers to such lists. If no key maps to a certain hash value, that array entry points to nil. 1 nil 2 nil 3 4 nil 5 : Key: name: tom score: 73 HASHMAX nil Which index has the collisions?

44 Chained Hash Table Put all elements that hash to the same slot into a linked list. • Slot j contains a pointer to the head of the list of all stored elements that hash to j • If there are no such elements, slot j contains NIL.

45 Chained Hash table Hash table, where collided records are stored in linked list good hash function, appropriate hash size Few collisions. Add, delete, search very fast O(1) otherwise… some hash value has a long list of collided records.. add - just insert at the head fast O(1) delete a target - delete from unsorted linked list slow search - sequential search slow O(n) Consider the two extremes.

46 Open Addressing An alternative to chaining for handling collisions.
• Store all keys in the hash table itself. • Each slot contains either a key or NIL. • To search for key k: Compute h(k) and examine slot h(k). Examining a slot is known as a probe. If slot h(k) contains key k, the search is successful. If this slot contains NIL, the search is unsuccessful. There’s a third possibility: slot h(k) contains a key that is not k. We compute the index of some other slot, based on k and on which probe (count from 0: 0th, 1st, 2nd, etc.) we’re on. Keep probing until we either find key k (successful search) or we find a slot holding NIL (unsuccessful search).

47 How to compute probe sequences
Linear probing: Given auxiliary hash function h, the probe sequence starts at slot h(k) and continues sequentially through the table, wrapping after slot m − 1 to slot 0. Given key k and probe number i (0 ≤ i < m), h(k, i ) = (h(k) + i ) mod m. Quadratic probing: As in linear probing, the probe sequence starts at h(k). Unlike linear probing, it examines cells 1,4,9, and so on, away from the original probe point: h(k, i ) = (h(k) + c1i + c2i 2) mod m (if c1=0, c2=1) Double hashing: Use two auxiliary hash functions, h1 and h2. h1 gives the initial probe, and h2 gives the remaining probes: h(k, i ) = (h1(k) + ih2(k)) mod m.

48 Open Addressing Example
Hash( 89, 10) = 9 Hash( 18, 10) = 8 Hash( 49, 10) = 9 Hash( 58, 10) = 8 Hash( 9, 10) = 9

49 Linear Probing: h(k, i ) = (h(k) + i ) mod m.
In linear probing, collisions are resolved by sequentially scanning an array (with wraparound) until an empty cell is found. In following example, table size m = 8, and k: A,P,Q B,O,R C,N,S D,M,T E,L,U F,K,N G,J,W,Z H,I,X,Y h(k): Action Store A Store C Store D Store G Store P Store Q Delete P Delete Q Store B Store R A 1 P B 2 C C C C C C C 3 D D D D D D D D D 4 Q R 5 6 G G G G G G G G # probes 7

50 Choosing a Hash Function
Notice that the insertion of Q required several probes (5). This was caused by A and P mapping to slot 0 which is beside the C and D keys. The performance of the hash table depends on having a hash function which evenly distributes the keys. Choosing a good hash function is a black art.

51 Clustering Even with a good hash function, linear probing has its problems: The position of the initial mapping i 0 of key k is called the home position of k. When several insertions map to the same home position, they end up placed contiguously in the table. This collection of keys with the same home position is called a cluster. As clusters grow, the probability that a key will map to the middle of a cluster increases, increasing the rate of the cluster’s growth. This tendency of linear probing to place items together is known as primary clustering. As these clusters grow, they merge with other clusters forming even bigger clusters which grow even faster.

52 Quadratic Probing Example
Hash( 89, 10) = 9 Hash( 18, 10) = 8 Hash( 49, 10) = 9 Hash( 58, 10) = 8 Hash( 9, 10) = 9

53 Quadratic Probing: h(k, i ) = (h(k) + c1i + c2i 2) mod m
Quadratic probing eliminates the primary clustering problem of linear probing by examining certain cells away from the original probe point. In the following example, table size m = 8, and c1 = 0 , c2 = 1 k: A,P,Q B,O,R C,N,S D,M,T E,L,U F,K,N G,J,W,Z H,I,X,Y h(k): Action Store A Store C Store D Store G Store P Store Q Delete P Delete Q Store B Store R A 1 P B 2 C C C C C C C 3 D D D D D D D D D 4 Q 5 R 6 G G G G G G G G # probes 3(5) 3(4) 3(6) 7

54 Double Hashing Double hashing: Use two auxiliary hash functions, h1 and h2. h1 gives the initial probe, and h2 gives the remaining probes: h(k, i ) = (h1(k) + ih2(k)) mod m. Quadratic probing solves the primary clustering problem, but it has the secondary clustering problem, in which, elements that hash to the same position probe the same alternative cells. Secondary clustering is a minor theoretical blemish. Double hashing is a hashing technique that does not suffer from secondary clustering. A second hash function is used to drive the collision resolution. Limits are left to ponder 


Download ppt "Hashtables."

Similar presentations


Ads by Google