Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hash Tables and Associative Containers CS-212 Dick Steflik.

Similar presentations


Presentation on theme: "Hash Tables and Associative Containers CS-212 Dick Steflik."— Presentation transcript:

1 Hash Tables and Associative Containers CS-212 Dick Steflik

2 Hash Tables a hash table is an array of size Tsize – has index positions 0.. Tsize-1 two types of hash tables – open hash table array element type is a pair all items stored in the array – chained hash table element type is a pointer to a linked list of nodes containing pairs items are stored in the linked list nodes keys are used to generate an array index – home address (0.. Tsize-1)

3 Faster Searching "balanced" search trees guarantee O(log 2 n) search path by controlling height of the search tree – AVL tree – 2-3-4 tree – red-black tree (used by STL associative container classes) hash table allows for O(1) search performance – search time does not increase as n increases

4 Hash Table a hash table is an array/vector (fixed size) – has index positions 0.. Tsize-1 if we could use the keys as an index we would have O(1) retrieval – hashTable[key] keys are used to generate an array index – home address (0.. Tsize-1) – function to do this is called a hash function hash(key) returns an int value hash(key) % Tsize => 0.. Tsize - 1

5 Collisions Collisions occur whenever two keys produce the same index (hash to the same location Design goal: pick a hash function that produces no collisions Away of life with hash tables What do you do? – linear probing: check the next location, if its empty use it – quadratic probing: check next, then 2 away, then 4 away......

6 a Hash Table of size 7 key value empty 01234560123456 some insertions: hash(K1) % 7 => 3 hash(K2) % 7 => 5 hash(K3) % 7 => 2 hash(K4) % 7 => 3 hash(K5) % 7 => 2 hash(K6) % 7 => 4 TTTTTTTTTTTTTT linear probe open addressing collision resolution strategy

7 Search Performance average number of probes needed to retrieve the value with key K? K home address #probes K1 3 K2 5 K3 2 K4 3 K5 2 K6 4 14/6 = 2.33 (successful) 1 1 1 2 5 4 unsuccessful search? 01234560123456 F K3 K3info F K1 K1info F K2 K2info F K4 K4info F K5 K5info F K6 K6info T

8 Chaining with Separate Lists linked lists of synonyms K3 K3info K1 K1info K5 K5info K4 K4info K6 K6info K2 K2info 01234560123456 hash(K1) % 7 => 3 hash(K2) % 7 => 5 hash(K3) % 7 => 2 hash(K4) % 7 => 3 hash(K5) % 7 => 2 hash(K6) % 7 => 4

9 Search Performance average number of probes needed to retrieve the value with key K? K home address #probes K1 3 K2 5 K3 2 K4 3 K5 2 K6 4 8/6 = 1.33 (successful) 01234560123456 K3 K3info K1 K1info K5 K5info K4 K4info K6 K6info K2 K2info 1 1 1 2 2 1 unsuccessful search?

10 Where are Hash Tables used? Databases Spelling checkers Java uses them all over the place (built into the language) most scripting languages (ASP, PERL, PHP) have associative arrays Caching Schemes – software – browsers, http proxy servers, DNS servers – hardware – memory caching, instruction caching

11 Deletions? search for item to be deleted chained hash table – delete a node from a linked list open hash table – just mark spot as "empty"? – must mark vacated spot as “deleted” – is different than “empty”

12 Hash Functions a hash function is used to map a key to an array index (home address) – search starts from here insert, retrieve, delete all start by applying the hash function to the key goals for a hash function – fast to compute – even distribution over the entire collection of keys all hash functions produce collisions – multiple keys hash to same home address

13 Some Hash Functions... Division – works good in most cases as long as keys are relatively random – H(key) = key mod m – if key is an integer identity function ( return key) – good if keys are random – not good if keys have similar characteristics ex m = 25 all keys divisible by 5 would map into positions 0, 5,10,15… causing clustering around those values

14 more Hash functions... Mid-Squared – produces a nearly random distribution of indices – mid-square technique takes longer to compute but gives better distribution when keys may have some digits in common – convert key to an octal string A-Z = 01 8 - 32 8 and 0-9 = 33 8 - 44 8 – ex key = A1 = 134 8 134 8 * 134 8 = 20420 8 using a table of 1024 elements – 001000100010000 2 use middle 10 bits as the index – index = 1000100010 2 = 546 10 – note - most collisions will occur for short identifiers

15 more Hash functions... Digit Folding – assume a 5 digit decimal string (digits 0-9 only) – H(key) = d1 + d2 + d3 + d4 + d5 (sum of digits) this would yield 0 <= h <= 45 for all possible keys if we were to fold the digits in pairs – H(key) = d1d2 + d3d4 + d5 – 0 <= h <= 207 (99 + 99 + 9) Double hashing – use two (or more) hash functions serially – helps overcome effects of a function that produces a poor distribution of keys

16 Clustering Undesireable function of the hash function selected and the collision resolution strategy – too many keys has to the same location causing long string of keys that need to be searched especially bad using a divide based function and using linear probing insertion/deletion/search can approach O(n) Pick a different hash function Pick a different collision resolution strategy

17 Factors Affecting Search Performance quality of hash function – how uniform? – depends on actual data collision resolution strategy used load factor of the HashTable – N/Tsize – the lower the load factor the better the search performance

18 Successful Search Performance open addressing open addressing chaining (linear probing) (double hashing) load factor 0.51.501.39 1.25 0.72.171.72 1.35 0.9 5.502.56 1.45 1.0 ---- ---- 1.50 2.0 ---- ---- 2.00

19 Summary of Hash tables search speed depends on load factor and quality of hash function – should be less than.75 for open addressing – can be more than 1 for chaining items not kept sorted by key very good for fast access to unordered data with known upper bound – to pick a good TSize


Download ppt "Hash Tables and Associative Containers CS-212 Dick Steflik."

Similar presentations


Ads by Google