CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing

Can we do better than O(log n) ? We have seen how skip lists and AVL trees can reduce the time to perform operations from O(n) to O(log n) Can we do better? Can we find a structure that will provide O(1) operations? Yes. No. Well, Maybe….

Hash Tables Hash tables are similar to Arrays except… –Elements can be indexed by values other than integers –A single position may hold more than one element Arbitrary values (hash keys) map to integers by means of a hash function Computing a hash function is usually a two-step process: 1.Transform the value (or key) to an integer 2.Map that integer to a valid hash table index Example: storing names –Compute an integer from a name –Map the integer to an index in a table (i.e., a vector, array, etc.)

Hash Tables Say we’re storing names: Angie Joe Abigail Linda Mark Max Robert John Hash Function 0 Angie, Robert 1 Linda 2 Joe, Max, John 3 4 Abigail, Mark

Hash Function: Transforming to an Integer Mapping: Map (a part of) the key into an integer –Example: a letter to its position in the alphabet Folding: key partitioned into parts which are then combined using efficient operations (such as add, multiply, shift, XOR, etc.) –Example: summing the values of each character in a string Shifting: get rid of high- or low-order bits that are not random –Example: if keys are always even, shift off the low order bit Casts: converting a numeric type into an integer –Example: casting a character to an int to get its ASCII value

Hash Function: Combinations Another use for shifting: in combination with folding when the fold operator is commutative: KeyMapped charsFoldedShifted and Folded eat5 + 1 + 202620 + 2 + 20 = 42 ate 1 + 20 + 5264 + 40 + 5 = 49 tea20 + 5 + 12680 + 10 + 1 = 91

Hash Function: Mapping to a Valid Index Almost always use modulus operator (%) with table size: –Example: idx = hash(val) % data.size() Must be sure that the final result is positive. –Use only positive arithmetic or take absolute value –Remember smallest negative number, possibly use longs To get a good distribution of indices, prime numbers make the best table sizes: –Example: if you have 1000 elements, a table size of 997 or 1009 is preferable

Hash Functions: some ideas Here are some typical hash functions: –Character: the char value cast to an int  it’s ASCII value –Date: a value associated with the current time –Double: a value generated by its bitwise representation –Integer: the int value itself –String: a folded sum of the character values –URL: the hash code of the host name

Hash Tables: Collisions Ideally, we want a perfect hash function where each data element hashes to a unique hash index However, unless the data is known in advance, this is usually not possible A collision is when two or more different keys result in the same hash table index

Example, perfect hashing Alfred, Alessia, Amina, Amy, Andy and Anne have a club. Amy needs to store information in a six element array. Amy discovers can convert 3rd letter to index: AlfredF = 5 % 6 = 5 AlessiaE = 4 % 6 = 4 AminaI = 8 % 6 = 2 AmyY = 24 % 6 = 0 AndyD = 3 % 6 = 3 AnneN = 13 % 6 = 1

Indexing is faster than searching Can convert a name (e.g. Alessia) into a number (e.g. 4) in constant time. Even faster than searching. Allows for O(1) time operations. Of course, things get more complicated when the input values change (Alan wants to join the club, since ‘a’ = 0 same as Amy, or worse yet Al who doesn’t have a third letter!)

Hash Tables: Resolving Collisions There are several general approaches to resolving collisions: 1.Open address hashing: if a spot is full, probe for next empty spot 2.Chaining (or buckets): keep a Collection at each table entry 3.caching: save most recently access value, slow search otherwise Today we will examine Open Address Hashing

Open Address Hashing All values are stored in an array. Hash value is used to find initial index to try. If that position is filled, next position is examined, then next, and so on until an empty position is filled The process of looking for an empty position is termed probing, specifically linear probing. There are other probing algorithms, but we won’t consider them.

Example Eight element table using Amy’s hash function. AminaAndyAlessiaAlfredAspen 0-aiqy1-bjrz 2-cks3-dlt4-emu5-fnv6-gpw7-hpq

Now Suppose Anne wants to Join The index position (5) is filled by Alfred. So we probe to find next free location. AminaAndyAlessiaAlfredAnneAspen 0-aiqy1-bjrz 2-cks3-dlt4-emu5-fnv6-gpw7-hpq

Next comes Agnes Her position, 6, is filled by Anne. So we once more probe. When we get to the end of the array, start again at the beginning. Eventually find position 1. AminaAgnesAndyAlessiaAlfredAnneAspen 0-aiqy1-bjrz 2-cks3-dlt4-emu5-fnv6-gpw7-hpq

Finally comes Alan Lastly Alan wants to join. His location, 0, is filled by Amina. Probe finds last free location. Collection is now completely filled. (More on this later) AminaAgnesAlanAndyAlessiaAlfredAnneAspen 0-aiqy1-bjrz 2-cks3-dlt4-emu5-fnv6-gpw7-hpq

Next operation, contains test Hash to find initial index, move forward examining each location until value is found, or empty location is found. Search for Amina, Search for Anne, search for Albert Notice that search time is not uniform AminaAndyAlessiaAlfredAspen 0-aiqy1-bjrz 2-cks3-dlt4-emu5-fnv6-gpw7-hpq

Final Operation: Remove Remove is tricky. Can’t just replace entry with null. What happens if we delete Agnes, then search for Alan? AminaAlanAndyAlessiaAlfredAnneAspen 0-aiqy1-bjrz 2-cks3-dlt4-emu5-fnv6-gpw7-hpq

How to handle remove Simple solution: Just don’t do it. (we will do this one) Better: create a tombstone: –A value that marks a deleted entry –Can be replaced with new entry –But doesn’t halt a search Amina TOMB STONE AlanAndyAlessiaAlfredAnneAspen 0-aiqy1-bjrz 2-cks3-dlt4-emu5-fnv6-gpw7-hpq

Hash Table Size - Load Factor Load factor: = n / m –So, load factor represents the average number of elements at each table entry –For open address hashing, load factor is between 0 and 1 (often somewhere between 0.5 and 0.75) –For chaining, load factor can be greater than 1 Want the load factor to remain small Load factor # of elements Size of table

What to do with a large load factor Common solution: When the load factor becomes too large (say, bigger than 0.75) then reorganize. Create a new table with twice the number of positions Copy each element, rehashing using the new table size, placing elements in new table The delete the old table Exactly like you did with the dynamic array, only this time using hashing.

Hash Tables: Algorithmic Complexity Assumptions: –Time to compute hash function is constant –Worst case analysis  All values hash to same position –Best case analysis  Hash function uniformly distributes the values (all buckets have the same number of objects in them) Find element operation: –Worst case for open addressing  O(n) –Best case for open addressing  O(1)

Hash Tables: Average Case What about average case? Turns out, it is 1/(1- ) So keeping load factor small is very important (1/(1- )) 0.251.3 0.52.0 0.62.5 0.754.0 0.856.6 0.9519.0

Your turn Complete the implementation of the hash table Use hashfun(value) to get hash value Don’t do remove. Do add and contains test first, then do the internal reorganize method

CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

Similar presentations

Presentation on theme: "CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

Similar presentations

Presentation on theme: "CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing."— Presentation transcript:

Similar presentations

About project

Feedback