Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hash Tables and Hash Maps. DCS – SWC 2 Hash Tables A Set and a Map are both abstract data types – we need a concrete implemen- tation in order to use.

Similar presentations


Presentation on theme: "Hash Tables and Hash Maps. DCS – SWC 2 Hash Tables A Set and a Map are both abstract data types – we need a concrete implemen- tation in order to use."— Presentation transcript:

1 Hash Tables and Hash Maps

2 DCS – SWC 2 Hash Tables A Set and a Map are both abstract data types – we need a concrete implemen- tation in order to use them In the Java library, two implementations are available: –Sets: HashSet, TreeSet –Maps: HashMap, TreeMap

3 DCS – SWC 3 Hash Tables The implementations HashSet and HashMap are based on a Hash Table A Hash Table is based on the below ideas: –Create an array of length N, which can store objects of some type T –Find a mapping from T to the interval [0; N-1] (a Hash Function f) –Store an object t of type T in the position f(t)

4 DCS – SWC 4 Hash Tables 01234 Car 1 Car 2 Car 3 f(Car 1 ) = 3 f(Car 2 ) = 0 f(Car 3 ) = 2

5 DCS – SWC 5 Hash Tables A Hash Table is thus ”almost” an array Instead of having an index directly available, we must calculate it If calculation can be done in constant time, then all basic operations (insert, delete, lookup) can be done in constant time! Better than tree-based implementations, which have O(log(N))

6 DCS – SWC 6 Hash Tables However, there are some issues: –How do we define a good mapping from the objects to [0; N-1]? –What happens if we try to store two objects at the same position?

7 DCS – SWC 7 Hash Functions Before finding a good mapping – i.e. a good hash function – we must consider the size of the array For good performance, the array should at least be as large as the maximal number of objects stored Rule of thumb is about 30 % larger Size should be a prime number (???)

8 DCS – SWC 8 Hash Functions What if the expected number of objects is unknown in advance? We can expand a hash table dynamically If the hash table in running out of space, double the capacity Start out with a reasonably large array (space is cheap…)

9 DCS – SWC 9 Hash Functions Having handled the choice of N, how do we define a proper hash function? Properties of a hash function: –Must map all objects of type T to the interval [0; N-1] –Should map objects as uniformly as possible to the interval [0; N-1]

10 DCS – SWC 10 Hash Functions We can enforce the mapping to [0;N-1] by using the modulo operator: f(t) = g(t) % N g(t) can then produce any integer value How do we achieve a uniform distribution? Theory for this is complicated, but there are some general rules to follow

11 DCS – SWC 11 Hash Functions A good hash function should be ”almost ran- dom”, but deterministic –”Almost random” – values are well distri- buted in the interval –Deterministic – always produce the same output for the same input

12 DCS – SWC 12 Hash Functions In Java, all objects have a hashCode method –Defined in Object class –Can override in any class definition –Returns an integer (the Hash Code) –We must use modulo on the value ourselves

13 DCS – SWC 13 Hash Functions Hash function for integers: –The number itself… Hash function for strings: final int HASH_MULTIPLIER = 31; int h = 0; for (int i = 0; i < s.length; i++) h = (HASH_MULTIPLIER * h) + s.charAt(i);

14 DCS – SWC 14 Hash Functions Hash code for an object can be calculated by combining hash codes for instance fields Combine values in a way similar to the algorithm used to find string hash codes

15 DCS – SWC 15 Hash Functions public int hashCode() { final int MULTIPLIER = 31; int h1 = regNo.hashCode(); int h2 = mileage; int h3 = model.hashCode(); int h = h1*MULTIPLIER + h2; h = h*MULTIPLIER + h3; return h; }

16 DCS – SWC 16 Hash Functions But wait…what about numeric overflow? We multiply a ”random” integer value with a number…? Does not really matter… As long as the algorithm is deterministic, overflow is not a problem Just helps ”scrambling” the value

17 DCS – SWC 17 Hash Functions Common pitfalls: –Remember to define a hashCode function –If you forget, the hashCode implementation in Object is used –Based solely on memory location of object –Two objects with the same value of instance fields will produce different hash codes…

18 DCS – SWC 18 Hash Functions Common pitfalls: –The hashCode function must be ”compatible” with your equals function –If a.equals(b) it must hold that a.hashCode() == b.hashCode() –If not, duplicates are allowed! –The reverse condition is not required; two different objects may have the same hash code

19 DCS – SWC 19 Hash Functions In general, you must remember to: –Either define the hashCode and the equals method –Or not define any of them!

20 DCS – SWC 20 Handling collisions Even with a good hash function, we will still experience collisions Collision: two different objects t 1 and t 2 have the same hash code We will then try to store both objects in the same position in the array Now what…?

21 DCS – SWC 21 Handling collisions What we store in each position in the array is not the objects themselves, but a linked list of objects Objects with the same hash code h are stored in the linked list in position h With a good hash function, the average length of non-empty lists is less than 2

22 DCS – SWC 22 Handling collisions 01234 Car 1 Car 2 Car 3 Car 4 Car 5 Car 6

23 DCS – SWC 23 Handling collisions Basic operations (insert, delete, lookup) follow this structure: –Calculate hash code for the object –Find the corresponding position in the array Insert: Insert element at the end of list Delete/Lookup: Iterate through list until element is found, or end of list is reached

24 DCS – SWC 24 Handling collisions Basic operations are thus not done in truly constant time However, if a proper hash function is used, running time is constant in practice Use hash-based implementations unless special circumstances apply –Hard to define hash/equals function –More functionality required


Download ppt "Hash Tables and Hash Maps. DCS – SWC 2 Hash Tables A Set and a Map are both abstract data types – we need a concrete implemen- tation in order to use."

Similar presentations


Ads by Google