1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.

Slides:



Advertisements
Similar presentations
© 2004 Goodrich, Tamassia Hash Tables
Advertisements

Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Hashing as a Dictionary Implementation
© 2004 Goodrich, Tamassia Hash Tables1  
September 26, Algorithms and Data Structures Lecture VI Simonas Šaltenis Nykredit Center for Database Research Aalborg University
Using arrays – Example 2: names as keys How do we map strings to integers? One way is to convert each letter to a number, either by mapping them to 0-25.
Hashing Techniques.
Hashing CS 3358 Data Structures.
Dictionaries and Hash Tables1  
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
1/51 Dictionaries, Tables Hashing TCSS 342 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Introduction to Hashing CS 311 Winter, Dictionary Structure A dictionary structure has the form: (Key, Data) Dictionary structures are organized.
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Hashing General idea: Get a large array
Dictionaries 4/17/2017 3:23 PM Hash Tables  
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Implementing Dictionaries Many applications require a dynamic set that supports dictionary-type operations such as Insert, Delete, and Search. E.g., a.
Hash Tables1   © 2010 Goodrich, Tamassia.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Hashing1 Hashing. hashing2 Observation: We can store a set very easily if we can use its keys as array indices: A: e.g. SEARCH(A,k) return A[k]
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
October 6, Algorithms and Data Structures Lecture VII Simonas Šaltenis Aalborg University
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++,
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Hash Tables 1/28/2018 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Hashing (part 2) CSE 2011 Winter March 2018.
Hashing CSE 2011 Winter July 2018.
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Dictionaries Dictionaries 07/27/16 16:46 07/27/16 16:46 Hash Tables 
© 2013 Goodrich, Tamassia, Goldwasser
Dictionaries 9/14/ :35 AM Hash Tables   4
Hash Tables 3/25/15 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Dictionaries and Hash Tables
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Dictionaries and Hash Tables
Hashing Alexandra Stefan.
Dictionaries 1/17/2019 7:55 AM Hash Tables   4
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Dictionaries Collection of pairs. Operations. (key, element)
Hash Tables Computer Science and Engineering
Hash Tables Computer Science and Engineering
Dictionaries 4/5/2019 1:49 AM Hash Tables  
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Dictionaries and Hash Tables
Presentation transcript:

1 HashTable

2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not be allowed Supports the following fundamental methods –void put(Object key, Object data) Inserts data into the dictionary using the specified key –Object get(Object key) Returns the data associated with the specified key An error occurs if the specified key is not in the dictionary –Object remove(Object key) Removes the data associated with the specified key and returns the data. An error occurs if the specified key is not in the dictionary

3 Abstract Dictionary Example ((5,A), (7,B), (2,C)) or ((5,A), (7,B), (2, Q))C or Qremove(2) ((5,A), (7,B), (2,C), (2, Q))Errorremove(Q) ((5,A), (7,B), (2,C), (2, Q))C or Qget(2) ((5,A), (7,B), (2,C), (2, Q))Noneput(2, Q) ((5,A), (7,B), (2,C))Bget(7) ((5,A), (7,B), (2,C))Errorget(A) ((5,A), (7,B), (2,C))Noneput(2,C) ((5,A), (7,B))Noneput(7, B) ((5,A))Noneput(5, A) DictionaryOutputOperation

4 What is a Hashtable? A hashtable is an unordered dictionary that uses an array to store data –Each data element is associated with a key –Each key is mapped into an array index using a hash function –The key AND the data are then stored in the array Hashtables are commonly used in the construction of compiler symbol tables.

5 Dictionaries AVL Trees vs. Hashtables O(1)O(N)O(Log N) remove O(1)O(N)O(Log N) get O(1)O(N)O(Log N) put Average Astounding! Worst Average Not Bad Worst HashtableAVL Method

6 Simple Example Insert data into the hashtable using characters as keys The hashtable is an array of “items” The hashtables’ capacity is 7 The hash function must take a character as input and convert it into a number between 0 and 6. Use the following hash function: Let P be the position of the character in the English alphabet (starting with 1). The hash function h(K) = P The function must be normalized in order to map into the appropriate range (0-6). The normalized hash function is h(K) %

Example put(B 2, Data 1 ) put(S 19, Data 2 ) put(J 10, Data 3 ) put(N 14, Data 4 ) put(X 24, Data 5 ) put(W 23, Data 6 ) put(B 2, Data 7 ) get(X 24 ) get(W 23 ) (B 2, Data 1 ) (S 19, Data 2 ) (J 10, Data 3 ) (N 14, Data 4 ) (X 24, Data 5 ) ??? This is called a collision Collisions are handled via a “collision resolution policy”

8 From Keys to Indices The mapping of keys to indices of a hash table is called a hash function A hash function is usually the composition of two maps, a hash code map and a compression map. –An essential requirement of the hash function is to map equal keys to equal indices –A “good” hash function minimizes the probability of collisions

9 Popular Hash-Code Maps Integer cast: for numeric types with 32 bits or less, we can reinterpret the bits of the number as an int Component sum: for numeric types with more than 32 bits (e.g., long and double), we can add the 32-bit components. Polynomial accumulation: for strings of a natural language, combine the character values (ASCII or Unicode) a 0 a 1... a n-1 by viewing them as the coefficients of a polynomial: a 0 + a 1 x x n-1 a n-1 -The polynomial is computed with Horner’s rule, ignoring overflows, at a fixed value x: a 0 + x (a 1 +x (a x (a n-2 + x a n-1 )... )) -The choice x = 33, 37, 39, or 41gives at most 6 collisions on a vocabulary of 50,000 English words Why is the component-sum hash code bad for strings?

10 Popular Compression Maps Division: h(k) = |k| mod N –the choice N = 2 k is bad because not all the bits are taken into account –the table size N is usually chosen as a prime number –certain patterns in the hash codes are propagated Multiply, Add, and Divide (MAD): h(k) = |ak + b| mod N

11 Details and Definitions Load factor is the size of the table divided by the capacity of the table Various means of “collision resolution” can be used. The collision resolution policy determines what is done when two keys map to the same array index. –Open Addressing: look for an open slot –Separate Chaining: keep a list of key/value pairs in a slot

12 Example put(B 2, Data 1 ) put(S 19, Data 2 ) put(J 10, Data 3 ) put(N 14, Data 4 ) put(X 24, Data 5 ) put(W 23, Data 6 ) get(X 24 ) get(W 23 ) (B 2, Data 1 ) (S 19, Data 2 ) (J 10, Data 3 ) (N 14, Data 4 ) (X 24, Data 5 ) (W 23, Data 7 ) (X 24, Data 5 ) ??? Open Addressing: When a collision occurs, probe for an empty slot. In this case, use linear probing (looking “down”) until an empty slot is found.

13 Open Addressing Uses a “probe sequence” to look for an empty slot to use The first location examined is the “hash” address The sequence of locations examined when locating data is called the “probe sequence” The probe sequence {s(0), s(1), s(2), … } can be described as follows: s(i) = norm(h(K) + p(i)) –where h(K) is the “hash function” mapping K to an integer –p(i) is a “probing function” returning an offset for the i th probe –norm is the “normalizing function” (usually division modulo capacity)

14 Open Addressing Linear probing –use p(i) = i –The probe sequence becomes {norm(h(k)), norm(h(k)+1), norm(h(k)+2), …} Quadratic probing –use p(i) = i 2 –The probe sequence becomes {norm(h(k)), norm(h(k)+1), norm(h(k)+4),…} –Must be careful to allow full coverage of “empty” array slots –A theorem states that this method will find an empty slot if the table is not more that ½ full.

15 Linear Probing If the current location is used, try the next table location linear_probing_insert(K) if (table is full) error probe = h(K) while (table[probe] occupied) probe = (probe + 1) mod M table[probe] = K Lookups walk along table until the key or an empty slot is found Uses less memory than chaining. (Don’t have to store all those links) Slower than chaining. (May have to walk along table for a long way.) Deletion is more complex. (Either mark the deleted slot or fill in the slot by shifting some elements down.)

16 Linear Probing Example h(k) = k mod 13 Insert keys:

17 Linear Probing Example (cont.)

18 Keys h N  N  0 1 Linear probing h(key)

19 Keys h N  N  0 1 Linear probing (h(key) + 1) mod N

20 Keys h N  N  0 1 Linear probing (h(key) + 2) mod N

21 Keys h N  N  0 1 Linear probing (h(key) + 3) mod N

22 Keys h N  N  0 1 Linear probing (h(key) + 4) mod N

23 Keys h N  N  0 1 Quadratic probing h(key)

24 Keys h N  N  0 1 Quadratic probing (h(key) + 1) mod N

25 Keys h N  N  0 1 Quadratic probing (h(key) + 4) mod N

26 Keys h N  N  0 1 Quadratic probing (h(key) + 9) mod N

27 Keys h Quadratic probing h(key) N = 17 (prime) N  N  0 1 (h(key) + 121) mod N

28 Keys h Quadratic probing h(key) N = 17 (prime) N  N  0 1 (h(key) + 144) mod N

29 Quadratic probing h(key) N = 17 (prime) N  N  0 1 Theorem: If quadratic probing is used, and the table size is prime, then a new element can always be inserted if the table is at least half empty.

30 Quadratic probing h(key) N  N  0 1 Application: Probing visited only 9 of the 17 bins, but if the table is half empty, not all those 9 bins can be occupied, so we must be able to insert a new element in one of them. Theorem: If quadratic probing is used, and the table size is prime, then a new element can always be inserted if the table is at least half empty. N = 17 (prime)

31 Collisions Given N people in a room, what are the odds that at least two of them will have the same birthday? Table capacity of 365 After N insertions what are the odds of at least one collision? Who wants to be a Millionaire? Assume N = 23 (load factor is therefore 23/365 = 6.3%). What are the approximate odds that two of these people have the same birthday? 10%75% 25%90% 50%99%

32 Collisions Let Q(n) be the probability that when n people are in a room, nobody has the same birthday. Let P(n) be the probability that when n people are in a room, at least two of them have the same birthday. P(n) = 1 – Q(n) Consider that: Q(1) = 1 Q(2) = Odds that Q(1) don’t collide times the odds of one more person not “colliding” Q(2) = Q(1) * 364/365 Q(3) = Q(2) * 363/365 Q(4) = Q(3) * 362/365 … Q(n) = (365/365) * (364/365) * (363/365) * … * ((365-n+1)/365) Q(n) = 365! / (365 n * (365-n)!)

33 Collisions Number of people Odds of a collision Odds of Collision N % % % % % % %10 2.7%5 Collisions are more frequent than you might expect, even for low load factors!

34 Hashcodes and table size Hashcodes should be fast/easy to compute Keys should evenly distribute across the table Hashtable capacities are usually kept at prime-values to avoid problems with probe sequences –Consider inserting into the table below using quadratic probing and a key object that hashes to index

35 We need to have a little talk How to remove an item from a hashtable that uses open addressing? Consider a table of size 11 with the following sequence of operations using h(k) = K%11 and linear probing (p(i) = i) –put(36, D1) –put(23, D2) –put(4, D3) –put(46, D4) –put(1, D5) –remove(23) –remove(36) –get(1)

36 Removal If an item is removed from the table, it could mess up gets on other items in the table. Fix the problem by using a “tombstone” marker to indicate that while the item has been removed from the array slot, the slot should be considered “occupied” for purposes of later gets.

37 Double Hashing Another probing strategy is to use “double hashing” The probe sequence becomes s(k,i) = norm(h(k) + i*h2(k)) The hash value is determined by “two” hash functions and is typically better than linear or quadratic probing.

38 Double Hashing Example h 1 (K) = K mod 13 h 2 (K) = 8 - K mod 8 we want h 2 to be an offset to add

39 Double Hashing Example (cont.)

40 Separate Chaining A way to “avoid” collisions Each array slot contains a list of data elements The fundamental methods then become: –PUT: hash into array and add to list –GET: hash into array and search the list –REMOVE: hash into array and remove from list The built-in HashMap and Hashtable classes use separate chaining

Chaining Example put(B 2, Data 1 ) put(S 19, Data 2 ) put(J 10, Data 3 ) put(N 14, Data 4 ) put(X 24, Data 5 ) put(W 23, Data 6 ) put(B 2, Data 7 ) get(X 24 ) get(W 23 ) (B 2, Data 1 ) (S 19, Data 2 ) (J 10, Data 3 ) (N 14, Data 4 ) (X 24, Data 5 ) ???

Chaining Example put(B 2, Data 1 ) put(S 19, Data 2 ) put(J 10, Data 3 ) put(N 14, Data 4 ) put(X 24, Data 5 ) put(W 23, Data 6 ) put(B 2, Data 7 ) get(X 24 ) get(W 23 ) (B 2, Data 1 ) (S 19, Data 2 ) (J 10, Data 3 ) (N 14, Data 4 ) I’m so relieved! (X 24, Data 5 )

43 Theoretical Results Let  = N/M the load factor: average number of keys per array index Analysis is probabilistic, rather than worst-case Expected Number of Probes foundNot found

44 Expected Number of Probes vs. Load Factor

45 Summary Dictionaries may be ordered or unordered –Unordered can be implemented with lists (array-based or linked) hashtables (best solution) –Ordered can be implemented with lists (array-based or linked) trees (avl (best solution), splay, bst)