Hashing as a Dictionary Implementation

Slides:



Advertisements
Similar presentations
Hash Tables CSC220 Winter What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?
Advertisements

CSE 1302 Lecture 23 Hashing and Hash Tables Richard Gesick.
HASH TABLE. HASH TABLE a group of people could be arranged in a database like this: Hashing is the transformation of a string of characters into a.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
Dictionaries and Their Implementations Chapter 18 Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013.
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Hashing Techniques.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter 48 Hashing.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hash Tables1 Part E Hash Tables  
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
Hashing as a Dictionary Implementation Chapter 20 Slides by Steve Armstrong LeTourneau University Longview, TX  2007,  Prentice Hall.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
Hash Tables1   © 2010 Goodrich, Tamassia.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing as a Dictionary Implementation Chapter 19.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Hashing. Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string.
Dictionaries and Their Implementations Chapter 18 Data Structures and Problem Solving with C++: Walls and Mirrors, Frank Carrano, © 2012.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
Sets and Maps Chapter 9.
Chapter 27 Hashing Jung Soo (Sue) Lim Cal State LA.
Hashing CSE 2011 Winter July 2018.
Data Abstraction & Problem Solving with C++
Slides by Steve Armstrong LeTourneau University Longview, TX
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Chapter 28 Hashing.
Hashing as a Dictionary Implementation
Chapter 21 Hashing: Implementing Dictionaries and Sets
Dictionaries and Their Implementations
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
Advanced Implementation of Tables
Sets and Maps Chapter 9.
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Presentation transcript:

Hashing as a Dictionary Implementation Chapter 13

Chapter Contents What is Hashing? Hash Functions Resolving Collisions Computing Hash Codes Compression a Hash Code into an Index for the Hash Table Resolving Collisions Open Addressing with Linear Probing Open Addressing with Quadratic Probing Open Addressing with Double Hashing A Potential Problem with Open Addressing Separate Chaining

Chapter Contents (ctd.) Efficiency The Load Factor The Cost of Open Addressing The Cost of Separate Chaining Rehashing Comparing Schemes for Collision Resolution A Dictionary Implementation that Uses Hashing Entries in the Hash Table Data Fields and Constructors The Methods getValue, remove, and addIterators Java Class Library: the Class HashMap

What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function receives the search key Returns the index of an element in an array called the hash table The index is known as the hash index A perfect hash function maps each search key into a different integer suitable as an index to the hash table

A hash function indexes its hash table. What is Hashing? A hash function indexes its hash table.

What is Hashing? How about a small town only needs 700 telephone numbers, most of the 10,000 hash table would be unused. Want to have a smaller hash table with only 700 entries. Algorithm getHashIndex(phoneNumber) // return an index to an array of tableSize location i = last four digits of phone number return i % tableSize

What is Hashing? Two steps of the hash function Convert the search key into an integer called the hash code Compress the hash code into the range of indices for the hash table Typical hash functions are not perfect They can allow more than one search key to map into a single index This is known as a collision

A collision caused by the hash function h What is Hashing? A collision caused by the hash function h

Hash Functions General characteristics of a good hash function Minimize collisions Distribute entries uniformly throughout the hash table Be fast to compute

Computing Hash Codes We will override the hashCode method of Object Return an int value based on the invoking object’s memory address. Equal but distance object will have different hash code Guidelines If a class overrides the method equals, it should override hashCode If the method equals considers two objects equal, hashCode must return the same value for both objects If an object invokes hashCode more than once during execution of program on the same data, it must return the same hash code

Computing Hash Codes Search keys are often string. The hash code for a string, s. Two typical hash functions: sum the Unicode values for each letter. For example, assign 1 to 26 to “A”~”Z” . See any problem? KSW, WSK A better approach: multiplying each unicode for each letter by a factor based on location Hash code for a primitive type Use the primitive typed key itself. Do Casting if not integer type Contains more than 32 bits, casting will lose first 32 bits. What should we do? Manipulate internal binary representations Combine pieces use folding (int) (key ^ ( key >> 32)) ^ exclusive-or >> shift to the right << shift to the left

Compressing a Hash Code Must compress the hash code so it fits into the index range Typical method for a code c is to compute c modulo n: c % n Index will then be between 0 and n – 1 If n is even, c % n has the same parity as c n is a prime number (the size of the table) The size of a hash table should be a prime number n greater than 2 and is odd. Then you compress a positive hash code c into an index for the table by using c % n, the indices will be distributed uniformly between 0 and n-1

Compressing a Hash Code private int getHashIndex(K key) { int hashIndex = key.hashCode() % hashTable.length; if ( hashIndex < 0 ) hashIndex = hashIndex + hashTable.length; return hashIndex; } One final detail: If c is negative, c % n lies between 1-n and 0. Add n to it so that it lies between 1 and n-1.

Resolving Collisions Options when hash functions returns location already used in the table Use another location in the table Change the structure of the hash table so that each array location can represent multiple values

Open Addressing with Linear Probing Open addressing scheme locates alternate location New location must be open, available Linear probing If collision occurs at hashTable[k], look successively at location k + 1, k + 2, … Examine consecutive locations beginning at the original hash index – to find the next available one.

Open Addressing with Linear Probing Retrievals? ? The effect of linear probing after adding four entries whose search keys hash to the same index.

Open Addressing with Linear Probing A revision of the hash table when linear probing resolves collisions; each entry contains a search key and its associated value

Removals A hash table if remove used null to remove entries. How about if we try to retrieve h(555-2072)?

Removals We need to distinguish among three kinds of locations in the hash table Occupied The location references an entry in the dictionary Empty The location contains null and always did Available The location's entry was removed from the dictionary and is now available for use

Open Addressing with Linear Probing A linear probe sequence (a) after adding an entry; (b) after removing two entries;

Open Addressing with Linear Probing A linear probe sequence (c) after a search; (d) during the search while adding an entry; (e) after an addition to a formerly occupied location.

Searches that Dictionary Operations Require To retrieve an entry Search the probe sequence for the key Examine entries that are present, ignore locations in available state Stop search when key is found or null reached To remove an entry Search the probe sequence same as for retrieval If key is found, mark location as available To add an entry Search probe sequence same as for retrieval Note first available slot Use available slot if the key is not found

Linear probing causes primary clustering Linear probing is apt to cause primary clustering. Each cluster is a group of consecutive and occupied locations in the hash table. During an addition, any collision within a cluster causes the cluster to get larger Avoid primary clustering by using quadratic probing

Open Addressing, Quadratic Probing Change the probe sequence Given search key k Probe to k + 1, k + 22, k + 32, … k + n2 Separate entries in the probe sequence For avoiding primary clustering But can lead to secondary clustering, since entries that collide with an existing entry use the same probe sequence.

Open Addressing, Quadratic Probing A probe sequence of length 5 using quadratic probing. Avoid primary clustering but can lead to secondary clustering

Open Addressing with Double Hashing Resolves collision by examining locations At original hash index Plus an increment determined by 2nd function Second hash function Different from first Depends on search key Returns nonzero value Reaches every location in hash table if table size is prime Avoids both primary and secondary clustering

Open Addressing with Double Hashing h1(key) = key modulo 7; h2(key) = 5- key modulo 5 h1(16) =2; h2(16)= 4; The first three locations in a probe sequence generated by double hashing for the search key.

Potential problem with open address Frequent addition and removals can cause every location in the hash table to reference either a current entry or a former entry. That is no location that contains null. If this happens, our approach to search a probe sequence will not work. Unsuccessful search should end at null, this case it has to search all locations.

Separate Chaining Alter the structure of the hash table Each location can represent multiple values Each location called a bucket Bucket can be a(n) List Sorted list Chain of linked nodes Array Vector

Separate Chaining A hash table for use with separate chaining; each bucket is a chain of linked nodes.

Separate Chaining Where new entry is inserted into linked bucket when integer search keys are (a) duplicate and unsorted;

Separate Chaining Where new entry is inserted into linked bucket when integer search keys are (b) distinct and unsorted;

Separate Chaining Where new entry is inserted into linked bucket when integer search keys are (c) distinct and sorted

A Dictionary Implementation That Uses Hashing A hash table and one of its entry objects

Java Class Library: The Class HashMap Assumes search-key objects belong to a class that overrides methods hashCode and equals Hash table is collection of buckets Constructors public HashMap() public HashMap (int initialSize) public HashMap (int initialSize, float maxLoadFactor) public HashMap (Map table)