H ASH TABLES. H ASHING Key indexed arrays had perfect search performance O(1) But required a dense range of index values Otherwise memory is wasted Hashing.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hash Tables CSC220 Winter What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?
Preliminaries Advantages –Hash tables can insert(), remove(), and find() with complexity close to O(1). –Relatively easy to program Disadvantages –There.
Hash Tables.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CSCE 3400 Data Structures & Algorithm Analysis
Lecture 11 oct 6 Goals: hashing hash functions chaining closed hashing application of hashing.
Hashing as a Dictionary Implementation
Hashing21 Hashing II: The leftovers. hashing22 Hash functions Choice of hash function can be important factor in reducing the likelihood of collisions.
Hashing CS 3358 Data Structures.
1 Hash Tables Gordon College CS Hash Tables Recall order of magnitude of searches –Linear search O(n) –Binary search O(log 2 n) –Balanced binary.
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Hashing. Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Hash Table March COP 3502, UCF.
1 Joe Meehean 1.  BST easy to implement average-case times O(LogN) worst-case times O(N)  AVL Trees harder to implement worst case times O(LogN)  Can.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
1 Hash table. 2 A basic problem We have to store some records and perform the following:  add new record  delete record  search a record by key Find.
Comp 335 File Structures Hashing.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Hash Tables CSIT 402 Data Structures II. Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
COSC 1030 Lecture 10 Hash Table. Topics Table Hash Concept Hash Function Resolve collision Complexity Analysis.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
Hashing (part 2) CSE 2011 Winter March 2018.
Data Structures Using C++ 2E
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Hash Tables.
Data Structures and Algorithms
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hashing Alexandra Stefan.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Pseudorandom number, Universal Hashing, Chaining and Linear-Probing
Data Structures and Algorithm Analysis Hashing
Lecture-Hashing.
Presentation transcript:

H ASH TABLES

H ASHING Key indexed arrays had perfect search performance O(1) But required a dense range of index values Otherwise memory is wasted Hashing allows us to approximate this performance but Allows arbitrary types of keys Map(hash) keys into compact range of index values Items are stored in an array accessed by this index value Allows us to approach the ideal of title[hashfunction(“COMP1927”)] = “Computing 2”;

H ASHING A hash table implementation consists of two main parts: (1) A hash function to map each key to an index in the hash table (array of size N). Key->[0..N-1] (2) A collision resolution so that if hash table at the calculated index is already occupied with an item with a different key, an alternative slot can be found Collisions are inevitable when dom(Key) > N

H ASH FUNCTIONS Requirements: if the table has TableSize entries, we need to hash keys to [0..TableSize-1] the hash function should be cheap to compute the hash function should ideally map the keys evenly to the index values - that is, every index should be generated with approximately the same probability this is easy if the keys have a random distribution, but requires some thought otherwise Simple method to hash keys: modular hash function compute i%TableSize choose TableSize to be prime

H ASHING S TRING K EYS Consider this potential hash function: we can turn a string into an Integer value: int hash (char *v, int TableSize) { int h = 0, i = 0; while (v[i] != ‘\0’) { h = h + v[i]; i++; } return h % TableSize; } int hash (char *v, int TableSize) { int h = 0, i = 0; while (v[i] != ‘\0’) { h = h + v[i]; i++; } return h % TableSize; } What is wrong with this function? How can it be improved?

H ASHING S TRING K EYS A better hash function: int hash (char *v, int TableSize) { int h = 0, i = 0; int a = 127; //prime number while (v[i] != ‘\0’) { h = (a*h + v[i]) % TableSize; i++; } return h; } int hash (char *v, int TableSize) { int h = 0, i = 0; int a = 127; //prime number while (v[i] != ‘\0’) { h = (a*h + v[i]) % TableSize; i++; } return h; }

H ASHING S TRING K EYS Universal hash function for string keys: Uses all of value in hash, with suitable randomization int hashU (char *v, int TableSize) { int h = 0, i = 0; int a = 31415, b = 27183; while (v[i] != ‘\0’) { h = (a*h + v[i]) % TableSize; a = a*b% (TableSize-1); i++; } return h; } int hashU (char *v, int TableSize) { int h = 0, i = 0; int a = 31415, b = 27183; while (v[i] != ‘\0’) { h = (a*h + v[i]) % TableSize; a = a*b% (TableSize-1); i++; } return h; }

C OLLISION R ESOLUTION : S EPARATE C HAINING What do we do if two entries have the same array index? maintain a list of entries per array index (separate chaining) use the next entry in the hash table (linear probing) use a key dependent increment for probing (double hashing)

S EPARATE C HAINING Can be viewed as a generalisation of sequential search Reduces number of comparisons by a factor of TableSize See lecture code for implementation “hi”“ci”“li” “ra” “as” “is” “fr”

S EPARATE C HAINING Cost Analysis: N array entries(slots), M stored items Best case: all lists are the same length M/N Worst case: one list of size M all the rest are size 0 If good hash and M<= N, cost is 1 If good hash and M> N, cost is M/N Ratio of items/slots is called load α = M/N

L INEAR P ROBING Resolve collision in the primary table: if the table is not close to be full, there are many empty slots, even if we have a collision in case of a collision, simply use the next available slot this is an instance of open-addressing hashing

L INEAR P ROBING : D ELETION Need to delete and reinsert all values after the index we delete at, till we reach a slot with no value

L INEAR P ROBING Cost Analysis: Cost to reach location where item is mapped is O(1), but then we may have to scan along to find it in the worst case this could be O(M) affected by the load factor M/N Problems When the table is starting to fill up, we can get clusters Inserting an item with one hash value can increase access time for items with other hash values Linear probing can become slow for near full hash tables

D OUBLE H ASHING To avoid clustering, we use a second hash function to determine a fixed increment to check for empty slots in the table: index determined by first hash function increment determined by second hash function

D OUBLE H ASHING Requirements for second hashing function: must never evaluate to zero increment should be relatively prime to the hash table size This ensures all elements are visited To generate relatively prime set table size to prime e.g. N=127 hash2() in range [1..N1] where N1 < 127 and prime Can be significantly faster than linear probing especially if the table is heavily loaded.

D YNAMIC H ASH T ABLES All the hash table methods we looked at so far have the same problem once the hash table gets full, the search and insertion times increases due to collisions Solution: grow table dynamically this involves copying of table content, amortised over time by reduction of collisions

E VALUATION Choice of the hash function can significantly affect the performance of the implementation, in particular when the hash table starts to fill up Choice of collision methods influences performance as well linear probing (fastest, given table is sufficiently big) double hashing (makes most efficient use of memory, req. 2nd hash function, fastest if table load is higher) separate chaining (easiest to implement. table load can be more than 1 but performance degrades)