Hashing Problem: store and retrieving an item using its key (for example, ID number, name) Linked List takes O(N) time Binary Search Tree take O(logN)

Slides:



Advertisements
Similar presentations
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Advertisements

Hashing / Hash tables Chapter 20 CSCI 3333 Data Structures.
Hashing General idea Hash function Separate Chaining Open Addressing
Lecture 11 oct 6 Goals: hashing hash functions chaining closed hashing application of hashing.
Implementation of Linear Probing (continued) Helping method for locating index: private int findIndex(long key) // return -1 if the item with key 'key'
Hashing CS 3358 Data Structures.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Introduction to Hashing CS 311 Winter, Dictionary Structure A dictionary structure has the form: (Key, Data) Dictionary structures are organized.
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
L. Grewe. Computing hash function for a string Horner’s rule: (( … (a 0 x + a 1 ) x + a 2 ) x + … + a n-2 )x + a n-1 ) int hash( const string & key )
Hashing. Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted.
§3 Separate Chaining ---- keep a list of all keys that hash to the same value struct ListNode; typedef struct ListNode *Position; struct HashTbl; typedef.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
1 Joe Meehean 1.  BST easy to implement average-case times O(LogN) worst-case times O(N)  AVL Trees harder to implement worst case times O(LogN)  Can.
HASHING Section 12.7 (P ). HASHING - have already seen binary and linear search and discussed when they might be useful (based on complexity)
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Hashing CS 202 – Fundamental Structures of Computer Science II Bilkent.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
1.  We’ll discuss the hash table ADT which supports only a subset of the operations allowed by binary search trees.  The implementation of hash tables.
DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Hashing Vishnu Kotrajaras, PhD. What do we want to do? Insert Delete find (constant time) No sorting No Findmin findmax.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hash Tables - Motivation
Hashing - 2 Designing Hash Tables Sections 5.3, 5.4, 5.4, 5.6.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
Lecture 17 April 11, 11 Chapter 5, Hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
CSE 373 Data Structures and Algorithms Lecture 17: Hashing II.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Searching Tables Table: sequence of (key,information) pairs (key,information) pair is a record key uniquely identifies information, so no duplicate records.
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
Hashing Vishnu Kotrajaras, PhD Nattee Niparnan, PhD.
1 Designing Hash Tables Sections 5.3, 5.4, 5.5, 5.6.
DS.H.1 Hashing Chapter 5 Overview The General Idea Hash Functions Separate Chaining Open Addressing Rehashing Extendible Hashing Application Example: Geometric.
Fundamental Structures of Computer Science II
CE 221 Data Structures and Algorithms
Hashing (part 2) CSE 2011 Winter March 2018.
Hashing.
Hashing.
CMSC 341 Hashing.
Hashing CSE 2011 Winter July 2018.
Hashing - resolving collisions
Handling Collisions Open Addressing SNSCT-CSE/16IT201-DS.
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
Lecture 17 April 11, 11 Chapter 5, Hashing dictionary operations
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Hash Tables.
Collision Resolution Neil Tang 02/18/2010
Resolving collisions: Open addressing
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Searching Tables Table: sequence of (key,information) pairs
CS202 - Fundamental Structures of Computer Science II
CMSC 341 Hashing.
Tree traversal preorder, postorder: applies to any kind of tree
EE 312 Software Design and Implementation I
Collision Resolution Neil Tang 02/21/2008
Ch Hash Tables Array or linked list Binary search trees
Collision Handling Collisions occur when different elements are mapped to the same cell.
Hashing Vishnu Kotrajaras, PhD.
Ch. 13 Hash Tables  .
Data Structures and Algorithm Analysis Hashing
Hashing.
EE 312 Software Design and Implementation I
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
CSE 373: Data Structures and Algorithms
Presentation transcript:

Hashing Problem: store and retrieving an item using its key (for example, ID number, name) Linked List takes O(N) time Binary Search Tree take O(logN) time Array List take O(1) time 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

Array 99999 ID: 4112041 Name: Somsri Faculty: Science ID: 4163490 ID: 4112041 Name: Somsri Faculty: Science ID: 4163490 Name: Sompong Faculty: Engineering Problem: a lot of empty space 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

Hashing 9999 ID: 4112041 Name: Somsri Faculty: Science ID: 4163490 9999 ID: 4112041 Name: Somsri Faculty: Science ID: 4163490 Name: Sompong Faculty: Engineering Map the key into some number between 0 to ArraySize-1 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

Hashing Map the key into an array position using a “hash function” ArrayIndex = hash(key) Take O(1) time to access an item Much less empty space than using normal array 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

Hash Function Must return a valid array index. Should be 1-to-1 mapping. If key1 != key2 then hash(key1) != hash(key2) A collision occurs when two distinct keys hash to the same location in the array Should distribute the keys evenly Any key value k is equally likely to hash to any of the m array locations. 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

Simple Hash Function ArrayIndex = key mod TableSize Example: TableSize should be a prime number for even distribution 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

Another Hash Function ArrayIndex = (k0 + 37k1 + 372k2 + . . . ) mod TableSize Example: 3-character key ArrayIndex = (k0 + 37k1 + 372k2) mod TableSize ArrayIndex = k0 + 37 * (k1 + 37 * (k2)) mod TableSize 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

Hash Function } public static int hash( String key, int tableSize ) int hashVal = 0; for( int i = 0; i < key.length( ); i++ ) hashVal = 37 * hashVal + key.charAt( i ); hashVal %= tableSize; if ( hashVal < 0 ) // overflow hashVal += tableSize; return hashVal; } 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

Collision When an element is inserted, if it hashes to the same value as an already inserted element, then we have a collision. Collision resolving techniques Separate Chaining Open Addressing Linear Probling, Quadratic Probling, Double Hashing 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

Separate Chaining 999 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

Separate Chaining Load factor l = number of elements / table size average length of list = l successful search cost 1 + (l/2) link traversals cost depends on l 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

Separate Chaining: evenly distributed 999 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

Separate Chaining: last digit is zero 10 20 999 Solution: TableSize is prime 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

Open Addressing No linked-list. All items are in the array If a collision occurs, alternative locations are tried until an empty cell is found try h0(x), h1(x), h2(x), … hi(x) = (hash(x) + f(i)) mod TableSize f(i) is a collision resolution strategy Require bigger table, l should be below 0.5 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

Linear Probing If a collision occurs, try the next cell sequentially f(i) = i hi(x) = (hash(x) + i) mod TableSize Try hash(x) mod TableSize, (hash(x) + 1) mod TableSize, (hash(x) + 2) mod TableSize, (hash(x) + 3) mod TableSize, . . . 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

Linear Probing Insert: 89, 18, 49, 58, 69 1 2 3 4 5 6 7 8 9 49 58 69 1 2 3 4 5 6 7 8 9 49 58 69 18 89 89 is directly inserted into cell 9 18 is directly inserted into cell 8 49 has a collision at cell 9 and finally put into cell 0 58 has collisions at cell 8, 9, 0 and finally put into cell 1 69 has a collisions at cell 9, 0, 1 and finally put into cell 2 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

Primary Clustering Forming of blocks of occupied cells (called clusters) A collision occurs if a key is hashed into anywhere in a cluster. Then there may be several attempts to resolve the collision before a free space is found. The new data is added into the cluster. 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

Linear Probing Problem: Primary Clustering Normal deletion cannot be performed (some following find operations will fail because the link of collisions that leads to the data is cut) Use lazy deletion Insertion cost = number of probes to find an empty cell = 1/(fraction of empty cells) = 1/(1- l) 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

Quadratic Probing Eliminate primary clustering f(i) = i2 hi(x) = (hash(x) + i2) mod TableSize Try hash(x) mod TableSize, hash(x)+12 mod TableSize, hash(x)+22 mod TableSize, hash(x)+32 mod TableSize, . . . Table must be at most half full and table size must be prime, otherwise insertion may fail (always have a collision) 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

Quadratic Probing Insert: 89, 18, 49, 58, 69 1 2 3 4 5 6 7 8 9 49 58 1 2 3 4 5 6 7 8 9 49 58 18 89 69 Insert 89, try cell 9 Insert 18, try cell 8 Insert 49, try cell 9, 0 Insert 58, try cell 8, 9, 2 Insert 69, try cell 9, 0, 3 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

Quadratic Probing Insert: 10, 20, 30, 40, 50, 60, 70 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 40 Insert 10, try cell 0 Insert 20, try cell 0, 1 Insert 30, try cell 0, 1, 4 Insert 40, try cell 0, 1, 4, 9 Insert 50, try cell 0, 1, 4, 9, 6 (16) Insert 60, try cell 0, 1, 4, 9, 6 (16), 5 (25) Insert 70, try cell 0, 1, 4, 9, 6 (16), 5 (25), 6 (36), 9 (49), 4 (64), 1 (81), 0 (100), 1 (121), 4 (144), 9 (169), 6 (196), . . . 20 30 60 50 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

Quadratic Probing Secondary clustering elements that hash to the same position will probe the same alternative cells and put into the next available space, forming a cluster. In the first example, inserting 89, 49, 69 forms a secondary cluster. Inserting 18, 58 forms another secondary cluster. 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

Double Hashing f(i) = i * hash2(x) hi(x) = (hash(x) + i * hash2(x)) mod TableSize Try hash(x) mod TableSize, (hash(x) + hash2(x)) mod TableSize, (hash(x) + 2*hash2(x)) mod TableSize, . . . Example: hash2(x) = R - (x mod R) R is a prime number smaller than TableSize 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

Double Hashing Insert: 89, 18, 49, 58, 69, 23 hash2(49) = 7-(49 mod 7) = 7 hash2(58) = 7-(58 mod 7) = 5 hash2(69) = 7-(69 mod 7) = 1 hash2(23) = 7-(23 mod 7) = 5 Insert 49, try 9, (9+7) mod 10 = 6 Insert 58, try 8, (8+5) mod 10 = 3 Insert 69, try 9, (9+1) mod 10 = 0 Insert 23, try 3, (3 + 5) mod 10 = 8, (3 + 10) mod 10 = 3, (3+15) mod 10 = 8, . . . 1 2 3 4 5 6 7 8 9 69 18 89 58 49 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

Rehashing When the table is too full, create a new table at least twice as big (and size is prime), compute the new hash value of each element, insert it into the new table. Rehash when the table is half full, or when an insertion fails, or when a certain load factor is reached. Because of lazy deletion, deleted cells are also counted when the load factor is calculated. Rehashing time is O(N). But the cost is shared by preceding N/2 insertions. So, it adds constant cost to each insertion. 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

public interface Hashable { int hash( int tableSize ); } public class MyInteger implements Comparable, Hashable { public int hash( int tableSize ) if ( value < 0 ) return -value % tableSize; else return value % tableSize; } 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

public static void main( String [ ] args ) { SeparateChainingHashTable H = new SeparateChainingHashTable( ); final int NUMS = 4000; final int GAP = 37; for( int i = GAP; i != 0; i = ( i + GAP ) % NUMS ) H.insert( new MyInteger( i ) ); for( int i = 1; i < NUMS; i+= 2 ) H.remove( new MyInteger( i ) ); for( int i = 2; i < NUMS; i+=2 ) if( ((MyInteger)(H.find( new MyInteger( i ) ))). intValue( ) != i ) System.out.println( "Find fails " + i ); } 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

public class SeparateChainingHashTable { private LinkedList[ ] theLists; public SeparateChainingHashTable( ) public SeparateChainingHashTable( int size ) public void insert( Hashable x ) public void remove( Hashable x ) public void find( Hashable x ) public void makeEmpty( ) public static int hash( String key, int tableSize ) private static final int DEFAULT_TABLE_SIZE = 101 private static int nextPrime( int n ) private static boolean isPrime( int n ) } 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

public class SeparateChainingHashTable { public SeparateChainingHashTable( ) this( DEFAULT_TABLE_SIZE ); } public SeparateChainingHashTable( int size ) theLists = new LinkedList[ nextPrime( size ) ]; for( int i = 0; i < theLists.length; i++ ) theLists[ i ] = new LinkedList( ); public void makeEmpty( ) theLists[ i ].makeEmpty( ); 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

public static int hash( String key, int tableSize ) { int hashVal = 0; for( int i = 0; i < key.length( ); i++ ) hashVal = 37 * hashVal + key.charAt( i ); hashVal %= tableSize; if( hashVal < 0 ) hashVal += tableSize; return hashVal; } 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

public void insert( Hashable x ) { LinkedList whichList = theLists[ x.hash( theLists.length ) ]; LinkedListItr itr = whichList.find( x ); if( itr.isPastEnd( ) ) whichList.insert( x, whichList.zeroth( ) ); } public void remove( Hashable x ) theLists[ x.hash( theLists.length ) ].remove( x ); public Hashable find( Hashable x ) return (Hashable)theLists[x.hash(theLists.length)]. find( x ).retrieve( ); 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

public class Employee implement Hashable { public int hash( int tableSize ) { return SeparateChainingHashTable.hash( name, tableSize ); } public boolean equals( Object rhs ) { return name.equals( ((Employee)rhs).name ); } private String name; private double salary; private int seniority; 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

public class QuadraticProbingHashTable { public static final int DEFAULT_TABLE_SIZE = 11; protected HashEntry [ ] array; private int currentSize; public QuadraticProbingHashTable( ) public QuadraticProbingHashTable( int size ) public void makeEmpty( ) public Hashable find ( Hashable x) public void insert( Hashable x ) public void remove( Hashable x ) public static int hash( String key, int tableSize ) } 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

class HashEntry { Hashable element; // the element boolean isActive; // false is deleted public HashEntry( Hashable e ) this( e, true ); } public HashEntry( Hashable e, boolean i ) element = e; isActive = i; 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

public class QuadraticProbingHashTable { public QuadraticProbingHashTable( ) { this( DEFAULT_TABLE_SIZE ); } public QuadraticProbingHashTable( int size ) allocateArray( size ); makeEmpty( ); } public void makeEmpty( ) currentSize = 0; for( int i = 0; i < array.length; i++ ) array[ i ] = null; private void allocateArray( int arraySize ) { array = new HashEntry[ arraySize ]; } 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

public Hashable find( Hashable x ) { int currentPos = findPos( x ); return isActive( currentPos ) ? array[ currentPos ].element : null; } private int findPos( Hashable x ) int collisionNum = 0; int currentPos = x.hash( array.length ); while( array[ currentPos ] != null && !array[ currentPos ].element.equals( x ) ) currentPos += 2 * ++collisionNum - 1; if( currentPos >= array.length ) currentPos -= array.length; return currentPos; 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

private boolean isActive( int currentPos ) { return array[ currentPos ] != null && array[ currentPos ].isActive; } public void insert( Hashable x ) { int currentPos = findPos( x ); if( isActive( currentPos ) ) return; array[ currentPos ] = new HashEntry( x, true ); if( ++currentSize > array.length / 2 ) rehash( ); } public void remove( Hashable x ) if( isActive( currentPos ) ) array[ currentPos ].isActive = false; 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

HashEntry [ ] oldArray = array; private void rehash( ) { HashEntry [ ] oldArray = array; // Create a new double-sized, empty table allocateArray( nextPrime( 2 * oldArray.length ) ); currentSize = 0; // Copy table over for( int i = 0; i < oldArray.length; i++ ) if( oldArray[ i ] != null && oldArray[ i ].isActive ) insert( oldArray[ i ].element ); return; } 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

private static int nextPrime( int n ) { if( n % 2 == 0 ) n++; for( ; !isPrime( n ); n += 2 ) ; return n; } private static boolean isPrime( int n ) if( n == 2 || n == 3 ) return true; if( n == 1 || n % 2 == 0 ) return false; for( int i = 3; i * i <= n; i += 2 ) if( n % i == 0 ) return false; return true; 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

Summary insert and find take constant average time load factor affects performance load factor of separate chaining hashing should be close to 1 load factor of open addressing hashing should not exceed 0.5 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

Summary Hashing is good when ordering information is not required Applications: symbol table on-line spelling checker 2110211 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University