Hash Tables CSIT 402 Data Structures II. Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Hash Tables.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Lecture 11 oct 6 Goals: hashing hash functions chaining closed hashing application of hashing.
Hashing as a Dictionary Implementation
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Hashing Techniques.
Hashing CS 3358 Data Structures.
CSE 373 Data Structures Lecture 10
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Introduction to Hashing CS 311 Winter, Dictionary Structure A dictionary structure has the form: (Key, Data) Dictionary structures are organized.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
Hashing General idea: Get a large array
Cpt S 223 – Advanced Data Structures Hashing
Aree Teeraparbseree, Ph.D
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
L. Grewe. Computing hash function for a string Horner’s rule: (( … (a 0 x + a 1 ) x + a 2 ) x + … + a n-2 )x + a n-1 ) int hash( const string & key )
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
Spring 2015 Lecture 6: Hash Tables
1 Joe Meehean 1.  BST easy to implement average-case times O(LogN) worst-case times O(N)  AVL Trees harder to implement worst case times O(LogN)  Can.
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
1.  We’ll discuss the hash table ADT which supports only a subset of the operations allowed by binary search trees.  The implementation of hash tables.
DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Hash Tables - Motivation
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Hashing - 2 Designing Hash Tables Sections 5.3, 5.4, 5.4, 5.6.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
H ASH TABLES. H ASHING Key indexed arrays had perfect search performance O(1) But required a dense range of index values Otherwise memory is wasted Hashing.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
Fundamental Structures of Computer Science II
CE 221 Data Structures and Algorithms
Handling Collisions Open Addressing SNSCT-CSE/16IT201-DS.
CSE 2331/5331 Topic 8: Hash Tables CSE 2331/5331.
CS202 - Fundamental Structures of Computer Science II
Collision Handling Collisions occur when different elements are mapped to the same cell.
17CS1102 DATA STRUCTURES © 2018 KLEF – The contents of this presentation are an intellectual and copyrighted property of KL University. ALL RIGHTS RESERVED.
Lecture-Hashing.
CSE 373: Data Structures and Algorithms
Presentation transcript:

Hash Tables CSIT 402 Data Structures II

Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining Open addressing: linear probing, quadratic probing, double hashing Rehashing Load factor

Goal Develop a structure that will allow user to insert/delete/find records in constant average time –structure will be a table (relatively small) –table completely contained in memory –implemented by an array –capitalizes on ability to access any element of the array in constant time

Hash Table Applications Symbol table in compilers Accessing tree or graph nodes by name –E.g., city names in Google maps Maintaining a transposition table in games –Remember previous game situations and the move taken (avoid re- computation) Dictionary lookups –Spelling checkers –Natural language understanding (word sense) Heavily used in text processing languages –E.g., Perl, Python, etc.

Hash Function Assume table (array) size is N Function f(x) maps any key x to an int between 0 and N−1 For example, assume that N=15, that key x is a non-negative int between 0 and MAX_INT, and hash function f(x) = x % 15

Hash Function Thus, since f(x) = x % 15, if x = f(x) = Storing the keys in the array is not a problem. Array: _ _ 47 _ _ _ _ _ _ _

Hash Function What happens when you try to insert: x = 65 ? x = 65 f(x) = 5 Array: _ _ 47 _ _ _ _ _ _ _ 65(?)

Hash Function What happens when you try to insert: x = 65 ? x 65 f(x) 5 Array: (?) This is called a collision.

Handling Collisions Separate Chaining Open Addressing –Linear Probing –Quadratic Probing –Double Hashing

Handling Collisions Separate Chaining

Let each array element be the head of a chain. Array:        35 Where would you store: 29, 16, 14, 99, 127 ?

Separate Chaining Let each array element be the head of a chain: Array:             Where would you store: 29, 16, 14, 99, 127 ? Note that we use the insertAtHead() method when inserting new keys.

Handling Collisions Linear Probing

Let key x be stored in element f(x)=t of the array Array: (?) What do you do in case of a collision? If the hash table is not full, attempt to store key in array elements (t+1)%N, (t+2)%N, (t+3)%N … until you find an empty slot.

Linear Probing Where do you store 65 ? Array:    attempts

Linear Probing If the hash table is not full, attempt to store key in array elements (t+1)%N, (t+2)%N, … Array:  Where would you store: 29, 16, 14, 99, 127 ?

Linear Probing If the hash table is not full, attempt to store key in array elements (t+1)%N, (t+2)%N, … Array:  Where would you store: 16, 14, 99, 127 ?

Linear Probing If the hash table is not full, attempt to store key in array elements (t+1)%N, (t+2)%N, … Array:   attempt Where would you store: 14, 99, 127 ?

Linear Probing If the hash table is not full, attempt to store key in array elements (t+1)%N, (t+2)%N, … Array:     attempts Where would you store: 99, 127 ?

Linear Probing If the hash table is not full, attempt to store key in array elements (t+1)%N, (t+2)%N, … Array:  attempts Where would you store: 127 ?

Linear Probing Leads to problem of clustering. Elements tend to cluster in dense intervals in the array.     

Drawbacks of Linear Probing Works until array is full, but as number of items N approaches TableSize, access time approaches O(N) Very prone to cluster formation (as in our example) –If a key hashes anywhere into a cluster, finding a free cell involves going through the entire cluster – and making it grow! –Primary clustering – clusters grow when keys hash to values close to each other Can have cases where table is empty except for a few clusters –Does not satisfy good hash function criterion of distributing keys uniformly

Handling Collisions Quadratic Probing

Let key x be stored in element f(x)=t of the array Array: (?) What do you do in case of a collision? If the hash table is not full, attempt to store key in array elements (t+1 2 )%N, (t+2 2 )%N, (t+3 2 )%N … until you find an empty slot.

Quadratic Probing Where do you store 65 ? f(65)=t=5 Array:     t t+1 t+4 t+9 attempts

Quadratic Probing If the hash table is not full, attempt to store key in array elements (t+1 2 )%N, (t+2 2 )%N … Array:  t+1 t attempts Where would you store: 29, 16, 14, 99, 127 ?

Quadratic Probing If the hash table is not full, attempt to store key in array elements (t+1 2 )%N, (t+2 2 )%N … Array:  t attempts Where would you store: 16, 14, 99, 127 ?

Quadratic Probing If the hash table is not full, attempt to store key in array elements (t+1 2 )%N, (t+2 2 )%N … Array:    t+1 t+4 t attempts Where would you store: 14, 99, 127 ?

Quadratic Probing If the hash table is not full, attempt to store key in array elements (t+1 2 )%N, (t+2 2 )%N … Array:    t t+1 t+4 attempts Where would you store: 99, 127 ?

Quadratic Probing If the hash table is not full, attempt to store key in array elements (t+1 2 )%N, (t+2 2 )%N … Array:  t attempts Where would you store: 127 ?

Quadratic Probing Tends to distribute keys better Alleviates problem of clustering

Handling Collisions Double Hashing

Let key x be stored in element f(x)=t of the array Array: (?) What do you do in case of a collision? Define a second hash function f 2 (x)=d. Attempt to store key in array elements (t+d)%N, (t+2d)%N, (t+3d)%N … until you find an empty slot.

Double Hashing Collision Resolution Strategy –Apply if hash function produces collision Typical second hash function f 2 (x)=R − ( x % R ) where R is a prime number, R < N

Double Hashing Where do you store 65 ? f(65)=t=5 Collision at 5 means apply double hashing Let f 2 (x)= 11 − (x % 11) f 2 (65)=d=1 Note: R=11, N=15 Attempt to store key in array elements (t+d)%N, (t+2d)%N, (t+3d)%N … Array:    t t=(t+2)%15 attempts t=(t+d)%15 =(t+1)%15

Double Hashing If the hash table is not full, attempt to store key in array elements (t+d)%N, (t+2d)%N … f 2 (x)= 11 − (x % 11) f 2 (29)=d=4 But f(29)= 14 => No collision => No f 2 this time Array:  t attempt Where would you store: 29, 16, 14, 99, 127 ?

Double Hashing If the hash table is not full, attempt to store key in array elements (t+d)%N, (t+2d)%N … Let f 2 (x)= 11 − (x % 11) f 2 (16)=d=6 But f(16)= 1 => No collision Array:  t attempt Where would you store: 16, 14, 99, 127 ?

Double Hashing If the hash table is not full, attempt to store key in array elements (t+d)%N, (t+2d)%N … Let f 2 (x)= 11 − (x % 11) f 2 (14)=d=8 f(14)= 14 => Collision Array:    t+16=(14+16)%15 t+8=(14+8)%15 t attempts Where would you store: 14, 99, 127 ? Initially hashes to 14%15=14

Double Hashing If the hash table is not full, attempt to store key in array elements (t+d)%N, (t+2d)%N … Let f 2 (x)= 11 − (x % 11) f 2 (99)=d=11 Array:     (t+22)%15 (t+11)%15 t (t+33)%15 attempts Where would you store: 99, 127 ? f(99)= 9 => No collision First application of 2ndary hash function

Double Hashing If the hash table is not full, attempt to store key in array elements (t+d)%N, (t+2d)%N … Let f 2 (x)= 11 − (x % 11) f 2 (127)=d=5 Array:    (t+10)%15 t (t+5)%15 attempts Where would you store: 127 ?

Delete Element Collision Ramifications Chaining Linear Probing Quadratic Probing Double Hashing

Chaining/Buckets Deletion from Linked List No Issues Linear Probing Issue: Removal of element within cluster Deletion Issues

[ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number The location where the element was must not be left as an ordinary "empty spot" since that could interfere with searches and deletions (why not insertions?). The location must be marked in some special way so that a search can tell that the spot used to have something in it.

Deletion Issues General Solutions Each slot can be marked as empty, deleted, or occupied Cascade following elements one slot back, according to the collision handling scheme Remove all successor elements and reinsert Second table of deleted items. Commonly used in search engines. Rebuild hash table Move element at end of cluster to fill slot Some of these are not foolproof!

Searching For a Key Elephant in the room If there’s a collision, how do you know when you found your item, since all mapped to the same slot? The key is assumed to be a unique value Keep in mind: The hash function maps sparse data into an array or similar It is applied to the key

Hash Tables in C++ STL STL has hash table containers –hash_set –hash_map

Hash Set in STL #include struct eqstr { bool operator()(const char* s1, const char* s2) const { return strcmp(s1, s2) == 0; } }; void lookup(const hash_set, eqstr>& Set, const char* word) { hash_set, eqstr>::const_iterator it = Set.find(word); cout << word << ": " << (it != Set.end() ? "present" : "not present") << endl; } int main() { hash_set, eqstr> Set; Set.insert("kiwi"); lookup(Set, “kiwi"); } KeyHash fn Key equality test

Hash Map in STL #include struct eqstr { bool operator() (const char* s1, const char* s2) const { return strcmp(s1, s2) == 0; } }; int main() { hash_map, eqstr> months; months["january"] = 31; months["february"] = 28; … months["december"] = 31; cout " << months[“january"] << endl; } KeyDataHash fn Key equality test Internally treated like insert (or overwrite if key already present)

12/26/03Hashing - Lecture 1049 Simple Hashes It's possible to have very simple hash functions if you are certain of your keys For example, –suppose we know that the keys s will be real numbers uniformly distributed over 0  s < 1 –Then a very fast, very good hash function is hash(s) = floor(s·m) where m is the size of the table

12/26/03Hashing - Lecture 1050 Nonnumerical Keys Many hash functions assume that the universe of keys is the natural numbers N={0,1,…} Need to find a function to convert the actual key to a natural number quickly and effectively before or during the hash calculation Generally work with the ASCII character codes when converting strings to numbers

12/26/03Hashing - Lecture 1051 Load Factor of a Hash Table Let N = number of items to be stored Load factor = N/TableSize –TableSize = 101 and N =505, then = 5 –TableSize = 101 and N = 10, then = 0.1 Average length of chained list = and so average time for accessing an item = O(1) + O( ) –Want to be smaller than 1 but close to 1 if good hashing function (i.e. TableSize  N) –With chaining hashing continues to work for > 1

Conclusions Best to choose N=prime number Issue of Load Factor = fraction of hash table occupied –should rehash when load factor between 0.5 and 0.7 Rehashing –approximately double the size of hash table (select N=nearest prime) –redefine hash function(s) –rehash keys into new table