Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.

Slides:



Advertisements
Similar presentations
Hash Tables CSC220 Winter What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?
Advertisements

Hash Tables.
Hashing.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Nov 12, 2009IAT 8001 Hash Table Bucket Sort. Nov 12, 2009IAT 8002  An array in which items are not stored consecutively - their place of storage is calculated.
Hashing Techniques.
Hashing CS 3358 Data Structures.
Dictionaries and Their Implementations
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
CS Data Structures Chapter 8 Hashing (Concentrating on Static Hashing)
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Hash Table March COP 3502, UCF.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
1.  We’ll discuss the hash table ADT which supports only a subset of the operations allowed by binary search trees.  The implementation of hash tables.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Comp 335 File Structures Hashing.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing Hashing is another method for sorting and searching data.
HASHING PROJECT 1. SEARCHING DATA STRUCTURES Consider a set of data with N data items stored in some data structure We must be able to insert, delete.
Hashing as a Dictionary Implementation Chapter 19.
Hash Tables - Motivation
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
ISOM MIS 215 Module 5 – Binary Trees. ISOM Where are we? 2 Intro to Java, Course Java lang. basics Arrays Introduction NewbieProgrammersDevelopersProfessionalsDesigners.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
1 Resolving Collision Although collisions should be avoided as much as possible, they are inevitable Need a strategy for resolving collisions. We look.
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Data Structures Using C++ 2E
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Advanced Associative Structures
Chapter 21 Hashing: Implementing Dictionaries and Sets
Dictionaries and Their Implementations
Hashing Alexandra Stefan.
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Hashing.
What we learn with pleasure we never forget. Alfred Mercier
Presentation transcript:

Hashing Chapter 20

Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest way of implementation a hash table is an array. Example: Suppose we want to store integers in the range of

Create any array “a” of size with indices in the range of and initialize the array with all zeros. insert (i) -- a[i]++ find (i) -- Is a[i] > 0 ? remove (i) -- if ( a[i] > 0) a[i] --

If our keys are 8-letter alphabetic words, there are 26 8 or about 200 billion possible keys [about 200 ‘gig’ of keys]. Only a small fraction of these keys will actually occur Conceptually, a very large array, with very few cells occupied. We need a better way

Allow many of the different possible keys which can occur to be mapped to the same location under the action of the index function. A hash function takes a key and maps it to some index (possibly a smaller index) in the array. A collision occurs when the hash function maps two actual keys to the same index.

Hash table operations Given a hash function hash(key) which returns an integer: The simplistic approach is as follows –insert(key):A[hash(key)] = object to insert –find(key):is the object at A[hash(key)] ? –remove(key):remove object at A[hash(key)] But what happens on collisions?

Choosing a hash function Desirable properties of a good hash function –quick and easy to compute –uniformly distributes the keys over the range of indices –minimizes collisions

Methods of building a hash function Truncation –ignore part of the key and use the remaining part as the index Folding –Partition the key into several parts and combine these parts to obtain the index Modular arithmetic –Convert the key to an integer and mod by the table size

Example Given 8-digit integers and a table of size 1000 –Truncation e.g. -- use the 4th, 7th and 8th digits to form the index: hash( ) = 394 –Folding e.g. -- break into groups of 3, 3, and 2 digits, add the parts and truncate if necessary: hash( ) = ( ) mod 1000 = 1100 mod 1000 = 100

Example cont’d –Modular arithmetic e.g. Simply mod by the table size: hash( ) = mod 1000 = 194 It seems best to have a table size which is a prime number for modular arithmetic, so a table size of 997 or 1009 would perform a little better. A combination of these techniques may be even better

Collision Resolution Open addressing –The table is an array which holds at most one object per index -- contiguous storage Chaining –The table is an array of chains, all elements on a chain have the same index [these chains are sometimes called “buckets”] -- dynamic storage

Open Addressing Linear probing –This is the simplest method of collision resolution –Start with the hash index and perform a linear search for the desired key or an empty location –The table is considered circular, the search wraps around from the last index to the first

Open Addressing Clustering –The major drawback of linear probing is that when the table becomes about half full, these is a tendency toward clustering –Clustering occurs when records start to appear to as long strings of adjacent positions, which may have several different hash values –Linear searches for empty locations become longer and longer

An Example Insert the items 67, 89, 17, 20, 90, 19 into an empty hash table using an array of size 10 and using the following hash function hash (key) = key mod 10. Use linear probing to handle collisions.

Open Addressing Other techniques of collision resolution –Rehashing use a second hash function to find an alternative position –Quadratic Probing if hash(key) = h, probe at locations h+1, h+4, h+9, h+16, etc. [i.e., locations h+i 2 for i = 1,2,3,4,…] –Random Probing use a seeded pseudo-random number generator to obtain the increment

Open Addressing Deletions –deletions with open addressing is awkward. (why?) –lazy deletion is the preferred means – that is, making items as deleted rather than physically removing them from the table.

Chaining Advantages to linked storage –with a good hash function, the linked lists will be short –clustering is not a problem -- records with different keys are on different chains –The size of the table is of less concern –Deletions are easy and efficient –[The chains could be binary search trees or other structures]

Load Factor The load factor of a hash table is the ratio of the number of items in the table to size of the hash table –n - the number of items in the table –t - the size of the hash table –the load factor  = n/t –  = 0 indicates an empty table –  = 0.5 indicates a table half full

Load Factor In open addressing,  may never exceed 1, and in practice,  > 0.5 will begin to cause clustering problems. In chaining, there is no limit to the size of .

Linear Probing Theorem: The average number of cells examined in an insertion using linear probing is [1 + 1/(1 – k)^2]/2 where k is the load factor. Theorem: The average number of cells examined in a successful search is approximately [1 + 1/(1 – k)]/2 where k is the load factor.

Quadratic Probing Note that in linear probing, each probe tries a different cell. Does quadratic probing guarantees that, when a cell is tried, we have not already tried it during the course of the current access? Does quadratic probing guarantees that, when we are inserting x and the table is not full, x will be inserted?

Quadratic Probing Theorem: If quadratic probing is used and the table size is prime, then a new element can always be inserted if the table is at least half empty. Furthermore, in the course of the insertion, no cell is probed twice.

Hash Table Vs. BST Insert and find operations can be implemented using a BST with average insert/find time of O(logn). However, a BST is generally a more powerful data structure than a hash table as it can easily support routines that require order, for example, finding the smallest/largest element.

Hash Table VS. BST If the input is sorted, a BST will perform poorly. Although balanced trees can be used to avoid the O (n) time insert/find, they are quite expensive to implement. Hence, if no ordering information is required and there is any suspicion that the input might be sorted, hashing is the data structure of choice.

Applications of Hash Tables Hash tables are used in implementing Symbol Tables Game Programs Spelling Checkers HW: Problems on page 710