CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.

Slides:



Advertisements
Similar presentations
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Advertisements

Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Hash Tables.
CS 206 Introduction to Computer Science II 04 / 10 / 2009 Instructor: Michael Eckmann.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Using arrays – Example 2: names as keys How do we map strings to integers? One way is to convert each letter to a number, either by mapping them to 0-25.
Log Files. O(n) Data Structure Exercises 16.1.
Maps, Dictionaries, Hashtables
CSE 250: Data Structures Week 12 March 31 – April 4, 2008.
CS 206 Introduction to Computer Science II 11 / 19 / 2008 Instructor: Michael Eckmann.
Hash Tables and Associative Containers CS-212 Dick Steflik.
CS 206 Introduction to Computer Science II 10 / 20 / 2008 Instructor: Michael Eckmann.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
CS 206 Introduction to Computer Science II 10 / 26 / 2009 Instructor: Michael Eckmann.
CS2420: Lecture 33 Vladimir Kulyukin Computer Science Department Utah State University.
Introduction to Hashing CS 311 Winter, Dictionary Structure A dictionary structure has the form: (Key, Data) Dictionary structures are organized.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Hashing General idea: Get a large array
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Hashing. Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Hashing CS 202 – Fundamental Structures of Computer Science II Bilkent.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 5. Abstract Data Structures & Algorithms 5.2 Static Data Structures.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
LECTURE 35: COLLISIONS CSC 212 – Data Structures.
CS 206 Introduction to Computer Science II 11 / 16 / 2009 Instructor: Michael Eckmann.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hash Tables CSIT 402 Data Structures II. Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Chapter 11 Hash Anshuman Razdan Div of Computing Studies
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
CS261 Data Structures Hash Tables Open Address Hashing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
1 What is it? A side order for your eggs? A form of narcotic intake? A combination of the two?
Fundamental Structures of Computer Science II
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Data Structures and Algorithms
Data Structures and Algorithms
Hashing Alexandra Stefan.
CS202 - Fundamental Structures of Computer Science II
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Data Structures and Algorithm Analysis Hashing
Presentation transcript:

CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS Spring 2009 Today’s Topics Questions? Comments? Hash tables –Better hash functions –Linear probing –Quadratic probing –Double hashing –Separate chaining

Last time we created a hash table to store Strings –We used open addressing hashing –With a fairly simple hash function. We got insertion and retrieval implemented well. But removal was problematic because of the open addressing hash scheme. Let's ignore the removal for now and focus on a better hash function. Java already implements a hash function for Strings: Suppose s is a string. Let n=s.length(). hashVal = s.charAt(0)*31^(n-1) + s.charAt(1)*31^(n-2) s.charAt(n - 2)*31 + s.charAt(n – 1)‏ Let's try to implement this now and compare to our other hash function. Hashes

Removal was problematic because of the open addressing hash scheme. Would it be any easier to implement removal with a separate chaining hash method? Hashes

Removal was problematic because of the open addressing hash scheme. Would it be any easier to implement removal with a separate chaining hash method? –Yes! Hashes

Strategies for best performance –want items to be distributed evenly (uniformly) throughout the hash table and we want few (or no) collisions so that depends on our data items, our choice of hash function and our size of the hash table –also need to decide whether to use a probing (linear or quadratic) hash table == open addressing or to use a hash table where collisions are handled by adding the item to a list for the index (hash value)‏ == separate chaining another method is called double hashing. –if choices are done well we get the retrieval time to be a constant, but the worst case is O(n)‏ –we also need to consider the computations needed to insert (computing the hash value)‏--- while this is a constant, a poor computation choice could make it a large constant (which we also wish to avoid)‏ Hashes

Clustering –Why is it bad? –How to avoid it? Quadratic probing (avoids it to some degree)‏ Double Hashing can avoid it Hashes

Is it possible to never find a spot in the table to place an item which uses linear probing? When? Hashes

Is it possible to never find a spot in the table to place an item which uses linear probing? When? - Yes when it is full. With a poor choice of table size, and quadratic probing, it is also possible to end up never finding a spot in the table to place an item. - but this could occur even before it is full. For quadratic probing, if the table size is prime, then a new element can always be inserted if the table is at least half empty. Hashes

Double hashing is another open address hashing method. It uses just an array of the items to represent the hash table. - Create a hash table as an array of items - Create a hash function hash1 that will return a hash value that we will call hv1 to place the item in the table. - If a collision occurs, call a second hash function hash2 which returns an int that we will call hv2. Use this hv2 as the amount of indices to hop to find another place to put the item. Hashes

Example: if an item initially hashes to value hv1=5 and this causes a collision, then using the same item we compute hv2 to be 7 using the second hash function. We would then check if there's an open slot in 5+7=12. If it's open, place the item there. If that's not open, look in 12+7=19, and if that's not open continue checking 7 slots away until we find an open slot. Wrap around when necessary (using mod the size of table). Remember the goals of hashing: - we want the data to be distributed well throughout the hash table - we want few collisions in the average case and the worst case Hashes

Another thing that must be taken care of is a situation such as this. Suppose we have a hash table of size 20 and all of the even numbered slots are filled (that is, index 0, 2,... 18) and we are trying to insert an item with hv1 of 4. This is a collision. So if we compute hv2 to be say 6, Hashes

( 4+6)%20=10 is filled, (10+6)%20=16 is filled, (16+6)%20= 2 is filled, ( 2+6)%20= 8 is filled, ( 8+6)%20=14 is filled, (14+6)%20= 0 is filled, ( 0+6)%20= 6 is filled, ( 6+6)%20=12 is filled, (12+6)%20=18 is filled, (18+6)%20= 4 is filled, ( 4+6)%20=10 (we're back where we started...)‏ Hashes

and this would go on forever, because we came back to the starting index without examining all the slots. Actually in this case we examined only the filled slots, and ignored all the empty slots!! Hashes

Double hashing with a table size that is relatively prime with respect to the value (hv2) returned by hash2 will guarantee that if there's an empty slot, we will eventually find it. Numbers that are relatively prime are those which have no common divisors besides 1. You could have a table_size which is an integer p, such that p and p-2 are prime (twin primes). Then the procedure is to compute hv1 to be some int within the range 0 to table_size-1, inclusive. Then compute hv2 to be some int within the range 1 to table_size-3. This will guarantee that if there's an empty slot, we will find it. This method was devised by Donald Knuth. Hashes