CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

Slides:



Advertisements
Similar presentations
Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Advertisements

Hash Tables.
Hashing.
Space-for-Time Tradeoffs
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CSCE 3400 Data Structures & Algorithm Analysis
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Hashing Techniques.
CS 261 – Data Structures Hash Tables Part II: Using Buckets.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Tirgul 9 Hash Tables (continued) Reminder Examples.
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Tirgul 8 Hash Tables (continued) Reminder Examples.
Hashing General idea: Get a large array
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
CS2110 Recitation Week 8. Hashing Hashing: An implementation of a set. It provides O(1) expected time for set operations Set operations Make the set empty.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
COSC 2007 Data Structures II
Hash Table March COP 3502, UCF.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
CS261 Data Structures Hash Tables Concepts. Goals Hash Functions Dealing with Collisions.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Comp 335 File Structures Hashing.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
CS201: Data Structures and Discrete Mathematics I Hash Table.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
CS261 Data Structures Hash Tables Open Address Hashing.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Copyright © Curt Hill Hashing A quick lookup strategy.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++,
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
CS 261 – Data Structures Hash Tables Part II: Using Buckets.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
CSC 212 – Data Structures Lecture 28: More Hash and Dictionaries.
Hash Tables Part II: Using Buckets
Advanced Associative Structures
Hash Table.
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
Hash Tables Buckets/Chaining
Hash Tables Open Address Hashing
Presentation transcript:

CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing

Can we do better than O(log n) ? We have seen how skip lists and AVL trees can reduce the time to perform operations from O(n) to O(log n) Can we do better? Can we find a structure that will provide O(1) operations? Yes. No. Well, Maybe….

Hash Tables Hash tables are similar to Arrays except… –Elements can be indexed by values other than integers –A single position may hold more than one element Arbitrary values (hash keys) map to integers by means of a hash function Computing a hash function is usually a two-step process: 1.Transform the value (or key) to an integer 2.Map that integer to a valid hash table index Example: storing names –Compute an integer from a name –Map the integer to an index in a table (i.e., a vector, array, etc.)

Hash Tables Say we’re storing names: Angie Joe Abigail Linda Mark Max Robert John Hash Function 0 Angie, Robert 1 Linda 2 Joe, Max, John 3 4 Abigail, Mark

Hash Function: Transforming to an Integer Mapping: Map (a part of) the key into an integer –Example: a letter to its position in the alphabet Folding: key partitioned into parts which are then combined using efficient operations (such as add, multiply, shift, XOR, etc.) –Example: summing the values of each character in a string Shifting: get rid of high- or low-order bits that are not random –Example: if keys are always even, shift off the low order bit Casts: converting a numeric type into an integer –Example: casting a character to an int to get its ASCII value

Hash Function: Combinations Another use for shifting: in combination with folding when the fold operator is commutative: KeyMapped charsFoldedShifted and Folded eat = 42 ate = 49 tea = 91

Hash Function: Mapping to a Valid Index Almost always use modulus operator (%) with table size: –Example: idx = hash(val) % data.size() Must be sure that the final result is positive. –Use only positive arithmetic or take absolute value –Remember smallest negative number, possibly use longs To get a good distribution of indices, prime numbers make the best table sizes: –Example: if you have 1000 elements, a table size of 997 or 1009 is preferable

Hash Functions: some ideas Here are some typical hash functions: –Character: the char value cast to an int  it’s ASCII value –Date: a value associated with the current time –Double: a value generated by its bitwise representation –Integer: the int value itself –String: a folded sum of the character values –URL: the hash code of the host name

Hash Tables: Collisions Ideally, we want a perfect hash function where each data element hashes to a unique hash index However, unless the data is known in advance, this is usually not possible A collision is when two or more different keys result in the same hash table index

Example, perfect hashing Alfred, Alessia, Amina, Amy, Andy and Anne have a club. Amy needs to store information in a six element array. Amy discovers can convert 3rd letter to index: AlfredF = 5 % 6 = 5 AlessiaE = 4 % 6 = 4 AminaI = 8 % 6 = 2 AmyY = 24 % 6 = 0 AndyD = 3 % 6 = 3 AnneN = 13 % 6 = 1

Indexing is faster than searching Can convert a name (e.g. Alessia) into a number (e.g. 4) in constant time. Even faster than searching. Allows for O(1) time operations. Of course, things get more complicated when the input values change (Alan wants to join the club, since ‘a’ = 0 same as Amy, or worse yet Al who doesn’t have a third letter!)

Hash Tables: Resolving Collisions There are several general approaches to resolving collisions: 1.Open address hashing: if a spot is full, probe for next empty spot 2.Chaining (or buckets): keep a Collection at each table entry 3.caching: save most recently access value, slow search otherwise Today we will examine Open Address Hashing

Open Address Hashing All values are stored in an array. Hash value is used to find initial index to try. If that position is filled, next position is examined, then next, and so on until an empty position is filled The process of looking for an empty position is termed probing, specifically linear probing. There are other probing algorithms, but we won’t consider them.

Example Eight element table using Amy’s hash function. AminaAndyAlessiaAlfredAspen 0-aiqy1-bjrz 2-cks3-dlt4-emu5-fnv6-gpw7-hpq

Now Suppose Anne wants to Join The index position (5) is filled by Alfred. So we probe to find next free location. AminaAndyAlessiaAlfredAnneAspen 0-aiqy1-bjrz 2-cks3-dlt4-emu5-fnv6-gpw7-hpq

Next comes Agnes Her position, 6, is filled by Anne. So we once more probe. When we get to the end of the array, start again at the beginning. Eventually find position 1. AminaAgnesAndyAlessiaAlfredAnneAspen 0-aiqy1-bjrz 2-cks3-dlt4-emu5-fnv6-gpw7-hpq

Finally comes Alan Lastly Alan wants to join. His location, 0, is filled by Amina. Probe finds last free location. Collection is now completely filled. (More on this later) AminaAgnesAlanAndyAlessiaAlfredAnneAspen 0-aiqy1-bjrz 2-cks3-dlt4-emu5-fnv6-gpw7-hpq

Next operation, contains test Hash to find initial index, move forward examining each location until value is found, or empty location is found. Search for Amina, Search for Anne, search for Albert Notice that search time is not uniform AminaAndyAlessiaAlfredAspen 0-aiqy1-bjrz 2-cks3-dlt4-emu5-fnv6-gpw7-hpq

Final Operation: Remove Remove is tricky. Can’t just replace entry with null. What happens if we delete Agnes, then search for Alan? AminaAlanAndyAlessiaAlfredAnneAspen 0-aiqy1-bjrz 2-cks3-dlt4-emu5-fnv6-gpw7-hpq

How to handle remove Simple solution: Just don’t do it. (we will do this one) Better: create a tombstone: –A value that marks a deleted entry –Can be replaced with new entry –But doesn’t halt a search Amina TOMB STONE AlanAndyAlessiaAlfredAnneAspen 0-aiqy1-bjrz 2-cks3-dlt4-emu5-fnv6-gpw7-hpq

Hash Table Size - Load Factor Load factor: = n / m –So, load factor represents the average number of elements at each table entry –For open address hashing, load factor is between 0 and 1 (often somewhere between 0.5 and 0.75) –For chaining, load factor can be greater than 1 Want the load factor to remain small Load factor # of elements Size of table

What to do with a large load factor Common solution: When the load factor becomes too large (say, bigger than 0.75) then reorganize. Create a new table with twice the number of positions Copy each element, rehashing using the new table size, placing elements in new table The delete the old table Exactly like you did with the dynamic array, only this time using hashing.

Hash Tables: Algorithmic Complexity Assumptions: –Time to compute hash function is constant –Worst case analysis  All values hash to same position –Best case analysis  Hash function uniformly distributes the values (all buckets have the same number of objects in them) Find element operation: –Worst case for open addressing  O(n) –Best case for open addressing  O(1)

Hash Tables: Average Case What about average case? Turns out, it is 1/(1- ) So keeping load factor small is very important (1/(1- ))

Your turn Complete the implementation of the hash table Use hashfun(value) to get hash value Don’t do remove. Do add and contains test first, then do the internal reorganize method