CSCI 2720 Hashing   Spring 2005.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Singly Linked Lists What is a singly-linked list? Why linked lists?
David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.
© 2004 Goodrich, Tamassia Hash Tables
Hash Tables CSC220 Winter What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?
1 Hash Tables Saurav Karmakar. 2 Motivation What are the dictionary operations? What are the dictionary operations? (1) Insert (1) Insert (2) Delete (2)
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Hash Table.
Briana B. Morrison Adapted from William Collins
Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Hash Tables.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Hash Tables,
Hashing.
Backup Slides. An Example of Hash Function Implementation struct MyStruct { string str; string item; };
CSCE 3400 Data Structures & Algorithm Analysis
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Hash Tables1   © 2010 Goodrich, Tamassia.
Comp 335 File Structures Hashing.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing as a Dictionary Implementation Chapter 19.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
October 6, Algorithms and Data Structures Lecture VII Simonas Šaltenis Aalborg University
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
1 i206: Lecture 12: Hash Tables (Dictionaries); Intro to Recursion Marti Hearst Spring 2012.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
Chapter 11 (Lafore’s Book) Hash Tables Hwajung Lee.
Appendix I Hashing.
Sets and Maps Chapter 9.
Hash Tables.
Hashing CS2110.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Sets and Maps Chapter 9.
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Presentation transcript:

CSCI 2720 Hashing   Spring 2005

Hashing Motivation Techniques Hash functions

Implementing Dynamic Dictionaries Want a data structure in which finds/searches are very fast As close to O(1) as possible minimum number of executed instructions per method Insert and Deletes should be fast too Objects in dictionary have unique keys A key may be a single property/attribute value Or may be created from multiple properties/values

Hash tables vs. Other Data Structures We want to implement the dictionary operations Insert(), Delete() and Search()/Find() efficiently. Arrays: can accomplish in O(1) time but are not space efficient (assumes we leave empty space for keys not currently in dictionary) Binary search trees can accomplish in O(log n) time are space efficient. Hash Tables: A generalization of an array that under some reasonable assumptions is O(1) for Insert/Delete/Search of a key

Array Approach – example A social security application keeping track of people where the primary search key is a person’s social security number (SSN) You can use an array to hold references to all the person objects Use an array with range 0 - 999,999,999 Using the SSN as a key, you have O(1) access to any person object Unfortunately, the number of active keys (Social Security Numbers) is much less than the array size (1 billion entries) Est. US population, Oct. 20th 2004: 294,564,209 Over 60% of the array would be unused

Hash Tables Very useful data structure Example applications: Good for storing and retrieving key-value pairs Not good for iterating through a list of items Example applications: Storing objects according to ID numbers When the ID numbers are widely spread out When you don’t need to access items in ID order

Hash Tables – Conceptual View buckets table obj1 key=15 7 6 5 4 3 2 1 Obj3 key=4 Obj2 key=30 hash value/index Obj4 key=2 Obj5 key=1

Hash Tables U (universe of keys) 1 2 3 4 5 6 7 k1 k2 k3 k4 k6 h (k2)=2 Hash Tables solve these problems by using a much smaller array and mapping keys with a hash function. Let universe of keys U and an array of size m. A hash function h is a function from U to 0…m, that is: h : U 0…m 1 2 3 4 5 6 7 U (universe of keys) k1 k2 k3 k4 k6 h (k2)=2 h (k1)=h (k3)=3 h (k6)=5 h (k4)=7

Hash index/value A hash value or hash index is used to index the hash table (array) A hash function takes a key and returns a hash value/index The hash index is a integer (to index an array) The key is specific value associated with a specific object being stored in the hash table It is important that the key remain constant for the lifetime of the object

Hash Functions & insert(…) Usage summary: int hashValue = hashFunction (int key); Or hashValue = hashFunction (String key); Or hashValue = hashFunction (itemType item); Insert method: public void insert (int key, itemType item) { hashValue = hashFunction (key); table[hashValue] = item; }

Hash Function You want a hash function/algorithm that is: Fast Creates a good distribution of hash values so that the items (based on their keys) are distributed evenly through the array Hash functions can use as input Integer key values String key values Multipart key values Multipart fields, and/or Multiple fields

The mod function Stands for modulo When you divide x by y, you get a result and a remainder Mod is the remainder 8 mod 5 = 3 9 mod 5 = 4 10 mod 5 = 0 15 mod 5 = 0 Thus for key-value mod M, multiples of M give the same result, 0 But multiples of other numbers do not give the same result So what happens when M is a prime number where the keys are not multiples of M?

Hash Tables: Insert Example For example, if we hash keys 0…1000 into a hash table with 5 entries and use h(key) = key mod 5 , we get the following sequence of events: 1 2 3 4 key data Insert 2 2 … 1 2 3 4 key data Insert 21 2 … 21 … 1 2 3 4 key data Insert 34 2 … 21 … 34 … Insert 54 There is a collision at array entry #4 ???

Dealing with Collisions A problem arises when we have two keys that hash in the same array entry – this is called a collision. There are two ways to resolve collision: Hashing with Chaining (a.k.a. “Separate Chaining”): every hash table entry contains a pointer to a linked list of keys that hash in the same entry Hashing with Open Addressing: every hash table entry contains only one key. If a new key hashes to a table entry which is filled, systematically examine other table entries until you find one empty entry to place the new key

Hashing with Chaining 1 2 3 4 Insert 54 21 54 34 1 2 3 4 Insert 101 21 The problem is that keys 34 and 54 hash in the same entry (4). We solve this collision by placing all keys that hash in the same hash table entry in a chain (linked list) or bucket (array) pointed by this entry: 1 2 3 4 other key key data Insert 54 21 54 34 CHAIN 1 2 3 4 Insert 101 21 54 34 101

Hashing with Chaining What is the running time to insert/search/delete? Insert: It takes O(1) time to compute the hash function and insert at head of linked list Search: It is proportional to max linked list length Delete: Same as search Therefore, in the unfortunate event that we have a “bad” hash function all n keys may hash in the same table entry giving an O(n) run-time! So how can we create a “good” hash function?

Choosing a Hash Function – 1 The performance of the hash table depends on having a hash function that evenly distributes the keys: uniform hashing is the ideal target Choosing a good hash function requires taking into account the kind of data that will be used. The statistics of the key distribution needs to be accounted for E.g., Choosing the first letter of a last name will likely cause lots of collisions depending on the nationality of the population Most programming languages (including java) have hash functions built in

Choosing a Hash Function – 2 Division/modulo method key mod m m is the array size; in general, it should be prime number Multiplication method Floor ((key*someFraction mod 1)*arraySize) Where some fraction is typically 0.618 Java Hash Map method Create a “hash” by performing a series of shifts, adds, and xors on the key index = hash mod arraySize

Prime Number Distribution For example, assume Keys (key values) are multiples of 5 5, 10, 15, 20, 25… The keys are evenly distributed 5 to 245 An M (the divisor) of 7 Then, the hash values will be evenly distributed from 0 to 6 for the keys See table  If M was 5, then you would have what kind of distribution? hash value = key mod m (m is typically the table size)

Choosing Hash Function – 3 If keys are non-random – e.g. part numbers Use all data to contribute to the hash function to get a better distribution Consider folding – sum the natural (or arbitrary) groups of digits in key Don’t use redundant or non-data (.e.g. checksum values) Do not use information that might change!  Analyze your expected key values (or some representative subset) to make sure your hash function gives a good distribution!

Hash Tables – Open Addressing obj1 key=15 7 6 5 4 3 2 1 Obj3 key=4 Index=4 Obj2 key=30 Index=4 hash value/index Obj4 key=2 Obj5 key=1

Hashing with Open Addressing So far we have studies hashing with chaining, using a list to store the items that hash to the same location Another option is to store all the items (references to single items) directly in the table. Open addressing collisions are resolved by systematically examining other table indexes, i0 , i1 , i2 , … until an empty slot is located.

Open Addressing The key is first mapped to an array cell using the hash function (e.g. key % array-size) If there is a collision find an available array cell There are different algorithms to find (to probe for) the next array cell Linear Quadratic Double Hashing

Probe Algorithms (Collision Resolution) Linear Probing Choose the next available array cell First try arrayIndex = hash value + 1 Then try arrayIndex = hash value + 2 Be sure to wrap around the end of the array! arrayIndex = (arrayIndex + 1) % arraySize Stop when you have tried all possible array indices If the array is full, you need to throw an exception or, better yet, resize the array Quadratic Probing Variation of linear probing that uses a more complex function to calculate the next cell to try

Double Hashing Apply a second hash function after the first The second hash function, like the first, is dependent on the key Secondary hash function must Be different than the first And, obviously, not generate a zero Good algorithm: arrayIndex = (arrayIndex + stepSize) % arraySize; Where stepSize = constant – (key % constant) And constant is a prime less than the array size

Load Factor Understanding the expected load factor will help you determine the efficiency of you hash table implementation and hash functions Load factor = number of items in hash table / array size For Open Addressing: If < 0.5, wasting space If > 0.8, overflows significant For Chaining: If < 1.0, wasting space If > 2.0, then search time to find a specific item may factor in significantly to the [relative] performance

Open Addressing vs. Separate Chaining When should you be concerned about Open Addressing and Separate Chaining implementations? Note that there are Hash libraries… Java supports Hashtable, HashMap, LinkedHashMap, HashSet,… But, if you are implementing your own hash table consider: Do you know the total number of items to be inserted into the table? Do you have plenty of memory? Do you know the expected load factor?

Hash Tables in Java Java supports a number of hash table classes Hashtable, HashMap, LinkedHashMap, HashSet, … See Sun Java API Documentation http://java.sun.com/j2se/1.4.1/docs/api/ Note that, like Vector and ArrayList, the items that are put into the hash tables are Objects Use Java casting when you remove items! As a programmer, you don’t see the collision detection, chaining, etc. You can set The initial table size The load factor (Default is .75) hashCode() – hash function (also need to override equals()) for the item to be hashed