Introduction to Hashing CS 311 Winter, 2013. Dictionary Structure A dictionary structure has the form: (Key, Data) Dictionary structures are organized.

Slides:



Advertisements
Similar presentations
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Advertisements

Preliminaries Advantages –Hash tables can insert(), remove(), and find() with complexity close to O(1). –Relatively easy to program Disadvantages –There.
Hash Tables.
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Log Files. O(n) Data Structure Exercises 16.1.
CSE 250: Data Structures Week 12 March 31 – April 4, 2008.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Hashing. Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Hashing Dr. Yingwu Zhu.
1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.
1 Symbol Tables The symbol table contains information about –variables –functions –class names –type names –temporary variables –etc.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing as a Dictionary Implementation Chapter 19.
Hashing - 2 Designing Hash Tables Sections 5.3, 5.4, 5.4, 5.6.
Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
CSE 373 Data Structures and Algorithms Lecture 17: Hashing II.
Chapter 5: Hashing Collision Resolution: Open Addressing Extendible Hashing Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova,
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining.
Hashing. Search Given: Distinct keys k 1, k 2, …, k n and collection T of n records of the form (k 1, I 1 ), (k 2, I 2 ), …, (k n, I n ) where I j is.
Searching Tables Table: sequence of (key,information) pairs (key,information) pair is a record key uniquely identifies information, so no duplicate records.
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
Chapter 11 (Lafore’s Book) Hash Tables Hwajung Lee.
Fundamental Structures of Computer Science II
Hashing (part 2) CSE 2011 Winter March 2018.
Hashing - resolving collisions
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Quadratic probing Double hashing Removal and open addressing Chaining
Advanced Associative Structures
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
CSE 2331/5331 Topic 8: Hash Tables CSE 2331/5331.
Collision Resolution Neil Tang 02/18/2010
Hashing.
Resolving collisions: Open addressing
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Double hashing Removal (open addressing) Chaining
Hashing Alexandra Stefan.
CS202 - Fundamental Structures of Computer Science II
Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by: Search the hash table in some.
Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by: Search the hash table in some.
Collision Resolution Neil Tang 02/21/2008
Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by: Search the hash table in some.
DATA STRUCTURES-COLLISION TECHNIQUES
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Collision Resolution: Open Addressing Extendible Hashing
CSE 373: Data Structures and Algorithms
Presentation transcript:

Introduction to Hashing CS 311 Winter, 2013

Dictionary Structure A dictionary structure has the form: (Key, Data) Dictionary structures are organized in a manner that optimizes search time for the key. Hashing stores dictionary objects in a table where each location has an address.

Key to Address Hashing is called a Key to Address system because the address of a dictionary object is computed directly from the key using a function called the Hash Function. A good hash function should –Be easy to calculate. –Distribute the objects throughout the table with equal probability. –Minimize collisions.

A Simple Hash Function An example of a simple hash function for a table of size M (locations 0 to M-1) is: int hash( int key ) { return key % M; } With a good hash function, the search time is O ( 1 ).

Collision Resolution A collision occurs when two keys result in the same address. When this happens, we must be able to store the second object in a location that can be quickly found starting from the original hash location. The two basic approaches to collision resolution are called open hashing (or Separate Chaining) and closed hashing (or Open Addressing).

Open Hashing Open Hashing means that collisions are resolved by storing the colliding object in a separate area. In essence, the objects that collide form linked lists, where the head of the list is the original hash location. Thus, the name Separate Chaining. One variation of open hashing is called Bucket Hashing.

Closed Hashing In closed hashing, objects that collide are stored within the hash table itself. This can create an addition problem called a Secondary Collision. Two general methods to resolve collisions in closed hashing are called Probing and Double Hashing.

Probing In probing, the hash function becomes: hash( key ) + p( i ) where i is an iteration value and p(0) = 0. The simplest form of probing is linear probing where p( i ) = i, for i = 0, 1, 2, … A problem with linear probing, however, is that it can cause clustering.

Probing II Another common approach to probing that avoids clustering is called quadratic probing. In quadratic probing p(i) = i 2, for i = 0, 1, 2,… However, if the table is more than half full or if the table size is not a prime number, it is possible that quadratic probing will not find an open slot even when there is one.

Double Hashing A problem with probing is that the probe sequence is the same for all colliding keys. An alternative to probing is double hashing. In this case the hash function is hash 1 ( key ) + i  hash 2 ( key ) If the table size is a prime number M and if R is a prime number less than M, then a good choice for hash 2 is: hash 2 ( key ) = R – ( key % R )

Load Factor The load factor  is defined to be N/M, where N is the number of objects in the table and M is the size of the table. For open hashing, we want the load factor to be close to 1. For closed hashing, we want the load factor to be less than 0.5.

Deletions When deleting an object from a hash table, there are two important considerations. 1.Deleting an object must not hinder later searches. That is, it must not cut off a chain used for probing. 2.A slot freed because of a deleting must remain usable. One solution is to use a tombstone.

Tombstones A tombstone is special marker that states that a slot is free; however, it used to be part of a chain. A search encountering a tombstone keeps going. When inserting and encountering a tombstone, we must continue to the end of the chain before reusing the tombstone to prevent inserting a duplicate value.

Tombstone II Tombstones do lengthen the size of a chain. An alternative to a tombstone is the following. When a value is removed, continue down the chain, swapping the free slot with the next value in the chain. This shortens the chain by one slot and always put the freed slot at the end of the chain.

Rehashing When a table gets too full or when chains get too long, Rehashing creates another table at least twice as big as the original. This also requires a new hash function. Then, starting from slot 0, each value in the original table is hashed (using the new function) into the new table.