E.G.M. PetrakisHashing1  Data organization in main memory or disk  sequential, binary trees, …  The location of a key depends on other keys => unnecessary.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hash Tables CSC220 Winter What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?
Hash Tables.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CSCE 3400 Data Structures & Algorithm Analysis
Hashing Techniques.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hashing (Ch. 14) Goal: to implement a symbol table or dictionary (insert, delete, search)  What if you don’t need ordered keys--pred, succ, sort, select?
Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Design and Analysis of Algorithms - Chapter 71 Hashing b A very efficient method for implementing a dictionary, i.e., a set with the operations: – insert.
Introduction to Hashing CS 311 Winter, Dictionary Structure A dictionary structure has the form: (Key, Data) Dictionary structures are organized.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
E.G.M. Petrakissearching1 Searching  Find an element in a collection in the main memory or on the disk  collection: (K 1,I 1 ),(K 2,I 2 )…(K N,I N )
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
Hashing General idea: Get a large array
E.G.M. PetrakisHashing1 Hashing on the Disk  Keys are stored in “disk pages” (“buckets”)  several records fit within one page  Retrieval:  find address.
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Data Structures Hashing Uri Zwick January 2014.
Spring 2015 Lecture 6: Hash Tables
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.
Chapter 5: Hashing Collision Resolution: Separate Chaining Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova, Simpson College.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Comp 335 File Structures Hashing.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
Data Structures and Algorithms Lecture (Searching) Instructor: Quratulain Date: 4 and 8 December, 2009 Faculty of Computer Science, IBA.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Tables. 2 Exercise 2 /* Exercise 1 */ void mystery(int n) { int i, j, k; for (i = 1; i
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Chapter 5: Hashing Collision Resolution: Open Addressing Extendible Hashing Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova,
Introduction to Algorithms 6.046J/18.401J LECTURE7 Hashing I Direct-access tables Resolving collisions by chaining Choosing hash functions Open addressing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining.
Hashing. Search Given: Distinct keys k 1, k 2, …, k n and collection T of n records of the form (k 1, I 1 ), (k 2, I 2 ), …, (k n, I n ) where I j is.
1 Hash Tables Chapter Motivation Many applications require only: –Insert –Search –Delete Examples –Symbol tables –Memory management mechanisms.
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Hashing (part 2) CSE 2011 Winter March 2018.
Hash table CSC317 We have elements with key and satellite data
Hashing - resolving collisions
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Hash Tables.
Collision Resolution Neil Tang 02/18/2010
Introduction to Algorithms 6.046J/18.401J
Resolving collisions: Open addressing
CS202 - Fundamental Structures of Computer Science II
Introduction to Algorithms
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Collision Resolution Neil Tang 02/21/2008
Ch Hash Tables Array or linked list Binary search trees
Collision Handling Collisions occur when different elements are mapped to the same cell.
DATA STRUCTURES-COLLISION TECHNIQUES
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Collision Resolution: Open Addressing Extendible Hashing
Presentation transcript:

E.G.M. PetrakisHashing1  Data organization in main memory or disk  sequential, binary trees, …  The location of a key depends on other keys => unnecessary key comparisons to find a key  Question: find key with a single comparison  Hashing: the location of a record is computed using its key only  Fast for random accesses - slow for range queries

E.G.M. PetrakisHashing2 Hash Table  Hash Function: transforms keys to array indices n h(key) dataindex h(key): Hash Function m

E.G.M. PetrakisHashing3 positionkeyrecord h(key) = key mod 1000

E.G.M. PetrakisHashing4 Good Hash Functions 1. Uniform: distribute keys evenly in space 2. Perfect: two records cannot occupy the same location or 3. Order preserving:  Difficult to find such hash functions  Property 2 is the most essential  Most functions are no better than h(key) = key mod m  Hash collision:

E.G.M. PetrakisHashing5 Collision Resolution 1. Open Addressing (rehashing): compute new position to store the key in the table (no extra space) i. linear probing ii. double hashing 2. Separate Chaining: lists of keys mapped to the same position (uses extra space)

E.G.M. PetrakisHashing6 Open Addressing  Computes a new address to store the key if it is occupied (rehashing)  if occupied too, compute a new address, … until an empty position is found  primary hash function: i=h(key)  rehash function: rh(i)=rh(h(key))  hash sequence: (h 0,h 1,h 2 …) = (h(key), rh(h(key)), rh(rh(h(key)))…)  To find a key follow the same hash sequence

E.G.M. PetrakisHashing7 Example  i=h(key)=key mod 100  rh(i) = (i+1) mod 100  key: 193  i=h(193)=93  rh(i)=(93+1)=94  Key 193 will occupy position

E.G.M. PetrakisHashing8 Problem 1: Locate Empty Positions  No empty position can be found i. the table is full  check on number of empty positions ii. the hash function fails to find an empty position although the table is not full !!  i=h(key) = key mod 1000  rh(i) = (i + 200) mod 1000 => checks only 5 positions on a table of 1000 positions  rh(i) = (i+1) mod 1000 successive positions  rh(i) = (i+c) mod 1000 where GCD(c,m) = 1

E.G.M. PetrakisHashing9 Problem 2: Primary Clustering  Different keys that hash into different addresses compete with each other in successive rehashes  i=h(key) = key mod 100  rh(i) = (i+1) mod 100  keys: 1990, 1991, 1992, 1993, 1994 => 94

E.G.M. PetrakisHashing10 Problem 3: Secondary Clustering  Different keys which hash to the same hash value have the same rehash sequence  i=h(key) = key mod 10  rh(i,j) = (i + j) mod 10 i. key 23 : h(23) = 3 rh = 4, 6, 9, 3, … ii. key 13 : h(13) = 3 rh = 4, 6, 9, 3, …

E.G.M. PetrakisHashing11 Linear Probing  Store the key into the next free position  h 0 = h(key) usually h 0 = key mod m  h i = (h i-1 + 1) mod m, i >= 1 S = {22, 35, 301, 99, 102, 452}

E.G.M. PetrakisHashing12 Observation 1  Different insertion sequences => different hash sequences  S 1 = {11,3,27,99,8,50,77,2 2,12,31,33,40,53}=>28 probes  S 2 ={53,40,33,31,12,22,7 7,50,8,99,27,3,11}=> 30 probes H(key) = key mod 13 number of probes

E.G.M. PetrakisHashing13 Observation 2  Deletions are not easy:  i=h(key) = key mod 10  rh(i) = (i+1) mod 10  Action: delete(65) and search(5)  Problem: search will stop at the empty position and will not find 5  Solution:  mark position as deleted rather than empty  the marked position can be reused

E.G.M. PetrakisHashing14 Observation 3  Linear probing tends to create long sequences of occupied positions  the longer a sequence is, the longer it tends to become  P: probability to use a position in the cluster Β m

E.G.M. PetrakisHashing15 Observation 4  Linear probing suffers from both primary and secondary clustering  Solution: double hashing  uses two hash functions h 1, h 2 and a  rehashing function rh

E.G.M. PetrakisHashing16 Double Hashing  Two hash functions and a rehashing function  primary hash function i=h 1 (key)= key mod m  secondary hash function h 2 (key)  rehashing function: rh(key) = (i + h 2 (key)) mod m  h 2 (m,key) is some function of m, key  helps rh in computing random positions in the hash table  h 2 is computed once for each key!

E.G.M. PetrakisHashing17 Example of Double Hashing i. hash function:  h 1 (key) = key mod m  q = (key div m) mod m ii. rehash function:  rh(i, key) = (i + h 2 (key)) mod m

E.G.M. PetrakisHashing18 Example (continued) A. m = 10,key = 23 h 1 (23) = 3, h 2 (23) = 2 rh(3,2)=(3+2) mod 10 = 5 rehash sequence: 5, 7, 9, 1, … m = 10, key = 13 h 1 (key)=3, h 2 (13)=1, rh(3,1)=(3+1)mod10=4 rehash sequence: 4, 5, 6,…

E.G.M. PetrakisHashing19 Performance of Open Addressing  Distinguish between  successful and  unsuccessful search  Assume a series of probes to random positions  independent events  load factor: λ = n/m  λ: probability to probe an occupied position  each position has the same probability P=1/m

E.G.M. PetrakisHashing20 Unsuccessful Search  The hash sequence is exhausted  let u be the expected number of probes  u equals the expected length of the hash sequence  P(k): probability to search k positions in the hash sequence

E.G.M. PetrakisHashing21

E.G.M. PetrakisHashing22 independent events u increases with λ => performance drops as λ increases

E.G.M. PetrakisHashing23 Successful Search  The hash sequence is not exhausted  the number of probes to find a key equals the number of probes s at the time the key was inserted plus 1  λ was less at that time  consider all values of λ increases with λ u: equivalent to unsuccessful search approximation

E.G.M. PetrakisHashing24 Performance  The performance drops as λ increases  the higher the value of λ is, the higher the probability of collisions  Unsuccessful search is more expensive than successful search  unsuccessful search exhausts the hash sequence

E.G.M. PetrakisHashing25 Experimental Results SUCCESSFUL UNSUCCESSFUL LOAD FACTOR LINEAR i + bkey DOUBLE LINEAR i + bkey DOUBLE 25% % % % %

E.G.M. PetrakisHashing26 Performance on Full Table

E.G.M. PetrakisHashing27 Separate Chaining  Keys hashing to the same hash value are stored in separate lists  one list per hash position  can store more than m records  easy to implement  the keys in each list can be ordered

E.G.M. PetrakisHashing28 h(key) = key mod m

E.G.M. PetrakisHashing29 Performance of Separate Chaining  Depends on the average chain size  insertions are independent events  let P(c,n,m): probability that a position has been selected c times after n insertions on a table of size m  P(c,n,m): probability that the chain has length c => binomial distribution p=1/m: success case q=1-p: failure case

E.G.M. PetrakisHashing30 => P(c,n,m)=(1/c!)λ c e -λ Poison =>

E.G.M. PetrakisHashing31 Unsuccessful Search  The entire chain is searched  the average number of comparisons equals its average length u

E.G.M. PetrakisHashing32 Successful Search  Not the whole chain is searched  the average number of comparisons equals the length s of the chain at time the key was inserted plus 1  the performance at the time a key was inserted equals that of unsuccessful search!

E.G.M. PetrakisHashing33 Performance  The performance drops with the length of the chains  worst case: all keys are stored in a single chain  worst case performance: O(N)  unsuccessful search performs better than successful search!! WHY ?  no problem with deletions!!

E.G.M. PetrakisHashing34 Coalesced Hashing  The hash sequence is implemented as a linked list within the hash table  no rehash function  the next hash position is the next available position in linked list  extra space for the list h(key) = key mod 10 keys: 19, 29, 49, 59

E.G.M. PetrakisHashing35 avail initially: avail = 9 h(key) = key mod 10 keys: 14,29,34,28,42,39,84,38 initialization List of empty positions Holds lists of rehashing positions and list of empty positions

E.G.M. PetrakisHashing36 Performance of Coalesced Hashing  Unsuccessful search  Successful search probes/search