Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Analysis of Algorithms CS 477/677
Hash Tables.
Hash Tables CIS 606 Spring 2010.
Hashing.
CS 253: Algorithms Chapter 11 Hashing Credit: Dr. George Bebis.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Hashing CS 3358 Data Structures.
Dictionaries and Their Implementations
1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables.
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
Hash Tables How well do hash tables support dynamic set operations? Implementations –Direct address –Hash functions Collision resolution methods –Universal.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Lecture 10: Search Structures and Hashing
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
Spring 2015 Lecture 6: Hash Tables
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.
Implementing Dictionaries Many applications require a dynamic set that supports dictionary-type operations such as Insert, Delete, and Search. E.g., a.
Data Structures Hash Tables. Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols:
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Hashing is another method for sorting and searching data.
Hashing Amihood Amir Bar Ilan University Direct Addressing In old days: LD 1,1 LD 2,2 AD 1,2 ST 1,3 Today: C
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Introduction to Algorithms 6.046J/18.401J LECTURE7 Hashing I Direct-access tables Resolving collisions by chaining Choosing hash functions Open addressing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
1 CSC 211 Data Structures Lecture 30 Dr. Iftikhar Azim Niaz 1.
1 Hash Tables Chapter Motivation Many applications require only: –Insert –Search –Delete Examples –Symbol tables –Memory management mechanisms.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Chapter 11 (Lafore’s Book) Hash Tables Hwajung Lee.
Hash table CSC317 We have elements with key and satellite data
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Hash Table.
CSE 2331/5331 Topic 8: Hash Tables CSE 2331/5331.
Dictionaries and Their Implementations
Introduction to Algorithms 6.046J/18.401J
Resolving collisions: Open addressing
Hash Tables – 2 Comp 122, Spring 2004.
Introduction to Algorithms
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
EE 312 Software Design and Implementation I
DATA STRUCTURES-COLLISION TECHNIQUES
DATA STRUCTURES-COLLISION TECHNIQUES
EE 312 Software Design and Implementation I
Hash Tables – 2 1.
Presentation transcript:

Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis

2 The Search Problem Unsorted list –O(N) Sorted list –O(logN) using arrays (i.e., binary search) –O(N) using linked lists Binary Search tree –O(logN) (i.e., balanced tree) –O(N) (i.e., unbalanced tree) Can we do better than this? –Direct Addressing –Hashing

3 Direct Addressing Assumptions: –Key values are distinct –Each key is drawn from a universe U = { 0, 1,..., n - 1 } Idea: –Store the items in an array, indexed by keys

4 Direct Addressing (cont’d) Direct-address table representation: – An array T[0... n - 1] – Each slot, or position, in T corresponds to a key in U – For an element x with key k, a pointer to x will be placed in location T[k] – If there are no elements with key k in the set, T[k] is empty, represented by NIL Search, insert, delete in O(1) time!

5 Direct Addressing (cont’d) Example 1: Suppose that the are integers from 1 to 100 and that there are about 100 records. Create an array A of 100 items and stored the record whose key is equal to i in in A[i]. |K| = |U| |K|: # elements in K |U|: # elements in U

6 Direct Addressing (cont’d) Example 2: Suppose that the keys are 9-digit social security numbers (SSN) Although we could use the same idea, it would be very inefficient (i.e., use an array of 1 billion size to store 100 records) |K| << |U|

7 Hashing Idea: –Use a function h to compute the slot for each key –Store the element in slot h(k) A hash function h transforms a key into an index in a hash table T[0…m-1]: h : U → {0, 1,..., m - 1} We say that k hashes to slot h(k)

8 Hashing (cont’d) U (universe of keys) K (actual keys) 0 m - 1 h(k 3 ) h(k 2 ) = h(k 5 ) h(k 1 ) h(k 4 ) k1k1 k4k4 k2k2 k5k5 k3k3 h : U → {0, 1,..., m - 1} hash table size: m

9 Hashing (cont’d) Example 2: Suppose that the keys are 9-digit social security numbers (SSN)

Advantages of Hashing Reduce the range of array indices handled: m instead of |U| where m is the hash table size: T[0, …, m-1] Storage is reduced. O(1) search time (i.e., under assumptions).

11 Collisions U (universe of keys) K (actual keys) 0 m - 1 h(k 3 ) h(k 2 ) = h(k 5 ) h(k 1 ) h(k 4 ) k1k1 k4k4 k2k2 k5k5 k3k3 Collisions occur when h(k i )=h(k j ), i≠j

12 Collisions (cont’d) For a given set K of keys: –If |K| ≤ m, collisions may or may not happen, depending on the hash function! –If |K| > m, collisions will definitely happen (i.e., there must be at least two keys that have the same hash value) Avoiding collisions completely might not be easy.

13 Handling Collisions We will discuss two main methods: (1) Chaining (2) Open addressing Linear probing Quadratic probing Double hashing

14 Chaining Idea: –Put all elements that hash to the same slot into a linked list –Slot j contains a pointer to the head of the list of all elements that hash to j

15 Chaining (cont’d) How to choose the size of the hash table m? –Small enough to avoid wasting space. –Large enough to avoid many collisions and keep linked-lists short. –Typically 1/5 or 1/10 of the total number of elements. Should we use sorted or unsorted linked lists? –Unsorted Insert is fast Can easily remove the most recently inserted elements

16 Hash Table Operations Search Insert Delete

17 Searching in Hash Tables Alg.: CHAINED-HASH-SEARCH(T, k) search for an element with key k in list T[h(k)] Running time depends on the length of the list of elements in slot h(k)

18 Insertion in Hash Tables Alg.: CHAINED-HASH-INSERT( T, x ) insert x at the head of list T[h(key[x])] T[h(key[x])] takes O(1) time; insert will take O(1) time overall since lists are unsorted. Note: if no duplicates are allowed, It would take extra time to check if item was already inserted.

19 Deletion in Hash Tables Alg.: CHAINED-HASH-DELETE(T, x) delete x from the list T[h(key[x])] T[h(key[x])] takes O(1) time. Finding the item depends on the length of the list of elements in slot h(key[x])

20 Analysis of Hashing with Chaining: Worst Case How long does it take to search for an element with a given key? Worst case: –All n keys hash to the same slot then O(n) plus time to compute the hash function 0 m - 1 T chain

21 Analysis of Hashing with Chaining: Average Case It depends on how well the hash function distributes the n keys among the m slots Under the following assumptions: (1) n = O(m) (2) any given element is equally likely to hash into any of the m slots (i.e., simple uniform hashing property) then  O(1) time plus time to compute the hash function n 0 = 0 n m – 1 = 0 T n2n2 n3n3 njnj nknk

22 Properties of Good Hash Functions Good hash function properties (1)Easy to compute (2) Approximates a random function i.e., for every input, every output is equally likely. (3) Minimizes the chance that similar keys hash to the same slot i.e., strings such as pt and pts should hash to different slot. We will discuss two methods: –Division method –Multiplication method

23 The Division Method Idea: –Map a key k into one of the m slots by taking the remainder of k divided by m h(k) = k mod m Advantage: –fast, requires only one operation Disadvantage: –Certain values of m are bad (i.e., collisions), e.g., power of 2 non-prime numbers

24 Example If m = 2 p, then h(k) is just the least significant p bits of k –p = 1  m = 2  h(k) =, least significant 1 bit of k –p = 2  m = 4  h(k) =, least significant 2 bits of k  Choose m to be a prime, not close to a power of 2  Column 2:  Column 3: {0, 1} {0, 1, 2, 3} k mod 97 k mod 100 m 97 m 100

25 The Multiplication Method Idea: (1) Multiply key k by a constant A, where 0 < A < 1 (2) Extract the fractional part of kA (3) Multiply the fractional part by m (4) Truncate the result h(k) = =  m (k A mod 1)  Disadvantage: Slower than division method Advantage: Value of m is not critical fractional part of kA = kA -  kA 

26 Example – Multiplication Method Suppose k=6, A=0.3, m=32 (1) k x A = 1.8 (2) fractional part: (3) m x 0.8 = 32 x 0.8 = 25.6 (4) h(6)=25

27 Open Addressing Idea: store the keys in the table itself No need to use linked lists anymore Basic idea: –Insertion: if a slot is full, try another one, until you find an empty one. –Search: follow the same probe sequence. –Deletion: need to be careful! Search time depends on the length of probe sequences! e.g., insert 14 probe sequence:

Generalize hash function notation: A hash function contains two arguments now: (i) key value, and (ii) probe number h(k,p), p=0,1,...,m-1 Probe sequence: Example: e.g., insert 14 Probe sequence: =

Generalize hash function notation: –Probe sequence must be a permutation of –There are m! possible permutations e.g., insert 14 Probe sequence: =

30 Common Open Addressing Methods Linear probing Quadratic probing Double hashing None of these methods can generate more than m 2 different probe sequences!

31 Linear probing: Inserting a key Idea: when there is a collision, check the next available position in the table: h(k,i) = (h 1 (k) + i) mod m i=0,1,2,... i=0: first slot probed: h 1 (k) i=1: second slot probed: h 1 (k) + 1 i=2: third slot probed: h 1 (k)+2, and so on How many probe sequences can linear probing generate? m probe sequences maximum probe sequence: wrap around

32 Linear probing: Searching for a key Given a key, generate a probe sequence using the same procedure. Three cases: (1) Position in table is occupied with an element of equal key  FOUND (2) Position in table occupied with a different element  KEEP SEARCHING (3) Position in table is empty  NOT FOUND 0 m - 1 wrap around

33 Linear probing: Searching for a key Running time depends on the length of the probe sequences. Need to keep probe sequences short to ensure fast search. 0 m - 1 wrap around

34 Linear probing: Deleting a key First, find the slot containing the key to be deleted. Can we just mark the slot as empty? –It would be impossible to retrieve keys inserted after that slot was occupied! Solution –“Mark” the slot with a sentinel value DELETED The deleted slot can later be used for insertion. 0 m - 1 e.g., delete 98

35 Primary Clustering Problem Long chunks of occupied slots are created. As a result, some slots become more likely than others. Probe sequences increase in length.  search time increases!! Slot b: 2/m Slot d: 4/m Slot e: 5/m initially, all slots have probability 1/m

36 Quadratic probing i=0,1,2,... Clustering is less serious but still a problem (secondary clustering) How many probe sequences can quadratic probing generate? m -- the initial position determines probe sequence

37 Double Hashing (1) Use one hash function to determine the first slot. (2) Use a second hash function to determine the increment for the probe sequence: h(k,i) = (h 1 (k) + i h 2 (k) ) mod m, i=0,1,... Initial probe: h 1 (k) Second probe is offset by h 2 (k) mod m, so on... Advantage: handles clustering better Disadvantage: more time consuming How many probe sequences can double hashing generate? m 2 -- why?

38 Double Hashing: Example h 1 (k) = k mod 13 h 2 (k) = 1+ (k mod 11) h(k,i) = (h 1 (k) + i h 2 (k) ) mod 13 Insert key 14: i=0: h(14,0) = h 1 (14) = 14 mod 13 = 1 i=1: h(14,1) = (h 1 (14) + h 2 (14)) mod 13 = (1 + 4) mod 13 = 5 i=2: h(14,2) = (h 1 (14) + 2 h 2 (14)) mod 13 = (1 + 8) mod 13 =