HASH TABLE. HASH TABLE a group of people could be arranged in a database like this: Hashing is the transformation of a string of characters into a.

Slides:



Advertisements
Similar presentations
Hash Tables CSC220 Winter What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?
Advertisements

Space-for-Time Tradeoffs
Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree.
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
File Processing - Indirect Address Translation MVNC1 Hashing Indirect Address Translation Chapter 11.
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Using arrays – Example 2: names as keys How do we map strings to integers? One way is to convert each letter to a number, either by mapping them to 0-25.
Hashing Techniques.
Hashing CS 3358 Data Structures.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Design and Analysis of Algorithms - Chapter 71 Hashing b A very efficient method for implementing a dictionary, i.e., a set with the operations: – insert.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Hash Table March COP 3502, UCF.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
1 Hash table. 2 A basic problem We have to store some records and perform the following:  add new record  delete record  search a record by key Find.
Comp 335 File Structures Hashing.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing as a Dictionary Implementation Chapter 19.
Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
CHAPTER 8 SEARCHING CSEB324 DATA STRUCTURES & ALGORITHM.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Copyright © Curt Hill Hashing A quick lookup strategy.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Chapter 9 Hashing Dr. Youssef Harrath
Hashing. Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string.
Hash Tables ADT Data Dictionary, with two operations – Insert an item, – Search for (and retrieve) an item How should we implement a data dictionary? –
Chapter 5 Record Storage and Primary File Organizations
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.
Chapter 11 (Lafore’s Book) Hash Tables Hwajung Lee.
Data Structures Chapter 8: Hashing 8-1. Performance Comparison of Arrays and Trees Is it possible to perform these operations in O(1) ? ArrayTree Sorted.
Data Structures Using C++ 2E
Slides by Steve Armstrong LeTourneau University Longview, TX
Subject Name: File Structures
Data Structures Using C++ 2E
Review Graph Directed Graph Undirected Graph Sub-Graph
Hash Table.
Hash Table.
Hash Tables.
Chapter 10 Hashing.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
What we learn with pleasure we never forget. Alfred Mercier
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Presentation transcript:

HASH TABLE

a group of people could be arranged in a database like this: Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string. Example a group of people could be arranged in a database like this: Allen, Jane Moore, Sarah Smith, Dan

H A S I N G Hash Table 7864 Allen, Jane Moore, Sarah 9802 1990 Smith, Dan HASH VALUES HASH KEYS HASH FUNCTION

Hash Table stores things and allows 3 operations: insert, search and delete. associated with a set of records

Bob Miller 34 John Smith Sally Wood 21 H John Smith 29 5

Each slot of a hash table is called a bucket and hash values are called bucket indices. 7864 Allen, Jane BUCKET BUCKET INDEX

HASH FUNCTION Mapping of the keys to indices of a hash table 2 compositions Hash code map: key integer Compression map: integer [0, N-1]

DIVISION Example: If table size m = 12 key k = 100 Map a key k into one of m slots by using this function: h(k) = k mod m   Example:     If table size m = 12                 key k = 100     than         h(100) = 100 mod 12                    = 4

Ex. k=3121 then 31212=9740641 thus h(3121)= 406 MID-SQUARE FUNCTION The key is squared and the mid part is used as the address. Ex. k=3121 then 31212=9740641 thus h(3121)= 406

Folding Key is divided into several parts 2 types 1. shift folding 2. boundary folding

Shift Folding Ex. (SSN) 123-45-6789 1. Divide into 3 parts: 123, 456 and 789. 2. Add them. 123+456+789=1368 3. h(k)=k mod M where M = 1000 h(1368) = 1368 mod 1000 = 368 1. Divide into five parts: 12, 34, 56, 78 and 9. 2. Add them. 12 + 34 + 56 + 78 + 9 = 189 3. h(k)=k mod M where M = 1000 h(189) = 189 mod 1000 = 189

1st 4 digits = 1234 Last 4 digits = 6789 Extraction Only a part of the key is used to compute the address. Ex. (SSN) 123-45-6789 1st 4 digits = 1234 Last 4 digits = 6789 1st 2 combined with the last 2 = 1289(address)

Hash Method : Folding Chopping the Key in Two Parts Add the Two Parts to Generate the Hash Leading Digit will be Ignored Example Key 3205 7148 2345 Parts 32 05 71 48 23 45 H(x) 37 19 68 Option Rotate the Second Digit Parts 32 50 71 84 23 54 H(x) 82 55 77

Radix Transformation K is transformed into another number base M = 100 21210=2559 M = 100 H(k) = k mod M H(255) = 255 mod 100 = 55

212= 9(9(2)+ 5)+ 5 = 2(92)+ 5(9)+ 5. divide 212 by 9. 9 divides into 212 23 times with remainder 5. 212= 9(23)+ 5 9 divides into 23 twice with remainder 5. 23= 9(2)+5 212= 9(9(2)+ 5)+ 5 = 2(92)+ 5(9)+ 5.

different keys happen to have same hash value Hash Collision different keys happen to have same hash value

Bob Miller 34 Jane Depp 18 Collision! Sally Wood 21 2 John Smith 29

Collision Resolution There are two kinds of collision resolution: 1 – Chaining makes each entry a linked list so that when a collision occurs the new entry is added to the end of the list. 2 – Open Addressing uses probing to discover an empty spot.

Collision Resolution – Open Addressing the table is probed for an open slot when the first one already has an element. Linear probing in which the interval between probes is fixed — often at 1. Quadratic probing in which the interval between probes increases linearly (hence, the indices are described by a quadratic function). Double hashing in which the interval between probes is fixed for each record but is computed by another hash function.

H(x,i) = (H(x) + i)(mod M) Linear Probing is a scheme in resolving hash collisions of values of hash functions by sequentially searching the hash table for a free location two values - one as a starting value and one as an interval between successive values  newLocation = (startingValue + stepSize) % arraySize H(x,i) = (H(x) + i)(mod M)

Linear Probing - Example 1 2 3 4 5 6 7 8 9 empty Insert 15, 17, 8 1 2 3 4 5 6 7 8 9 empty 15 17 8 H(8)=8 mod 10 = 8 H(15)=15 mod 10 = 5 H(17)=17 mod 10 = 7

1 2 3 4 5 6 7 8 9 empty 75 15 35 17 8 25 Insert 25 Insert 35 H(1,8)=(1 + 8) mod 10 = 9 H(35)=35 mod 10 = 5 H(1,6)=(1 + 6) mod 10 = 7 H(1,7)=(1 + 7) mod 10 = 8 H(1,5)=(1 + 5) mod 10 = 6 H(25)=25 mod 10 = 5 H(1,5)=(1 + 5) mod 10 = 6 Insert 75 H(1,9)=(1+9) mod 10 = 0 H(1,8)=(1+8) mod 10 = 9 H(1,5)=(1+5) mod 10 = 6 H(75)=75 mod 10 = 5 H(1,6)=(1+6) mod 10 = 7 H(1,7)=(1+7) mod 10 = 8

Has anyone spotted the flaw in the linear probing technique Has anyone spotted the flaw in the linear probing technique? Think about this: what would happen if we now inserted 85, then 95, then 55?

Each one would probe exactly the same positions as its predecessors Each one would probe exactly the same positions as its predecessors. This is known as clustering. It leads to inefficient operations, because it causes the number of collisions to be much greater than it need be.

eliminates primary clustering p(K, i) = c1 i2 + c2i + c3 Quadratic Probing eliminates primary clustering p(K, i) = c1 i2 + c2i + c3  p(K, i) = i2 (i.e., c1 = 1, c2 = 0, and c3 = 0)

Quadratic Probing - Example Table Size is 11 (0..10) Hash Function: h(x) = x mod 11 Insert keys: 20 mod 11 = 9 30 mod 11 = 8 2 mod 11 = 2 13 mod 11 = 2  2+12=3 25 mod 11 = 3  3+12=4 24 mod 11 = 2  2+12, 2+22=6 10 mod 11 = 10 9 mod 11 = 9  9+12, 9+22 mod 11, 9+32 mod 11 =7 1 2 3 13 4 25 5 6 24 7 9 8 30 20 10

not all hash table slots will be on the probe sequence Using p(K, i) = i2 gives particularly inconsistent results If all slots on that cycle happen to be full, this means that the record cannot be inserted at all!

Double Hashing P = (1 + P) mod TABLE_SIZE  increment P, not by a constant but by an amount that depends on the Key.  P = (1 + P) mod TABLE_SIZE P = (P + INCREMENT(Key)) mod TABLE_SIZE

Double Hashing - Example P = (P + INCR(Key)) mod TABLE_SIZE Suppose INCR(Key) = 1 + (Key mod 7) Adding 1 guarantees it is never 0! Insert 15, 17, 8:

Insert 35: P = H(35) = 5. P = (5 + (1 + 35 mod 7)) mod 10 = 6. Insert 25:P = H(25) = 5. P = (5 + (1 + 25 mod 7)) mod 10 = 0

10 3 2 1 4 9 5 8 6 7 Let’s try! Insert 75: P = (P + INCR(Key)) mod TABLE_SIZE Suppose INCR(Key) = 1 + (Key mod 7)

Chaining/Separate Chaining uses an array as the primary hash table an array of lists of entries

Chaining nil nil nil : nil One way to handle collision is to store the collided records in a linked list. The array now stores pointers to such lists. If no key maps to a certain hash value, that array entry points to nil. 1 nil 2 nil 3 4 nil 5 : Key: 9903030 name: tom score: 73 HASHMAX nil

29 16 14 99 127 129 16 127 99 29 14 29 129

is a collision resolution method that Coalesced Hashing is a collision resolution method that uses pointers to connect the elements of a synonym chain. A hybrid of separate chaining and open addressing. Linked lists within the hash table handle collisions. This strategy is effective, efficient and very easy to implement.

A5, A2, A3 B5, A9, B2 B9, C2 Insert: A2 A2 A2 A3 A3 A3 C2 A5 A5 A5 B9

Insert: A5, A2, A3 B5, A9,B2 B9,C2 A2 A2 A2 A3 A3 A3 A5 A5 A5 C2 A9 A9

using additional space Bucket Addressing using additional space A bucket can be defined as a block of space that can be used to store multiple elements that hash to the same position.

Insert: A5, A2, A3, B5, A9, B2, B9 A2 B2 A3 A5 B5 A9 B9

TOMBSTONE DELETION Deleting a record must not hinder later searches. The search process must still pass through the newly emptied slot to reach records whose probe sequence passed through this slot.  It should not mark the slot as empty. Freed slot should be available to a future insertion. TOMBSTONE

28 83 25 75 35 Insert: Collision Probing Sequence: 25 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 1 2 3 4 5 6 7 0 4 5 6 7 0 1 2 3 3 4 5 6 7 0 1 2 75 Delete: 35 83 Match Found! TOMBSTONE 75 28

A1, A4, A2,B4,B1 A2 A4 Delete: Insert: A1 B1 A2 B1 B4 A4 B4

Perfect Hash Functions Quick to compute Distributes keys uniformly throughout the table Very rare(birthday paradox) No collisions Perfect hash functions are rare.

A Perfect Hash Function for Strings R. J. Cichelli gave an algorithm for finding perfect hash functions for strings. He proposes the hash function: h(s)=size+g(s.charAt(0))+ g(s.charAt(size-1))%n where size = s.length(). The function g is to be constructed so that h(s) is unique for each string s.

Example 1: Illustrating Perfect Hashing Use Cichelli's algorithm to build a minimal perfect hash function for the following nine strings: DO DOWNTO ELSE END IF IN TYPE VAR WITH

Example 1: Solution For Step 1 in the algorithm, we find the frequencies of the first and last letter of each word to find: D O E I F N T V R W H 3 2 4 2 1 1 1 1 1 1 1 Next we find the sum of the first and last letter of each word: DO=5(D+0=3+2), DOWNTO=5, ELSE = 8, END=7, IF=3, IN=3, TYPE=5, VAR=2,WITH=2 Sorting the keywords in decreasing frequency yields: ELSE END DOWNTO DO TYPE IN IF VAR WITH We are now at step 5 of the algorithm, the heart of the algorithm. We try the words in frequency order:

Example 1: Cichelli's Method (cont'd) s = ELSE g(E)=0 h(s)= s.length()+g(E)+g(E)=4 s = END g(D)=0 h(s)= s.length()+g(E)+g(D)=3 s = DOWNTO g(O)=0 h(s)= s.length()+g(D)+g(O)=6 s = DO h(s)= s.length()+g(D)+g(O)=2 s = TYPE g(T)=0 h(s)= s.length()+g(T)+g(E)=4* s = TYPE g(T)=1 h(s)= s.length()+g(T)+g(E)=5 s = IN g(I)=0,g(N)=0 h(s)=s.length()+g(I)+g(N)=2* s = IN g(I)=1,g(N)=0 h(s)=s.length()+g(I)+g(N)=3* s = IN g(I)=2,g(N)=0 h(s)=s.length()+g(I)+g(N)=4* s = IN g(I)=3,g(N)=0 h(s)=s.length()+g(I)+g(N)=5* s = IN g(I)=3,g(N)=1 h(s)=s.length()+g(I)+g(N)=6* s = IN g(I)=3,g(N)=2 h(s)=s.length()+g(I)+g(N)=7 s = IF g(F)=0 h(s)=s.length()+g(I)+g(F)=5* s = IF g(F)=1 h(s)=s.length()+g(I)+g(F)=6* s = IF g(F)=2 hash(s)=s.length()+g(I)+g(F)=7* s = IF g(F)=3 h(s)=s.length()+g(I)+g(F)=8

Example 1: Cichelli's Algorithm (cont'd) 0 1 2 3 4 5 6 7 8 VAR WITH DO END ELSE TYPR DOWNTO IN IF The hash table above is fully occupied with empty slots. Note that if there are empty slots or there is a collision, then the g-value assignments are in error.

Extendible hashing Hashing with buckets

Extendible Hashing - Class Example

Leaf Pages Directory d1 = local depth d = global depth rec 1 rec 4 splitting bucket rec 1 rec 2 d1=0 1 splitting bucket d = 0 record 3 = overflow!! d = 1 rec 2 rec 3 record 5 = overflow!! NEXT

rec 1 rec 4 00 01 rec 2 10 splitting bucket rec 3 d = 2 d1 = 2 d1 = 1 11 01 rec 1 rec 4 rec 2 splitting bucket rec 3 record 7 = overflow!! rec 5 rec 6 NEXT

000 110 d = 3 111 001 010 011 100 101 rec 1 rec 4 splitting bucket record 8 = overflow!! d1 = 3 rec 2 rec 7 d1 = 3 rec 3 rec 5 rec 6 NEXT

rec 1 NEXT rec 4 rec 8 000 001 010 011 100 rec 2 101 rec 7 110 rec 3 d1 = 3 d1 = 2 rec 1 rec 4 rec 8 000 110 d = 3 111 001 010 011 100 101 d1 = 3 d1 = 2 rec 2 rec 3 rec 5 rec 6 rec 7 rec 9 splitting bucket record 10 = overflow!!

NEXT rec 7 rec 9 d1 = 3 rec 1 rec 4 000 110 d = 3 111 001 010 011 100 101 rec 2 rec 3 d1 = 2 rec 8 rec 11 rec 12 d1 = 3 rec 5 rec 6 splitting bucket rec 10 record 13 = overflow!!

d1 = 4 d1 = 3 d1 = 2 0000 1110 d = 4 1111 0001 0010 0011 1100 1101 0100 0101 0110 0111 1010 1011 1000 1001 rec 1 rec 4 rec 2 rec 3 rec 5 rec 7 rec 8 rec 11 rec 12 rec 14 rec 15 rec 6 rec 10 rec 13

Hash Table Uses driver's license record's Internet search engines telephone book databases electronic library catalogs implementing passwords for systems with multiple users. Hash Tables allow for a fast retrieval of the password which corresponds to a given username