1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.

Slides:



Advertisements
Similar presentations
Hash Key to address transformation Division remainder method Hash(key)= key mod tablesize Random number generation Folding method Digit or Character extraction.
Advertisements

§4 Open Addressing 2. Quadratic Probing f ( i ) = i 2 ; /* a quadratic function */ 【 Theorem 】 If quadratic probing is used, and the table size is prime,
Hashing General idea Hash function Separate Chaining Open Addressing
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CSCE 3400 Data Structures & Algorithm Analysis
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
CHAPTER 7 HASHING What is hashing for? For searching But we already have binary search in O( ln n ) time after sorting. And we already have algorithms.
Nov 12, 2009IAT 8001 Hash Table Bucket Sort. Nov 12, 2009IAT 8002  An array in which items are not stored consecutively - their place of storage is calculated.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Techniques.
Hashing CS 3358 Data Structures.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Hashing General idea: Get a large array
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Hashing. Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted.
§3 Separate Chaining ---- keep a list of all keys that hash to the same value struct ListNode; typedef struct ListNode *Position; struct HashTbl; typedef.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Hashing CS 202 – Fundamental Structures of Computer Science II Bilkent.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
1.  We’ll discuss the hash table ADT which supports only a subset of the operations allowed by binary search trees.  The implementation of hash tables.
DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
Comp 335 File Structures Hashing.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hash Tables - Motivation
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Hashing - 2 Designing Hash Tables Sections 5.3, 5.4, 5.4, 5.6.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
CHAPTER 8 SEARCHING CSEB324 DATA STRUCTURES & ALGORITHM.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Fundamental Structures of Computer Science II
Hashing Problem: store and retrieving an item using its key (for example, ID number, name) Linked List takes O(N) time Binary Search Tree take O(logN)
Hash Table.
CSCE 3110 Data Structures & Algorithm Analysis
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Tree traversal preorder, postorder: applies to any kind of tree
EE 312 Software Design and Implementation I
Hashing.
Data Structures and Algorithm Analysis Hashing
EE 312 Software Design and Implementation I
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Presentation transcript:

1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with binary search trees

2 5.1 General Ideas Hash table is a fixed size (TableSize) array containing keys. Each key is mapped into some number in the range 0 to TableSize - 1, and placed in the appropriate cells.

3 5.1 General Ideas The mapping is called a hash function, which should be simple to compute and should ensure that any two distinct keys get different cells. It should distribute the keys evenly among the cells. Collision occurs when 2 or more keys are mapped to the same cell.

4 5.2 Hash Function Simple Hash Function For numeric keys, one simple hash function is Key mod TableSize, where TableSize is a prime number. Assume the key value is 9 digits, and there are 2500 keys. To reduce collision, choose the table size so that the load factor is about 50%.

5 5.2 Hash Function Select TableSize to be 4999, a prime number close to 5000.

6 5.2 Hash Function Hash by Folding Partition the key into several parts, usually 3 parts of about equal length. Partitions are folded over each other and summed. The remainder of the sum divided by TableSize is the hash value.

7 5.2 Hash Function Example (use folding instead of to illustrate folding, and set TableSize to for ease of illustration)

8 5.2 Hash Function Mid-Square Method The key is multiplied by itself (squared). The middle few digits of the result are used as the hash value. The exact number of digits to be used depends on the size of the table.

9 5.2 Hash Function Suppose the key is = The middle 3 digits 399 is the hash value. If TableSize is 200, then 399 mod 200 = 199 is the hash value. Avoid the situation where the middle digits are zeros.

Hash Function Character Keys One simple method to convert keys to numbers is to add up the ASCII values of the characters in the string, e.g., the string HongKong becomes 795 ( )

Hash Function typedef unsigned int Index; /* Fig 5.3 */ Index Hash1(const char *Key, int TableSize) { unsigned int HashVal = 0; while (*Key != '\0') HashVal += *Key++; return HashVal % TableSize; }

Separate Chaining Keep a list of all elements that hash to the same value Example: Hash (X) = X mod 10, with new elements inserted at the end of the list, and the data sequence 0, 4, 9, 16, 25, 36, 49, 64, 81

Separate Chaining

Separate Chaining Type declaration for separate chaining /* Fig 5.7 */ #ifndef _HashSep_H struct ListNode; typedef struct ListNode *Position; struct HashTbl; typedef struct HashTbl *HashTable;

Separate Chaining HashTable InitializeTable (int TableSize); void DestroyTable (HashTable H); Position Find (ElementType Key, HashTable H ); void Insert (ElementType Key, HashTable H); ElementType Retrieve (Position P); /* Routines such as Delete and MakeEmpty are omitted */ #endif /* _HashSep_H */

Separate Chaining struct ListNode { ElementType Element; Position Next; }; typedef Position List; struct HashTbl { int TableSize; List *TheLists; };

Separate Chaining Initialization routine for separate chaining /* Fig 5.8 */ HashTable InitializeTable (int TableSize) { HashTable H; int i;

Separate Chaining if (TableSize < MinTableSize) { Error ("Table size too small"); return NULL; } /* Allocate table */ H = malloc (sizeof (struct HashTbl));

Separate Chaining if (H == NULL) FatalError ("Out of space!!!"); H->TableSize = NextPrime (TableSize); /* Allocate array of lists */ H->TheLists = malloc (sizeof (List) * H-> TableSize); if (H->TheLists == NULL) FatalError ("Out of space!!!");

Separate Chaining /* Allocate list headers */ for (i = 0; i TableSize; i++) { H->TheLists [i] = malloc (sizeof (struct ListNode)); if (H->TheLists [i] == NULL) FatalError ("Out of space!!!"); else H->TheLists [i]->Next = NULL; }

Separate Chaining return H; }

Separate Chaining Find routine for separate chaining /* Fig 5.9 */ Position Find (ElementType Key, HashTable H) { Position P; List L;

Separate Chaining L = H->TheLists [Hash (Key, H->TableSize)]; P = L->Next; while (P != NULL && P->Element != Key) /* Probably need strcmp!! */ P = P->Next; return P; }

Separate Chaining Insert routine for separate chaining /* Fig 5.10 */ void Insert (ElementType Key, HashTable H) { Position Pos, NewCell; List L;

Separate Chaining Pos = Find (Key, H); if (Pos == NULL) /* Key is not found */ { NewCell = malloc (sizeof (struct ListNode)); if (NewCell == NULL) FatalError ("Out of space!!!"); else {

Separate Chaining L = H->TheLists [Hash (Key, H-> TableSize)]; NewCell->Next = L->Next; /* Probably need strcpy! */ NewCell->Element = Key; L->Next = NewCell; } } }

Separate Chaining Effort required to perform a search is the constant time required to evaluate the hash function plus the time to traverse the list. Average list length = (load factor) Successful search requires about 1 + /2 links to be traversed. Unsuccessful search requires about 1 + links to be traversed.

Separate Chaining A general rule is to make the table size as large as the expected number of elements. Chaining could be through a list or a tree. A disadvantage of separate chaining is that it requires a second data structure for the chains. Time is required for the allocation of new cells on insertion.

Open Addressing If collision occurs, alternative cells are tried until an empty cell is found. h i (X) = (Hash (X) + F(i)) mod TableSize, with F(0) = 0 Load factor should be below 0.5. Try consecutive locations (with wraparound), i.e., F(i) = i.

Linear Probing Example: Key sequence 89, 18, 49, 58, 69

Linear Probing Primary clustering Any key that hashes into the cluster will require several attempts to resolve the collision,and then it will add to the cluster. Expected number of probes for successful search is S = 1/2(1+1/(1- ))

Linear Probing Primary clustering Any key that hashes into the cluster will require several attempts to resolve the collision,and then it will add to the cluster.

Linear Probing Expected number of probes for successful search is S = 1/2(1+1/(1- )) Expected number of probes for insertion and unsuccessful search and is

Linear Probing For random collision resolution strategy (each probe is independent of the previous probes),

Linear Probing

Quadratic Probing Eliminates the primary clustering problem The collision function is quadratic, e.g., F(i) = i 2 No guarantee that all cells are tried. No guarantee of finding an empty cell once the table gets more than half full, or even before the table gets full if the table size is not prime.

Quadratic Probing

Quadratic Probing Eliminates the primary clustering problem The collision function is quadratic, e.g., F(i) = i 2 No guarantee that all cells are tried. No guarantee of finding an empty cell once the table gets more than half full, or even before the table gets half full if the table size is not prime.

Quadratic Probing Type declaration for open addressing typedef int ElementType; /* Fig */ #ifndef _HashQuad_H typedef unsigned int Index; typedef Index Position;

Quadratic Probing /* Place in the implementation file */ enum KindOf Entry {Legitimate, Empty, Deleted} struct HashEntry {ElementType Element; enum KindOfEntryInfo; };

Quadratic Probing typedef struct HashEntry Cell; /* Cell *TheCells will be allocated later */ struct HashTbl { int TableSize; Cell*TheCells; };

Quadratic Probing struct HashTbl; typedef struct HashTbl *HashTable; HashTable InitializeTable (int TableSize); void DestroyTable (HashTable H); Position Find (ElementType Key, HashTable H); void Insert (ElementType Key, HashTable H);

Quadratic Probing ElementType Retrieve (Position P, HashTable H); HashTable Rehash (HashTable H); /* Delete & MakeEmpty are omitted */ #endif /* _HashQuad_H */

Quadratic Probing Routine to initialize open addressing hash table /* Fig */ HashTable InitializeTable (int TableSize) { HashTable H; int i;

Quadratic Probing if (TableSize < MinTableSize) { Error ("Table size too small"); return NULL; } /* Allocate table */ H = malloc (sizeof (struct HashTbl));

Quadratic Probing if (H == NULL) FatalError ("Out of space!!!"); H->TableSize = NextPrime (TableSize); /* Allocate array of Cells */ H->TheCells = malloc (sizeof (Cell) * H ->TableSize);

Quadratic Probing if (H->TheCells == NULL) FatalError ("Out of space!!!"); for (i = 0; i TableSize; i++ ) H->TheCells [i].Info = Empty; return H; }

Quadratic Probing Routine for hashing with quadratic probing /* Fig */ Position Find (ElementType Key, HashTable H) { Position CurrentPos; int CollisionNum;

Quadratic Probing CollisionNum = 0; CurrentPos = Hash (Key, H->TableSize); while (H->TheCells [CurrentPos].Info != Empty && H-> TheCells [CurrentPos].Element != Key) /* Probably need strcmp!! */ { CurrentPos += 2 * ++CollisionNum - 1;

Quadratic Probing if (CurrentPos >= H->TableSize) CurrentPos -= H->TableSize; } return CurrentPos; } If the table size is prime, a new element can always be inserted if the table is at least half empty.

Quadratic Probing Standard deletion cannot be performed in an open addressing hash table because the cell might have caused a collision to go past it. Secondary clustering problem - elements hash to the same position will probe the same alternative cells.

Double Hashing F (i) = i * hash 2 (X), hash 2 (X) should not be zero An example is hash 2 (X) = R - (X mod R), where R is a prime number smaller than TableSize (R=7 in the following table).

Double Hashing

Rehashing Build another hash table that is about twice as big, with a new hash function. Suppose the elements 13, 15, 24, 6 and 23 are inserted into the original table using the function h(X) = X mod 7, with linear probing:

Rehashing

Rehashing Rehashing with a table with 17 cells, h(X) = X mod 17

Rehashing Rehashing for Open Addressing /* Fig 5.22 */ HashTable Rehash (HashTable H) {int i, OldSize; Cell *OldCells; OldCells = H->TheCells; OldSize = H->TableSize;

Rehashing /* Get a new, empty table */ H = InitializeTable (2 * OldSize); /* Scan through old table, reinsert into new */ for (i = 0; i < OldSize; i++ ) if (OldCells [i].Info == Legitimate) Insert (OldCells [i].Element, H); free (OldCells); return H; }

Rehashing Rehashing operation is O(N) per N/2 inserts, i.e., a constant cost to each insertion. Slow down interactive operations

Rehashing Strategies of implementing rehashing with quadratic probing –rehash as soon as half full –rehash only when insertion fails –rehash when the load factor reaches a certain threshold

61 Chapter 5 Summary Hash table can be used to implement the Insert and Find operations in constant average time. For separate chaining, the load factor should be close to 1. For open addressing, the load factor should not exceed 0.5.

62 Chapter 5 Summary Rehashing can be implemented to allow the table to grow. Comparison between binary search trees and hash tables –difficult to find the minimum (or maximum) element in a hash table –cannot find a range of elements in a hash table

63 Chapter 5 Summary –O(log N) is not necessarily that much more than (1), since there are no multiplications or divisions by search trees. –Sorted input can make binary trees perform poorly.

64 Chapter 5 Summary Some applications of hash table –files for which records are not required to be arranged in a particular order –compilers use symbol tables to keep track of declared variables (no delete operation) –on-line spelling checkers can store an entire dictionary in a hash table