CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is.

Slides:



Advertisements
Similar presentations
Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Advertisements

Hash Tables.
The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
CSCE 3400 Data Structures & Algorithm Analysis
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Hashing21 Hashing II: The leftovers. hashing22 Hash functions Choice of hash function can be important factor in reducing the likelihood of collisions.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
TTIT33 Algorithms and Optimization – Lecture 5 Algorithms Jan Maluszynski - HT TTIT33 – Algorithms and optimization Lecture 5 Algorithms ADT Map,
Hash Tables and Associative Containers CS-212 Dick Steflik.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Hash Table March COP 3502, UCF.
Searching Chapter 2.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Hashing CS 202 – Fundamental Structures of Computer Science II Bilkent.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
1 Hash table. 2 A basic problem We have to store some records and perform the following:  add new record  delete record  search a record by key Find.
Appendix E-A Hashing Modified. Chapter Scope Concept of hashing Hashing functions Collision handling – Open addressing – Buckets – Chaining Deletions.
Comp 335 File Structures Hashing.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Hashing - 2 Designing Hash Tables Sections 5.3, 5.4, 5.4, 5.6.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
CS261 Data Structures Hash Tables Open Address Hashing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
COSC 1030 Lecture 10 Hash Table. Topics Table Hash Concept Hash Function Resolve collision Complexity Analysis.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hashing. Search Given: Distinct keys k 1, k 2, …, k n and collection T of n records of the form (k 1, I 1 ), (k 2, I 2 ), …, (k n, I n ) where I j is.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Sets and Maps Chapter 9.
Hash table CSC317 We have elements with key and satellite data
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Sets and Maps Chapter 9.
Data Structures – Week #7
What we learn with pleasure we never forget. Alfred Mercier
Data Structures and Algorithm Analysis Hashing
Presentation transcript:

CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is a key and I is some information associated with key K.

CS121 Data Structures CS121 © JAS 2004 The operations we may require are: 1.Initialise the table, T, to be the empty table. The empty table can be seen as filled with entries (K o, I o ) where K o is a special empty key, distinct from all other nonempty keys. 2.Determine whether or not the table, T, is full. 3.Insert a new entry (K,I), into the table, T, provided T is not already full. 4.Delete the table entry (K,I) from the table T. 5.Given, K, retrieve the information, I, from the entry (K,I) 6.Update the table entry (K,I) in table, T, by replacing it with a new table entry (K,I') 7.Enumerate the table entries (K,I) in table, T, in increasing order of their keys, K.

CS121 Data Structures CS121 © JAS 2004 Implementing a Table using a (Binary Search) Tree Initialise – create empty tree Empty – test for empty tree (root is nil) Full – tree is full only when memory exhausted Insertion – add entry to tree Deletion – delete a node (and reshape tree) Retrieve – search tree Update – search for entry and modify it Enumerate – traverse tree in appropriate order Initialise, Empty – O(1) Insert, Delete, Retrieve, Update – O(logn) Enumerate – O(n)

CS121 Data Structures CS121 © JAS 2004 Hash Tables Principle – calculate a value (hash value) from the key to determine the position of the entry in the table If the key field is alphabetic a simple hash function is to associate an integer value with each letter; sum them and return a value modulo the table size

CS121 Data Structures CS121 © JAS 2004 Associate the value 1 with a, 2 with b, and so on up to 26 with z Let the table size be 10 (indexes 0..9) cat = 24 mod 10 = 4 cat dog = 26 mod 10 = 6 test = 64 mod 10 = 4 dog test

CS121 Data Structures CS121 © JAS 2004 Unless our table size is such that it is big enough to contain all possible entries we will get collisions

CS121 Data Structures CS121 © JAS 2004 Probability of Collisions von Mises Birthday Paradox for 23 or more people, there is a greater than 50% chance that two or more will have the same birthday. for 88 or more – three or more will have the same birthday. Hence for a 365 entry table (based on birthdays), after 23 insertions there will be a greater than 50% chance that an insertion will result in a collision. This is counter-intuitive because the table is in fact only 23/365 = 6.3% full. This is known as the load factor.

CS121 Data Structures CS121 © JAS 2004 The Theory For M people – probability same birthday P(M) – probability they don’t Q(M). Q(M) = 1 – P(M). For M = 1 Q(1) = 1 (nobody to share with!) For M = 2 Q(2) = 364/365 (only 364 days not the other person’s birthday) For M = 3 Q(3) = (364x363) / (365 2 ) For M > 1Q(M) = (364x353x…x(366-M)) / (365 (M-1) ) P(22) = P(23) = 0.507

CS121 Data Structures CS121 © JAS 2004 When a collision occurs we need to find an alternative space for the entry The simplest method is a linear search – look forward for the next empty space – open addressing

CS121 Data Structures CS121 © JAS 2004 Primary clustering

CS121 Data Structures CS121 © JAS 2004 Solution to clustering is to use a different search path for different key values - double hashing If the initial hash function is based on remainder then a typical second hash would be based on the quotient if a <> b and a mod t = b mod t then a div t <> b div t

CS121 Data Structures CS121 © JAS 2004 cat = 24 mod 10 = 4 cat dog = 26 mod 10 = 6 dogtest Recall our initial example and let the rehash be divide by table size test = 64 mod 10 = 4 div 10 = 6

CS121 Data Structures CS121 © JAS 2004 The same principles of hashing and collision resolution are used for both insertion and retrieval/update. The series of entries that are examined to see if they are empty is called the probe sequence. Insertion terminates when we find an empty entry to insert into. Retrieval/update terminates when either we find the entry we are looking for, or we have searched the whole table.

CS121 Data Structures CS121 © JAS 2004 Need to ensure that all possible entries are probed. Obvious for a linear probe sequence (increment by 1). For a double hash probe sequence it is less obvious unless the table size is a prime number (or table is even and probe step size is odd). To check whole table has been searched we can count the number of probes. Alternative is define a full table as one with one empty space. A search thus ends when an empty space has been found. This requires the insertion routine to know that it doesn’t fill the last space.

CS121 Data Structures CS121 © JAS 2004 Some code:- Hashing by Open Addressing public class OpenAddress { private Object[] Elt; private int Tablesize; private Object EmptyCell;

CS121 Data Structures CS121 © JAS 2004 OpenAddress(int tablesize) { int index; Tablesize = tablesize; EmptyCell = null; Elt = new Object[tablesize]; for (index=0;index<tablesize;++index) Elt[index] = EmptyCell; }

CS121 Data Structures CS121 © JAS 2004 public void Store (Object newElt) throws OAException { int index = Hash (newElt.key); probe = Hash2 (newElt.key); for(int cnt = 1; cnt < Tablesize; cnt++) { if (Elt[index].equals(EmptyCell)) {Elt[index] = newElt; return;} else if (Elt[index].key.equals(newElt.key)) throw new OAException(“already in”); else index = (index+probe)%Tablesize; } throw new OAException(“table full”); }

CS121 Data Structures CS121 © JAS 2004 public Object Retrieve(Object SearchKey) throws OAException { int index = Hash(Searchkey); probe = Hash2(Searchkey); for (int cnt = 1; cnt < Tablesize; cnt++0 { if (Elt[index].equals(EmptyCell)) throw new OAException(“not in table”); else if (Elt[index].key.equals(SearchKey)) return Elt[index]; else index = (index + probe)%Tablesize; } throw new OAException(“not in table”); }

CS121 Data Structures CS121 © JAS 2004 cat dog testcat dog test Alternative approach is to use chaining

CS121 Data Structures CS121 © JAS 2004 Operations on a table Initialisation Chaining – set chain pointers to nil – O(n) Open addressing – set keys to empty key – O(n) Insert, Retrieval, Update Chaining – use hash function, then search linked list – O(s) where s is average list length Open addressing – use hash functions to find entry – O(k) where k is some constant based on complexity of hash functions

CS121 Data Structures CS121 © JAS 2004 Deletion Chaining – delete entry from list – O(s) Open addressing – must mark entry has having been deleted to avoid stopping search early – O(k) Enumeration Not normally feasible

CS121 Data Structures CS121 © JAS 2004 Design of Hash functions – step one generate a key value – preconditioning (1) Using codes for letters – a=1 asciiposn cat = =3123* * *128 0 =49428 tac = =31220* * *128 0 =327939

CS121 Data Structures CS121 © JAS 2004 (2) Using bit patterns cat = tac = If the bit patterns get too long then a section of the bit pattern can be selected to use for the hash function.

CS121 Data Structures CS121 © JAS 2004 Step two – generate an address from the key value Division by table size Folding – The key is divided into sections, and the sections are added together. – if Key = ; hash(Key) = = 537 –addition, subtraction and multiplication could be used Midsquare – square the key and select a number of bits from the middle –if Key = 12345; Key 2 = ; select middle – 399 Truncate – select last n digits –if Key = ; select last 3 digits – 122

CS121 Data Structures CS121 © JAS 2004 Hash Table Reordering When a hash table is nearly full many items are not at the locations given by their hash address so many comparisons are made to locate them, and if an item is not present an entire list of rehash positions must be searched. To address this problem the table can be reordered.

CS121 Data Structures CS121 © JAS 2004 Ordered Hash Tables (Amble and Knuth) All items that hash to the same address are maintained in descending order of the key. Assume that a nil entry is less than all possible keys. A search can then terminate whenever a key less than that being searched for is found. On insertion when a collision occurs the keys are compared and the rehash/search for free space process continues with the smaller key.

CS121 Data Structures CS121 © JAS H(cat)=(3+1+20)/9=24/9=6; rh=2 H(act)=(1+3+20)/9=24/9=6; rh=2 H(tac)=(20+1+3)/9=24/9=6; rh=2 cat tac act

CS121 Data Structures CS121 © JAS 2004 Ordered Hash Tables reduce the average number of probes to determine an entry is in the table. However, the number of probes for insertion is not reduced and equals the number required for an unsuccessful search in an unordered table. Also ordered tables require significant data movement.

CS121 Data Structures CS121 © JAS 2004 Brent’s Method Works with double hashing. Rehash search key until empty slot found. Then consider all keys in rehash path and determine if placing one of these in an empty slot would require fewer rehashes. If so place search key in this position and move existing key to empty rehash slot.

CS121 Data Structures CS121 © JAS

CS121 Data Structures CS121 © JAS 2004 Brent’s method reduces the average number of probes for a successful search, but not for an unsuccessful search. Insertion is also more complex. Brent’s method can be applied recursively – the displaced item is again inserted using Brent’s method (checking keys on rehash path for possible replacement – but if carried to extreme would yield unacceptably long insertion times.

CS121 Data Structures CS121 © JAS 2004 Consider a hash function which used the initial letter of the key word and inserted into a table of size 26 If collisions are resolved by chaining we have an index table – indexed by initial letter If the chains are replaced by the same hash structure using the second letter of the key word as the hash function - we have a 26-ary tree

CS121 Data Structures CS121 © JAS 2004 With an index table (or a 26-ary tree!) we regain the property that we can list the entries in sorted form However, it is also possible to combine hash tables, trees and index tables EgUse an index table according to first letter, then use hash tables for each letter sub-table

CS121 Data Structures CS121 © JAS 2004