Hashing 8 April 2003. 2 Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hash Tables.
Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree.
CSCE 3400 Data Structures & Algorithm Analysis
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Using arrays – Example 2: names as keys How do we map strings to integers? One way is to convert each letter to a number, either by mapping them to 0-25.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
Hashing Techniques.
Hashing CS 3358 Data Structures.
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Tirgul 9 Hash Tables (continued) Reminder Examples.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Tirgul 8 Hash Tables (continued) Reminder Examples.
Lecture 10: Search Structures and Hashing
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Hash Table March COP 3502, UCF.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.
Implementing Dictionaries Many applications require a dynamic set that supports dictionary-type operations such as Insert, Delete, and Search. E.g., a.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
1 Hash table. 2 A basic problem We have to store some records and perform the following:  add new record  delete record  search a record by key Find.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
Data Structures and Algorithms Lecture (Searching) Instructor: Quratulain Date: 4 and 8 December, 2009 Faculty of Computer Science, IBA.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Copyright © Curt Hill Hashing A quick lookup strategy.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Data Structure & Algorithm Lecture 8 – Hashing JJCAO Most materials are stolen from Prof. Yoram Moses’s course.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Hashing - Hash Maps and Hash Functions
Review Graph Directed Graph Undirected Graph Sub-Graph
Hash functions Open addressing
Advanced Associative Structures
Hash Table.
Hash Table.
CS202 - Fundamental Structures of Computer Science II
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
What we learn with pleasure we never forget. Alfred Mercier
Lecture-Hashing.
Presentation transcript:

Hashing 8 April 2003

2 Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each student uniquely identified by a student number. The student numbers currently range from about 1,000,000 to above 9,999,999 therefore an array of 10 million elements would be enough to hold all possible student numbers. Given each student record is at least 100 bytes long we would require an array size of 1,000 Megabytes to do this.

3 Example - 2 There are fewer than 400 students enrolled in CS at present There must be a better way We could have a sorted array of 400 elements and retrieve students using a binary search. We want our access to be as fast as possible. In this situation we would use a hash table.

4 Example - 3 Find some way to transform a student number from the several million values to a range closer to 400 but avoiding (as much as possible) the case where two numbers transform (or hash) to the same value. We place the records according to their transformed key into a new array (or hash table) containing at least 400 elements.

5 Example - 4 Make the size of the hash table 479 elements long. A popular method for transforming keys is to use the mod operator (take the remainder upon integer division of the original key by the size of the hash table)

6 Example - 5 For example, consider student number 949,786,456: % 479 = 348 Therefore we should place this student in array element 348 in the hash table (note: the mod operator is effective because it can only have values in the range ).

7 Direct Access Table If we have a collection of n elements whose keys are unique integers in (1,m), where m >= n, then we can store the items in a direct address table, T[m], where T i is either empty or contains one of the n elements. Searching a direct address table is an O(1) operation: – for a key, k, we access T k, 1. if it contains an element, return it, 2. if it doesn't then return NULL. – There are two constraints: 1. the keys must be unique, and 2. the range of the keys must be severely bounded.

8 Direct Access Table

9 Using Linked Lists If the keys are not unique, then we can construct a set of m lists and store the heads of these lists in the direct address table. The time to find an element will still be O(1). If the maximum number of duplicates is n dup max, then searching for a specific element is O(n dup max ).

10 Using Linked Lists If duplicates are the exception rather than the rule, then n dup max is much smaller than n and a direct address table will provide good performance. But if n dup max approaches n, then the time to find a specific element approaches O(n) and some other structure such as a tree will be more efficient.

11 Using Linked Lists

12 Analysis The range of the keys determines the size of the direct address table and may be too large to be practical. For instance it’s not likely that you’ll be able to use a direct address table to store elements which have arbitrary 32-bit integers as their keys for a few years yet! Direct addressing is easily generalized to the case where there is a function, h(k) => (1,m) which maps each value of the key, k, to the range (1,m). In this case, we place the element in T[h(k)] rather than T[k] and we can search in O(1) time as before.

13 Mapping Fuctions The direct address approach requires that the function, h(k), is a one-to-one mapping from each k to integers in (1,m). Such a function is known as a perfect hashing function: it maps each key to a distinct integer within some manageable range and lets us build an O(1) search time table. Finding a perfect hashing function is not always possible. Sometimes we can find a hash function which maps most of the keys onto unique integers, but maps a small number of keys onto the same integer. If the number of collisions is sufficiently small, then hash tables work well and give O(1) search times.

14 Handling Collisions In cases where multiple keys map to the same integer, then elements with different keys may be stored in the same “slot” of the hash table. There may be more than one element which should be stored in a single slot of the table. Techniques used to manage this problem are: – chaining – overflow areas – re-hashing – using neighboring slots (linear probing) – quadratic probing – random probing

15 Chaining One simple scheme is to chain all collisions in lists attached to the appropriate slot. Allows an unlimited number of collisions to be handled and doesn't require a priori knowledge The tradeoff is the same as with linked lists versus array implementations of sets: linked lists incur overhead in space and, to a lesser extent, in time.

16 Chaining

17 How Chaining Works To insert a new item in the table, we hash the key to determine – which list the item goes on insert the item at the beginning of the list (For example, to insert 11, we divide 11 by 8 giving a remainder of 3. Thus, 11 goes on the list starting at HashTable[3]) To find an item, we hash the number and then follow links in the chain down the list to see if it is present.

18 How Chaining Works-2 To delete a number, we find the number and remove the node from the appropriate linked list. Entries in the hash table are dynamically allocated and entered on a linked list associated with each hash table entry. Alternative methods, where all entries are stored in the hash table itself, are known as direct or open addressing.

19 Re-hashing Re-hashing schemes use a second hashing operation when there is a collision. If there is a further collision, we re-hash until an empty “slot” in the table is found. The re-hashing function can either be a new function or a re-application of the original one. As long as the functions are applied to a key in the same order, then a sought key can always be found.

20 Re-Hashing

21 Linear probing One of the simplest re-hashing functions is +1 (or -1), i.e., on a collision, look in the neighboring slot in the table. It calculates the new address extremely quickly.

22 Open Addressing 1. Linear Probing In linear probing, when a collision occurs, the new element is put in the next available spot (essentially doing a sequential search). Example: Insert : Hash table size = 10, so 49 % 10 = 9, 18 % 10 = 8, 89 % 10 = 9, 48 % 10 = 8

23 Open Addressing Insert 1Insert 2Insert 3Insert 4 [0] 89 [1] 48 [2] [3] [4] [5] [6] [7] [8] 18 [9]49

24 Problems In linear probing records tend to cluster around each other. (once an element is placed in the hash table the chances of it’s adjacent element being filled are doubled–either filled by a collision or directly). If two adjacent elements are filled then the chances of the next element being filled is three times that for an element with no neighbor.

25 Animation from the Web The animation gives you a practical demonstration of the effect of linear probing: it also implements a quadratic re-hash function so that you can see differences. 0/hash_tables.html

26 Clustering Linear probing is subject to a clustering phenomenon. Re-hashes from one location occupy a block of slots in the table which “grows” towards slots and blocks to which other keys hash. This exacerbates the collision problem and the number of re-hashes can become large.

27 Quadratic Probing Better behavior is usually obtained with quadratic probing, where the secondary hash function depends on the re-hash index: address = h(key) + c i 2 On the i th re-hash. (A more complex function of i can be used.) Quadratic probing is susceptible to secondary clustering since keys which have the same hash value also have the same probe sequence Secondary clustering is not nearly as severe as clustering caused by linear probing.

28 Overflow area When a collision occurs, a slot in an overflow area is used for the new element and a link from the primary slot established as in a chained system. This is essentially the same as chaining, except that the overflow area is pre-allocated and thus may be faster to access. As with re-hashing, the maximum number of elements must be known in advance, but in this case, two parameters must be estimated: the optimum size of the primary and overflow areas.

29 Overflow Area

30 Comparison OrganizationAdvantagesDisadvantages Chaining  Unlimited number of elements  Unlimited number of collisions  Overhead of multiple linked lists Re-hashing  Fast re-hashing  Fast access through use of main table space  Maximum number of elements must be known  Multiple collisions may become probable Overflow Area  Fast access  Collisions don't use primary table space  Two parameters which govern performance need to be estimated

31 Hash Functions If the hash function is uniform (equally distributes the data keys among the hash table indices), then hashing effectively subdivides the list to be searched. Worst-case behavior occurs when all keys hash to the same index. Why? It is important to choose a good hash function.

32 Choosing Hash Functions Choice of h: h[x] – must be simple – must distribute (spread) the data evenly Choice of m: m approximates n (about 1 item/linked list) where n = input size

33 Mod Function Choice of a three digit hash for phone numbers e.g x is an integer value. h[x] = x mod m. Choosing last three digit(738) is more appropriate than the first three digits (398) as it distributes the data more evenly. To do this take mod function: x mod m: h[x] = x mod 10 k : gives last k digits h[x] = x mod 2 k : gives last k bits

34 Middle Digits of an Integer This often yields unpredictable (and thus good) distributions of the data. Assume that you wish to take the two digits three positions from the right of x. If x = then h[x] = 72 This is obtained by h[x] = (x/1000) mod 100 Where (x/1000) drops three digits and (x/1000) mod 100 keeps two digits.

35 Order Preserving Hash Function x < y implies h[x]<= h[y] Application: Sorting

36 Perfect Hashing Function A perfect hashing function is one that causes no collisions. Perfect hashing functions can be found only under certain conditions. One application of the perfect hash function is a static dictionary. h[x] is designed after having peeked at the data.

37 Retrieval To retrieve a record is the same as insertion. Take the key value, perform the same transformation as for insertion then look up the value in the hash table.

38 Issues There are two basic issues when designing a hash algorithm: – Choosing the best hash function – Deciding what to do with collisions

39 Hash Function Strategies If the key is an integer and there is no reason to expect a non-random key distribution then the modulus operator is a simple (and efficient) and effective method. If the key is a string value (e.g. someone’s name or C++ reserved words) then it first needs to be transformed to an integer.