1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hash Tables CSC220 Winter What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?
Hash Tables.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree.
CSCE 3400 Data Structures & Algorithm Analysis
Skip List & Hashing CSE, POSTECH.
Data Structures Using C++ 2E
Dictionaries and Their Implementations Chapter 18 Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Log Files. O(n) Data Structure Exercises 16.1.
Hashing Techniques.
Dictionaries and Their Implementations
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Spring 2015 Lecture 6: Hash Tables
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
1 Symbol Tables The symbol table contains information about –variables –functions –class names –type names –temporary variables –etc.
Comp 335 File Structures Hashing.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Dictionaries and Their Implementations Chapter 18 Data Structures and Problem Solving with C++: Walls and Mirrors, Frank Carrano, © 2012.
Hash Tables ADT Data Dictionary, with two operations – Insert an item, – Search for (and retrieve) an item How should we implement a data dictionary? –
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
Chapter 11 (Lafore’s Book) Hash Tables Hwajung Lee.
Hashing.
Hash table CSC317 We have elements with key and satellite data
Data Abstraction & Problem Solving with C++
Review Graph Directed Graph Undirected Graph Sub-Graph
Hash functions Open addressing
Hash Table.
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
Advanced Implementation of Tables
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Data Structures – Week #7
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Lecture-Hashing.
Presentation transcript:

1 Hashing (Walls & Mirrors - end of Chapter 12)

2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson

3 Overview Hashing Data with Multiple Organizations

4 Hashing Basic idea: Define a function that, given an item’s search key, determines the position in a table where an item should be stored. No search-key comparisons are required. Finding an item this way takes O( 1 ) time, which is even better than O( log N ) time required by a minimum-height binary search tree!

5 Hashing: Definitions Recall, that a table is an Abstract Data Type (ADT) in which items are stored and retrieved according to their search-key values. A hash function is a function that maps the search key of an item into a table location that will contain the item. A hash table is an array that contains table items in the locations assigned by a hash function.

6 Hashing: Example Suppose that flight information for an airline (e.g. origin, destination, departure time, arrival time, available seats, etc.) is to be stored in a table by flight number. If the flight numbers are 3-digit numbers, ranging from 100 to 999, then one might simply store the information for flight k in position k of array a, namely, a[k]. However, if flight information needs to be maintained in an air traffic control system for all airlines serving an airport, array a may become very large with many empty positions where no flight number has been assigned.

7 Hashing: Choosing a Hash Function A solution to this problem is to provide a hash function, h, that maps the flight number of an airline into a valid position in a “reasonably-sized” array. (We shall discuss choosing an appropriate size for this array later.) If array a is of size N, namely, a[0.. N–1], then for flight k, a simple and effective choice of h is h( k ) = k mod N For example, if N = 1000, then flight 1234 would be stored in position 1234 mod 1000 of array a, namely a[234]. Note that, for any k  0, 0  (k mod N)  N–1. Therefore, this approach is guaranteed to produce a valid index for array a[0.. N–1].

8 Choosing a Hash Function (Cont’d.) To be effective, a hash function must be a) fast to compute, and b) distribute items evenly throughout the hash table (array). Various hash functions have been proposed, including –Selecting digits: h( 1234 ) = 23 (select middle two digits), –Folding: h( 1234 ) = = 46. Research shows that hash functions that are best at achieving objective (b): –involve the entire search key, and –if h(k) = k mod N is used, N is chosen to be a prime number.

9 Hashing a Character String If the search key is an array of characters or string, it may be necessary to convert it into an integer before a hash function can be applied. One way of doing this is to represent each character by its ASCII value, and then concatenate the results: S = 123 octal = 83 decimal U = 125 octal = 85 decimal E = 105 octal = 69 decimal The integer corresponding to “SUE” would then be 123,125,105 octal = 21,801,541 decimal If we applied h(k) = k mod N to this, with N = 1000, we obtain array position 21,801,541 mod 1000 = 541

10 Hashing a Character String: Caution! The following must be considered: Integer overflow. On a 32-bit computer, the largest int is about 4.3 * (The largest unsigned int is about 8.6 * 10 9 ; long int is often implemented the same as int.) – Care must be taken to ensure that the numeric value determined for a string does not exceed the available space. – It may be useful to hash every 2-3 characters or employ folding as well as concatenation. Loss of significant digits. In the preceding example, 21,801,541 mod 1000 = 541 the most important digits are 541; is, essentially, discarded by mod. Care must be taken to ensure that the rightmost digits are not dropped before the hash function is applied. (Otherwise, several strings could map to the same location.)

11 Hashing: Resolving Collisions Suppose that we use hash function h(k) = k mod N, with N = 1009 (which is prime). Although this function will distribute items evenly throughout the hash table, note that 1234 mod 1009 = 225 = 2243 mod 1009 = ( *1009) mod 1009 = 3252 mod 1009 = ( *1009) mod 1009 =... Consequently, we can still get multiple, distinct search keys mapping to the same table location. These conditions are called collisions. Two general approaches for resolving collisions are considered.

12 Resolving Collisions (Cont’d.) 1) Open Addressing. If the location indicated by hash function, h(k), is occupied, search for another open (available) location: Linear probing: If table location h(k) is occupied, consider locations h(k)+1, h(k)+2, h(k)+3, … until an available location is found. Quadratic probing: If table position h(k) is occupied, consider h(k)+1 2, h(k)+2 2, h(k)+3 2, … = h(k)+1, h(k)+4, h(k)+9, … until an available location is found. Double hashing: If table location h(k) is occupied, consider h(k)+g(k), h(k)+2*g(k), h(k)+3*g(k), … until an available location is found. g(k) is a second hash function. For table a[0.. N–1], h(k)+j = N wraps around to h[0].

13 Resolving Collisions (Cont’d.) 2) Restructuring the Hash Table. The structure of the hash table is changed to accommodate multiple items in the same location: Bucketting: Each location in table a[0.. N–1] is an array, called a bucket, that can store multiple items. Separate Chaining: Each location in table a[0.. N–1] is the head of a linked list.

14 Resolving Collisions: Comparing Approaches Linear probing ( h(k)+1, h(k)+2, … ) will often cause items to cluster in the hash table, resulting in additional collisions. Quadratic probing ( h(k)+1 2, h(k)+2 2, … ) eliminates the kinds of clusters formed by linear probing, and can be effective if the hash table is sufficiently large. Double hashing ( h(k)+g(k), h(k)+2*g(k), … ) can also be effective at eliminating clusters if g is carefully chosen and the hash table is sufficiently large. Bucketting will perform well if the table and the buckets are sufficiently large, but it can be wasteful of space. Separate Chaining is space efficient, since the linked lists are allocated dynamically, and effective at resolving collisions as long as the linked lists do not get too long.

15 Resolving Collisions: Choosing a Table Size As a hash table fills, the chance of collision increases, and the efficiency of locating an item decreases. Specifically, for a hash table of size N, let  = Number of items in the table / N When  = 0.5 (the table is half full), the time to access an item is nearly the same for all methods discussed. As the table fills and  approaches 1, separate chaining is the most efficient. (  is the average length of the linked lists.) For the open addressing methods, when computing , deleted items should be considered as remaining in the table, since the positions they occupy will need to be visited (and skipped over) when probing for an item. In any case, the size of a hash table should be chosen so that   2/3.

16 Data with Multiple Organizations Suppose that you are running a business where customer orders are placed via the Web. You want to fill the customer orders in the order they were placed, i.e. first-come-first-served. However, if a customer calls to check on the status of their order, you would like to be able to quickly look up their account information, given their name. One solution would be to maintain two copies of the customer orders: one stored in a queue in FIFO order, and the other in a list, sorted by the customers’ last name. An alternative approach is to store one copy of the customer orders, but allow them to be linked in two different ways.

17 Data with Multiple Organizations (Cont’d.) ChenMillerSmithWeiss headPtr to sorted list SmithChenWeissMiller backPtr to FIFO queue

18 Data with Multiple Organizations (Cont’d.) By storing only one copy of the customer orders, we don’t have to worry about keeping multiple copies up-to-date or in-synch. headPtr to sorted list ChenMillerSmithWeiss backPtr to FIFO queue