Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:

Slides:



Advertisements
Similar presentations
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Advertisements

Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Hashing General idea Hash function Separate Chaining Open Addressing
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CSCE 3400 Data Structures & Algorithm Analysis
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Log Files. O(n) Data Structure Exercises 16.1.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Techniques.
Hashing CS 3358 Data Structures.
CSC 2300 Data Structures & Algorithms February 27, 2007 Chapter 5. Hashing.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
CS2420: Lecture 33 Vladimir Kulyukin Computer Science Department Utah State University.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Hashing General idea: Get a large array
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Hashing. Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Hashing CS 202 – Fundamental Structures of Computer Science II Bilkent.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
1.  We’ll discuss the hash table ADT which supports only a subset of the operations allowed by binary search trees.  The implementation of hash tables.
DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Appendix E-A Hashing Modified. Chapter Scope Concept of hashing Hashing functions Collision handling – Open addressing – Buckets – Chaining Deletions.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing as a Dictionary Implementation Chapter 19.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
H ASH TABLES. H ASHING Key indexed arrays had perfect search performance O(1) But required a dense range of index values Otherwise memory is wasted Hashing.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
CE 221 Data Structures and Algorithms
Hashing (part 2) CSE 2011 Winter March 2018.
Hashing Alexandra Stefan.
Handling Collisions Open Addressing SNSCT-CSE/16IT201-DS.
Hashing Alexandra Stefan.
Hash Table.
Hashing Alexandra Stefan.
EE 312 Software Design and Implementation I
Data Structure and Algorithm Analysis 05: Hashing
Data Structures – Week #7
Hashing.
Data Structures and Algorithm Analysis Hashing
EE 312 Software Design and Implementation I
Presentation transcript:

Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:

General Idea Purpose: Design a kind of tables that can perform insertions, deletions, and finds in constant average time. In this chapter, we discuss the Hash table ADT. The ideal hash table data structure is merely an array of some fixed size, containing the keys. Typically, a key is a string with an associated value (for instance, salary information).

General Idea The implementation of hash tables is frequently called hashing. We will refer to the table size as TableSize. The common convention is to have the table run from 0 to TableSize-1.

General Idea Each key is mapped into some number in the range 0 to TableSize-1 and placed in the appropriate cell. The mapping is called a hash function, which ideally should be simple to compute and should ensure that any two distinct keys get different cells. Since there are a finite number of cells and a virtually inexhaustible supply of keys, this is clearly impossible, and thus we seek a hash function that distributes the keys evenly among the cells.

General Idea The only remaining problems deal with choosing a function, deciding what to do when two keys hash to the same value (this is known as a collision), and deciding on the table size. 0 John Phil Dave Mary

Hash Function If the input keys are integers, when simply returning (Key mod TableSize) is generally a reasonable strategy, unless Key happens to have some undesirable properties. In this case, the choice of hash function needs to be carefully considered. For instance, if the table size is 10 and the keys all end in zero, then the standard hash function is a bad choice.

Hash Function To avoid situations like the one above, it is usually a good idea to ensure that the table size is prime. When the input keys are random integers, then this function is not only very simple to compute but also distributes the keys evenly.

Hash Function Usually, the keys are strings; in this case, the hash function needs to be chosen carefully. One option is to add up the ASCII values of the characters in the string. int Hash(char *Key, int TableSize) { unsigned int HashVal=0; while (*Key!= ‘ \0 ’ ) HashVal+=*Key++; return HashVal%TableSize; }

Hash Function However, if the table size is large, the function does not distribute the keys well. For instance, suppose that TableSize=10007 (10007 is a prime number), and all keys are eight or fewer characters long. Since a char has an integer value that is always at most 127, the hash function can only assume values between 0 and 1016, which is 127*8. This is clearly not an equitable distribution!

Hash Function This hash function assumes that Key has at least two characters plus the NULL terminator. The value 27 represents the number of letters in the English alphabet, plus the blank, and 729 is This function examines only the first three characters, but if these are random and the table size is 10,007, as before, then we would expect a reasonably equitable distribution. int Hash(char *Key, int TableSize) { return (Key[0]+27*Key[1]+729*Key[2]) % TableSize; }

Hash Function Unfortunately, English is not random. Although there are 26 3 =17,576 possible combinations of three characters (ignoring blanks), a check of a reasonably large on-line dictionary reveals that the number of different combinations is actually only 2,851. Even if non of these combinations collide, only 28 percent of the table can actually be hashed to.

Collision If, when an element is inserted, it hashes to the same value as an already inserted element, then we have a collision and need to resolve it. There are several methods for dealing with this. We will discuss two of the simplest: separate chaining and open addressing.

Separate Chaining The first strategy, separate chaining, is to keep a list of all elements that hash to the same value. For convenience, our lists have headers. We assume for this section that the keys are the first 10 perfect squares, namely 0, 1, 4, 9, 16, 25, 36, 49, 64, 81. The hashing function is simply Hash(X)=X mod 10 (The table size is not prime but is used here for simplicity).

Separate Chaining / 1 / 81 4 / / 36 9 / / / / / /

Separate Chaining To perform a Find, we use the hash function to determine which list to traverse. We then traverse this list in the normal manner, returning the position where the item is found. To perform an Insert, we traverse down the appropriate list to check whether the element is already in place (if duplicates are expected, an extra field is usually kept, and this field would be incremented in the event of a match). If the element turns out to be new, it is inserted either at the front of the list or at the end of the list, which ever is easiest. The Delete is a straightforward implementation in a linked list.

Open Addressing In an open addressing hashing system, if a collision occurs, alternative cells are tried until an empty cell is found. More formally, cells h 0 (X), h 1 (X), h 2 (X), … are tried in succession, where h i (X)=(Hash(X)+F(i)) mod TableSize, with F(0)=0. The function, F, is the collision resolution strategy. There are three common collision resolution strategies, namely Linear Probing, Quadratic Probing, and Double Hashing.

Linear Probing In linear probing, F is a linear function of i, typically F(i)=i. This amounts to trying cells sequentially (with wraparound) in search of an empty cell.

Linear Probing Example: Keys are {89, 18, 49, 58, 69}. EmptyAfter 89After 18After 49After 58After

Quadratic Probing Quadratic probing is a collision resolution method that eliminates the primary clustering problem of linear probing. Quadratic probing the collision function is quadratic. The popular choice is F(i)=i 2.

Quadratic Probing Empty After 89 After 18 After 49After 58After

Quadratic Probing For linear probing it is a bad idea to let the hash table get nearly full, because performance degrades. For quadratic probing, the situation is even more drastic: There is no guarantee of finding an empty cell once the table gets more than half full, or even before the table gets half full if the table size is not prime. This is because at most half of the table can be used as alternative locations to resolve collisions.

Double Hashing The last collision resolution method we will examine is double hashing. For double hashing, one popular choice is F(i)=i*hash 2 (X) This formula says that we apply a second hash function to X and probe at a distance hash 2 (X), 2hash 2 (X), …, and so on. A function such as hash 2 (X)=R-(X mod R), with R a prime smaller than TableSize, will work well. Here, R is set to 7.

Double Hashing EmptyAfter 89After 18After 49After 58After

Homework 5.1