Hashing Table Professor Sin-Min Lee Department of Computer Science.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hash Tables.
Hashing.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Skip List & Hashing CSE, POSTECH.
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
Hashing21 Hashing II: The leftovers. hashing22 Hash functions Choice of hash function can be important factor in reducing the likelihood of collisions.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Hashing Techniques.
Hashing CS 3358 Data Structures.
Data Structures Hash Tables
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Tirgul 9 Hash Tables (continued) Reminder Examples.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Tirgul 8 Hash Tables (continued) Reminder Examples.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Hash Table March COP 3502, UCF.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
1 Hash table. 2 A basic problem We have to store some records and perform the following:  add new record  delete record  search a record by key Find.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Comp 335 File Structures Hashing.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 5. Abstract Data Structures & Algorithms 5.2 Static Data Structures.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Chapter 11 Hash Anshuman Razdan Div of Computing Studies
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining.
Hashing. Search Given: Distinct keys k 1, k 2, …, k n and collection T of n records of the form (k 1, I 1 ), (k 2, I 2 ), …, (k n, I n ) where I j is.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
Data Structures Using C++ 2E
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Hash functions Open addressing
Hash Table.
CS202 - Fundamental Structures of Computer Science II
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Presentation transcript:

Hashing Table Professor Sin-Min Lee Department of Computer Science

What is Hashing? n Hashing is another approach to storing and searching for values. n The technique, called hashing, has a worst case behavior that is linear for finding a target, but with some care, hashing can be dramatically fast in the average case.

TABLES: Hashing Hash functions balance the efficiency of direct access with better space efficiency. For example, hash function will take numbers in the domain of SSN ’ s, and map them into the range of 0 to 10, f(x) Hash Function Map: The function f(x) will take SSNs and return indexes in a range we can use for a practical array.

Where hashing is helpful? n Any where from schools to department stores or manufactures can use hashing method to simple and easy to insert and delete or search for a particular record.

Compare to Binary Search? n Hashing make it easy to add and delete elements from the collection that is being searched. n Providing an advantage over binary search. n Since binary search must ensure that the entire list stay sorted when elements are added or deleted.

How does hashing work? n Example: suppose, the Tractor company sell all kind of tractors with various stock numbers, prices, and other details. They want us to store information about each tractor in an inventory so that they can later retrieve information about any particular tractor simply by entering its stock number.

n Suppose the information about each tractor is an object of the following form, with the stock number stored in the key field: n struct Tractor n { n int key; // The stock number n double cost; // The price, in dollar n int horsepower; // Size of engine n };

n Suppose we have 50 different stock number and if the stock numbers have values ranging from 0 to 49, we could store the records in an array of the following type, placing stock number “j” in location data[ j ]. n If the stock numbers ranging from 0 to 4999, we could use an array with 5000 components. But that seems wasteful since only a small fraction of array would be used.

n It is bad to use an array with 5000 components to store and search for a particular elements among only 50 elements. n If we are clever, we can store the records in a relatively small array and yet retrieve particular stock numbers much faster than we would by serial search.

n Suppose the stock numbers will be these: 0, 100, 200, 300, … 4800, 4900 n In this case we can store the records in an array called data with only 50 components. The record with stock number “j” can be stored at this location: n data[ j / 100] n The record for stock number 4900 is stored in array component data[49]. This general technique is called HASHING.

Key & Hash function n In our example the key was the stock number that was stored in a member variable called key. n Hash function maps key values to array indexes. Suppose we name our hash function hash. n If a record has the key value of j then we will try to store the record at location data[hash(j)], hash(j) was this expression: j / 100

n In our example, every key produced a different index value when it was hashed. That is a perfect hash function, but unfortunately a perfect hash function cannot always be found. n Suppose we have stock number 300 and 399. Stock number 300 will be place in data[300 / 100] and stock number 399 in data[399 / 100]. Both stock numbers 300 and 399 supposed to be place in data[3]. This situation is known as a COLLISION.

Algorithm to deal with collision n 1. For a record with key value given by key, compute the index hash(key). n 2. If data[hash(key)] does not already contain a record, then store the record in data[hash(key)] and end the storage algorithm. (Continue next slide)

n 3. If the location data[hash(key)] already contain a record, then try data[hash(key) + 1]. If that location already contain a record, try data[hash(key) + 2], and so forth until a vacant position is found. When the highest numbered array position is reached, simply go to the start of the array. n This storage algorithm is called: n Open Address Hashing

Hash functions to reduce collisions n 1. Division hash function: key % table Size. With this function, certain table sizes are better than others at avoiding collisions.The good choice is a table size that is a prime number of the form 4k + 3. For example, 811 is a prime number equal to (4 * 202) + 3. n 2. Mid-square hash function. n 3. Multiple hash function.

Linear Probing Hash( 89, 10) = 9 Hash( 18, 10) = 8 Hash( 49, 10) = 9 Hash( 58, 10) = 8 Hash( 9, 10 ) = Insert 89Insert 18Insert 49Insert 58Insert 9 After H + 1, H + 2, H + 3, H + 4,……..H + i

Problem with Linear Probing n When several different keys are hashed to the same location, the result is a small cluster of elements, one after another. n As the table approaches its capacity, these clusters tend to merge into larger and lager clusters. n Quadratic Probing is the most common technique to avoid clustering.

Hash( 89, 10) = 9 Hash( 18, 10) = 8 Hash( 49, 10) = 9 Hash( 58, 10) = 8 Hash( 9, 10 ) = 9 Quadratic Probing Insert Insert 18Insert 49Insert 58Insert After H+1*1, H+2*2, H+3*3, ….H+i*i

Linear and Quadratic probing problems n In Linear Probing and quadratic Probing, a collision is handle by probing the array for an unused position. n Each array component can hold just one entry. When the array is full, no more items can be added to the table. n A better approach is to use a different collision resolution method called CHAINED HASHING

Chained Hashing n In Chained Hashing, each component of the hash table’s array can hold more than one entry. n Each component of the array could be a List. The most common structure for the array ‘s components is to have each data[j] be a head pointer for a linked list.

... data [0][1][2][3][4][5] Record whose key hashes to 0 Another Record key hashes to 0 Record whose key hashes to 2 Record whose key hashes to 1 Another Record key hashes to 1 Another Record key hashes to 2... CHAIN HASHING

Time Analysis of Hashing n Worst-case occurs when every key gets hashed to the same array index. In this case we may end up searching through all the items to find one we are after --- n a linear operation, just like serial search. n The Average time for search of a hash table is dramatically fast.

Time analysis of Hashing l 1. The Load factor of a hash table n 2. Searching with Linear probing n 3. Searching with Quadratic Probing n 4. Searching with Chained Hashing

The load factor of a hash table n We call X is the load factor of a hash table: n X = Number of occupied table locations The Size of Table’s array

Searching with Linear Probing n In open address hashing with linear probing, a non full hash table, and no deletions, the average number of table elements examined in a successful search is approximately: n X () ______ 1 1 With X != 1

Searching with Quadratic probing n In open address hashing, a non full hash table, and no deletions, the average number of table elements examined in a successful search is approximately: __________ n(1 - X)-l-l X With X != 1

Searching with Chained Hashing n I open address hashing with Chained Hashing, the average number of table elements examined in a successful search is approximately: n 1 X __ 2 +

Summary n Open addressing n Linear Probing n Quadratic hashing n Chained Hashing n Time Analysis of hashing

* Ex: h(k) = (k [0]+ k [1]) % n is not perfect since it is possible that two keys have same first two letters (assume k is an ascii string). * If a function is not perfect, collisions occur. k1 and k2 collide when h2 (k1)= h2(k2).

A good hash function spreads items evenly through out the array. A more complex function may not be perfect. Ex :h2(k)= (k [0] + a1 * k[1]... + aj * k[j]) % n where j is strlen (k) -1; a1...aj are constant.

Example Consider birthdays of 23 people chosen randomly. Probability that everyone of 23 people has distinct birthday = (365x364x...x343)/(365^23 ) <= 0.5 Probability that some two of 23v people have the same birthday >= > If you have a table with m=365 locations and only n=23 elements to be stored in the table (i.e., load factor lambda=n/m=0.063), the probability of collision occurrence is more than 50 %.

Methods to specify another location for z when h(z) is already occupied by a different element n (1) Chaining: h(z) contains a pointer to a list of elements mapped to the same location h(z). n o Separate Chaining n o Coalesced Chaining

n 2) Open Addressing o Linear Probing: Look at the next location. o Double Hashing: Look at the i-th location from h(z), where i is given by another hash function g(z).

CHAINED HASHING

Secondary Clustering n - Tendency of two elements that have collided to follow the same sequence of locations in the resolution of the collision