COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.

Slides:



Advertisements
Similar presentations
Hash Tables.
Advertisements

Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CSCE 3400 Data Structures & Algorithm Analysis
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Nov 12, 2009IAT 8001 Hash Table Bucket Sort. Nov 12, 2009IAT 8002  An array in which items are not stored consecutively - their place of storage is calculated.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Techniques.
Hashing CS 3358 Data Structures.
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Hash Table March COP 3502, UCF.
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
Comp 335 File Structures Hashing.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
CSC211 Data Structures Lecture 20 Hashing Instructor: Prof. Xiaoyan Li Department of Computer Science Mount Holyoke College.
P p Chapter 11 discusses several ways of storing information in an array, and later searching for the information. p p Hash tables are a common approach.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Hashing - 2 Designing Hash Tables Sections 5.3, 5.4, 5.4, 5.6.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
CSE 373 Data Structures and Algorithms Lecture 17: Hashing II.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Department of Computer Engineering Faculty of Engineering, Prince of Songkla University 1 9 – Hash Tables Presentation copyright 2010 Addison Wesley Longman,
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
DS.H.1 Hashing Chapter 5 Overview The General Idea Hash Functions Separate Chaining Open Addressing Rehashing Extendible Hashing Application Example: Geometric.
Hashing.
Data Structures Using C++ 2E
Data Structures Using C++ 2E
Review Graph Directed Graph Undirected Graph Sub-Graph
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
CS202 - Fundamental Structures of Computer Science II
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Data Structures – Week #7
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
CSE 373: Data Structures and Algorithms
Presentation transcript:

COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

2 Topics How to choose a Hash Function? Closed hashing Linear hashing Quadratic hashing Double hashing

3 Hash Functions Good hash function: Easy & fast to compute Has minimal number of clashes Data items are spread uniformly throughout the array Hashing problems reduce to the following points: Finding a hashing method that minimizes collisions Resolving collisions when they do happen

4 Hashing Methods Integer Type It is sufficient for a hash function to operate on integers Any arbitrary integer can be converted into an integer within a certain range The index of the hash table lies within a specific range Solutions Digit Selection Folding Modulo arithmetic

5 Hashing Methods Digit Selection Choose a group of digits from the number Use combination of Mod/div operations on the search key One of the most effective hashing methods

6 Hashing Methods Digit Selection Example Assume table size = 1000 Key= Choose 2 nd, 4 th,& last digits H(key) = 147 key = d1 d2 d3 d4 d5 d6 d7 d8 d9 Choose leftmost 3 digits H(key) = key Div = d1 d2 d3 Choose rightmost 3 digits H(key) = key Mod 1000 = d7 d8 d9

7 Hashing Methods Digit Selection Mid-square Method (Multiplication) First Variant Key is squared, then some digits of this square are selected to give the index. Example k= H(k) = k 2 = Pick up 3 middle digits  index = 077

8 Hashing Methods Folding Method Digits are added together instead of just being selected Digits can first be grouped and then add the groups Folding can be done more than once on the search key

9 Hashing Methods Folding Method Example: Key = H(Key) = = 28 Disadvantage All values will be put in the range Solution Divide into groups then fold Key = Groups: Fold: =454 Hash again to fit into table size

10 Hashing Methods Modulo Arithmetic Choose a prime table size Divide the search key using modulo the size of the table h(x) = x mod TableSize Items will be distributed over the table Advantages Simple Reduces collisions items will be evenly distributed if table size is a prime number

11 Hashing Methods What should be done if the search key is a character? Convert the character string into some integer before applying the hash function How should we do that? Use the ASCII code: Can lead to duplication (e.g. NOTE and TONE will result in the same hash function) Write a numeric value for each character in binary Concatenate the results

12 Hashing Methods Example: Key = NOTE ASCII code for each character N = 14 = (01110) // Order of ‘N’ in alphabet O = 15 = (01111) T = 20 = (10100) E = 5 = (00101) Concatenation Binary result: y = ( ) Equivalent decimal X = 474,757 Apply hash function h(x) = x mod TableSize

13 Closed Hashing (Open Addressing) No secondary data structure All the data goes inside the table. On collision, try alternate cells until an empty cell is found. How? Bigger table is needed.

14 Linear Probing Linear search from position where collision occurred.

15 Linear Probing This is called a collision, because there is already another valid record at [2]. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [100] Number Number Number Number Number Number My hash value is [2]. [2] is occupied, how to do

16 Linear Probing This is called a collision, because there is already another valid record at [2]. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [100] Number Number Number Number Number Number My hash value is [2]. When a collision occurs, move forward until you find an empty spot. When a collision occurs, move forward until you find an empty spot.

17 Linear Probing This is called a collision, because there is already another valid record at [2]. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [100] Number Number Number Number Number Number My hash value is [2]. [5] is empty, I can insert it

18 Linear Probing This is called a collision, because there is already another valid record at [2]. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [100] Number Number Number Number Number The new record goes in the empty spot. The new record goes in the empty spot. Number

19 Linear Probing Find the next index in the array up until the maximum subscript is reached and then it should return to the first index (wrap around) Try alternate cells Cells h 0 (x), h 1 (x), h 2 (x), … are tried until an free cell is found h i (x) = ( hash(x) + f(i) ) mod TSIZE f(0) = 0 Linear probing f(i) = i

20 Searching for a Key The data that's attached to a key can be found fairly quickly. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [100] Number Number Number Number Number Number Number

21 Searching for a Key Calculate the hash value. Check that location of the array for the key.. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [100] Number Number Number Number Number Number Number My hash value is [2]. Not me.

22 Searching for a Key Keep moving forward until you find the key, or you reach an empty spot. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [100] Number Number Number Number Number Number Number My hash value is [2]. Not me.

23 Searching for a Key Keep moving forward until you find the key, or you reach an empty spot. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [100] Number Number Number Number Number Number Number My hash value is [2]. Not me.

24 Searching for a Key Keep moving forward until you find the key, or you reach an empty spot. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [100] Number Number Number Number Number Number Number My hash value is [2]. Yes!

25 Searching for a Key When the item is found, the information can be copied to the necessary location. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [100] Number Number Number Number Number Number Number My hash value is [2]. Yes!

26 Deleting a Record Records may also be deleted from a hash table [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [100] Number Number Number Number Number Number Please delete me.

27 Deleting a Record Records may also be deleted from a hash table. But the location must not be left as an ordinary "empty spot" since that could interfere with searches. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [100] Number Number Number Number Number

28 Deleting a Record Records may also be deleted from a hash table. But the location must not be left as an ordinary "empty spot" since that could interfere with searches. The location must be marked in some special way so that a search can tell that the spot used to have something in it. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [100] Number Number Number Number Number

29 Linear Probing Advantage Uses less memory than chaining don’t have to store all the links Disadvantages Can be slower than chaining may have to walk along the table for a long way Difficult to delete a key and associated record. has an impact on the search process Clustering Primary clustering Table contains groups of consecutively occupied locations

30 Linear probing: f(i) = i Quadratic probing: f(i) = i 2 Insert 10, 40, 60, 20, 30, 70, 80 Quadratic Probing mod 10 = 6

31 Quadratic Probing Advantages Easy to compute Avoids primary clustering Disadvantage Not all entries are searched. Might not encounter a free storage location even when there are locations that are still free Elements that has the same hash value will probe the same set of alternate cells Secondary clustering Not a big problem in practice Use a good hash function

32 Double Hashing Use two hash functions one as before that generates the ‘home’ position. second one generates a sequence of offsets from the home position that define the probe sequence. probe = (probe + offset) mod N If the size of the table is prime, this method will eventually examine every position in the table.

33 Problems with Closed Hashing Table too full Running time too long Inserts could fail Must be chosen in advance Don’t know the number of elements Rehashing Build a new table that is about twice as big Hash the elements into the new table Need to apply new hash function to every item in the old hash table

34 Summary Hash tables are specialized for dictionary operations: Insert, Delete, Search Principle: Turn the key field of the record into a number, which we use as an index for locating the item in an array. O(1) in the ideal case Problems: find a good hash function, collisions, wasted space, do not support ordering queries Implementations: open hashing, closed hashing, dynamic hashing

35 Reveiw What is a perfect hash function? What is a collision? What is meant by clustering? How does clustering affect the overall efficiency of hashing? What is a bucket? What is the time complexity for insertion, deletion, and search in Hashing?