CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.

Slides:



Advertisements
Similar presentations
Hashing.
Advertisements

HASH TABLE. HASH TABLE a group of people could be arranged in a database like this: Hashing is the transformation of a string of characters into a.
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
© 2004 Goodrich, Tamassia Hash Tables1  
Maps. Hash Tables. Dictionaries. 2 CPSC 3200 University of Tennessee at Chattanooga – Summer 2013 © 2010 Goodrich, Tamassia.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
Dictionaries and Hash Tables1  
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Lecture 10: Search Structures and Hashing
Data Structures Hash Table (aka Dictionary) i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst, Brian Hayes, Andreas Veneris, Glenn Brookshear,
Hashing General idea: Get a large array
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Hash Tables1   © 2010 Goodrich, Tamassia.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Comp 335 File Structures Hashing.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 5. Abstract Data Structures & Algorithms 5.2 Static Data Structures.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing – Part I CS 367 – Introduction to Data Structures.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hashing. Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string.
Week 9 - Monday.  What did we talk about last time?  Practiced with red-black trees  AVL trees  Balanced add.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
Chapter 11 (Lafore’s Book) Hash Tables Hwajung Lee.
Appendix I Hashing.
Hash Tables 1/28/2018 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Hashing CSE 2011 Winter July 2018.
Review Graph Directed Graph Undirected Graph Sub-Graph
Dictionaries Dictionaries 07/27/16 16:46 07/27/16 16:46 Hash Tables 
© 2013 Goodrich, Tamassia, Goldwasser
Hash Tables 3/25/15 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Chapter 10 Hashing.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Dictionaries 1/17/2019 7:55 AM Hash Tables   4
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Dictionaries 4/5/2019 1:49 AM Hash Tables  
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Dictionaries and Hash Tables
Presentation transcript:

CS212: DATA STRUCTURES Lecture 10:Hashing 1

Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket Arrays  Hash function

Map Abstract Data type 3  A map allows us to store elements and these elements can be located quickly using key.  Map stores key-value pairs (k,v),where k is the key and v is its corresponding value.  Each key is unique key.  Motivation:  to search for each element that has been stored by using its key.

Map Abstract Data type 4  Example:  To map storing student records as (student’s name,address and course grades), the key will be the student’s ID number.  Keys(labels) Assigned to values (diskettes) Labeled diskettes are inserted into the map (file cabinet) Keys can be used later to retrieve or remove values

Map Abstract Data type methods 5 size() :return number of entries in M map. isEmpty() :test whether M is empty. get(k) :if M contains an entry e with key equal to k, then return the value of e, else return null. put(k,v): if M doesn’t have an entry with key equal to K, then add entry (k,v) to M and return null. Else, replace with v the existing value of the entry with key equal to k. remove(k): remove from M the entry with key equal to k and returns its value. Keys(): returns an iterable collection contains all the keys stored in M. Values():returns an iterable collection contains all the values as sociated with keys stored in M. Entries(): return an iterable collection containing all the K-value entries in M.

Map Abstract Data type methods 6  Example: put(2,C) null {(5,A), (7,B), (2,C)} put(8,D) null {(5,A), (7,B), (2,C), (8,D)} put(2,E) C {(5,A), (7,B), (2,E), (8,D)} get(7) B {(5,A), (7,B), (2,E), (8,D)} Operation Output Map isEmpty() true Φ put(5,A) null {(5,A)} put(7,B) null {(5,A), (7,B)} get(4) null {(5,A), (7,B), (2,E), (8,D)} get(2) E {(5,A), (7,B), (2,E), (8,D)} size() 4 {(5,A), (7,B), (2,E), (8,D)} remove(5) A {(7,B), (2,E), (8,D)} remove(2) E {(7,B), (8,D)} get(2) null {(7,B), (8,D)} isEmpty() false {(7,B), (8,D)}

What is hash? 7  Hashing is the process of mapping large amount of data item to a smaller table with the help of a hashing function.  Hashing uses a data structure called a hash table.  Although hash tables provide fast insertion, deletion, and retrieval, operations that involve searching, such as finding the minimum or maximum value, are not performed very quickly.  Hashing is also used in many encryption algorithms.  Hash table advantages: From linear search to binary search, we improved our search efficiency from O(n) to O(logn). We now present a new data structure, called a hash table, that will increase our efficiency to O(1), or constant time.

 Hash Table is a data structure in which keys are mapped to array positions by a hash function. This table can be searched for an item in fast time using a hash function to form an address from the key.  Hash Function is a function which, when applied to the key, produces an integer which can be used as an address in a hash table.  Perfect hash function  Good hash function  When more than one element tries to occupy the same array position, we have a collision.  Collision is a condition resulting when two or more keys produce the same hash location. Hash Table 8

Bucket Arrays 9  A bucket array for a hash table is an array A of size N, where each cell of A is thought of as a "bucket" (that is, a collection of key-value pairs) and the integer N defines the capacity of the array.  Example: A bucket array of size 11 for the entries (1,D), (3,C), (3,F), (3,Z), (6,A), (6,C) and (7Q)

Bucket Arrays drawbacks 10  searches, insertions, and removals in the bucket array take O(1) time. This sounds like a great achievement, but it has two drawbacks.  First, the space used is proportional to N. Thus, if N is much larger than the number of entries n actually present in the map, we have a waste of space. The second draw back is that keys are required to be integers in the range [0, N − 1], which is often not the case

Hash functions 11  The hash function maps the record's key to an integer called the hash index.  A collision occurs when two keys are mapped to the same hash index.  One way to resolve collisions is to allow each bucket to store multiple records. This is called chaining.  Example: data 1 information 4 math 4 Discrete mathematics 4 Algebra 4 Solid geometry

 The search time of each algorithm depend on the number n of elements of the collection S of the data.  A searching technique called Hashing or Hash addressing which is essentially independent of the number n.  Comparison of keys was the main operation used by the previous discussed searching methods.  There is a different way of searching by calculates the position of the key based on the value of the key.  We need to find a function h that can transfer a key K (string, number, record, etc..) into an index the a table used for storing items of the same type as K.  This function is called hash function. 12 Hash functions

13 1-Division function : One simple compression function is the division method, which maps an integer i to : |i| mod N. Example: Suppose we want to store a sequence of randomly generated numbers, keys: 5, 17, 37, 20, 42, 3. The array A, the hash table, where we want to store the numbers: | | | | | | | | | | We need a way of mapping the numbers to the array indexes, a hash function, that will let us store the numbers and later recompute the index when we want to retrieve them. There is a natural choice for this. Hash functions

 Our hash table has 9 fields and the mod function, which sends every integer to its remainder modulo 9, will map an integer to a number between 0 and 8. 5 mod 9 = 5 17 mod 9 = 8 37 mod 9 = 1 20 mod 9 = 2 42 mod 9 = 6 3 mod 9 = 3 We store the values: | | 37 | 20 | 3 | | 5 | 42 | | 17 | In this case, computing the hash value of the number n to be stored: n mod 9, costs a constant amount of time. And so does the actual storage, because n is stored directly in an array field. 14 Hash functions

Hash Functions 1. Division  A hash function must guarantee that the number it returns is a valid index to one of the table entries.  The simplest way is to use division modulo.  TSize=sizeof(table), as in h(K)= K mod TSize.  It is best if TSize is a prime number.  Advantages: simple useful if we don't know much about the keys 15

Hash Functions 16 2.Extraction Idea: use only part of the key to compute the hash value/ address/ index. Exe: Key is (SSN) This method might use for example: the first four digits ( 1234) or the last four (6789), or combined the first two with the last two (1289) to be the index.

Hash Functions 3. Folding  Idea: divide the key into parts, then combine (“fold”) the parts to create the index  The key is divided into several parts. These parts are combined or folded together and are usually transformed in a certain way to create (address) index into the table.  This is done by first dividing the key into parts where each of the parts of the key will be the same length as the desired index  Note: after combining the key parts if the resulted index is grater that the desired length then you can apply either division (which is usually used) or use extraction. 17

There are two types of folding 1) Shift folding The key is divided into several parts then these parts are added together to create the index Exe: Key is (SSN) (SSN) can be divided into three parts, 123, 456, 789, and then these parts can be added. The resulting 1,368 can be divided modulo TSize. 18 Hash Functions

19 2)Boundary folding Same as shift folding, except that every other part is written backwards Exe: Key is (SSN) (SSN) with three parts, 123, 456, 789. the first part is taken in the same order the second part is in reverse order the third pat is in the same order The result is =1,566, then division Exe: Key is Boundary folding: = 1228 This process is simple and fast especially when bit patterns are used instead of numerical values, replace addition in previous examples with XOR Hash Functions

4. Mid-Square function  Idea: square the key (key is multiplied by itself), then use the “middle (mid) part of the result” as the address.  Note: extraction could be used to extract the mid part.  Exe: Key is 3121 Square the key: (3,121) 2 =9,740,641 Then use the mid part as the address (406) Here, for 1,000-cell table, h(3,121)=406 Hash Functions(cont’) 20

 Detecting and resolving collisions  Even with the methods introduced previously, collisions may still occur.  We cannot hash two keys to the same location, so we must find a way to resolve collisions.  Choice of hash function and choice of table size may reduce collisions, but will not eliminate them.  Methods for resolving collisions: open addressing: find another empty position chaining: use linked lists bucket addressing: store elements at same location 21 Hash Functions

Applications of Hash tables 22  Lots of recent research into using distributed hash  tables in peer-to-peer networks (searching,  lookup)  Symbol tables (compilers)  Databases (of phone numbers, IP addresses, etc.)  Dictionaries

References: Text book, chapter10: Hashing End Of Chapter 23