1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.

Slides:



Advertisements
Similar presentations
Hash Tables CSC220 Winter What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?
Advertisements

Preliminaries Advantages –Hash tables can insert(), remove(), and find() with complexity close to O(1). –Relatively easy to program Disadvantages –There.
Hash Tables.
HASH TABLE. HASH TABLE a group of people could be arranged in a database like this: Hashing is the transformation of a string of characters into a.
Data Structures Using C++ 2E
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
Hashing Techniques.
Hashing CS 3358 Data Structures.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
CSE 373 Data Structures Lecture 10
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Hash Table March COP 3502, UCF.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Final Review Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2010.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.
1 Hash table. 2 A basic problem We have to store some records and perform the following:  add new record  delete record  search a record by key Find.
Comp 335 File Structures Hashing.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 5. Abstract Data Structures & Algorithms 5.2 Static Data Structures.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing as a Dictionary Implementation Chapter 19.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.
Hashing Lecture 10. More Efficient Searching Question: Can searching be more efficient than logarithmic O(log n) significantly? Example: a set of students.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hash Tables. 2 Exercise 2 /* Exercise 1 */ void mystery(int n) { int i, j, k; for (i = 1; i
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Chapter 5: Hashing Collision Resolution: Open Addressing Extendible Hashing Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova,
H ASH TABLES. H ASHING Key indexed arrays had perfect search performance O(1) But required a dense range of index values Otherwise memory is wasted Hashing.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
ISOM MIS 215 Module 5 – Binary Trees. ISOM Where are we? 2 Intro to Java, Course Java lang. basics Arrays Introduction NewbieProgrammersDevelopersProfessionalsDesigners.
Data Structure & Algorithm Lecture 8 – Hashing JJCAO Most materials are stolen from Prof. Yoram Moses’s course.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
Chapter 11 (Lafore’s Book) Hash Tables Hwajung Lee.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Hashing (part 2) CSE 2011 Winter March 2018.
Hashtables.
Hash table CSC317 We have elements with key and satellite data
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Hash functions Open addressing
Hash Tables.
CS202 - Fundamental Structures of Computer Science II
EE 312 Software Design and Implementation I
Collision Resolution Neil Tang 02/21/2008
DATA STRUCTURES-COLLISION TECHNIQUES
EE 312 Software Design and Implementation I
Collision Resolution: Open Addressing Extendible Hashing
CSE 373: Data Structures and Algorithms
Presentation transcript:

1 Hash table

2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

3 A basic problem We have to store some records and perform the following:  add new record  delete record  search a record by key Find a way to do these efficiently!

4 Unsorted array Use an array to store the records, in unsorted order  add - add the records as the last entry fast O(1)  delete a target - slow at finding the target, fast at filling the hole (just take the last entry) O(n)  search - sequential search slow O(n)

5 Sorted array Use an array to store the records, keeping them in sorted order  add - insert the record in proper position. much record movement slow O(n)  delete a target - how to handle the hole after deletion? Much record movement slow O(n)  search - binary search fast O(log n)

6 Linked list Store the records in a linked list (sorted / unsorted)  add - fast if one can insert node anywhere O(1)  delete a target - fast at disposing the node, but slow at finding the target O(n)  search - sequential search slow O(n) (if we only use linked list, we cannot use binary search even if the list is sorted.)

7 Array as table tom mary peter david andy betty studidnamescore bill49... Consider this problem. We want to store 1000 student records and search them by student id.

8 Array as table : : : : betty : andy : : 90 : 81.5 : studidnamescore david56.8 : : : : bill : : : 49 : : One ‘naive’ way is to store the records in a huge array (index ). The index is used as the student id, i.e. the record of the student with studid is stored at A[12345]

9 Array as table Store the records in a huge array where the index corresponds to the key  add - very fast O(1)  delete - very fast O(1)  search - very fast O(1) But it wastes a lot of memory! Not feasible.

10 Hash function function Hash(key: KeyType): integer; Imagine that we have such a magic function Hash. It maps the key (stud_id) of the 1000 records into the integers , one to one. No two different keys maps to the same number. H(‘ ’) = 134 H(‘ ’) = 67 H(‘ ’) = 764 … H(‘ ’) = 3

11 Hash table : betty : bill : : 90 : 49 : studidnamescore andy81.5 : : david : : : 56.8 : : : : : : : To store a record, we compute Hash(stud_id) for the record and store it at the location Hash(stud_id) of the array. To search for a student, we only need to peek at the location Hash(target stud_id).

12 Hash table with Perfect Hash Such magic function is called perfect hash  add - very fast O(1)  delete - very fast O(1)  search - very fast O(1) But it is generally difficult to design perfect hash. (e.g. when the potential key space is large)

13 Hash function A hash function maps a key to an index within in a range Desirable properties:  simple and quick to calculate  even distribution, avoid collision as much as possible function Hash(key: KeyType);

14 Division Method Certain values of m may not be good:  Good values for m are prime numbers which are not close to exact powers of 2. For example, if you want to store 2000 elements then m=701 (m = hash table length) yields a hash function: h (k) = k mod m h (key) = k mod 701

15 Collision For most cases, we cannot avoid collision Collision resolution - how to handle when two different keys map to the same index H(‘ ’) = 134 H(‘ ’) = 67 H(‘ ’) = 764 … H(‘ ’) = 3 H(‘ ’) = 3

16 The problem arises because we have two keys that hash in the same array entry, a collision. There are two ways to resolve collision:  Hashing with Chaining: every hash table entry contains a pointer to a linked list of keys that hash in the same entry  Hashing with Open Addressing: every hash table entry contains only one key. If a new key hashes to a table entry which is filled, systematically examine other table entries until you find one empty entry to place the new key Hash Tables

17 Open Addressing The key is first mapped to a slot: If there is a collision subsequent probes are performed: If the offset constant, c and m are not relatively prime, we will not examine all the cells. Ex.:  Consider m=4 and c=2, then only every other slot is checked. When c=1 the collision resolution is done as a linear search. This is known as linear probing

18 Linear Probing example Insert: 89, 18, 49, 58, 9 to table size=10, hash function is: %tablesize

19 Linear Probing Example-2 Single character keys, table size, m=8 Hash function (map characters to range 0...7): k: APQ BORCNS DMT ELU FKN GJWZ HIXY h 1 (k): Action Store A Store C Store D Store G Store P Store Q Delete P Delete Q Load B Load R Load Q 0AAAAAAAAAAA0AAAAAAAAAAA 1PP±±BBB1PP±±BBB 2CCCCCCCCCC2CCCCCCCCCC 3DDDDDDDDD3DDDDDDDDD 4QQ±±RR4QQ±±RR 5Q5Q 6GGGGGGGG6GGGGGGGG # probes

20 Choosing a Hash Function Notice that the insertion of Q required several probes (5). This was caused by A and P mapping to slot 0 which is beside the C and D keys. The performance of the hash table depends on a having a hash function which evenly distributes the keys.  The statistics of the key distribution needs to be accounted for. For example, choosing the first letter of a surname will cause problems depending on the nationality of the population; the variable names in a compiler often differ by one character, eg., t1, t2, t3, etc. Consult computer science texts, such as Knuth’s “The Art of Computer Programming”.

21 Clustering Even with a good hash function, linear probing has its problems:  The position of the initial mapping i 0 of key k is called the home position of k.  When several insertions map to the same home position, they end up placed contiguously in the table. This collection of keys with the same home position is called a cluster.  As clusters grow, the probability that a key will map to the middle of a cluster increases, increasing the rate of the cluster’s growth. This tendency of linear probing to place items together is known as primary clustering.  As these clusters grow, they merge with other clusters forming even bigger clusters which grow even faster.

22 Performance Analysis If n slots in a table of size m are occupied, the load factor is defined as where  =1 means the table is full, and  =0 means the table is empty. It can be shown that the number of probes in a successful search, C, and the number of probes in an unsuccessful search, C’ is given by:

23 Quadratic Probing  h(k)=h(k) + f(i) ( i=0,1,2,…)%TS  h(k)=Rmod TS  f(i)=i 2 Theorem 20.4: If quadratic probing is used and the table size is prime, then a new element can always be inserted if the table is at least half empty. Furthermore, in the course of the insertion, no cell is probed twice.

24 Quadratic probing-example Insert: 89, 18, 49, 58, 9 to table size=10, hash function is: %tablesize

25 Double Hashing Recall that in open addressing the sequence of probes follows We can solve the problem of primary clustering in linear probing by having the keys which map to the same home position use differing probe sequences. In other words, the different values for c should be used for different keys. Double hashing refers to the scheme of using another hash function for c Note that h 1 and h 2 need to be evaluated only once per key.

26 Chained Hash Table nil 5 : HASHMAX Key: name: tom score: 73 One way to handle collision is to store the collided records in a linked list. The array now stores pointers to such lists. If no key maps to a certain hash value, that array entry points to nil.

27 Chained Hash table Hash table, where collided records are stored in linked list  good hash function, appropriate hash size Few collisions. Add, delete, search very fast O(1)  otherwise… some hash value has a long list of collided records.. add - just insert at the head fast O(1) delete a target - delete from unsorted linked list slow search - sequential search slow O(n)

28 Common errors (page 749) Providing a poor hash function Not rehashing when load factor reaches 0.5. More errors listed on page 749 of the book

29 In class exercises 20.1, 20.2 and 20.5 in the book on page 750.