Hashing / Hash tables Chapter 20 CSCI 3333 Data Structures.

Slides:



Advertisements
Similar presentations
Fundamentals of Python: From First Programs Through Data Structures
Advertisements

Hash Tables CSC220 Winter What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Hash Tables.
Fall 2002CMSC Discrete Structures1 Now it’s Time for… RecurrenceRelations.
Intro. to Data Structures 1CSCI 3333 Data Structures - Roughly based on Chapter 6.
Data Structures1 Basic Data Structures Elementary Structures Arrays Lists Search Structures Binary search Tree Hash Tables Sequence Structures Stacks Queues.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Hashing. CENG 3512 Motivation The primary goal is to locate the desired record in a single access of disk. – Sequential search: O(N) – B+ trees: O(log.
Skip List & Hashing CSE, POSTECH.
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Hashing21 Hashing II: The leftovers. hashing22 Hash functions Choice of hash function can be important factor in reducing the likelihood of collisions.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Hashing. Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted.
Hash Table March COP 3502, UCF.
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
1.  We’ll discuss the hash table ADT which supports only a subset of the operations allowed by binary search trees.  The implementation of hash tables.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
Chapter 5: Hashing Collision Resolution: Separate Chaining Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova, Simpson College.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
ISOM MIS 215 Module 5 – Binary Trees. ISOM Where are we? 2 Intro to Java, Course Java lang. basics Arrays Introduction NewbieProgrammersDevelopersProfessionalsDesigners.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining.
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
DS.H.1 Hashing Chapter 5 Overview The General Idea Hash Functions Separate Chaining Open Addressing Rehashing Extendible Hashing Application Example: Geometric.
Fundamental Structures of Computer Science II
Sections 10.5 – 10.6 Hashing.
CE 221 Data Structures and Algorithms
Data Structures Using C++ 2E
Hashing Problem: store and retrieving an item using its key (for example, ID number, name) Linked List takes O(N) time Binary Search Tree take O(logN)
Hash Tables (Chapter 13) Part 2.
Handling Collisions Open Addressing SNSCT-CSE/16IT201-DS.
Data Structures Using C++ 2E
Advanced Associative Structures
Collision Resolution Neil Tang 02/18/2010
Hashing Alexandra Stefan.
CS202 - Fundamental Structures of Computer Science II
Collision Resolution Neil Tang 02/21/2008
Collision Handling Collisions occur when different elements are mapped to the same cell.
CSE 373: Data Structures and Algorithms
Presentation transcript:

Hashing / Hash tables Chapter 20 CSCI 3333 Data Structures

Outline Basic definitions Different hashing techniques Linear probing Quadratic probing Separate chaining hashing Comparing hashing with binary search trees CSCI 3333 Data Structures

Basic definitions Problem definition: A solution: Given a set of items (S) and a given item (i), define a data structure that supports operations such as find/insert/delete i in constant time. A solution: A hashing function h maps a large data set into a small index set. Typically the function involves the mod( ) operation. Example hashing function: Let S be a set of 10,000 employee records. Let LName be the LastName attribute of the employee record. Suppose each array item can hold up to k employee records. Suppose the array is of size N. (Then Nk > 10,000) Given an employee e, h(e) = e.LName.toInteger( ) % N. CSCI 3333 Data Structures

Design of a hash function Two concerns: The hash function should be simple enough. The hash function should distribute the data items evenly over the whole array. Why? For (a): efficiency For (b): to avoid collision, and to make good use of array space. CSCI 3333 Data Structures

A sample hash function in Java Exercise: What are the respective hash codes of the following strings: Doe, Smith, Stevenson? Suppose tableSize is 10. CSCI 3333 Data Structures

Linear probing Collision: Given the hash function h, h(x) returns a position that is already occupied. Linear probing: When a collision occurs, search sequentially in the array until an empty cell is found (to insert the new data item). Wrap around if necessary. Example below. CSCI 3333 Data Structures

Linear probing: example h(k,n) = k % n CSCI 3333 Data Structures

Linear probing: insert Q: What’s the worst case cost when inserting an item using linear hashing? N ? Q: What’s the average cost? Theorem 20.2 (next page) The performance of the hash table depends on how full the table is. The load factor (λ) of a hash table is the fraction of the table that is full (between 0 and 1). CSCI 3333 Data Structures

Theorem 20.2 The average number of cells examined in an insertion using linear probing is roughly (1+1/(1- λ)2)/2, where λ is the load factor. Exercises: When the table is half full, what’s the average cost of inserting an item? How if the load factor is 25%? How if the load factor is 75%? Why? Primary clustering lf average cost 0.25 1.39 0.50 2.50 0.75 8.50 CSCI 3333 Data Structures

Large blocks of occupied cells are formed in the hash table. Impact? Primary clustering Large blocks of occupied cells are formed in the hash table. Impact? Any key that hashes into a cluster requires excessive attempts to resolve the collision. Plus, that insertion increases the size of the cluster. CSCI 3333 Data Structures

Linear probing: search/find Find(k): If the data item k cannot be found at the h(k) position, search sequentially until either k is found or an empty cell is reached; in the latter case k does not exist in the array. Cost? Theorem 20.3 The average number of cells examined in an unsuccessful search using linear probing is roughly (1+1/(1- λ)2)/2. The average number of cells examined in a successful search using linear probing is roughly (1+1/(1- λ))/2. CSCI 3333 Data Structures

Linear probing: delete Delete (k) Cost ? Cost of searching for k Cost of fill up the left space CSCI 3333 Data Structures

Quadratic probing Goal: To eliminate the primary clustering problem of linear probing Strategy: by examining certain cells away from the original probe point when a collision occurs using F(i) = i2 Let H = h(k) = hash (k, n). If H is occupied and not equal to k, search H+1, H+22, H+32, …, until found or all possible locations are exhausted. CSCI 3333 Data Structures

Quadratic probing: example CSCI 3333 Data Structures

Theorem 20.4 When quadratic probing is used, a new element can always be inserted when the following prerequisites are met: The size of the hash table, M, is a prime number. At least M/2 of the table entries are empty. Overhead of quadratic hashing: The hash table needs to be at least half-empty.  Rehashing of the hash table is needed. CSCI 3333 Data Structures

Rehashing A technique to dynamically expand the size of the hash table when, for example, the table is half full in quadratic hashing. Two steps: Create a larger table. Create a new hash function (for example, the table size has changed). Use the new hash function to add the existing data items from the old table to the new table. CSCI 3333 Data Structures

Rehashing: example in Java CSCI 3333 Data Structures

Separate chaining hashing A more space-efficient hashing method than quadratic hashing. The hash table is implemented as an array of linked list. The returned value of the hash function points to the linked list where the item is to be inserted or found. Challenge: The linked lists should be kept short. CSCI 3333 Data Structures

Figure 20.20 (a): See class Node next page Error ? CSCI 3333 Data Structures

Figure 20.20 (b) CSCI 3333 Data Structures

Separate chaining hashing: analysis Let M be the size of the hash table. Let N be the total number of data items in the hash table. Then, The average length of the linked list = N/M. Also called the load factor (lf). Note: different from the load factor (λ) in earlier discussions. The average number of probes for an insertion = lf. The average number of probes for an unsuccessful search = lf. The average number of probes for a successful search = 1+lf/2. CSCI 3333 Data Structures

Hashing vs Binary search tree operations Binary search tree Hashing Find O(log N) Constant Insert Delete findMin/findMax Not supported printAllSorted O(N logN) Overhead ? CSCI 3333 Data Structures