Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hash Tables.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Skip List & Hashing CSE, POSTECH.
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Hashing Techniques.
Hashing CS 3358 Data Structures.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Quick Review of material covered Apr 8 B+-Tree Overview and some definitions –balanced tree –multi-level –reorganizes itself on insertion and deletion.
Lecture 10: Search Structures and Hashing
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
Hash Table March COP 3502, UCF.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
CSC 427: Data Structures and Algorithm Analysis
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Introduction to Algorithms 6.046J/18.401J LECTURE7 Hashing I Direct-access tables Resolving collisions by chaining Choosing hash functions Open addressing.
October 6, Algorithms and Data Structures Lecture VII Simonas Šaltenis Aalborg University
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
1 Hash Tables Chapter Motivation Many applications require only: –Insert –Search –Delete Examples –Symbol tables –Memory management mechanisms.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
Sets and Maps Chapter 9.
Hashing (part 2) CSE 2011 Winter March 2018.
Chapter 12 Hash Table.
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
Advanced Associative Structures
Hash Table.
Hash Table.
Chapter 10 Hashing.
Chapter 12 Hash Table.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Sets and Maps Chapter 9.
Presentation transcript:

Chapter 12 Hash Table

● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search time of O(n).

Learning Objectives ● Develop the motivation for hashing. ● Study hash functions. ● Understand collision resolution and compare and contrast various collision resolution schemes. ● Summarize the average running times for hashing under various collision resolution schemes. ● Explore the java.util.HashMap class.

12.1 Motivation ● Let's design a data structure using an array for which the indices could be the keys of entries. ● Suppose we wanted to store the keys 1, 3, 5, 8, 10, with a guaranteed one-step access to any of these.

12.1 Motivation ● The space consumption does not depend on the actual number of entries stored.  It depends on the range of keys. ● What if we wanted to store strings?  For each string, we would first have to compute a numeric key that is equivalent to it.  java.lang.String.hashCode() computes the numeric equivalent (or hashcode) of a string by an arithmetic manipulation involving its individual characters.

12.1 Motivation ● Using numeric keys directly as indices is out of the question for most applications.  There isn't enough space

12.1 Motivation

12.2 Hashing ● A simple hash function  table size of 10  h(k) = k mod 10

12.2 Hashing ● ear collides with cat at position 4. ● There is empty space in the table, and it is up to the collision resolution scheme to find an appropriate position for this string. ● A better mapping function ● For any hash function one could devise, there are always hashcodes that could force the mapping function to be ineffective by generating lots of collisions.

12.2 Hashing

12.3 Collision Resolution ● There are two ways to resolve collisions.  open addressing ● Find another location for the colliding key within the hash table.  closed addressing ● store all keys that hash to the same location in a data structure that “hangs off” that location.

Linear Probing

● As more and more entries are hashed into the table, they tend to form clusters that get bigger and bigger. ● The number of probes on collisions gradually increases, thus slowing down the hash time to a crawl.

Linear Probing ● Insert "cat", "ear", "sad", and "aid"

Linear Probing ● Clustering is the downfall of linear probing, so we need to look to another method of collision resolution that avoids clustering.

Quadratic Probing

● Avoids Clustering ● When the probing stops with a failure to find an empty spot, as many as half the locations of the table may still be unoccupied. ● A hash to 2,3,6,0,7, and 5 are endlessly repeated, and an insertion is not done, even though half the table is empty.

Quadratic Probing ● For any given prime N, once a location is examined twice, all locations that are examined thereafter are also ones that have been already examined.

Chaining ● If a collision occurs at location i of the hash table, it simply adds the colliding entry to a linked list that is built at that location.

Running times ● We assume that the hashing process itself (hashcode and mapping) takes O(1).  Running time of insertion is determined by the collision resolution scheme.

12.4 The java.util.HashMap Class ● Consider a university-wide database that stores student records.  Every student is assigned a unique id (key), with which is associated several pieces of information such as name, address, credits, gpa, etc.  These pieces of information constitute the value.

12.4 The java.util.HashMap Class ● A StudentInfo dictionary that stores (id, info) pairs for all the students enrolled in the university. ● The operations corresponding to this relationship can be found in hava.util.Map

12.4 The java.util.HashMap Class ● The Map interface also provides operations to enumerate all the keys, enumerate all the values, get the size of the dictionary, check whether the dictionary is empty, and so on. ● The java.util.HashMap implements the dictionary abstraction as specified by the java.util.Map interface. It resolves collisions using chaining.

Table and Load Factor ● When the no-arg constructor is used  Default initial capacity 16  Default load factor of ● The table size is defined as the actual number of key-value mappings in the has table.

Table and Load Factor ● We can choose an initial capacity  Only uses capacities that are powers of 2. ● 101 becomes 128

Table and Load Factor ● An initial capacity of 128.

Storage of Entries ● Relevant fields in the HashMap class.  threshold is the size threshold ● Product of the capacity and the threshold load factor (N* t)

Storage of Entries ● Entry[] table sets up an array of chains.  Map.Entry is defined inside the Map interface.  next holds a reference to the next Entry in its linked list.

Adding an Entry ● Example  Name serves as a key to the phone number value.

Adding an Entry

● If the key argument is null, a special object, NULL_KEY is returned, otherwise the argument key is returned as is.

Adding an Entry

● Example  h = 25 and length = 16  The binary representation of h and length-1 (11001 and 01111).

Adding an Entry ● Since length is a power of 2, the binary representation of length will be with k zeros. ● Any h is expressible as 2 c * k + r.  r is a result of the bit-wise and, since the 2 c * k part is a higher order bit that will be zeroed out in the process.

Adding an Entry

● The if statement triggers a rehashing process if the size is equal to or greater than the threshold.

Rehashing

Searching

12.5 Quadratic Probing: Repetition of Probe Locations ● Quadratic probing only examines N/2 locations of the table before starting to repeat locations. ● Suppose a key is hashed to location h, where there is a collision.  Following locations are examined.

12.5 Quadratic Probing: Repetition of Probe Locations ● If two different probes (i and j) end up at the same location?

12.5 Quadratic Probing: Repetition of Probe Locations ● Since N is a prime number, it must divide one of the factors (i + j) or (i - j). ● N divides (i - j) only when at least N probes have been made already. ● N divides (i + j) when (i + j = N), at the very least. ● j = N - i

12.6 Summary ● A hash table implements the dictionary operations of insert, search, and delete on (key, value) pairs. ● Given a key, a hash function for a given hash table computes an index into the table as a function of the key by first obtaining a numeric hashcode, and then mapping this hashcode to a table location.

12.6 Summary ● When a new key hashes to a location in the hash table that is already occupied, it is said to collide with the occupying key. ● Collision resolution is the process used upon collision to determine an unoccupied location in the hash table where the colliding key may be inserted. ● In searching for a key, the same hash function and collision resolution scheme must be used as for its insertion.

12.6 Summary ● A good hash function must be O(1) time and must distribute entries uniformly over the hash table. ● Open addressing relocates a colliding entry in the hash table itself. Closed addressing stores all entries that hash to a location, in a data structure that “hangs off” that location. ● Linear probing and quadratic probing are instances of open addressing, while chaining is an instance of closed addressing.

12.6 Summary ● Linear probing leads to clustering of entries with the clusters becoming increasingly larger as more and more collisions occur. Clustering degrades performance significantly. ● Quadratic probing attempts to reduce clustering. On the other hand, quadratic probing may leave as many as half the hash table empty while reporting failure to insert a new entry.

12.6 Summary ● Chaining is the simplest way to resolve collisions and also results in better performance than linear probing or quadratic probing. ● The worst-case search time for linear probing, quadratic probing, and chaining is O(n). ● The load factor of a hash table is the ratio of the number of keys, n, to the capacity, N.

12.6 Summary ● The average performance of chaining depends on the load factor. For a perfect hash function that always distributes keys uniformly, the average search time for chaining is O(1).