Hashing.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

1 11. Hash Tables Heejin Park College of Information and Communications Hanyang University.
David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Analysis of Algorithms CS 477/677
Hash Tables.
Hash Tables Introduction to Algorithms Hash Tables CSE 680 Prof. Roger Crawfis.
Hash Tables CIS 606 Spring 2010.
CSE 1302 Lecture 23 Hashing and Hash Tables Richard Gesick.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 253: Algorithms Chapter 11 Hashing Credit: Dr. George Bebis.
Hashing CS 3358 Data Structures.
Dictionaries and Their Implementations
1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables.
Hash Tables How well do hash tables support dynamic set operations? Implementations –Direct address –Hash functions Collision resolution methods –Universal.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
Lecture 11 March 5 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Design and Analysis of Algorithms - Chapter 71 Hashing b A very efficient method for implementing a dictionary, i.e., a set with the operations: – insert.
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Lecture 10: Search Structures and Hashing
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Hash Table March COP 3502, UCF.
Spring 2015 Lecture 6: Hash Tables
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Implementing Dictionaries Many applications require a dynamic set that supports dictionary-type operations such as Insert, Delete, and Search. E.g., a.
1 Hash table. 2 A basic problem We have to store some records and perform the following:  add new record  delete record  search a record by key Find.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Comp 335 File Structures Hashing.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Storage and Retrieval Structures by Ron Peterson.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Copyright © Curt Hill Hashing A quick lookup strategy.
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
Data Structure & Algorithm Lecture 8 – Hashing JJCAO Most materials are stolen from Prof. Yoram Moses’s course.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
Chapter 11 (Lafore’s Book) Hash Tables Hwajung Lee.
Data Structures Using C++ 2E
Hashing Jeff Chastine.
Hash table CSC317 We have elements with key and satellite data
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Dictionaries and Their Implementations
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Lecture-Hashing.
Presentation transcript:

Hashing

Motivating Applications Large collection of datasets Datasets are dynamic (insert, delete) Goal: efficient searching/insertion/deletion Hashing is ONLY applicable for exact-match searching

Direct Address Tables If the keys domain is U  Create an array T of size U For each key K  add the object to T[K] Supports insertion/deletion/searching in O(1)

Solution is to use hashing tables Direct Address Tables Alg.: DIRECT-ADDRESS-SEARCH(T, k) return T[k] Alg.: DIRECT-ADDRESS-INSERT(T, x) T[key[x]] ← x Alg.: DIRECT-ADDRESS-DELETE(T, x) T[key[x]] ← NIL Running time for these operations: O(1) Solution is to use hashing tables Drawbacks >> If U is large, e.g., the domain of integers, then T is large (sometimes infeasible) >> Limited to integer values and does not support duplication

Direct Access Tables: Example U is the domain K is the actual number of keys

Hashing A data structure that maps values from a certain domain or range to another domain or range Hash function 3 15 Domain: String values 20 55 Domain: Integer values

Hashing A data structure that maps values from a certain domain or range to another domain or range Hash function Student IDs 950000 ….. 960000 Range ….. 10000 Domain: numbers [950,000 … 960,000] Domain: numbers [0 … 10,000]

Hash Tables When K is much smaller than U, a hash table requires much less space than a direct-address table Can reduce storage requirements to |K| Can still get O(1) search time, but on the average case, not the worst case

Hash Tables: Main Idea Use a hash function h to compute the slot for each key k Store the element in slot h(k) Maintain a hash table of size m  T [0…m-1] A hash function h transforms a key into an index in a hash table T[0…m-1]: h : U → {0, 1, . . . , m - 1} We say that k hashes to slot h(k)

Hash Tables: Main Idea Hash Table (of size m) U (universe of keys) U (universe of keys) h(k1) h(k4) k1 K (actual keys) k4 k2 h(k2) = h(k5) k5 k3 h(k3) m - 1 >> m is much smaller that U (m <<U) >> m can be even smaller than |K|

Example Back to the example of 100 students, each with 9-digit SSN All what we need is a hash table of size 100

What About Collisions Collisions! U (universe of keys) h(k1) h(k4) k1 K (actual keys) k4 k2 h(k2) = h(k5) Collisions! k5 k3 h(k3) m - 1 Collision means two or more keys will go to the same slot

Handling Collisions Many ways to handle it Chaining Open addressing Linear probing Quadratic probing Double hashing

Chaining: Main Idea Put all elements that hash to the same slot into a linked list (Chain) Slot j contains a pointer to the head of the list of all elements that hash to j

Chaining - Discussion Choosing the size of the hash table Small enough not to waste space Large enough such that lists remain short Typically 10% -20% of the total number of elements How should we keep the lists: ordered or not? Usually each list is unsorted linked list

Insertion in Hash Tables Alg.: CHAINED-HASH-INSERT(T, x) insert x at the head of list T[h(key[x])] Worst-case running time is O(1) May or may not allow duplication based on the application

Deletion in Hash Tables Alg.: CHAINED-HASH-DELETE(T, x) delete x from the list T[h(key[x])] Need to find the element to be deleted. Worst-case running time: Deletion depends on searching the corresponding list

Searching in Hash Tables Alg.: CHAINED-HASH-SEARCH(T, k) search for an element with key k in list T[h(k)] Running time is proportional to the length of the list of elements in slot h(k) What is the worst case and average case??

Analysis of Hashing with Chaining: Worst Case m - 1 T chain All keys will go to only one chain Chain size is O(n) Searching is O(n) + time to apply h(k)

Analysis of Hashing with Chaining: Average Case m - 1 T chain With good hash function and uniform distribution of keys Any given element is equally likely to hash into any of the m slots All chain will have similar sizes Assume n (total # of keys), m is the hash table size Average chain size  O (n/m) Average Search Time O(n/m): The common case

Analysis of Hashing with Chaining: Average Case If m (# of slots) is proportional to n (# of keys): m = O(n) n/m = O(1)  Searching takes constant time on average

Hash Functions

Hash Functions A hash function transforms a key (k) into a table address (0…m-1) What makes a good hash function? (1) Easy to compute (2) Approximates a random function: for every input, every output is equally likely (simple uniform hashing) (3) Reduces the number of collisions

Hash Functions Make table size (m) a prime number Common function Goal: Map a key k into one of the m slots in the hash table Make table size (m) a prime number Avoids even and power-of-2 numbers Common function h(k) = F(k) mod m Some function or operation on K (usually generates an integer) The output of the “mod” is number [0…m-1]

Examples of Hash Functions Collection of images F(k): Sum of the pixels colors h(k) = F(k) mod m Collection of strings F(k): Sum of the ascii values h(k) = F(k) mod m Collection of numbers F(k): just return k h(k) = F(k) mod m