IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.

Slides:



Advertisements
Similar presentations
Hash Tables CSC220 Winter What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?
Advertisements

Preliminaries Advantages –Hash tables can insert(), remove(), and find() with complexity close to O(1). –Relatively easy to program Disadvantages –There.
Hash Tables.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
Nov 12, 2009IAT 8001 Hash Table Bucket Sort. Nov 12, 2009IAT 8002  An array in which items are not stored consecutively - their place of storage is calculated.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing CS 3358 Data Structures.
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
CSC 2300 Data Structures & Algorithms February 27, 2007 Chapter 5. Hashing.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Hashing. Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
1.  We’ll discuss the hash table ADT which supports only a subset of the operations allowed by binary search trees.  The implementation of hash tables.
DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
Comp 335 File Structures Hashing.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
HASHING PROJECT 1. SEARCHING DATA STRUCTURES Consider a set of data with N data items stored in some data structure We must be able to insert, delete.
Hashing as a Dictionary Implementation Chapter 19.
Hash Tables - Motivation
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
CHAPTER 8 SEARCHING CSEB324 DATA STRUCTURES & ALGORITHM.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
DS.H.1 Hashing Chapter 5 Overview The General Idea Hash Functions Separate Chaining Open Addressing Rehashing Extendible Hashing Application Example: Geometric.
Hashing.
Data Structures Using C++ 2E
Data Abstraction & Problem Solving with C++
Data Structures Using C++ 2E
Advanced Associative Structures
Collision Resolution Neil Tang 02/18/2010
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Ch Hash Tables Array or linked list Binary search trees
Data Structures and Algorithm Analysis Hashing
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Presentation transcript:

IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Hash Tables

2 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Linked List insert, find, delete operations take O(n) Stack & Queue insert, find, delete operations take O(1) but the access is restricted Binary Search Tree insert, find, delete operations take O(log n) in average case, but take O(n) in worst case AVL Tree, Red-Black Tree insert, find, delete operations take O(log n) Review

3 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Review Array all operations take O(1) time data accessed using index (integer) size should be determined first not growable

4 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Hashing Definition Hash function Collision resolution Open hashing Separate chaining Closed hashing (Open addressing) Linear probing Quadratic probing Double hashing Primary Clustering, Secondary Clustering Access: insert, find, delete Outline

5 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Hash Tables Hashing is used for storing relatively large amounts of data in a table called a hash table ADT. Hash table is usually fixed as H-size, which is larger than the amount of data that we want to store. We define the load factor ( ) to be the ratio of data to the size of the hash table. Hash function maps an item into an index in range H-1 key hash function item hash table

6 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Hash Tables (2) Hashing is a technique used to perform insertions, deletions, and finds in constant average time. To insert or find a certain data, we assign a key to the elements and use a function to determine the location of the element within the table called hash function. Hash tables are arrays of cells with fixed size containing data or keys corresponding to data. For each key, we use the hashing function to map key into some number in the range 0 to H-size-1 using hashing function.

7 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Hash Function Hashing function should have the following features: Easy to compute. Two distinct key map to two different cells in array (Not true in general) - why?. This can be achieved by using direct-address table where universal set of keys is reasonably small. Distributes the keys evenly among cells. One simple hashing function is to use mod function with a prime number. Any manipulation of digits, with least complexity and good distribution can be used.

8 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Hash Function: Truncation Part of the key is simply ignored, with the remainder truncated or concatenated to form the index. Phone no:index

9 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Hash Function: Folding The data can be split up into smaller chunks which are then folded together in some form. Phone no: 3-groupindex

10 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Hash Function: Modular arithmetic Convert the data into an integer, divide by the size of the hash table, and take the remainder as the index. 3-groupindex % 100 = % 100 = % 100 = 25

11 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Choosing a hash function A good has function should satisfy two criteria: 1. It should be quick to compute 2. It should minimize the number of collisions

12 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Example of hash function Hash function for string X = 128 A 3 X 3 + A 2 X 2 + A 1 X 1 + A 0 X 0 (((A 3 X) + A 2 ) X + A 1 ) X + A 0 The result of hash function is much larger than the size of table, so we should modulo the result with the size of hash table.

13 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Example of hash function int hash(String key, int tableSize) { int hashVal = 0; for (int i=0; i < key.length(); i++) hashVal = (hashVal * key.charAt(i)) % tableSize; return hashVal % tableSize; } Modulo (A + B) % C = (A % C + B % C) % C (A * B) % C = (A % C * B % C) % C

14 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Example of hash function int hash(String key, int tableSize) { int hashVal = 0; for (int i=0; i < key.length(); i++) hashVal = (hashVal * 37 + key.charAt(i)); hashVal %= tableSize; if (hashVal < 0) hashVal += tableSize; return hashVal; }

15 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Example of hash function int hash(String key, int tableSize) { int hashVal = 0; for (int i=0; i < key.length(); i++) hashVal += key.charAt(i) return hashVal % tableSize; }

16 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Collision resolution When two keys map into the same cell, we get a collision. We may have collision in insertion, and need to set a procedure (collision resolution) to resolve it.

17 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Closed Hashing If collision, try to find alternative cells within table. Closed hashing also known as open addressing. For insertion, we try cells in sequence by using incremented function like: h i (x) = (hash(x) + f(i)) mod H-sizef(0) = 0 Function f is used as collision resolution strategy. The table is bigger than the number of data. Different method to choose function f : Linear probing Quadratic probing Double hashing

18 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Linear probing Use a linear function f(i) = i Find the first position in the table for the key, which is close to the actual position. Least complex function. May result in primary clustering. Elements that hash to the different location probe the same alternative cells The complexity of this probing is dependent on the value of (load factor). We do not use this probing if > 0.5.

19 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Hashing - insert dawn emerald crystal marigold alpha flamingo hallmark moon

20 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May cobalt? marigold? private? alpha crystal dawn emerald flamingo hallmark moon marigold private Hashing - lookup

21 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Hashing - delete delete emerald delete moon alpha crystal dawn flamingo hallmark marigold private lazy deletion - why?

22 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Hashing - operation after delete custom (insert) marigold? alpha crystal dawn flamingo hallmark marigold private

23 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May canary alpha crystal dawn custom flamingo hallmark marigold private cobalt canary alpha crystal dawn custom flamingo hallmark marigold private dark Primary Clustering Elements that hash to the different location probe the same alternative cells

24 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Quadratic probing Eliminate the primary clustering by selecting f(i) = i 2 There is more problem with a hash table that is more than half full. You have to select appropriate table size that is not square of a number. We can prove that quadratic probing with table size prime number and at least half empty will always find a location for an element. Can use increment to collision by noting that quadratic function f(i) = i 2 = f(i-1) + 2 i - 1. Elements that hash to the same location will probe the same alternative cells (secondary clustering).

25 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Double hashing Collision resolution function is another hash function like f(i) = i * hash2 (x) Each time a factor of hash2 (x) is added to probe. Have to be careful for the choice of second hash function to ensure that it does not come to zero and it probes all the cells. It is essential to have a prime size hash table.

26 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May canary alpha crystal dawn custom flamingo hallmark marigold private cobalt done alpha crystal dawn custom flamingo hallmark marigold private dark Double Hashing

27 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Open Hashing Collision problems is solved by inserting all elements that hash to the same bucket into a single collection of values. Open Hashing: To keep a linked list of all the elements that are hashed to the same cell (separate chaining). Each cell in the hash table contains a pointer to a linked list containing the data. Functions and Analysis of Open Hashing: Inserting a new element in to the table: We add the element at the beginning or the end of the appropriate linked list. Depending if you want to check for duplicates or not. Also depends on how frequent you expect to access the most recently added elements.

28 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May Open Hashing

29 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Open Hashing For search, we use the hash function to determine which linked list holds the element, and then traverse the linked list to find the element. Deletion is done to the element in the appropriate linked list after we find the element to be deleted. We could use other kinds of lists like a tree or another hash table for each cell in the hash table to resolve collision. The main advantage of this method is the fact that it can handle any amount of data (dynamic expansion). The main disadvantage of this method is the memory usage for each cell.

30 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Analysis of Open Hash In general the average length of a list is the load factor. Complexity of insertion depends on hashing function and where insertion is done but in general has the same complexity of insertion to the linked list + time to evaluate the hashing function used. For search, time complexity is the constant time to evaluate the hashing function + traversing the list. Worst case O(n) for search. Average case depends. General rule for open hashing is to make  1. Used for dynamic size data.

31 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Issues Other issues common to all closed hashing resolutions: Confusing after deletion. Simpler than open hashing function Good if we do not expect too many collisions. If search is unsuccessful, we may have to search the whole table. Use of large table compare to number of data expected.

32 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Summary Hash tables: array Hash function: function that maps key into number [0  size of hash table) Collision resolution Open hashing Separate chaining Closed hashing (Open addressing) Linear probing Quadratic probing Double hashing Primary Clustering, Secondary Clustering

33 Ruli Manurung (Fasilkom UI)IKI10100: Lecture8 th May 2007 Summary Advantage Running time O(1) + O(Collision resolution) Disadvantage Difficult (not efficient) to print all elements in hash table Inefficient to find minimum element or maximum element Not growable (for closed hash/open addressing) Waste some space (load factor)