Hash Tables.

Slides:



Advertisements
Similar presentations
Hash Tables CSC220 Winter What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?
Advertisements

Preliminaries Advantages –Hash tables can insert(), remove(), and find() with complexity close to O(1). –Relatively easy to program Disadvantages –There.
HASH TABLE. HASH TABLE a group of people could be arranged in a database like this: Hashing is the transformation of a string of characters into a.
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Hashing CS 3358 Data Structures.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
Hashing as a Dictionary Implementation Chapter 19.
Hash Tables. Introduction A hash table is a data structure that stores things and allows insertions, lookups, and deletions to be performed in O(1) time.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Chapter 11 (Lafore’s Book) Hash Tables Hwajung Lee.
DS.H.1 Hashing Chapter 5 Overview The General Idea Hash Functions Separate Chaining Open Addressing Rehashing Extendible Hashing Application Example: Geometric.
Fundamental Structures of Computer Science II
Hashing (part 2) CSE 2011 Winter March 2018.
Chapter 27 Hashing Jung Soo (Sue) Lim Cal State LA.
Hashing.
Data Structures Using C++ 2E
Hash table CSC317 We have elements with key and satellite data
LEARNING OBJECTIVES O(1), O(N) and O(LogN) access times. Hashing:
Data Abstraction & Problem Solving with C++
School of Computer Science and Engineering
Slides by Steve Armstrong LeTourneau University Longview, TX
Subject Name: File Structures
Data Structures Using C++ 2E
Review Graph Directed Graph Undirected Graph Sub-Graph
Hash functions Open addressing
Quadratic probing Double hashing Removal and open addressing Chaining
Design and Analysis of Algorithms
Advanced Associative Structures
Hash Table.
Chapter 28 Hashing.
Chapter 21 Hashing: Implementing Dictionaries and Sets
Dictionaries and Their Implementations
Hashing.
Resolving collisions: Open addressing
Searching Tables Table: sequence of (key,information) pairs
Data Structures and Algorithms
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
Advanced Implementation of Tables
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
EE 312 Software Design and Implementation I
Data Structures – Week #7
Ch Hash Tables Array or linked list Binary search trees
Ch. 13 Hash Tables  .
What we learn with pleasure we never forget. Alfred Mercier
Hash Maps Introduction
17CS1102 DATA STRUCTURES © 2018 KLEF – The contents of this presentation are an intellectual and copyrighted property of KL University. ALL RIGHTS RESERVED.
EE 312 Software Design and Implementation I
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Lecture-Hashing.
Presentation transcript:

Hash Tables

What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?

Introduction of hash table Data structure that offers very fast insertion and searching, almost O(1). Relatively easy to program as compared to trees. Based on arrays, hence difficult to expand. No convenient way to visit the items in a hash table in any kind of order.

Hashing A range of key values can be transformed into a range of array index values. A simple array can be used where each record occupies one cell of the array and the index number of the cell is the key value for that record. But keys may not be well arranged. In such a situation hash tables can be used.

Converting Words to Numbers Adding the digits :- Add the code numbers for each character. E.g. cats: c = 3, a = 1, t = 20, s = 19, gives 43. What if, the Total range of word codes is from 1 to 260. 50,000 words exist. No enough index numbers. Multiplying by powers :- Decompose a word into its letters. Convert the letters to their numerical equivalents. Multiply them by appropriate powers of 27 and add the results. E.g. Leangsuksun = much larger than 260

Hash Function Need to compress the huge range of numbers. arrayIndex = hugenumber % smallRange; This is a hash function. It hashes a number in a large range into a number in a smaller range, corresponding to the index numbers in an array. An array into which data is inserted using a hash function later is called a hash table.

Collisions Two words can hash to the same array index, resulting in collision. Open Addressing: Search the array in some systematic way for an empty cell and insert the new item there if collision occurs. Separate chaining: Create an array of linked list of words, so that the item can be inserted into the linked list if collision occurs.

Open Addressing Three methods to find next vacant cell: Linear Probing :- Search sequentially for vacant cells, incrementing the index until an empty cell is found. Clustering is a problem occurring in linear probing. As the array gets full, clusters grow larger, resulting in very long probe lengths. Array can be expanded if it becomes too full.

Quadratic Probing load factor = nItems / arraySize; If load factor isn’t high, clusters can form. In quadratic probing more widely separated cells are probed. The step is the square of the step number. If index is x, the probe goes to x+1, x+4, x+9, x+16 and so on. Eliminates primary clustering, but all the keys that hash to a particular cell follow the same sequence in trying to find a vacant cell (secondary clustering).

Double Hashing Better solution. Generate probe sequences that depend on the key instead of being the same for every key. Hash the key a second time using a different hash function and use the result as the step size. Step size remains constant throughout a probe, but its different for different keys. Secondary hash function should not be the same as primary hash function. It must never output a zero. stepSize = constant – (key % constant); Requires that size of hash table is a prime number.

Separate Chaining No need to search for empty cells. The load factor can be 1 or greater. If there are more items on the lists access time is reduced. Deletion poses no problems. Table size is not a prime number. Arrays (buckets) can be used at each location in a hash table instead of a linked list.

Hash Functions A good hash function is simple and can be computed quickly. Speed degrades if hash function is slow. Purpose is to transform a range of key values into index values such that the key values are distributed randomly across all the indices of the hash table. Keys may be completely random or not so random.

Random Keys If the world were perfect, Evenly distributed NOT! A perfect hash function maps every key into a different table location. In most cases large number of keys are compressed into a smaller range of index numbers. Distribution of key values in a particular database determines what the hash function needs to be. For random keys: index = key % arraySize;

Non-random Keys Consider a number of the form 033-400-03-94-05-0-535. Every digit serves a purpose. The last 3 digits are redundant for error checking. These digits shouldn’t be considered. Every part of the remaining key should contribute to the data. Use a prime number for the modulo base.

Folding Break the key into groups of digits and add the groups. The number of digits in a group should correspond to the size of the array.

Hashing Efficiency Insertion and searching can approach O(1) time. If collision occurs, access time depends on the resulting probe lengths. Individual insert or search time is proportional to the length of the probe. This is in addition to a constant time for hash function. Relationship between probe length (P) and load factor (L) for linear probing : P = (1+1 / (1 – L2)) / 2 for successful search and P = (1 + 1 / (1 – L))/ 2

Hashing Efficiency Quadratic probing and Double Hashing share their performance equations. For successful hashing : -log2(1 - loadFactor) / loadFactor For an unsuccessful search :- 1 / (1 - loadFactor) Searching for separate chaining :- 1 + loadFactor /2 For unsuccessful search :- 1 + loadFactor For insertion :- 1 +loadfactor ?2 for ordered lists and 1 for unordered lists.

Open Addressing vs. Separate Chaining If open addressing is to be used, double hashing is preferred over quadratic probing. If plenty of memory is available and the data won’t expand, then linear probing is simpler to implement. If number of items to be inserted in hash table isn’t known, separate chaining is preferable to open addressing. When in doubt use separate chaining

External Storage Hash table can be stored in main memory. If it is too large it can be stored externally on disk, with only part of it being read into main memory at a time. In external hashing its important that the blocks do not become full. Even with a good hash function, the block might become full. This situation can be handled using variations of the collision-resolution schemes.