1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.

Slides:



Advertisements
Similar presentations
Hash Tables CSC220 Winter What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?
Advertisements

Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Hash Tables.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Hashing as a Dictionary Implementation
File Processing - Indirect Address Translation MVNC1 Hashing Indirect Address Translation Chapter 11.
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
Hashing21 Hashing II: The leftovers. hashing22 Hash functions Choice of hash function can be important factor in reducing the likelihood of collisions.
Hashing Techniques.
Hashing CS 3358 Data Structures.
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Tirgul 8 Hash Tables (continued) Reminder Examples.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
Hash Table March COP 3502, UCF.
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
IT 60101: Lecture #151 Foundation of Computing Systems Lecture 15 Searching Algorithms.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Appendix E-A Hashing Modified. Chapter Scope Concept of hashing Hashing functions Collision handling – Open addressing – Buckets – Chaining Deletions.
Comp 335 File Structures Hashing.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 5. Abstract Data Structures & Algorithms 5.2 Static Data Structures.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
CHAPTER 8 SEARCHING CSEB324 DATA STRUCTURES & ALGORITHM.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Copyright © Curt Hill Hashing A quick lookup strategy.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing. Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string.
Hashing. Search Given: Distinct keys k 1, k 2, …, k n and collection T of n records of the form (k 1, I 1 ), (k 2, I 2 ), …, (k n, I n ) where I j is.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
Hashing CSE 2011 Winter July 2018.
Hash functions Open addressing
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
Advanced Associative Structures
Hash Table.
CS202 - Fundamental Structures of Computer Science II
Data Structures – Week #7
What we learn with pleasure we never forget. Alfred Mercier
Presentation transcript:

1 CSCD 326 Data Structures I Hashing

2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional searching time complexity available is O(log2n) for binary search Binary search requires that data be stored in sorted order. Hashing approach to data storage and retrieval: Contiguous memory is not used and memory is sacrificed for speed. Often used for symbol table management in compilers, assemblers, and linker/loaders.

3 Hashing - Basic Ideas Data storage - hashing relies primarily on arrays for data storage but not on contiguous storage within the array Data storage/retrieval method: use a math function which, when given the key or data value to be stored, returns an array index in which to store the value. This is referred to as a "hash function." The same function will be used to retrieve the value later on.

4 Simple Example of hashing Employee data is to be stored using employee number as a key. Employee numbers are unique and run from 10,000 to 19,999. Storage: use an array of size 10,000. Hash function: Emp. Number provides a unique index into the array and that array location is used to store/retrieve information for this employee. Problem: key values (in other situations) are often not unique or do not fall into a range which allows a reasonable size array.

5 Goals for Hashing Functions The same key value (value used for insertion) should always return the same index. If it does not - data can't be retrieved later. As much as possible - different key values should not hash to the same index. This is done by mixing things up with the hash function so that common patterns in key values do not hash to the same locations. This can never be prevented however - so collision handling becomes an issue.

6 Hash Function Construction Methods Using numeric ASCII values of characters: Example key: JUNK Add ASCII values of characters ( ) to produce a single integer (312). This may suffice but the integer produced is not unique to "JUNK".

7 Hash Function Construction Methods (2) Concatenation of ASCII values: Represent A - Z as integers 0-25 and concatenate these values. So JUNK becomes: = = = =32 1 and so the concatenation can be expressed as: 9 * * * =

8 Hash Function Construction Methods (3) Using the mod operator: Allows reduction of large values into the range of actual hash table indices. in the example above if the table is an array of size % = Note here that the mod operator simply removes the first two digits and this makes the hashed value less unique to the string used to generate it.

9 Hash Function Construction Methods (4) Using the mod operator: Problems with use of mod operator - choice of exact table size is very important - if there are a large number of common factors - many collisions can be generated. e.g. table size 15 Key values 10, 20, 30, 40, 50, 60, 70 - here 7 values hash to three indices - 30,60 to ,50 to 5 and 10,40,70 to 10 Solution - use an array size which is prime - thus it can't have any common factors with key values.

10 Hash Function Construction Methods (5) Using pseudo-random number generators: Given the same starting seed pseudo-random number generators always produce the same sequence of values. Here use a number generated from the key string as a seed and use the first resulting pseudo-random sequence value to generate the hash table index.

11 Hash Function Construction Methods (6) Folding Scrambles numeric values to remove the effects of recurring patterns- e.g. add the numeric values. Boundary Folding Breaks numbers into segments and adds digits in the segments. e.g. social security numbers: breaks at dashes - hash value is Fan Folding Like boundary but reverses the digits in every other value.

12 Hash Function Construction Methods (7) Digit or character extraction Another way to scramble similar patterns in multiple keys - can be used in two ways: 1) Simply remove characters likely to be similar in many keys (or use dissimilar characters). 2) Mid-Square technique Represent key as a number. Square the number. Extract from the middle of the squared value enough bits to form an array index.

13 Linked Collision Processing Linked method of collision overflow handling divides memory into two parts: One part for primary storage (the hash table itself) A separate secondary part for collision overflow (may be either dynamically allocated or a separate fixed allocation area).

14 Linked Collision Processing (2) Linked collision overflow handling: Assume the hash table is composed of an array of objects which contain an instance variable which is a reference to an object of the same type. On collision: dynamically allocate a new node and place data into it. link the new node through the reference. overflowed items are stored in a linked list off the original table item.

15 Linked Collision Processing (3) Primary Memory (Hash Table) Secondary Memory (Overflow)

16 Linked Collision Processing (4) Search time with linked overflow If there have been many collisions - the search is no longer constant time complexity since a sequential search must be done through the linked list. Thus the time complexity becomes O(n) where n is the number of collisions.

17 Linear Collision Processing Also called Linear Probing - no primary and secondary memory - original array holds both. When a collision occurs: Start at hashed location (site of first collision) Proceed sequentially through the array until available storage is found - store at this location The array must be treated circularly since a probe could reach the end and need to start again at beginning.

18 Linear Collision Processing Problem with linear probing: clustering If the hash function produces one value more than others - parts of the table will quickly fill up while others are empty. Clustering causes further collisions later.

19 Analysis of Linear Probing Depends on the loading density of the hash table D - Number of Records in Hash Table / Size of Hash Table Array --- D = 1 indicates maximum density Average number of probes is proportional to: For a successful search: (½ (1 + 1/(1-D)) Unsuccessful search: (½ (1 + 1/(1-D) 2 )) for D = and 1.18 for D = and 2.50 for D = and for D = and This is why Linear Probing is referred to as a Density Dependant Search Technique

20 Rehashing Alternative to linear probing to avoid clustering. After a collision occurs - apply a different hash function to get a new location altogether. If new location is taken either resort to linear probing from there or apply a 3rd or 4th hash function Eventually some probing method must be used.

21 Quadratic Probing Another alternative to linear probing: if a collision occurs at initial index k: try to store in index k +1 for all successive collisions (k + 1, etc) try to store in index k + r 2 where r is a count of how many collisions have occurred Variation on rehashing-double hashing Use the second hash function to determine a fixed increment to move through the array.