Hashing – Part I CS 367 – Introduction to Data Structures.

Slides:



Advertisements
Similar presentations
SYMBOL TABLES &CODE GENERATION FOR EXECUTABLES. SYMBOL TABLES Compilers that produce an executable (or the representation of an executable in object module.
Advertisements

Hashing.
Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree.
CSCE 3400 Data Structures & Algorithm Analysis
Hashing as a Dictionary Implementation
File Processing - Indirect Address Translation MVNC1 Hashing Indirect Address Translation Chapter 11.
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
Hashing Part One Reaching for the Perfect Search Most of this material stolen from "File Structures" by Folk, Zoellick and Riccardi.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Using arrays – Example 2: names as keys How do we map strings to integers? One way is to convert each letter to a number, either by mapping them to 0-25.
Hashing Techniques.
CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Introduction to Hashing & Hashing Techniques
CSC 2300 Data Structures & Algorithms February 27, 2007 Chapter 5. Hashing.
Chapter 5: Hashing Hash Tables
Hash Tables1 Part E Hash Tables  
CS2420: Lecture 33 Vladimir Kulyukin Computer Science Department Utah State University.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Hashing General idea: Get a large array
Aree Teeraparbseree, Ph.D
Hashing Lesson Plan - 8.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Hash Table March COP 3502, UCF.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
CS261 Data Structures Hash Tables Concepts. Goals Hash Functions Dealing with Collisions.
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Hashing CS 202 – Fundamental Structures of Computer Science II Bilkent.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 Hash table. 2 A basic problem We have to store some records and perform the following:  add new record  delete record  search a record by key Find.
1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.
CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is.
Comp 335 File Structures Hashing.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Storage and Retrieval Structures by Ron Peterson.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
CHAPTER 8 SEARCHING CSEB324 DATA STRUCTURES & ALGORITHM.
Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Data Structures Using C++
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing. Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string.
Week 9 - Monday.  What did we talk about last time?  Practiced with red-black trees  AVL trees  Balanced add.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Review Graph Directed Graph Undirected Graph Sub-Graph
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
COMPUTER 2430 Object Oriented Programming and Data Structures I
CS202 - Fundamental Structures of Computer Science II
Data Structures – Week #7
What we learn with pleasure we never forget. Alfred Mercier
Presentation transcript:

Hashing – Part I CS 367 – Introduction to Data Structures

Searching Up to now the only way to find a key is to search through all or part of the data –linked list: O(n) –AVL tree: O(log n) –binary search of array: O(log n) If lots of data and/or searching the data very often, these times can be long –given the key, would like to get the data directly

Hashing The solution to this problem is to put the key through a function that says exactly where the data is (or where it should be placed) –this function is called a hash function h(key) = integer –the integer obtained from a hash function can be used as an index into an array if the hash function is perfect – always generates a unique integer for different keys – the time to place and access data is O(1)

Hashing Hashing Function AMXAMX AMX

Hashing Functions So what is the hashing function? –the simplest hashing function is to use the division remainder assume the array is 1000 elements in size translate the data into a number, n h(n) = n % 1000

Hashing Functions simple example –consider a small school –each student is tracked by a 4 digit ID number –each students ID# begins with the year they started > 0, 2001->1, 2002->2, etc. –all student records are stored in an array maximum of 1000 students per year –let’s look at records for all sophomores assume they were freshman in 2001

Hashing Functions Mary’s records Pete’s records John’s records Amy’s records … Mary’s ID #:1000 Pete’s ID #:1004 John’s ID #:1009 Amy’s ID#:1011 To find John’s record in the array: 1009 % 1000 = 9 Go to index number 9.

Generating n The previous example is rather simplistic in that it is hashing already unique integers –seems kind of pointless –maybe not if the integers are large consider the UW’s 10 digit ID numbers Often it is desirable to hash some other kind of data –a person’s name for example

Generating n How is a string converted into an integer? –the simplest method is to add all of the ASCII values for each character together –example convert amy into an integer –a = 97; m = 109; y = 121 –a + m + y = 327 –there are lots of other ways to convert strings to integers what are a few of them?

Hashing Functions There are millions of possible hashing functions –we will not be considering them all –basically, anything you can think of to generate an integer could be used as a hashing function Mathematicians have spent lots of time and effort to come up with some basic methods that work pretty well

Division We have already seen the division method –it involves taking the remainder of division h(key) = key % tableSize A few notes about making this work better –table size should be a prime number –usually a good method if nothing very little is known about the keys –the remaining methods will all use division as the final step in their calculation

Folding Separate the key into various equally sized parts and then recombine them –usually with addition Two kinds of folding –shift folding just add the various parts together as they are –boundary folding reverse the order of every other part and add them together

Folding Consider a SSN as a key –break it into 3 parts first 3, second 3, last 3 Shift folding example –SSN = –first = 123; second = 456; third = 789 –h(key) = (first + second + third) % size h(SSN) = 1368 % tableSize Boundary folding example –h(key) = (first + R(second) + third) % size –h(key) = ( ) % size

Increasing Performance Consider using shifting and exclusive OR’ing to generate the key –exclusive OR parts together to generate index Example –consider the string abcdefgh –if each part is a letter, just exclusive OR them ‘a’ ^ ‘b’ ^ ‘c’ ^ ‘d’ ^ ‘e’ ^ ‘f’ ^ ‘g’ ^ ‘h’ –often, a character is represented by 8 bits what’s the problem with this? –might be better to exclusive OR chunks of the string “abcd” ^ “efgh” why were four digits chosen in this case?

Increasing Performance int shiftFold(String key, int tableSize) { int chunk = 0; int result = 0; byte[ ] st = key.getBytes(); for(int i=0; i<st.length; i+=4) { for(int j=0; (j<4) && (j + i < st.length); j++) { chunk = chunk | st[j + i]; chunk = chunk << 8; } result = result ^ chunk; chunk = 0; } return result % tableSize; }

Increasing Performance The performance could be increased even more if the table size was a power of 2 –can get rid of the modulo operation at the end –modulo is an expensive calculation –could just do a subtraction and an AND operation instead

Mid-Square Function Square the number and take the middle part as the index –a string must first be converted to get the number to square The entire key gets used to generate the address –less chance for conflicts more on this later This method works best if the table size is a power of two

Mid-Square Function Table size equals 1024 (2 10 ) The key is 3121 – = = ( ) 2 –middle 10 digits of this value are listed in bold Index in array is –( ) 2 = 322 This is all very quick and easy to calculate using mask and shift operations

Mid-Square Function int tableSize = 1024; int mask = (tableSize – 1) ; int maskBits = logBase2(tableSize); int shiftBits = 7; // table size must be a power of two int midSquare(String key, int tableSize) { int n = stringToNum(key); int n = n * n; return n & (mask << shiftBits); }

Extraction Simply pull out a certain part of the key and use it as the index –example SSN = index = middle of key = 456 alternative index = first, middle, last = 159 Should try to choose a part of the key that is most likely unique –consider foreign student SSN –start with 999 probably not a great idea to extract the first three numbers