ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.

Slides:



Advertisements
Similar presentations
Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Advertisements

Hashing.
HASH TABLE. HASH TABLE a group of people could be arranged in a database like this: Hashing is the transformation of a string of characters into a.
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
CS202 - Fundamental Structures of Computer Science II
Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
Nov 12, 2009IAT 8001 Hash Table Bucket Sort. Nov 12, 2009IAT 8002  An array in which items are not stored consecutively - their place of storage is calculated.
Using arrays – Example 2: names as keys How do we map strings to integers? One way is to convert each letter to a number, either by mapping them to 0-25.
Hashing Techniques.
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Lecture 11 March 5 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Introduction to Hashing CS 311 Winter, Dictionary Structure A dictionary structure has the form: (Key, Data) Dictionary structures are organized.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
HASHING Section 12.7 (P ). HASHING - have already seen binary and linear search and discussed when they might be useful (based on complexity)
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
Comp 335 File Structures Hashing.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
CHAPTER 8 SEARCHING CSEB324 DATA STRUCTURES & ALGORITHM.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
CSE 373 Data Structures and Algorithms Lecture 17: Hashing II.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hashing. Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string.
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
Searching Tables Table: sequence of (key,information) pairs (key,information) pair is a record key uniquely identifies information, so no duplicate records.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
School of Computer Science and Engineering
Lecture No.43 Data Structures Dr. Sohail Aslam.
Hash Table.
Dictionaries and Their Implementations
Searching Tables Table: sequence of (key,information) pairs
CS202 - Fundamental Structures of Computer Science II
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Data Structures and Algorithm Analysis Hashing
Lecture-Hashing.
Presentation transcript:

ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh

Review Sorting Algorithms –Elementary Insertion Sort Selection Sort Bubble Sort –Efficient Shell Sort Heap Sort Quick Sort Merge Sort Radix Sort

Searching so far We have encountered searching algorithms several times already in the course; –Binary Search Trees O(lg n) –Linked List searches O(n) The efficiency of searches has varied, depending on how effectively the data has been arranged. This week we look at an alternative approach to searching – where the data could be found in constant time O(1)

Searching in Constant Time In order the find data in constant time, we need to know where to look for it. Given a ‘key’, which could be in any form (alphanumeric), we need to return an index for some table (or array). A function which converts a key to a address is known as a Hash function. –If that address turns out to be a unique address it is a perfect hash function.

Hashing Example Take a student id number (IDNum); – A possible hash function could be; –H(IDNum) = IDNum % 1000 Which would return – what? –This number could then be the array index number.

Hashing If only Hashing was that simple…! There is a problem with the function, the hash function will return a total of 1000 possible different indexes; –What happens when there are more than 1000 students? When a hash function returns the same index for more than one key, there is a collision. A hash table, needs to contain at least as many positions as the number of elements to be hashed.

Hashing Example 2 Suppose we need to convert a variable name into a data location. –int ken = 31; We need a hash function that could return a unique address for each variable name; –H(“ken”) Consider how many different variable names there could be? How large should the hash table be?

Hashing Example cont. Suppose set the function H() to sum the values of each letter in the variable name; –k=11,e=5,n=14; –H(“ken”) = 30. Therefore we could store the ken data in index 30. We can use this bad hashing function to highlight some problems that hashing functions should address;

Hashing Problems If we have a program with 4 variables; –nameH(“name”) = 33 –ageH(“age”) = 13 –genderH(“gender”) = 53 –meanH(“mean”) = 33 1)The data is spread out throughout the table – with many unused wasted cells. 2)There is a collision between name and mean. 3)These two problems have to be solved by a simple, efficient algorithm.

Good Hash Functions A good hash function should: - be easy and quick to compute - achieve an even distribution of the key values that actually occur across the index range supported by the table Typically a hash function will take a key value and: - chop it up into pieces, and - mix the pieces together in some fashion, and - compute an index that will be uniformly distributed across the available range. Note: hash functions are NOT random in any sense.

Approaches Truncation –Ignore a part of the key value and use the remainder as the table index e.g., maps to 976 Folding: –Partition the key, then combine the parts in some simple manner e.g., maps to = 1256 and then mod to 256 Modular Arithmetic: –Convert the key to an integer, and then mod that integer by the size of the table e.g., maps to 876

Truncation Caution It is a good idea if the entire key has some impact on the hash function, simply truncating a key may lead to many keys returning the same result when hashed. –Consider truncating the last 3 letters of the following keys; hash, mash, bash, trash.

Hash Function int strHash(string toHash, const int TableSize) { int hashValue = 0; for (int Pos = 0; Pos < toHash.length(); Pos++) { hashValue = hashValue + int(toHash.at(Pos)); } return (hashValue % TableSize); } Given the key ‘ken’ and a table size of 1000, what would be returned?

Improving the hash function The hash function given on the previous slide would return the same result if we put either of the following keys in; sham or mash –The hash function didn’t take position into account. This can easily be remedied with the following change; hashValue = 4*hashValue + int(toHash.at(Pos)); This is known as Collision Reduction, or rather reducing the chance of collision.

Resolving Collisions Even with a sophisticated hashing function it is likely that collisions will still occur, so we need a strategy to deal with collisions. We first find out about a collision if we try to insert data into a position which is already filled. –In this case we can simply insert the data into a different available position, leaving a record so the data can be retrieved.

Linear Probing Linear probing deals with collisions by inserting a new value into the next available space after the space returned by the hash function; If H(key) is occupied store data in H(key)+1.

Linear Probing, problem Consider the following hash table a b c d a b c c d a b c c d c If c is duplicated, the new value is placed in the successive cell. This leads to clustering, which contradicts one of our key objectives.

Quadratic Probing Quadratic Probing is designed to combat the clustering effect of linear probing. Rather than inserting the data into the next available cell, data is inserted into a cell based on the following sequence; –k –k+1 –k+4 –k+9 –k+16 While this solves the problem of clustering, it produces a problem that the hash function may not try every slot in the table (if table size is a prime, then approximately half of the cells will be tested).

General Increment Probing A general Increment Probing approach will try each cell in a sequence based on a formula; –k –k+s(1) –k+s(2) –k+s(3) –k+s(4) Here care must be taken that the formula doesn’t return to the first cell too quickly. What happens if s(i) = i 2 ?

Key dependent Probing Another probing strategy could calculate the formula based on some part of the original key – perhaps just adding the value of the first segment of the key. However this could produce inefficient code.

Deletion Given this approach to collision resolution, care needs to be taken when deleting data from a cell. –Why? The tombstone method marks a deleted cell as available for insertion, but marks it has having had data in it.

Alternative to Probing An alternative strategy to probing is to use Separate Chaining. –Here more than one piece of data can be associated to the same cell in the hash table –The cell can contain a pointer to a linked list of data insertions. –This is sometimes known as a bucket - see case study!