Hash Tables and Associative Containers CS-212 Dick Steflik.

Slides:



Advertisements
Similar presentations
CSCE 3400 Data Structures & Algorithm Analysis
Advertisements

Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
Hashing Techniques.
Hashing CS 3358 Data Structures.
1 Hash Tables Gordon College CS Hash Tables Recall order of magnitude of searches –Linear search O(n) –Binary search O(log 2 n) –Balanced binary.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hashing (Ch. 14) Goal: to implement a symbol table or dictionary (insert, delete, search)  What if you don’t need ordered keys--pred, succ, sort, select?
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
COSC 2007 Data Structures II
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
IT 60101: Lecture #151 Foundation of Computing Systems Lecture 15 Searching Algorithms.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
Hash Tables1   © 2010 Goodrich, Tamassia.
Comp 335 File Structures Hashing.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Hashing Hashing is another method for sorting and searching data.
HASHING PROJECT 1. SEARCHING DATA STRUCTURES Consider a set of data with N data items stored in some data structure We must be able to insert, delete.
Hashing as a Dictionary Implementation Chapter 19.
1 Introduction to Hashing - Hash Functions Sections 5.1, 5.2, and 5.6.
Hash Tables - Motivation
Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
COSC 1030 Lecture 10 Hash Table. Topics Table Hash Concept Hash Function Resolve collision Complexity Analysis.
Data Structures Using C++
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing. Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string.
Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,
1 the BSTree class  BSTreeNode has same structure as binary tree nodes  elements stored in a BSTree are a key- value pair  must be a class (or a struct)
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Advanced Associative Structures
Hash Table.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hash Tables and Associative Containers
Hashing Alexandra Stefan.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
Advanced Implementation of Tables
EE 312 Software Design and Implementation I
EE 312 Software Design and Implementation I
Presentation transcript:

Hash Tables and Associative Containers CS-212 Dick Steflik

Hash Tables a hash table is an array of size Tsize – has index positions 0.. Tsize-1 two types of hash tables – open hash table array element type is a pair all items stored in the array – chained hash table element type is a pointer to a linked list of nodes containing pairs items are stored in the linked list nodes keys are used to generate an array index – home address (0.. Tsize-1)

Faster Searching "balanced" search trees guarantee O(log 2 n) search path by controlling height of the search tree – AVL tree – tree – red-black tree (used by STL associative container classes) hash table allows for O(1) search performance – search time does not increase as n increases

Hash Table a hash table is an array/vector (fixed size) – has index positions 0.. Tsize-1 if we could use the keys as an index we would have O(1) retrieval – hashTable[key] keys are used to generate an array index – home address (0.. Tsize-1) – function to do this is called a hash function hash(key) returns an int value hash(key) % Tsize => 0.. Tsize - 1

Collisions Collisions occur whenever two keys produce the same index (hash to the same location Design goal: pick a hash function that produces no collisions Away of life with hash tables What do you do? – linear probing: check the next location, if its empty use it – quadratic probing: check next, then 2 away, then 4 away......

a Hash Table of size 7 key value empty some insertions: hash(K1) % 7 => 3 hash(K2) % 7 => 5 hash(K3) % 7 => 2 hash(K4) % 7 => 3 hash(K5) % 7 => 2 hash(K6) % 7 => 4 TTTTTTTTTTTTTT linear probe open addressing collision resolution strategy

Search Performance average number of probes needed to retrieve the value with key K? K home address #probes K1 3 K2 5 K3 2 K4 3 K5 2 K6 4 14/6 = 2.33 (successful) unsuccessful search? F K3 K3info F K1 K1info F K2 K2info F K4 K4info F K5 K5info F K6 K6info T

Chaining with Separate Lists linked lists of synonyms K3 K3info K1 K1info K5 K5info K4 K4info K6 K6info K2 K2info hash(K1) % 7 => 3 hash(K2) % 7 => 5 hash(K3) % 7 => 2 hash(K4) % 7 => 3 hash(K5) % 7 => 2 hash(K6) % 7 => 4

Search Performance average number of probes needed to retrieve the value with key K? K home address #probes K1 3 K2 5 K3 2 K4 3 K5 2 K6 4 8/6 = 1.33 (successful) K3 K3info K1 K1info K5 K5info K4 K4info K6 K6info K2 K2info unsuccessful search?

Where are Hash Tables used? Databases Spelling checkers Java uses them all over the place (built into the language) most scripting languages (ASP, PERL, PHP) have associative arrays Caching Schemes – software – browsers, http proxy servers, DNS servers – hardware – memory caching, instruction caching

Deletions? search for item to be deleted chained hash table – delete a node from a linked list open hash table – just mark spot as "empty"? – must mark vacated spot as “deleted” – is different than “empty”

Hash Functions a hash function is used to map a key to an array index (home address) – search starts from here insert, retrieve, delete all start by applying the hash function to the key goals for a hash function – fast to compute – even distribution over the entire collection of keys all hash functions produce collisions – multiple keys hash to same home address

Some Hash Functions... Division – works good in most cases as long as keys are relatively random – H(key) = key mod m – if key is an integer identity function ( return key) – good if keys are random – not good if keys have similar characteristics ex m = 25 all keys divisible by 5 would map into positions 0, 5,10,15… causing clustering around those values

more Hash functions... Mid-Squared – produces a nearly random distribution of indices – mid-square technique takes longer to compute but gives better distribution when keys may have some digits in common – convert key to an octal string A-Z = and 0-9 = – ex key = A1 = * = using a table of 1024 elements – use middle 10 bits as the index – index = = – note - most collisions will occur for short identifiers

more Hash functions... Digit Folding – assume a 5 digit decimal string (digits 0-9 only) – H(key) = d1 + d2 + d3 + d4 + d5 (sum of digits) this would yield 0 <= h <= 45 for all possible keys if we were to fold the digits in pairs – H(key) = d1d2 + d3d4 + d5 – 0 <= h <= 207 ( ) Double hashing – use two (or more) hash functions serially – helps overcome effects of a function that produces a poor distribution of keys

Clustering Undesireable function of the hash function selected and the collision resolution strategy – too many keys has to the same location causing long string of keys that need to be searched especially bad using a divide based function and using linear probing insertion/deletion/search can approach O(n) Pick a different hash function Pick a different collision resolution strategy

Factors Affecting Search Performance quality of hash function – how uniform? – depends on actual data collision resolution strategy used load factor of the HashTable – N/Tsize – the lower the load factor the better the search performance

Successful Search Performance open addressing open addressing chaining (linear probing) (double hashing) load factor

Summary of Hash tables search speed depends on load factor and quality of hash function – should be less than.75 for open addressing – can be more than 1 for chaining items not kept sorted by key very good for fast access to unordered data with known upper bound – to pick a good TSize