Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,

Slides:



Advertisements
Similar presentations
Hash Tables.
Advertisements

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Hash Tables,
CSCE 3400 Data Structures & Algorithm Analysis
Hashing as a Dictionary Implementation
Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Hashing21 Hashing II: The leftovers. hashing22 Hash functions Choice of hash function can be important factor in reducing the likelihood of collisions.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Hashing Techniques.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Maps & Hashing Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Hash Tables and Associative Containers CS-212 Dick Steflik.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
hashing1 Hashing It’s not just for breakfast anymore!
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Hashing The Magic Container. Interface Main methods: –Void Put(Object) –Object Get(Object) … returns null if not i –… Remove(Object) Goal: methods are.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
CS2110 Recitation Week 8. Hashing Hashing: An implementation of a set. It provides O(1) expected time for set operations Set operations Make the set empty.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
COSC 2007 Data Structures II
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Copyright © 2002, Systems and Computer Engineering, Carleton University Hashtable.ppt * Object-Oriented Software Development Unit 8.
CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is.
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 1 Lecture 9: Searching Data Structures.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Hash Tables - Motivation
CS201: Data Structures and Discrete Mathematics I Hash Table.
CSC 427: Data Structures and Algorithm Analysis
Data Structures and Algorithms Lecture (Searching) Instructor: Quratulain Date: 4 and 8 December, 2009 Faculty of Computer Science, IBA.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Chapter 11 Hash Anshuman Razdan Div of Computing Studies
“Never doubt that a small group of thoughtful, committed people can change the world. Indeed, it is the only thing that ever has.” – Margaret Meade Thought.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
COSC 1030 Lecture 10 Hash Table. Topics Table Hash Concept Hash Function Resolve collision Complexity Analysis.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
ISOM MIS 215 Module 5 – Binary Trees. ISOM Where are we? 2 Intro to Java, Course Java lang. basics Arrays Introduction NewbieProgrammersDevelopersProfessionalsDesigners.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
1 What is it? A side order for your eggs? A form of narcotic intake? A combination of the two?
Hashing CSE 2011 Winter July 2018.
Slides by Steve Armstrong LeTourneau University Longview, TX
Hashing Alexandra Stefan.
Data Structures and Algorithms for Information Processing
Searching.
Advanced Associative Structures
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hash Tables and Associative Containers
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hash Tables Computer Science and Engineering
Hash Tables Computer Science and Engineering
Collision Handling Collisions occur when different elements are mapped to the same cell.
Presentation transcript:

Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia, Main “associative memory”

Access to data  O( n ) - linked list, array  O(log n ) – sorted array, search tree  O(1) – array by index index access is O(1) because data location is found by computation, not search

Computed access to array data e.g. array of objects arr location in memory address of arr[12]: * 12 =

Access by Hashing  Hashing applies same concept at software level: access operations do not search for data keys; they compute data indexes index = f(data.key)  performance “almost” O(1)

Access example  student number is key : s  i = f(key) = % = 4092  data for student s is at location 4092 in data array arr[4092].key = “s ”  problems wasted storage – array must have elements competition for space: s and s iterated operations are more difficult

Access example key : “s ” i = f(key) = % = 4092 f(“s ”) = 4092 “s ” key

Hashing terminology  student number is key : s  i = f(key) = % = 4092  data for student s is at location 4092 in data array arr[4092].key = “s ”  problems wasted storage – array must have elements competition for space : s and s hash function hash table collision

Hashing Fact-of-Life Collisions are unavoidable Solution strategy:  minimize number of collisions  resolve the collisions that do occur

Hash functions for hash table of size n map key -> {0,n-1} typical function: key -> integer % n eg. // student number key int hash(String stuNo, int n) { return Integer.parseInt(stuNo.substring(1))%n; }

Hash function goals  simple as possible (speed)  distribute keys uniformly over indices (minimize collisions) two steps: 1.transform key to integer if necessary (hashCode()) 2.restrict integer to range of data array (hash())

Java’s hashCode() method public int hashCode()  Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable. The general contract of hashCode is: Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.  If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.  It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.equals(java.lang.Object)  As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)  Returns: a hash code value for this object.

Other hashing methods  hashCode can be overwritten for any class  hashCode usually should be overwritten fit actual data improve performance remove dependence on location in memory

Design model  equals() is based on key field match implies same record  hashCode function also based on key field key is used for access BUT hash function is also based on table size

Resizing table  if table is resized, all data must be re- entered into new array, not just copied e.g.: int hash(String stuNo, int n) { return Integer.parseInt(stuNo.substring(1))%n; } hash(“s ”,10000) => 4092 hash(“s ”, 6667) => 4026

Resolving collisions When a collision occurs on insertion:  internal store new element at another location in the table  external store new element outside the table

Linear probing  sequential search for next available location to store data when collision occurs eg. hash(key) -> index=4 if table[4] is occupied, try table[5] then table[6],…, until empty location found

Linear probing hash table after each insertion (Weiss) Data Structures & Problem Solving using JAVA/2E Mark Allen Weiss © 2002 Addison Wesley

fail find(58) delete(89) find(58) The Deletion Problem (Weiss, 2002)

Lazy deletion a a a a a a a a a a find(58) delete(89) find(58) insert(99) find(58) a a a a a a a a a d insert criterion value==-1 OR state==d valuestatevaluestate continue search criterion value!= a a a a a a a a a a valuestate

Linear probing performance Ideal performance depends on fraction of table that is full  k items in table of size n  probability of insertion collision: k/n=p  average probes to free space: n/(n-k) or 1/(1-p) e.g. table half full: 2 probes BUT…

Linear probing performance Linear probing for insertion produces primary clustering:  probability of insertion collision: (1+(1-p) -2 )/2 e.g. table half full: 2.5 probes

Illustration of primary clustering in linear probing (b) versus no clustering (a) and the less significant secondary clustering in quadratic probing (c). Long lines represent occupied cells, and the load factor is 0.7. Data Structures & Problem Solving using JAVA/2E Mark Allen Weiss © 2002 Addison Wesley

Linear probing performance

Clustering  primary clustering from linear probing  solution: alternate probing actions  e.g. quadratic probing constraint: minimize computation of probe

Clustering  primary clustering linear probing  secondary clustering different probes from different indices quadratic probing  even better: different probes for different keys at same index secondary hashing a a a a a a a a a a valuestate 16 linear

Probing comparison a a a a a a a a a a valuestate a a a a a a a a a a valuestate linearnon-linear a a a a a a a a a a valuestate 16 secondary hash 96

Secondary hashing  Hash function determines initial index  Secondary hash function determines step size for probe after collision

Table class – Main p.571 public class Table { private int manyItems; private Object[ ] keys; private Object[ ] data; private boolean[ ] hasBeenUsed; …

constructor public Table(int capacity) { if (capacity <= 0) throw new IllegalArgumentException("Capacity is negative"); keys = new Object[capacity]; data = new Object[capacity]; hasBeenUsed = new boolean[capacity]; }

search for an object by key public boolean containsKey(Object key) { return findIndex(key) != -1; } private int findIndex(Object key) { int count = 0; int i = hash(key); while (count < data.length && hasBeenUsed[i]) { if (key.equals(keys[i])) return i; count++; i = nextIndex(i); } return -1; }

wrap around indexing private int nextIndex(int i) { if (i+1 == data.length) return 0; else return i+1; }

get an object public Object get(Object key) { int index = findIndex(key); if (index == -1) return null; else return data[index]; }

insert a key and object public Object put(Object key, Object element) { int index = findIndex(key); Object answer; if (index != -1) // replace object for key { answer = data[index]; data[index] = element; return answer; } else if (manyItems < data.length) // new key and object { index = hash(key); while (keys[index] != null) index = nextIndex(index); keys[index] = key; data[index] = element; hasBeenUsed[index] = true; manyItems++; return null; } else // table is full { throw new IllegalStateException("Table is full."); }

remove a key and object public Object remove(Object key) { int index = findIndex(key); Object answer = null; if (index != -1) { answer = data[index]; keys[index] = null; data[index] = null; manyItems--; } return answer; }

Changing probe strategy double hash private int findIndex(Object key) { int count = 0; int i = hash1(key); int p = hash2(key); while (count < data.length && hasBeenUsed[i]) { if (key.equals(keys[i])) return i; count++; i = nextIndex(i,p); } return -1; } private int nextIndex(int i, int p) { return (i+p)%data.length; }

Picking good hash strategies  division hash functions prime table size (n) is required index is hashCode % n stepSize is1+ hashCode % (n-2) (Knuth: best if (n-2) also prime)  mid-square hashCode 2 – take ‘middle’ digits  multiplicative hashCode * r (0<r<1) – take fraction digits

External Hashing (Chaining)  array of linked lists of objects  for map, objects contain map entry pairs key hash function index 0123…0123… data pair

External Hashing (Chaining)  less sensitive to load factor  more memory access (list)  easier to manage

Comparison of Hashing Performance

Analysis of performance Linear probing:  ½(1 + 1/(1-α)) comparisons for successful search where α is load factor (Knuth)  assumptions: uniform hashing no deletions  e.g., 1365 entries in table of 1709 α =.80, expect 3 comparisons

Analysis of performance Double hashing:  -ln(1- α )/ α comparisons for successful search where α is load factor (Knuth)  assumptions: uniform hashing no deletions  e.g., 1365 entries in table of 1709 α =.80, expect 2 comparisons

Analysis of performance Chained hashing:  1+ α /2 comparisons for successful search where α is load factor  assumptions: uniform hashing  e.g., 1365 entries in table of 1709 α =.80, expect 1.4 comparisons

Hash table summary  hash table – array  computed access into array based on key  n to 1 relation of keys to indexes  collisions  collision resolution open hashing double hashing chained hashing

JAVA Collections Interfaces  Collection List  Queue Set  SortedSet Map  SortedMap Implementations  array (resizable)  linked list  balanced search tree  hash table  hash table plus linked list

hashed implementations  HashSet implements Set  HashMap implements Map constructors: capacity load factor  performance