COMP 103 Hashing. 2 RECAP-TODAY RECAP Bitmaps are a fast way to implement Sets of integers, characters, etc TODAY  Hashing is a similar idea  Detecting.

Slides:



Advertisements
Similar presentations
Hashing as a Dictionary Implementation
Advertisements

Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Hashing Techniques.
Maps, Dictionaries, Hashtables
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hashing General idea: Get a large array
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
CS2110 Recitation Week 8. Hashing Hashing: An implementation of a set. It provides O(1) expected time for set operations Set operations Make the set empty.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
Hash Table March COP 3502, UCF.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
COMP 103 Priority Queues, Partially Ordered Trees and Heaps.
COMP 103 Hashing 2013-T2 Lecture 28 Thomas Kuehne School of Engineering and Computer Science, Victoria University of Wellington  Marcus Frean, Lindsay.
The Java Collections Framework (Part 2) By the end of this lecture you should be able to: Use the HashMap class to store objects in a map; Create objects.
Hash Tables1   © 2010 Goodrich, Tamassia.
COMP 103 Hashing 2014-T2 Lecture 32 Marcus Frean School of Engineering and Computer Science, Victoria University of Wellington  Marcus Frean, Lindsay.
1 5. Abstract Data Structures & Algorithms 5.2 Static Data Structures.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing as a Dictionary Implementation Chapter 19.
CS201: Data Structures and Discrete Mathematics I Hash Table.
CSC 427: Data Structures and Algorithm Analysis
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Chapter 11 Hash Anshuman Razdan Div of Computing Studies
1 CSC 427: Data Structures and Algorithm Analysis Fall 2011 Space vs. time  space/time tradeoffs  hashing  hash table, hash function  linear probing.
“Never doubt that a small group of thoughtful, committed people can change the world. Indeed, it is the only thing that ever has.” – Margaret Meade Thought.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Building Java Programs Bonus Slides Hashing. 2 Recall: ADTs (11.1) abstract data type (ADT): A specification of a collection of data and the operations.
CS261 Data Structures Hash Tables Open Address Hashing.
COMP 103 Hashing (II), and exam tips 2014-T2 Lecture 33 Marcus Frean School of Engineering and Computer Science, Victoria University of Wellington  Marcus.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
COSC 1030 Lecture 10 Hash Table. Topics Table Hash Concept Hash Function Resolve collision Complexity Analysis.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
2015-T2 Lecture 30 School of Engineering and Computer Science, Victoria University of Wellington  Lindsay Groves, Marcus Frean, Peter Andreae, and Thomas.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
1 the hash table. hash table A hash table consists of two major components …
Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,
Searching Tables Table: sequence of (key,information) pairs (key,information) pair is a record key uniquely identifies information, so no duplicate records.
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
Java Methods A & AB Object-Oriented Programming and Data Structures Maria Litvin ● Gary Litvin Copyright © 2006 by Maria Litvin, Gary Litvin, and Skylight.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
Building Java Programs Generics, hashing reading: 18.1.
COMP 103 Course Review. 2 Menu  A final word on hash collisions in Open Addressing / Probing  Course Summary  What we have covered  What you should.
Sets and Maps Chapter 9.
COMP 103 Hashing Marcus Frean 2015-T2 Lecture 31
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
CS202 - Fundamental Structures of Computer Science II
Hash Tables Computer Science and Engineering
Feedback from Assignment 2
Sets and Maps Chapter 9.
Algorithms: Design and Analysis
Collision Handling Collisions occur when different elements are mapped to the same cell.
Presentation transcript:

COMP 103 Hashing

2 RECAP-TODAY RECAP Bitmaps are a fast way to implement Sets of integers, characters, etc TODAY  Hashing is a similar idea  Detecting collisions  Dealing with collisions:  Buckets/chaining  Probing

3 O(1) Sets with big values? ✔ We need a way to compute an array index for an object: add(“Show me the book that refers to all the books that don’t reference themselves”) “Hashing”: the number is the “hash code” of object N ✔✗✔✔✗✗✗✗✗✗✗ ⋯⋯ ✗ Hash function 581

4 O(1) Sets with big values?  But there are too many possible sentences!!  Suppose the hash function always produces a number between (say) 0 and 10,000… ⇒ some sentences must end up with the same number!! ⇒ “Collision” N ✔✗✔✔✗✗✗✗✗✗✗ ⋯⋯ ✔✗ HASH “ Show me the book th… ” “ The barber shaves everyone who doesn’t shave themselves ” HASH

5 Detecting collisions  Store the item in the array, instead of just a boolean  When asked about an item, or adding one  need to check if there is a different item there already.  Questions: 1. How do you compute the hash code? 2. What do you do about collisions? N ⋯⋯ “ Show me the book th… ” “ The barber shaves everyone… ” HASH

6 A HashSet private E[ ] data ; public boolean contains(E value) { int hash = Math.abs(value.hashCode() % data.length); if (data[hash] == null) return false; else if (data[hash].equals(value)) return true; else //Collision !!! } public boolean add(E value) { int hash = Math.abs(value.hashCode() % data.length); if (data[hash] == null) { data[hash] = value; size++; return true; } else if (data[hash].equals(value)) return false; else //Collision !!! } Cost is independent of number of items in Set Cost is determined by cost of hashCode() must be consistent

7 Computing Hash Codes Wish list:  Should produce an integer  Should distribute the hash codes evenly through the range. minimises collisions  Should be fast to compute  Should take account of all components of the object  Must be consistent with equals() !!! two items that are equal must have the same hash value. Can we avoid clashes altogether? That would be perfect!

8 A Simple Hash Function for Strings  We could add up the codes of all the characters: private int hash(String value) { int hash = 0; for (int i = 0; i < value.length(); i++) hash += value.charAt(i); return hash; } So why is this not very good?

9 Example: Hashing course codes  Course codes: four letters and three digits 418 ← DEAF ← DEAF102 DEAF201 ⋮ 429 ← BBSC201 MDIA ← ECHI410 MDIA102 MDIA ← ECHI303 JAPA111 JAPA201 MDIA202 MDIA220 MDIA ← ARCH101 ASIA101 BBSC231 BBSC303 BBSC321 CHEM201 ECHI403 ECHI412 JAPA112 JAPA211 JAPA301 MDIA203 MDIA302 MDIA320 ⋮ 450 ← ANTH412 ARCH389 ARTH111 BIOL228 BIOL327 BIOL372 CHEM489 COML304 COML403 COML421 COMP102 COMP201 CRIM313 CRIM421 DESN215 DESN233 ECON328 ECON409 ECON418 ECON508 EDUC449 EDUC458 EDUC548 EDUC557 ENGL228 ENGL408 ENGL426 ENGL435 ENGL444 ENGL453 FREN124 FREN331 FREN403 FREN412 GEOL362 GEOL407 GERM214 GERM403 GERM412 INFO213 INFO312 INFO402 ITAL206 ITAL215 LALS501 LATI404 LING224 LING323 LING404 MAOR102 MARK304 MARK403 MATH206 MATH314 MATH323 MATH431 MOFI403 PHIL104 PHIL203 PHIL302 PHIL320 PHIL401 PHIL410 RELI321 RELI411 SAMO101 ⋮ ie. a lot of collisions!

10 Better Hash Functions  Make the contribution of each character depend on its position: private int hash(String course) { int k = 257; int hash = 0; for (int i = 0; i < course.length(); i ++ ) hash = hash*k + course.charAt(i); return hash; } k 6 x c 0 + k 5 x c 1 + k 4 x c 2 + k 3 x c 3 + k 2 x c 4 + k 1 x c 5 + c 6 (it is best to use a prime number for the constant)

11 Perfect Hash Functions  Perfect hash function gives no collisions for a given data set!  Example - for VUW courses (in ): private int hash(String course) { int hash = 0; for (int i = 0; i < course.length(); i++) hash = (hash * 51 + course.charAt(i)) % 72201; return hash; }  Building a perfect hash function:  very difficult  very specific to a particular set of possible values  only useful in very specialised circumstances

12 Designing Hash Functions  Integers:  use the value itself (mod table size)  Strings:  use a standard method  Objects with several fields, Arrays, Lists  combine the hash values of the individual fields/cells each multiplied by a position (like String)  Sets (where order doesn’t matter)  ?  Look up a text book  This is a specialised and well studied topic!

13 Java and hashCode  All objects have a hashCode method and an equals method, so:  you can call equals on any object  and you can put any object into a HashSet, HashMap, …  Many predefined objects (eg String) have good equals and hashCode methods defined.  The default equals method:  are the references identical? ie, equals is ==  if this is not what you want, define your own equals method  The default hashCode (eg. for user defined classes):  returns an integer based on the reference (pointer value)  So… if you redefine equals, you should redefine hashCode too!

14 Dealing with Collisions  Two approaches:  Use a collection at each place (“buckets” or “chaining”)  Look for an empty place in the hashtable (“probing” or “open addressing”) N ⋯⋯ “ The barber shaves everyone… ” HASH “ Show me the book th… ” HASH “ Show me the book th… ” ✔ ✘

15 Collisions: chaining / buckets  Store a Set in each cell: hash value → which set  Performance?  if the array is of size k, each subset will be about 1/k th of size().  cost ≈ cost(hashCode) + cost (subset) ant fox hen dog bee kea cow elk owl pig sow tui ape bat bug cat eel gnu jay nit ray yak cod roe What kind of Sets ? This is what Java's HashMap does. If the sets get too big.... just rehash!

16 Linear Probing Hash value tells us where to start looking.  if value.hashCode() → p start at index p if cell is used, try p+1, p+2, p+3 … wrap round to 0 at the end of the array: “the division method” hash = (name[0]+name[1])% Sam Steve StigStu Sven Sun (3) (2) (5) (4) (2)