CSE 373: Data Structures and Algorithms

Slides:



Advertisements
Similar presentations
Hash Tables and Sets Lecture 3. Sets A set is simply a collection of elements Unlike lists, elements are not ordered Very abstract, general concept with.
Advertisements

Hashing as a Dictionary Implementation
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Maps, Dictionaries, Hashtables
Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
Lecture 11 March 5 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
CSE 373 Data Structures and Algorithms Lecture 18: Hashing III.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Hashing CS 202 – Fundamental Structures of Computer Science II Bilkent.
Search Algorithm Lecture Chapter 15-2 Algorithms for SELECT and JOIN Operations Implementing the SELECT Operation : Search Methods for Simple Selection:
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Hashing as a Dictionary Implementation Chapter 19.
The Map ADT and Hash Tables. 2 The Map ADT  Map: An abstract data type where a value is "mapped" to a unique key  Need a key and a value to insert new.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
CS 206 Introduction to Computer Science II 11 / 16 / 2009 Instructor: Michael Eckmann.
Building Java Programs Bonus Slides Hashing. 2 Recall: ADTs (11.1) abstract data type (ADT): A specification of a collection of data and the operations.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
CSE 373: Data Structures and Algorithms Lecture 16: Hashing III 1.
Implementing the Map ADT.  The Map ADT  Implementation with Java Generics  A Hash Function  translation of a string key into an integer  Consider.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Building Java Programs Generics, hashing reading: 18.1.
TCSS 342, Winter 2006 Lecture Notes
COMP 53 – Week Eleven Hashtables.
CSE373: Data Structures & Algorithms Lecture 6: Hash Tables
CSCI 210 Data Structures and Algorithms
Recitation 13 Searching and Sorting.
Hashing CSE 2011 Winter July 2018.
Hashing Alexandra Stefan.
Week 8 - Wednesday CS221.
Hashing Alexandra Stefan.
Efficiency add remove find unsorted array O(1) O(n) sorted array
Hash Tables.
Road Map CS Concepts Data Structures Java Language Java Collections
Hash table another data structure for implementing a map or a set
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
Advanced Associative Structures
Building Java Programs
CSE 373: Data Structures and Algorithms
Dictionaries and Their Implementations
Searching Tables Table: sequence of (key,information) pairs
slides created by Marty Stepp and Hélène Martin
CSE 373: Data Structures and Algorithms
EE 422C Sets.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
slides adapted from Marty Stepp and Hélène Martin
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
TCSS 342, Winter 2005 Lecture Notes
Hashing Alexandra Stefan.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
CSE 373: Data Structures and Algorithms
Advanced Implementation of Tables
CSE 143 Lecture 25 Set ADT implementation; hashing read 11.2
CSE 373 Data Structures and Algorithms
CSE 373 Separate chaining; hash codes; hash maps
slides created by Marty Stepp
Building Java Programs
Hashing based on slides by Marty Stepp
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
slides created by Marty Stepp and Hélène Martin
Data Structures and Algorithm Analysis Hashing
CSE 373 Set implementation; intro to hashing
CSE 373: Data Structures and Algorithms
Presentation transcript:

CSE 373: Data Structures and Algorithms Lecture 14: Hashing

Set ADT set: A collection that does not allow duplicates We don't think of a set as having indexes; we just add things to the set in general and don't worry about order basic set operations: insert: Add an element to the set (order doesn't matter). remove: Remove an element from the set. search: Efficiently determine if an element is a member of the set. set.contains("to") true set "the" "of" "from" "to" "she" "you" "him" "why" "in" "down" "by" "if" set.contains("be") false

Implementing Set ADT Insert Remove Search Unsorted array (1) (n) (log(n)+n) (log(n) + n) (log(n)) Linked list BST (if balanced) O(log n)

A different tactic How do you check to see if a word is in the dictionary? linear search? binary search? A – Z tabs?

Hash tables table maintains b different "buckets" buckets are numbered 0 to b - 1 hash function maps elements to value in 0 to b – 1 operations use hash to determine which bucket an element belongs in and only searches/modifies this one bucket … b - 1 hash func. h(element) elements (e.g., strings) hash table

Hashing, hash functions The idea: somehow we map every element into some index in the array ("hash" it); this is its one and only place that it should go Lookup becomes constant-time: simply look at that one slot again later to see if the element is there insert, remove, search all become O(1) ! For now, let's look at integers (int) a "hash function" h for int is trivial: store int i at index i (a direct mapping) if i >= array.length, store i at index (i % array.length) h(i) = i % array.length

Simple Integer Hash Functions elements = integers TableSize = 10 h(i) = i % 10 Insert: 7, 18, 41, 34 1 2 3 4 5 6 7 8 9 K mod 10 ( K % 10)

Simple Integer Hash Functions elements = integers TableSize = 10 h(i) = i % 10 Insert: 7, 18, 41, 34 1 2 3 4 5 6 7 8 9 K mod 10 ( K % 10)

Simple Integer Hash Functions elements = integers TableSize = 10 h(i) = i % 10 Insert: 7, 18, 41, 34 1 2 3 4 5 6 7 8 9 K mod 10 ( K % 10)

Simple Integer Hash Functions elements = integers TableSize = 10 h(i) = i % 10 Insert: 7, 18, 41, 34 1 2 3 4 5 6 7 8 18 9 K mod 10 ( K % 10)

Simple Integer Hash Functions elements = integers TableSize = 10 h(i) = i % 10 Insert: 7, 18, 41, 34 1 2 3 4 5 6 7 8 18 9 K mod 10 ( K % 10)

Simple Integer Hash Functions elements = integers TableSize = 10 h(i) = i % 10 Insert: 7, 18, 41, 34 1 41 2 3 4 5 6 7 8 18 9 K mod 10 ( K % 10)

Simple Integer Hash Functions elements = integers TableSize = 10 h(i) = i % 10 Insert: 7, 18, 41, 34 1 41 2 3 4 5 6 7 8 18 9 K mod 10 ( K % 10)

Simple Integer Hash Functions elements = integers TableSize = 10 h(i) = i % 10 Insert: 7, 18, 41, 34 1 41 2 3 4 34 5 6 7 8 18 9 K mod 10 ( K % 10)

Hash function example 1 41 2 3 4 34 5 6 7 8 18 9 h(i) = i % 10 1 41 2 3 4 34 5 6 7 8 18 9 h(i) = i % 10 result is constrained to a range distributes keys over a range result is stable constant-time lookup: just look at i % 10 again later We lose all ordering information: getMin, getMax, removeMin, removeMax the various ordered traversals printing items in sorted order

Hash function for strings elements = Strings let's view a string by its letters: String s : s0, s1, s2, …, sn-1 how do we map a string into an integer index? (how do we "hash" it?) one possible hash function: treat first character as an int, and hash on that h(s) = s0 % TableSize Is this a good hash function? When will strings collide?

Better string hash functions view a string by its letters: String s : s0, s1, s2, …, sn-1 another possible hash function: treat each character as an int, sum them, and hash on that h(s) = % TableSize What's wrong with this hash function? When will strings collide? a third option: perform a weighted sum of the letters, and hash on that h(s) = % TableSize

Hash collisions 1 21 2 3 4 34 5 6 7 8 18 9 collision: the event that two hash table elements map into the same slot in the array example: add 7, 18, 41, 34, then 21 21 hashes into the same slot as 41! 21 should not replace 41 in the hash table; they should both be there collision resolution: means for fixing collisions in a hash table

Chaining chaining: All keys that map to the same hash value are kept in a linked list 1 2 3 4 5 6 7 8 9 10 22 12 42 107

Load factor load factor: ratio of elements to capacity load factor = size / capacity = 5 / 10 = 0.5 1 2 3 4 5 6 7 8 9 10 22 12 42 107

Analysis of hash table search analysis of search, with chaining: unsuccessful:  the average length of a list at hash(i) successful: 1 + (/2) one node, plus half the avg. length of a list (not including the item)

Implementing Set with Hash Table Each Set entry adds an element to the table hash function will tell us where to put the element in the hash table Hash table organized for constant time insert, remove, and search

Implementing Set with Hash table public interface StringSet { public boolean add(String value); public boolean contains(String value); public void print(); public boolean remove(String value); public int size(); }

StringHashEntry public class StringHashEntry { public String data; // data stored at this node public StringHashEntry next; // reference to the next entry // Constructs a single hash entry. public StringHashEntry(String data) { this(data, null); } public StringHashEntry(String data, StringHashEntry next) { this.data = data; this.next = next;

StringHashSet class public class StringHashSet implements StringSet { private static final int DEFAULT_SIZE = 11; private StringHashEntry[] table; private int size; ... } Client code talks to the StringHashSet, not to the entry objects stored in it The array (table) is of StringHashEntry each element in the array is a linked list of elements that have the same hash

Set implementation: search public boolean contains(String value) { // figure out where value should be... int valuePosition = hash(value); // check to see if the value is in the set StringHashEntry temp = table[valuePosition]; while (temp != null) { if (temp.data.equals(value)) { return true; } temp = temp.next; // otherwise, the value was not found return false;

Set implementation: insert Similar structure to contains Calculate hash of new element Check if the element is already in the set Add the element to the front of the list that is at table[hash(value)]

Set implementation: insert public boolean add(String value) { int valuePosition = hash(value); // check to see if the value is already in the set StringHashEntry temp = table[valuePosition]; while (temp != null) { if (temp.data.equals(value)) { return false; } temp = temp.next; // add the value to the set StringHashEntry newEntry = new StringHashEntry(value, table[valuePosition]); table[valuePosition] = newEntry; size++; return true;