Hash Tables and Sets Lecture 3. Sets A set is simply a collection of elements Unlike lists, elements are not ordered Very abstract, general concept with.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Hash Tables.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Hash Tables,
Hashing.
CSE 1302 Lecture 23 Hashing and Hash Tables Richard Gesick.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CSCE 3400 Data Structures & Algorithm Analysis
Maps, Dictionaries, Hashtables
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hashing General idea: Get a large array
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Maps A map is an object that maps keys to values Each key can map to at most one value, and a map cannot contain duplicate keys KeyValue Map Examples Dictionaries:
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Hashing CS 202 – Fundamental Structures of Computer Science II Bilkent.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Comp 335 File Structures Hashing.
CS 61B Data Structures and Programming Methodology July 17, 2008 David Sun.
Search  We’ve got all the students here at this university and we want to find information about one of the students.  How do we do it?  Linked List?
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Hashing Hashing is another method for sorting and searching data.
The Map ADT and Hash Tables. 2 The Map ADT  Map: An abstract data type where a value is "mapped" to a unique key  Need a key and a value to insert new.
CS201: Data Structures and Discrete Mathematics I Hash Table.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
Data Structures and Algorithms Lecture (Searching) Instructor: Quratulain Date: 4 and 8 December, 2009 Faculty of Computer Science, IBA.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Building Java Programs Bonus Slides Hashing. 2 Recall: ADTs (11.1) abstract data type (ADT): A specification of a collection of data and the operations.
CS261 Data Structures Hash Tables Open Address Hashing.
October 6, Algorithms and Data Structures Lecture VII Simonas Šaltenis Aalborg University
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Building Java Programs Generics, hashing reading: 18.1.
Sets and Maps Chapter 9.
Sections 10.5 – 10.6 Hashing.
Efficiency add remove find unsorted array O(1) O(n) sorted array
Hash Tables.
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
Hashing CS2110 Spring 2018.
Building Java Programs
Hashing CS2110.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CSE 373: Data Structures and Algorithms
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Sets and Maps Chapter 9.
slides created by Marty Stepp
Hash Maps Introduction
Data Structures and Algorithm Analysis Hashing
CSE 373: Data Structures and Algorithms
Presentation transcript:

Hash Tables and Sets Lecture 3

Sets A set is simply a collection of elements Unlike lists, elements are not ordered Very abstract, general concept with broad usefulness: The set of all Google search queries from the past 24 hours The set of all photos with your face in them The set of all files in a folder How are sets represented in computers? Consider the following problem: We want to store a large set of approx. 10 million random numbers The following operations are happening constantly: Add – inserting a new number into the set Delete – removing an existing element from the set Lookup – checking if a new random number is in the set

Representing Sets Suppose we use an ArrayList for this heavily churning set: Add, Delete, and Lookup are all O(n) Suppose the ArrayList is sorted: Lookup is O(log n) Add/Delete are still O(n) Cleverer algorithms: Self-balancing trees: Lookup, Add, and Delete are guaranteed O(log n) Hash tables: Lookup, Add, and Delete are worst-case O(n) … but on average O(1)

Using Buckets Lets go back to ArrayLists, but use a different approach: Create 2 ArrayLists Even numbers go in the first list Odd numbers go in the second list Now, Add/Delete/Lookup only take half the work: Check if the number is even or odd Get the right ArrayList Search through about 5 million entries instead of 10 million This is promising! … but still O(n)

Using Buckets Yet another approach: Instead of two different ArrayLists, lets use 4 Multiples of 4 go in the first list Multiples of 4 have the property (x % 4) == 0 If (x % 4) == 1, then x goes in the second list If (x % 4) == 2, then x goes in the third list If (x % 4) == 3, then x goes in the fourth list Now, Add/Delete/Lookup only take ¼ as much work: Calculate the number mod 4 Find the right list Search through 2.5 million elements instead of 10 million This is even better! … but still O(n)

Using Buckets Yet another approach: use 10 million buckets! If the numbers are truly randomly distributed, then: Some buckets may be empty Some buckets may have 2 or even 100 elements On average, each bucket has close to 1 element Suddenly, Add/Delete/Lookup become very cheap – O(1) As long as we scale up the number of buckets to match the amount of data, we can maintain O(1) lookup This is a hash table!

Hash Functions In our example, we were only storing integers We can use this to store arbitrary data, as long as one thing is provided: A hash function What is a hash function? A function that converts any data into an integer This integer is used to determine which bucket in which to store the data The hash function must ensure fairly even distribution in the table. More on this later.

Example Hash Function Suppose we wish to store a set of strings instead of integers We need a hash function Heres a simple one: a = 1, b = 2, c = 3, …, z = 26 Sum the value of each letter asdf.hashCode() = a + s + d + f = = 30 asdf goes in the 30 th bucket

Hash Collisions This hash function has some problems: It only deals with English letters We can solve this by using the ASCII or Unicode value of the character instead of its index in the English alphabet It is prone to collisions A hash collision is when two or more distinct values have the same hash code In example hash function, all anagrams collide: least = 57 steal = 57 stale = 57 Therefore, this hash table would be very bad for storing sets of anagrams! It would degenerate into using a single ArrayList, as one bucket would be used.

Generalizing What exactly is a hash table? Given elements that have a hash function, hash tables are just arrays! Each array element is an ArrayList in order to resolve collisions Number of buckets is proportional to number of elements in the set Expliot time-memory tradeoff to get quick lookup times Array is resized when hash table gets too full Load factor: The ratio of filled hash table slots to total slots Load factor is 0.0 when the hash table is empty and 1.0 when every bucket has at least one element When load factor reaches a certain value, 0.75 in our case, the array gets larger to maintain sparseness Hash tables can get much more complicated than this, but the fundamentals remain the same.

The Lab In this lab, we have implemented a very simple hash table SimpleHashTable.java It is so simple that it cannot handle collisions! Each bucket isnt an ArrayList – its just a single element when full, or null if empty Your task is to modify the code and implement collision resolution This means that each array slot should be an ArrayList instead of merely an Object

Java Generics You will see some strange angle-bracket notation: ArrayList, SimpleHashTable If parentheses indicate function arguments, then angle brackets indicate type arguments Type arguments are a way of specifying data structures that work on various types: ArrayList has: void add(String arg0) String get(int index) SimpleHashSet has: void add(Integer arg0) boolean contains(Integer arg0)

Operations to Implement SimpleHashSet.java: public void add(T element) public boolean contains(T element) public boolean remove(T element) public void clear() public boolean isEmpty() public int size() Some of these may remain unchanged You will also have to edit the private members and reimplement some private methods