1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.

Slides:



Advertisements
Similar presentations
CSCI 2720 Hashing   Spring 2005.
Advertisements

Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Hash Tables.
Hashing.
The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
Hashing as a Dictionary Implementation
September 26, Algorithms and Data Structures Lecture VI Simonas Šaltenis Nykredit Center for Database Research Aalborg University
Hashing Techniques.
Hashing CS 3358 Data Structures.
1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Hash Tables and Associative Containers CS-212 Dick Steflik.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 19: B-Trees: Data Structures for Disk.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Lecture 10: Search Structures and Hashing
Tirgul 7 Heaps & Priority Queues Reminder Examples Hash Tables Reminder Examples.
Data Structures Hash Table (aka Dictionary) i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst, Brian Hayes, Andreas Veneris, Glenn Brookshear,
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 17: Binary Search Trees; Heaps.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
MA/CSSE 473 Day 28 Hashing review B-tree overview Dynamic Programming.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
Spring 2015 Lecture 6: Hash Tables
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Implementing Dictionaries Many applications require a dynamic set that supports dictionary-type operations such as Insert, Delete, and Search. E.g., a.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
1 Hash table. 2 A basic problem We have to store some records and perform the following:  add new record  delete record  search a record by key Find.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Comp 335 File Structures Hashing.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 5. Abstract Data Structures & Algorithms 5.2 Static Data Structures.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
Data Structures and Algorithms Lecture (Searching) Instructor: Quratulain Date: 4 and 8 December, 2009 Faculty of Computer Science, IBA.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Tables. 2 Exercise 2 /* Exercise 1 */ void mystery(int n) { int i, j, k; for (i = 1; i
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
October 6, Algorithms and Data Structures Lecture VII Simonas Šaltenis Aalborg University
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
Data Structure & Algorithm Lecture 8 – Hashing JJCAO Most materials are stolen from Prof. Yoram Moses’s course.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
1 i206: Lecture 12: Hash Tables (Dictionaries); Intro to Recursion Marti Hearst Spring 2012.
Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
Hash Table.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Algorithms and Data Structures Lecture VI
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Presentation transcript:

1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables

2 Unresolved Question on Heaps Q: What happens if there is more than one item to swap with? A: Swap with the larger one.

3 Slide copyright 1999 Addison Wesley Longman ¶Move the last node onto the root Removing the Top of the Heap

4 Slide copyright 1999 Addison Wesley Longman ¶Move the last node onto the root. ·Push the out-of-place node downward, swapping with its larger child until the new node reaches an acceptable location

5 Slide copyright 1999 Addison Wesley Longman ¶Move the last node onto the root. ·Push the out-of-place node downward, swapping with its larger child until the new node reaches an acceptable location

6 Slide copyright 1999 Addison Wesley Longman ¶Move the last node onto the root. ·Push the out-of-place node downward, swapping with its larger child until the new node reaches an acceptable location

7 Hash Tables Very useful data structure –Good for storing and retrieving key/value pairs Often in constant time! –Not good for iterating through a list of items Example applications: –Storing posting lists in Information Retrieval For each word, a list of which documents it occurs in This assumes you will not be looking up words in alphabetical order –Storing objects according to ID numbers When the ID numbers are widely spread out When you don’t need to access items in ID order

8 Slide adapted from lecture by Andreas Veneris How can you store all Social security numbers in an array and have O(1) access? –Use an array with range ,999,999 –This will give you O(1) access time but … –…considering there are approx. 32,000,000 people in Canada you waste 1,000,000,000-32,000,000 array entries! Problem: The range of key values we are mapping is too large (0-999,999,999) when compared to the # of keys (American citizens) Why Not Arrays?

9 Slide adapted from lecture by Andreas Veneris Hash Tables We want a data structure that, given a collection of n keys, implements the dictionary operations Insert(), Delete() and Search() efficiently. Binary search trees: can do that in O(log n) time and are space efficient. Arrays: can do this in O(1) time but they are not space efficient. Hash Tables: A generalization of an array that under some reasonable assumptions is O(1) for Insert/Delete/Search of a key

10 Slide adapted from lecture by Andreas Veneris Hash Tables solve this problem by using a much smaller array and mapping keys with a hash function. Let universe of keys U and an array of size m. A hash function h is a function from U to 0…m, that is: h : U 0…m Hash Tables U ( universe of keys ) k 1 k 2 k 3 k 4 k h (k 2 )=2 h (k 1 )= h (k 3 )=3 h (k 6 )=5 h (k 4 )=7

11 The mod function Stands for modulo When you divide x by y, you get a result and a remainder Mod is the remainder –8 mod 5 = 3 –9 mod 5 = 4 –10 mod 5 = 0 –15 mod 5 = 0 Thus for A mod M, multiples of M give the same result, 0 –But multiples of other numbers do not give the same result –So what happens when M is a prime number?

12 Slide adapted from lecture by Andreas Veneris Hash Tables: Example For example, if we hash keys 0…1000 into a hash table with 5 entries and use h ( key) = key mod 5, we get the following sequence of events: key data Insert 2 2 … key data Insert 21 2 … 21 … key data Insert 34 2 … 21 … 34 … Insert 54 There is a collision at array entry #4 ???

13 Slide adapted from lecture by Andreas Veneris The problem arises because we have two keys that hash in the same array entry, a collision. There are two ways to resolve collision: –Hashing with Chaining: every hash table entry contains a pointer to a linked list of keys that hash in the same entry –Hashing with Open Addressing: every hash table entry contains only one key. If a new key hashes to a table entry which is filled, systematically examine other table entries until you find one empty entry to place the new key Dealing with Collisions

14 Slide adapted from lecture by Andreas Veneris Hashing with Chaining The problem is that keys 34 and 54 hash in the same entry (4). We solve this collision by placing all keys that hash in the same hash table entry in a LIFO list (chain or bucket) pointed by this entry: other key key data Insert CHAIN Insert

15 Slide adapted from lecture by Andreas Veneris What is the running time to insert/search/delete? –Insert: It takes O(1) time to compute the hash function and insert at head of linked list –Search: It is proportional to max linked list length –Delete: Same as search Therefore, in the unfortunate event that we have a “bad” hash function all n keys may hash in the same table entry giving an O(n) run-time! So how can we create a “good” hash function? Hashing with Chaining

16 Slide adapted from lecture by Andreas Veneris Hash functions are “good” provided that the keys satisfy (approximately) the assumption of: Uniform hashing: –each key is equally likely to hash in any of the m slots (sometimes unrealistic) Hashing with Chaining

17 Slide adapted from lecture by Andreas Veneris Division Method Certain values of m may not be good: –When m = 2 p then h (k) is the p lower-order bits of the key –Good values for m are prime numbers which are not close to exact powers of 2. For example, if you want to store 2000 elements then m=701 (m = hash table length) yields a hash function: h (k) = k mod m h (key) = k mod 701

18 Slide adapted from lecture by Andreas Veneris Choosing a Hash Function The performance of the hash table depends on a having a hash function which evenly distributes the keys. Choosing a good hash function requires taking into account the kind of data that will be used. –The statistics of the key distribution needs to be accounted for. –E.g., Choosing the first letter of a last name will cause problems depending on the nationality of the population Most programming languages (including java) have hash functions built in.

19 Slide adapted from lecture by Hector Garcia-Molina Rule of thumb: Try to keep space utilization (load factor) between 50% and 80% Load factor = _ # keys used___ total # slots in table If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket

20 Slide adapted from lecture by Andreas Veneris Hashing with Open Addressing So far we have studies hashing with chaining, using a list to store keys that hash to the same location. Another option is to store all the keys directly in the table. Open addressing –collisions are resolved by systematically examining other table indexes, i 0, i 1, i 2, … until an empty slot is located.

21 Slide adapted from lecture by Andreas Veneris Open Addressing The key is first mapped to a slot: If there is a collision subsequent probes are performed: Linear Probing: –When c=1 the collision resolution is done as a linear search.

22 Double Hashing Apply a second hash function after the first We won’t worry about details, but the following charts show –Double hashing faster than linear probing –But bucket chains faster than double hashing

23 Slide adapted from lecture by Andreas Veneris

24 Slide adapted from lecture by Andreas Veneris

25 Slide adapted from lecture by Hector Garcia-Molina Hashing good for probes given key e.g., SELECT … FROM R WHERE R.A = 5 Indexing vs Hashing

26 Hash Tables in Java Java includes a Hashtable class –In java.util.* –Keys are objects; you have to use casting. As a programmer, you don’t see the collision detection, chaining, etc Uses open hashing (chaining) You can set –The initial table size –The load factor Default is.75 To change the hash function –Write your own version of Hashtable that extends java.util.Dictionary See

27

28 Main Input File Output

29 Slide adapted from lecture by Hector Garcia-Molina INDEXING (Including B Trees) good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5 Indexing vs Hashing

30 Hash Tables vs. Search Trees Hash tables great for selecting individual items –Fast search and insert –O(1) if the table size and hash function are chosen well –Good for access data that is stored on disk BUT –Hash trees inefficient for finding sets of information with similar keys E.g. searching along a date range –We often need this in text and DBMS applications –Search trees are better for this

31 Storing Data on Disk Hash tables are useful for information stored on disk as well –B-trees good for this too Can load the table in memory –The data items point to location on disk Or can have both the table and the data on disk –Load in the parts of the table currently in use into memory as they are accessed. –Useful for VERY large tables.

32 Next Time B-Trees –Very important for database and IR applications