CSC211 Data Structures Lecture 20 Hashing Instructor: Prof. Xiaoyan Li Department of Computer Science Mount Holyoke College.

Slides:



Advertisements
Similar presentations
Chapter 5 introduces the often- used data structure of linked lists. This presentation shows how to implement the most common operations on linked lists.
Advertisements

CSC211 Data Structures Lecture 9 Linked Lists Instructor: Prof. Xiaoyan Li Department of Computer Science Mount Holyoke College.
Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common approach to the.
Chapter 8 introduces the technique of recursive programming. As you have seen, recursive programming involves spotting smaller occurrences of a problem.
Container Classes A container class is a data type that is capable of holding a collection of items. In C++, container classes can be implemented as.
L l One of the tree applications in Chapter 9 is binary search trees. l l In Chapter 9, binary search trees are used to implement bags and sets. l l This.
 One of the tree applications in Chapter 10 is binary search trees.  In Chapter 10, binary search trees are used to implement bags and sets.  This presentation.
CSC212 Data Structure Lecture 14 Binary Search Trees Instructor: Prof. George Wolberg Department of Computer Science City College of New York.
Searching “It is better to search, than to be searched” --anonymous.
 Chapter 11 has several programming projects, including a project that uses heaps.  This presentation shows you what a heap is, and demonstrates two.
CSC212 Data Structure - Section AB Lecture 20 Hashing Instructor: Edgardo Molina Department of Computer Science City College of New York.
Introduction This chapter explores graphs and their applications in computer science This chapter explores graphs and their applications in computer science.
 Chapter 7 introduces the stack data type.  Several example applications of stacks are given in that chapter.  This presentation shows another use called.
P Chapter 6 introduces the stack data type. p Several example applications of stacks are given in that chapter. p This presentation shows another use called.
  Chapter 12 presents several common algorithms for sorting an array of integers.   Two slow but simple algorithms are Selectionsort and Insertionsort.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
P p Chapter 10 has several programming projects, including a project that uses heaps. p p This presentation shows you what a heap is, and demonstrates.
CSC212 Data Structure - Section FG Lecture 21 Quadratic Sorting Instructor: Zhigang Zhu Department of Computer Science City College of New York.
 Chapter 10 introduces trees.  This presentation illustrates the simplest kind of trees: Complete Binary Trees. Complete Binary Trees Data Structures.
hashing1 Hashing It’s not just for breakfast anymore!
CSC212 Data Structure - Section FG Lecture 9 Linked Lists Instructor: Zhigang Zhu Department of Computer Science City College of New York.
hashing1 Hashing It’s not just for breakfast anymore!
Copyright © 1999, Carnegie Mellon. All Rights Reserved. Chapter 12 presents several common algorithms for sorting an array of integers. Two slow but simple.
L l Chapter 11 discusses several ways of storing information in an array, and later searching for the information. l l Hash tables are a common approach.
 An important topic: preconditions and postconditions.  They are a method of specifying what a function accomplishes. Preconditions and Postconditions.
L l Chapter 4 introduces the often- used linked list data structures l l This presentation shows how to implement the most common operations on linked.
 Chapter 9 introduces the technique of recursive programming.  As you have seen, recursive programming involves spotting smaller occurrences of a problem.
CSC212 Data Structure - Section FG Lecture 12 Stacks and Queues Instructor: Zhigang Zhu Department of Computer Science City College of New York.
 Chapter 8 introduces the queue data type.  Several example applications of queues are given in that chapter.  This presentation describes the queue.
Sorting Sorting Arranging items in a collection so that there is an ordering on one (or more) of the fields in the items Sort Key The field (or fields)
P An important topic: preconditions and postconditions. p They are a method of specifying what a function accomplishes. Preconditions and Postconditions.
 A Collection class is a data type that is capable of holding a group of items.  In Java, Collection classes can be implemented as a class, along with.
P An important topic: preconditions and postconditions. p They are a method of specifying what a function accomplishes. Illinois State University Department.
P p Chapter 11 discusses several ways of storing information in an array, and later searching for the information. p p Hash tables are a common approach.
P p Chapter 11 discusses several ways of storing information in an array, and later searching for the information. p p Hash tables are a common approach.
 An important topic: preconditions and postconditions.  They are a method of specifying what a method accomplishes. Preconditions and Postconditions.
CSC211 Data Structures Lecture 21 Quadratic Sorting Instructor: Prof. Xiaoyan Li Department of Computer Science Mount Holyoke College.
CS201: Data Structures and Discrete Mathematics I Hash Table.
CSC211 Data Structures Lecture 18 Heaps and Priority Queues Instructor: Prof. Xiaoyan Li Department of Computer Science Mount Holyoke College.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
P p Chapter 10 introduces trees. p p This presentation illustrates the simplest kind of trees: Complete Binary Trees. Complete Binary Trees Data Structures.
Container Classes  A container class is a data type that is capable of holding a collection of items.  In C++, container classes can be implemented as.
CSC212 Data Structure Lecture 16 Heaps and Priority Queues Instructor: George Wolberg Department of Computer Science City College of New York.
Department of Computer Engineering Faculty of Engineering, Prince of Songkla University 1 9 – Hash Tables Presentation copyright 2010 Addison Wesley Longman,
 Chapter 6 introduces templates, which are a C++ feature that easily permits the reuse of existing code for new purposes.  This presentation shows how.
Hashtables.
Linked Lists in Action Chapter 5 introduces the often-used data public classure of linked lists. This presentation shows how to implement the most common.
Search by Hashing.
Using a Stack Chapter 6 introduces the stack data type.
Hash Tables Chapter 11 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common approach.
Using a Stack Chapter 6 introduces the stack data type.
Quadratic Sorting Chapter 12 presents several common algorithms for sorting an array of integers. Two slow but simple algorithms are Selectionsort and.
Heaps Chapter 10 has several programming projects, including a project that uses heaps. This presentation shows you what a heap is, and demonstrates two.
Heaps Chapter 10 has several programming projects, including a project that uses heaps. This presentation shows you what a heap is, and demonstrates two.
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Using a Stack Chapter 6 introduces the stack data type.
Heaps Chapter 11 has several programming projects, including a project that uses heaps. This presentation shows you what a heap is, and demonstrates.
Using a Stack Chapter 6 introduces the stack data type.
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Hash Tables Chapter 11 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common approach.
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Hash Tables Chapter 11 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common approach.
CSC212 Data Structure - Section KL
Using a Queue Chapter 8 introduces the queue data type.
Using a Queue Chapter 8 introduces the queue data type.
Heaps Chapter 10 has several programming projects, including a project that uses heaps. This presentation shows you what a heap is, and demonstrates two.
Complete Binary Trees Chapter 9 introduces trees.
Hash Tables Chapter 11 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common approach.
Presentation transcript:

CSC211 Data Structures Lecture 20 Hashing Instructor: Prof. Xiaoyan Li Department of Computer Science Mount Holyoke College

Searching: Most Common Methods p Serial Search p simplest p Binary Search p better average-case performance p Search by Hashing p better average-case performance

p p Chapter 12 discusses several ways of storing information in an array, and later searching for the information. p p Hash tables are a common approach to the storing/searching problem. p p This presentation introduces hash tables. Hash Tables Data Structures and Other Objects Using C++

What is a Hash Table ? p p The simplest kind of hash table is an array of records. p p This example has 701 records. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] An array of records... [ 700]

What is a Hash Table ? p p Each record has a special field, called its key.   In this example, the key is a long integer field called Number. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [ 700] [ 4 ] Number

What is a Hash Table ? p p The number might be a person's identification number, and the rest of the record has information about the person. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [ 700] [ 4 ] Number

What is a Hash Table ? p p When a hash table is in use, some spots contain valid records, and other spots are "empty". [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number

Inserting a New Record p p In order to insert a new record, the key must somehow be converted to an array index. p p The index is called the hash value of the key. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number

Inserting a New Record p p Typical way create a hash value: [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number (Number mod 701) What is ( mod 701) ?

Inserting a New Record p p Typical way to create a hash value: [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number (Number mod 701) What is ( mod 701) ? 3

Inserting a New Record p p The hash value is used for the location of the new record. Number [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number [3]

Inserting a New Record p p The hash value is used for the location of the new record. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number

Collisions p p Here is another new record to insert, with a hash value of __ ? [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number

Collisions p p Here is another new record to insert, with a hash value of 2. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number My hash value is [2].

Collisions p p Here is another new record to insert, with a hash value of 2. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number My hash value is [2]. When a collision occurs, move forward until you find an empty spot. When a collision occurs, move forward until you find an empty spot.

Collisions p p This is called a collision, because there is already another valid record at [2]. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number When a collision occurs, move forward until you find an empty spot. When a collision occurs, move forward until you find an empty spot.

Collisions p p This is called a collision, because there is already another valid record at [2]. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number When a collision occurs, move forward until you find an empty spot. When a collision occurs, move forward until you find an empty spot.

Collisions p p This is called a collision, because there is already another valid record at [2]. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number When a collision occurs, move forward until you find an empty spot. When a collision occurs, move forward until you find an empty spot.

Collisions p p This is called a collision, because there is already another valid record at [2]. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number The new record goes in the empty spot. The new record goes in the empty spot.

A Quiz Where would you be placed in this table, if there is no collision? Use your social security number or some other favorite number. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number

Another Kind of Collision Where would you be placed in this table, if there is no collision? Use your social security number or some other favorite number. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number Number My hash value is [ ? ].

Another Kind of Collision Where would you be placed in this table, if there is no collision? Use your social security number or some other favorite number. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number Number My hash value is [700].

Another Kind of Collision Where would you be placed in this table, if there is no collision? Use your social security number or some other favorite number. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number Number My hash value is [700].

Searching for a Key p p The data that's attached to a key can be found fairly quickly. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number

Searching for a Key p p Calculate the hash value. p p Check that location of the array for the key. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number My hash value is [2]. Not me.

Searching for a Key p p Keep moving forward until you find the key, or you reach an empty spot. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number My hash value is [2]. Not me.

Searching for a Key p p Keep moving forward until you find the key, or you reach an empty spot. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number My hash value is [2]. Not me.

Searching for a Key p p Keep moving forward until you find the key, or you reach an empty spot. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number My hash value is [2]. Yes!

Searching for a Key p p When the item is found, the information can be copied to the necessary location. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number My hash value is [2]. Yes!

Deleting a Record p p Records may also be deleted from a hash table. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number Please delete me.

Deleting a Record p p Records may also be deleted from a hash table. p p But the location must not be left as an ordinary "empty spot" since that could interfere with searches. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number

Deleting a Record [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number p p Records may also be deleted from a hash table. p p But the location must not be left as an ordinary "empty spot" since that could interfere with searches. p p The location must be marked in some special way so that a search can tell that the spot used to have something in it.

Time Analysis p Without any collisions p With collisions

Time Analysis p Without any collisions p constant p With collisions p O(k) where k is the average collisions for items p k << n, size of the problem

Improving Hashing: ? p Size of the hashing table when using division hash function p prime number in the form of 4k+3 p Other hashing functions p mid-square, multiplicative p Double hashing (instead of linear probing) p the 2 nd hash function for stepping through the array p Chained hashing p using a linked list for each component of the hash table Exercise; size = 9? 10? 11?19? What is the best size given n items?

p p Hash tables store a collection of records with keys. p p The location of a record depends on the hash value of the record's key. p p When a collision occurs, the next available location is used. p p Searching for a particular key is generally quick. p p When an item is deleted, the location must be marked in a special way, so that the searches know that the spot used to be used. p p Applications? Summary

Next Lecture: Sorting

T HE E ND Presentation copyright 1997 Addison Wesley Longman, For use with Data Structures and Other Objects Using C++ by Michael Main and Walter Savitch. Some artwork in the presentation is used with permission from Presentation Task Force (copyright New Vision Technologies Inc) and Corel Gallery Clipart Catalog (copyright Corel Corporation, 3G Graphics Inc, Archive Arts, Cartesia Software, Image Club Graphics Inc, One Mile Up Inc, TechPool Studios, Totem Graphics Inc). Students and instructors who use Data Structures and Other Objects Using C++ are welcome to use this presentation however they see fit, so long as this copyright notice remains intact.