COSC 2007 Data Structures II

Slides:



Advertisements
Similar presentations
COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III.
Advertisements

The ADT Hash Table What is a table?
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Hashing as a Dictionary Implementation
Dictionaries and Their Implementations Chapter 18 Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013.
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Using arrays – Example 2: names as keys How do we map strings to integers? One way is to convert each letter to a number, either by mapping them to 0-25.
Hashing Techniques.
Dictionaries and Their Implementations
Maps, Dictionaries, Hashtables
Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Lecture 11 March 5 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
Hash Tables and Associative Containers CS-212 Dick Steflik.
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter 48 Hashing.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
HASHING PROJECT 1. SEARCHING DATA STRUCTURES Consider a set of data with N data items stored in some data structure We must be able to insert, delete.
Hashing as a Dictionary Implementation Chapter 19.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
CHAPTER 8 SEARCHING CSEB324 DATA STRUCTURES & ALGORITHM.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Tables. 2 Exercise 2 /* Exercise 1 */ void mystery(int n) { int i, j, k; for (i = 1; i
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
COSC 1030 Lecture 10 Hash Table. Topics Table Hash Concept Hash Function Resolve collision Complexity Analysis.
Copyright © Curt Hill Hashing A quick lookup strategy.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Data Structure & Algorithm Lecture 8 – Hashing JJCAO Most materials are stolen from Prof. Yoram Moses’s course.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,
Dictionaries and Their Implementations Chapter 18 Data Structures and Problem Solving with C++: Walls and Mirrors, Frank Carrano, © 2012.
Hash Tables ADT Data Dictionary, with two operations – Insert an item, – Search for (and retrieve) an item How should we implement a data dictionary? –
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
Searching Tables Table: sequence of (key,information) pairs (key,information) pair is a record key uniquely identifies information, so no duplicate records.
1 the BSTree class  BSTreeNode has same structure as binary tree nodes  elements stored in a BSTree are a key- value pair  must be a class (or a struct)
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Chapter 27 Hashing Jung Soo (Sue) Lim Cal State LA.
CSCI 210 Data Structures and Algorithms
Dictionaries 9/14/ :35 AM Hash Tables   4
Hash Table.
Chapter 28 Hashing.
Chapter 21 Hashing: Implementing Dictionaries and Sets
Dictionaries and Their Implementations
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
Presentation transcript:

COSC 2007 Data Structures II April 20, 2017 COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables III Advanced Impelmentation of Tables

Topics Hashing Open hashing Definition Hash function Key Hash value collision Open hashing

Common Problem A common pattern in many programs is to store and look up data Find student record, given ID# Find person address, given phone # Because it is so common, many data structures for it have been investigated How?

Phone Number Problem Problem: phone company wants to implement caller ID. given a phone number (the key), look up person’s name or address(the data) lots of phone numbers (P=107-1) in a given area code only a small fraction of them are in use Nobody has a phone number :0000000 or 0000001

Comparison of Time Complexity (average) Operation Insertion Deletion Search Unsorted Array O(1) O(n) O(n) Unsorted reference O(1) O(n) O(n) Sorted Array O(n) O(n) O(logn) Sorted reference O(n) O(n) O(n) BST O(logn) O(logn) O(logn) Can we do better than O(logn)?

Can we do better than O(log N)? All previous searching techniques require a specified amount of time (O(logn) or O(n)) Time usually depends on number of elements (n) stored in the table In some situations searching should be almost instantaneous -- how? Examples 911 emergency system Air-traffic control system

Can we do better than O(log N)? Answer: Yes … sort of, if we're lucky. General idea: take the key of the data record you’re inserting, and use that number directly as the item number in a list (array). Search is O(1), but huge amount of space wasted. – how to solve this? Null 259-1623 Xu 000-0000 000-0001 000-0002 ••• Sub 263-3049

Hashing Basic idea: Don't use the data value directly. Given an array of size B, use a hash function, h(x), which maps the given data record x to some (hopefully) unique index (“bucket”) in the array. 1 h x h(x) B-1

What is Hash Table? The simplest kind of hash table is an array of records. This example has 101 records. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [100] . . . An array of records

What is Hash Table? [ 4 ] Each record has a special Number 256-2879 8888 Queen St. Linda Kim Each record has a special field, called its key. In this example, the key is a long integer field called Number. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [100] . . . An array of records

What is Hash Table? The number is person's phone number, [ 4 ] Number 256-2879 The number is person's phone number, and the rest is person name or address. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [100] . . . An array of records

What is Hash Table? When a hash table is in use, some spots contain valid records, and other spots are "empty". [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [100] Number 281942902 Number 233667136 Number 506643548 . . . Number 155778322 An array of records

Inserting a New Record? In order to insert a new record, Number 265-1556 In order to insert a new record, the key must somehow be converted to an array index. The index is called the hash value of the key. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [100] Number 281942902 Number 233667136 Number 506643548 . . . Number 155778322 An array of records

Inserting a New Record? Typical way to create a hash value: Number 265-1556 Typical way to create a hash value: (Number mod 101) What is (265-1556 mod 101) ? [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [100] Number 281942902 Number 233667136 Number 506643548 . . . Number 155778322 An array of records

Inserting a New Record? 3 Typical way to create a hash value: Number 265-1556 Typical way to create a hash value: (Number mod 101) What is (2651556 mod 101) ? 3 [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [100] Number 281942902 Number 233667136 Number 506643548 . . . Number 155778322 An array of records

Inserting a New Record? [3] The hash value is used for Number 265-1556 Inserting a New Record? The hash value is used for the location of the new record. [3] [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [100] Number 281942902 Number 233667136 Number 506643548 . . . Number 155778322 An array of records

Inserting a New Record? The hash value is used for the location of the new record. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [100] Number 281942902 Number 233667136 Number 580625685 Number 506643548 . . . Number 155778322 An array of records

What is Hashing? What is hashing? Each item has a unique key. Use a large array called a Hash Table. Use a Hash Function. Hashing is like indexing in that it involves associating a key with a relative record address. Hashing, however, is different from indexing in two important ways: With hashing, there is no obvious connection between the key and the location. With hashing two different keys may be transformed to the same address. A Hash function is a function h(K) which transforms a key K into an address.

What is Hashing? An address calculator (hashing function) is used to determine the location of the item Address Calculator (Hash function) Array (Hash table) Search key N-1

What Can Be Hashed? Anything! Can hash on numbers, strings, structures, etc. Java defines a hashing method for general objects which returns an integer value.

Where do we use Hashing? Databases (phone book, student name list). Spell checkers. Computer chess games. Compilers.

Hashing and Tables Hashing gives us another implementation of Table ADT Hashing operations Initialize all locations in Hash Table are empty. Insert Search Delete Hash the key; this gives an index; use it to find the value stored in the table in O(1) Great improvement over Log N.

Hashing Insert pseudocode Retrieval pseudocode table[i]=newItem tableInsert (newItem) i = the array index that the address calculator gives you for the new item’s search key table[i]=newItem Retrieval pseudocode tableRerieve (searchKey) i = array index for searchKey given by the hash function if (table[i].getKey( ) == searchKey) return table[i] else return null

Hashing Deletion pseudocode tableDelete (searchKey) i = array index for searchKey given by the hash function success=(tabke[I].getKey() equals searchKey if (success) Delete the item from table[i] Return success

Hash Tables Table size Mapping Main problems Entries are numbered 0 to TSIZE-1 Mapping Simple to compute Ideally 1-1: not possible Even distribution Main problems Choosing table size Choosing a good hash function What to do on collisions

How to choose the Table Size? 1 2 3 4 5 6 7 8 9 110 210 320 460 520 600 10 210,320 TSIZE = 11 H (Key) = Key mod TSIZE TSIZE = 10 20 22 54 15 26 49 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 110 210 320 460 520 600

How to choose a Hashing Function? The hash function we choose depends on the type of the key field (the key we use to do our lookup). Finding a good one can be hard Rule Be easy to calculate. Use all of the key. Spread the keys uniformly.

How to choose a Hashing Function? Example: Student Ids (integers) h(idNumber) = idNumber % B eg. h(678921) = 678921 % 100 = 21 Names (char strings) h(name) = (sum over the ascii values) % B eg. h(“Bill”) = (66+105+108+108) % 101 = 86

Collision Here is another new record to Number 2641455 Collision Here is another new record to insert, with a hash value of 2. My hash value is [2]. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [100] Number 281942902 Number 233667136 Number 580625685 Number 506643548 . . . Number 155778322 An array of records

What to do on collisions? Open hashing (separate chaining) Close hashing (open address) Linear Probing Quadratic Probing Double hashing

Open hashing (separate chaining) Keep a list of all elements that hash to the same value. 1 2 3 4 5 6 7 8 9 81 64 16 36 49 25 1 4 9 16 25 36 49 64 81

Open hashing (separate chaining) Secondary Data Structure List Search tree another hash table We expect small collision Simple Small overhead 1 2 3 4 5 6 7 8 9 81 64 16 36 49 25

Operations with Chaining Insert with chaining Apply hash function to get a position. Insert key into the Linked List at this position. Search with chaining Search the Linked List at this position.

Open hashing (separate chaining)   public class ChainNode { Private KeyedItem item; private ChainNode next; public ChainNode(KeyedItem newItem, ChainNode nextNode) { item = newItem; next= nextNode; // set and get methods } } // end of ChainNode

Open hashing (separate chaining)   public class HashTable { private final int HASH_TABLE_SIZE = 101; // size of hash table private ChainNode [] table; //hash table private int size; //size of hash table public HashTable() { table = new ChainNode [HASH_TABLE_SIZE]; size =0; } public bool tableIsEmpty() { return size ==0;} public int tableLength() { return size;} public void tableInsert(KeyedItem newItem) throws HashException {} public boolean tableDelete(Comparable searchKey) {} public KeyedIten tableRetrieve(Comparable searchKey) {} } // end of hashtable

Open hashing (separate chaining) tableInsert(newItem) if (table is not full) { searchKey= the search key of newItem i = hashIndex (searchKey) node= reference to a new node containing newItem node.setNext (table[I]); table[I] = node } else //table full throw new HashException ()

Open hashing (separate chaining) tableRetrieve (searchKey) i = hashIndex (searchKey) node= table [I]; while ((node !=null)&& node.getItem().getKey()!= searchKey ) node=getNext () if (node !=null) return node.getITem() else return null

Evaluation of Chaining Disadvantages of Chaining More complex to implement. Search and Delete are harder. We need to know: The number of elements in the table (N); the number of buckets (B); the quality of the hash function Worse case (O(n)) for searching Advantage of Chaining Insertions is easy and quick. Allows more records to be stored. The size of table is dynamic

Review A(n) ______ maps the search key of a table item into a location that will contain the item. hash function hash table AVL tree heap

Review A hash table is a(n) ______. stack queue array list

Review The condition that occurs when a hash function maps two or more distinct search keys into the same location is called a(n) ______. disturbance collision Rotation congestion

Review ______ is a collision-resolution scheme that searches the hash table sequentially, starting from the original location specified by the hash function, for an unoccupied location. Linear probing Quadratic probing Double hashing Separate chaining

Review ______ is a collision-resolution scheme that searches the hash table for an unoccupied location beginning with the original location that the hash function specifies and continuing at increments of 12, 22, 32, and so on. Linear probing Double hashing Quadratic probing Separate chaining

Review ______ is a collision-resolution scheme that uses an array of linked lists as a hash table. Linear probing Double hashing Quadratic probing Separate chaining

Review The load factor of a hash table is calculated as ______. table size + current number of table items table size – current number of table items current number of table items * table size current number of table items / table size