Hashing 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 18 January 2005.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.
Hash Tables CIS 606 Spring 2010.
Hashing.
CSCE 3400 Data Structures & Algorithm Analysis
Theory I Algorithm Design and Analysis (5 Hashing) Prof. Th. Ottmann.
File Processing - Indirect Address Translation MVNC1 Hashing Indirect Address Translation Chapter 11.
CS202 - Fundamental Structures of Computer Science II
Hashing Techniques.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
Lecture 11 March 5 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Chapter 5: Hashing Hash Tables
Tirgul 9 Hash Tables (continued) Reminder Examples.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Lecture 10: Search Structures and Hashing
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Data Structures Hashing Uri Zwick January 2014.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
Spring 2015 Lecture 6: Hash Tables
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Implementing Dictionaries Many applications require a dynamic set that supports dictionary-type operations such as Insert, Delete, and Search. E.g., a.
1 Symbol Tables The symbol table contains information about –variables –functions –class names –type names –temporary variables –etc.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Comp 335 File Structures Hashing.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing1 Hashing. hashing2 Observation: We can store a set very easily if we can use its keys as array indices: A: e.g. SEARCH(A,k) return A[k]
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
1 Introduction to Hashing - Hash Functions Sections 5.1, 5.2, and 5.6.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Ihab Mohammed and Safaa Alwajidi. Introduction Hash tables are dictionary structure that store objects with keys and provide very fast access. Hash table.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashing, Hashing Tables Chapter 8. Class Hierarchy.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Week 9 - Monday.  What did we talk about last time?  Practiced with red-black trees  AVL trees  Balanced add.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Hashing Fundamental Data Structures and Algorithms Margaret Reid-Miller 20 January 2005.
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Hash functions Open addressing
Hash Table.
CS202 - Fundamental Structures of Computer Science II
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
CS 3343: Analysis of Algorithms
Presentation transcript:

Hashing Fundamental Data Structures and Algorithms Margaret Reid-Miller 18 January 2005

Plan  Today  Seat assignments  Hash functions  Reading:  For today and next time: Sedgewick Chapter 14  Reminder: HW0 due on Thursday

Hash Tables An Alternative Representation for Dictionaries

Dictionary Interface An Abstract Data Type that maintains a dynamic set is a Dictionary. Crucial operations:  Insert  Find  Remove Standard operations: create, destroy, copy,…

Dictionary Interface insert: may or may not allow multiple occurrences find: membership query, often also retrieve associated information remove: may use deferred actions for speed up amortized running time

Small Universe  Suppose we have a small universe U = {0,1,2,…,M-1} of items.  We want to maintain a subset A of U.  Ease: Use an array of bits (boolean) of size M.  Insert: A[k] = 1  Find: return A[k] != 0  Remove: A[k] = 0 Operations are constant time.

Direct Access Tables  In most applications we do not store simple items but pairs (key, object).  Use an array of pointers (references to objects).  Insert: A[key] = object  Find: return A[key]  Remove: A[key] = null Again operations are constant time.

Large Universe  But what if the universe U of keys is large (and the subset is small)? e.g., names, symbol table of a compiler.  Even when the identifiers are at most 16 long there are some possibilities.

Hashing – the Idea  Map keys into integers in the range 0.. m-1, m<<M and m is the table size.  Pick a “good” mapping from keys to integers:  Easy to compute  Even distribution into the table a b c d e f l h i j k l m n o p q r s t u v w x y z

Hashing – Terminology  The array in which we store the objects is the hash table.  To enter an object into the table, we compute an index from the key.  The map from the key to the index is a hash function h: h(key) = index

Space-Time Tradeoff  A direct table has O(1) operations in the worse case. But space may be prohibitive.  Minimize space by using a sequential search.  Hashing balances space and time (on average) by changing the size of the hash table.

Problem - Collisions  Fundamental problem: Some keys map to the same location, a collision: h(x) = h(y).  Can we prevent collisions?

Pigeonhole Principal  There is no way to avoid collisions.  Since m << M there must be at least two keys that map to the same index.  The famous Pigeonhole Principle: If you put more than k items into k bins, then at least one bin contains more than one item.

Problem - Hash Function  Second problem: How do we find a suitable hash function?  Ideally, we want to distribute the keys uniformly over the hash table to minimize collisions.  That is, we want h to appear random, as though “hashing” the keys.

Hash Functions

Hashing-Efficiency  We also need to make sure h(k) is easy to compute.  Note that k could be a fairly complicated data structure. How do you turn an array of integers into a single integer? Or how about a tree?  Goal: All operations should be constant time.  But things can go badly wrong on rare occasions.

Division method  Assume wlog the keys are integers.  A simple hash function is h(k) = k mod m, where m is the table size.  The choice of m is crucial.  Good choice: m prime.

Division method  Primes are fairly dense, so this is no great restriction on the table size.  In fact, we can nearly double the hash table: 31, 61, 127,251, 509, 1021, 2039,…  Store these values in a table; don’t try to compute on the fly.

Multipication Method  Another hash function is h(x) = floor( m ( k r mod 1) ) where 0 < r < 1 is cleverly chosen.  Advantage: the choice of m is not critical  Ideally should be irrational, then the values (i r mod 1), i = 1, 2,...,M are very evenly distributed over [0,1].  Of course, there is a little problem here.

Random Input  Note that good hash functions are easy to come by if the input is random (as a bit pattern). Then we can take simply a few bits from the input (say, the first or last 16 bits).  However, such a method would fail miserably if the input shows some regularity. No good for general use.

Integer keys?  The assumption objects in U are integers has to be taken with a grain of salt.  Often we have to massage things a bit to extract numbers.  Of course, in the end everything is just one (possibly huge) number written in binary. This can be used in some languages like C to directly extract hash values from these bits.

Example: Strings public int hashCode(String key, int m) { int h = 0; for (int i=0; i<key. length(); i++) h = 37 * h + key.charAt(i); // 37 is magic number h %= m; if (h < 0) // overflow? h += m; return h; } This is really an interpretation of the string as a number in base 37 (not ordinary radix notation, though.)

Hash functions  Desired properties  Approximates a random distribution  Over the range of table index values  Efficient calculation  Approaches  Modular arithmetic  Many  Perfect hashing  When full set of input keys known in advance

Next time: Collisions