Hash Tables COT4810 Ken Pritchard 2 Sep 04.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.
Hash Tables CSC220 Winter What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Data Structures and Algorithms
Hash Tables.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Hash Tables,
Hash Tables CIS 606 Spring 2010.
CSE 1302 Lecture 23 Hashing and Hash Tables Richard Gesick.
HASH TABLE. HASH TABLE a group of people could be arranged in a database like this: Hashing is the transformation of a string of characters into a.
CSCE 3400 Data Structures & Algorithm Analysis
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Hashing21 Hashing II: The leftovers. hashing22 Hash functions Choice of hash function can be important factor in reducing the likelihood of collisions.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables.
1 Hash Tables Gordon College CS Hash Tables Recall order of magnitude of searches –Linear search O(n) –Binary search O(log 2 n) –Balanced binary.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
Hashing Table Professor Sin-Min Lee Department of Computer Science.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Comp 335 File Structures Hashing.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 5. Abstract Data Structures & Algorithms 5.2 Static Data Structures.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hashing. Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string.
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Hash Table.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Presentation transcript:

Hash Tables COT4810 Ken Pritchard 2 Sep 04

Overview History Description Details Examples Uses

History The term hashing was apparently developed through an analogy to compare the way the key would be mangled in a hash function to the standard meaning of hashing being to chop something up. 1953 – Hashing with chaining was mentioned in an internal IBM memorandum. 1956 – Hashing was mentioned in a publication by Arnold I. Dumey, Computers and Automation 1968 – Random probing with secondary clustering was described by Robert Morris in CACM 11 1973 – Donald Knuth wrote The Art of Computer Programming in which he describes and analyzes hashing in depth.

Description A hash table is a data structure that stores things and allows insertions, lookups, and deletions to be performed in O(1) time. An algorithm converts an object, typically a string, to a number. Then the number is compressed according to the size of the table and used as an index. There is the possibility of distinct items being mapped to the same key. This is called a collision and must be resolved.

Hash Code Generator Compression Key   Number   Index Smith  7 Bob Smith 123 Main St. Orlando, FL 327816 407-555-1111 bob@myisp.com 1 2 3 4 5 6 7 8 9

Collision Resolution There are two kinds of collision resolution: 1 – Chaining makes each entry a linked list so that when a collision occurs the new entry is added to the end of the list. 2 – Open Addressing uses probing to discover an empty spot. With chaining, the table does not have to be resized. With open addressing, the table must be resized when the number of elements is larger than the capacity.

Collision Resolution - Chaining With chaining, each entry is the head of a (possibly empty) linked list. When a new object hashes to an entry, it is added to the end of the list. A particularly bad hash function could create a table with only one non-empty list that contained all the elements. Uniform hashing is very important with chaining. The load factor of a chained hash table indicates how many objects should be found at each location, provided reasonably uniform hashing. The load factor LF = n/c where n is the number of objects stored and c is the capacity of the table. With uniform hashing, a load factor of 2 means we expect to find no more than two objects in one slot. A load factor less than 1 means that we are wasting space.

Chaining Smith  7 1 2 3 4 Bob Smith 123 Main St. Orlando, FL 327816 1 2 3 4 Bob Smith 123 Main St. Orlando, FL 327816 407-555-1111 bob@myisp.com Jim Smith 123 Elm St. Orlando, FL 327816 407-555-2222 jim@myisp.com 5 6 7 8 9

Collision Resolution – Open Addressing With open addressing, the table is probed for an open slot when the first one already has an element. There are different types of probing, some more complicated than others. The simplest type is to keep increasing the index by one until an open slot is found. The load factor of an open addressed hash table can never be more than 1. A load factor of .5 indicates that the table is half full. With uniform hashing, the number of positions that we can expect to probe is 1/(1 – LF). For a table that is half full, LF = .5, we can expect to probe 1/(1 - .5) = 2 positions. Note that for a table that is 95% full, we can expect to probe 20 positions.

Probing Smith  7 Bob Smith 123 Main St. Orlando, FL 327816 Bob Smith 123 Main St. Orlando, FL 327816 407-555-1111 bob@myisp.com 1 2 3 4 5 6 7 8 Jim Smith 123 Elm St. Orlando, FL 327816 407-555-2222 jim@myisp.com 9

Hash Functions Hash Functions perform two separate functions: 1 – Convert the string to a key. 2 – Constrain the key to a positive value less than the size of the table. The best strategy is to keep the two functions separate so that there is only one part to change if the size of the table changes.

Hash Functions - Key Generation There are different algorithms to convert a string to a key. There is usually a trade off between efficiency and completeness. Some very efficient algorithms use bit shifting and addition to compute a fairly uniform hash. Some less efficient algorithms give a weight to the position of each character using a base related to the ASCII codes. The key is guaranteed unique, but can be very long. Also, some of the precision gained will be lost in compression.

Hash Functions - Compression The compression technique generally falls into either a division method or a multiplication method. In the division method, the index is formed by taking the remainder of the key divided by the table size. When using the division method, ample consideration must be given to the size of the table. The best choice for table size is usually a prime number not too close to a power of 2.

Hash Functions - Compression In the multiplication method, the key is multiplied by a constant A in the range: 0 < A < 1 Extract the fractional part of the result and multiply it by the size of the table to get the index. A is usually chosen to be 0.618, which is approximately: (Ö5 – 1)/2 The entire computation for the index follows: index = floor(table_size * ((key * A) % 1))

Hash Functions - Example int hash(char * key) { int val = 0; while(*key != '\0') val = (val << 4) + (*key); key++; } return val; int compress(int index, int size) return abs(index % size);

Dynamic Modification If the total number of items to store in the table are not known in advance, the table may have to be modified. In any event, the insert and delete functions should track the number of items. If the table uses chaining, it will not have to be modified unless the user wants to specify a maximum load value so that performance does not deteriorate. If the table uses open addressing, the table will have to resized when full before another item can be stored; however, it might be resized before that point based on the load factor so that performance does not deteriorate.

Hash Table Uses Compilers use hash tables for symbol storage. The Linux Kernel uses hash tables to manage memory pages and buffers. High speed routing tables use hash tables. Database systems use hash tables.

Example C Example C# Example Java Example

Summary A hash table is a convenient data structure for storing items that provides O(1) access time. The concepts that drive selection of the key generation and compression functions can be complicated, but there is a lot of research information available. There are many areas where hash tables are used. Modern programming languages typically provide a hash table implementation ready for use in applications.

References Knuth, Donald A. The Art of Computer Programming. Philippines: Addison-Wesley Publishing Company, 1973. Loudon, Kyle. Mastering Algorithms with C. Sebastopol: O’Reilly & Associates, 1999 Watt, David A., and Deryck F. Brown. Java Collections. West Sussex: John Wiley & Sons, 2001 Dewdney, A. K. The New Turing Omnibus. New York: Henry Holt and Company, 2001