Comp 122, Spring 2004 Hash Tables – 1. hashtables - 2 Lin / Devi Comp 122, Fall 2003 Dictionary Dictionary: »Dynamic-set data structure for storing items.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

David Luebke 1 6/7/2014 CS 332: Algorithms Skip Lists Introduction to Hashing.
David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.
FIFO Queues CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
Hash Tables Many of the slides are from Prof. Plaisteds resources at University of North Carolina at Chapel Hill.
Hash Tables.
Hash Tables CIS 606 Spring 2010.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Lecture 6 : Dynamic Hashing Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University.
Nov 12, 2009IAT 8001 Hash Table Bucket Sort. Nov 12, 2009IAT 8002  An array in which items are not stored consecutively - their place of storage is calculated.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
HashMaps. Overview What are HashMaps? Implementing DictionaryADT with HashMaps HashMaps 2/16.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables.
Maps, Dictionaries, Hashtables
Comp 122, Spring 2004 Keys into Buckets: Lower bounds, Linear-time sort, & Hashing.
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter 48 Hashing.
Review for Test 2 i206 Fall 2010 John Chuang. 2 Topics  Operating System and Memory Hierarchy  Algorithm analysis and Big-O Notation  Data structures.
Design and Analysis of Algorithms - Chapter 71 Hashing b A very efficient method for implementing a dictionary, i.e., a set with the operations: – insert.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Algorithms and Data Structures Hash Tables and Associative Arrays.
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
ADTs and C++ Ch 2,3,4 May 12, 2015 Abstract Data Type (ADT) An abstract data type is: 1.A collection of data items (storage structures) 2.Basic operations.
Jan 12, 2012 Introduction to Collections. 2 Collections A collection is a structured group of objects Java 1.2 introduced the Collections Framework Collections.
Arrays.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Design and Analysis of Algorithms Hash Tables Haidong Xue Summer 2012, at GSU.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
P p Chapter 11 discusses several ways of storing information in an array, and later searching for the information. p p Hash tables are a common approach.
Hashing Hashing is another method for sorting and searching data.
HASHING PROJECT 1. SEARCHING DATA STRUCTURES Consider a set of data with N data items stored in some data structure We must be able to insert, delete.
1 Introduction to Hashing - Hash Functions Sections 5.1, 5.2, and 5.6.
CS201: Data Structures and Discrete Mathematics I Hash Table.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Foundations of Data Structures Practical Session #10 Hash Tables.
Author: Takdir, S.ST. © Sekolah Tinggi Ilmu Statistik.
Hashing is a method to store data in an array so that sorting, searching, inserting and deleting data is fast. For this every record needs unique key.
3 Data. Software And Data Data Data element – a single, meaningful unit of data. Name Social Security Number Data structure – a set of related data elements.
1 Joe Meehean.  List of names  Set of names  Map names as keys phone #’s as values Phil Bill Will Phil Bill Will Phil Bill Will Phil: Bill:
Interface: (e.g. IDictionary) Specification class Appl{ ---- IDictionary dic; dic= new XXX(); application class: Dictionary SortedDictionary ----
Collections Data structures in Java. OBJECTIVE “ WHEN TO USE WHICH DATA STRUCTURE ” D e b u g.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Data Structure & Algorithm Lecture 8 – Hashing JJCAO Most materials are stolen from Prof. Yoram Moses’s course.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Prof. amr Goneid, AUC1 CSCE 110 PROGRAMMING FUNDAMENTALS WITH C++ Prof. Amr Goneid AUC Part 15. Dictionaries (1): A Key Table Class.
Hash Tables ADT Data Dictionary, with two operations – Insert an item, – Search for (and retrieve) an item How should we implement a data dictionary? –
Nov 22, 2010IAT 2651 Java Collections. Nov 22, 2010IAT 2652 Data Structures  With a collection of data, we often want to do many things –Organize –Iterate.
Dictionaries and Hashing CSCI 3333 Data Structures.
Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining.
Tables, hashtables, relations and dictionaries Dr. Andrew Wallace PhD BEng(hons) EurIng
 Array is a data structure were elements are stored in consecutive memory location.in the array once the memory is allocated.it cannot be extend any more.
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
Computer Science: A Structured Programming Approach Using C1 Objectives ❏ To understand the relationship between arrays and pointers ❏ To understand the.
Container Classes. How do we do better? Tough to delete correctly: startLocation.
3/19/2016 5:40 PMLinked list1 Array-based List v.s. Linked List Yumei Huo Department of Computer Science College of Staten Island, CUNY Spring 2009.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Chapter 27 Hashing Jung Soo (Sue) Lim Cal State LA.
Course Developer/Writer: A. J. Ikuomola
EEE2108: Programming for Engineers Chapter 8. Hashing
Array Array is a variable which holds multiple values (elements) of similar data types. All the values are having their own index with an array. Index.
Chapter 28 Hashing.
Keys into Buckets: Lower bounds, Linear-time sort, & Hashing
Chapter 21 Hashing: Implementing Dictionaries and Sets
Presentation transcript:

Comp 122, Spring 2004 Hash Tables – 1

hashtables - 2 Lin / Devi Comp 122, Fall 2003 Dictionary Dictionary: »Dynamic-set data structure for storing items indexed using keys. »Supports operations Insert, Search, and Delete. »Applications: Symbol table of a compiler. Memory-management tables in operating systems. Large-scale distributed systems. Hash Tables: »Effective way of implementing dictionaries. »Generalization of ordinary arrays.

hashtables - 3 Lin / Devi Comp 122, Fall 2003 Direct-address Tables Direct-address Tables are ordinary arrays. Facilitate direct addressing. »Element whose key is k is obtained by indexing into the k th position of the array. Applicable when we can afford to allocate an array with one position for every possible key. »i.e. when the universe of keys U is small. Dictionary operations can be implemented to take O(1) time. »Details in Sec

hashtables - 4 Lin / Devi Comp 122, Fall 2003 Hash Tables Notation: »U – Universe of all possible keys. »K – Set of keys actually stored in the dictionary. » |K| = n. When U is very large, »Arrays are not practical. »|K| << |U|. Use a table of size proportional to |K| – The hash tables. »However, we lose the direct-addressing ability. »Define functions that map keys to slots of the hash table.

hashtables - 5 Lin / Devi Comp 122, Fall 2003 Hashing Hash function h: Mapping from U to the slots of a hash table T[0..m–1]. h : U {0,1,…, m–1} With arrays, key k maps to slot A[k]. With hash tables, key k maps or hashes to slot T[h[k]]. h[k] is the hash value of key k.

hashtables - 6 Lin / Devi Comp 122, Fall 2003 Hashing 0 m–1 h(k1)h(k1) h(k4)h(k4) h(k 2 )=h(k 5 ) h(k3)h(k3) U (universe of keys) K (actual keys) k1k1 k2k2 k3k3 k5k5 k4k4 collision

hashtables - 7 Lin / Devi Comp 122, Fall 2003 Issues with Hashing Multiple keys can hash to the same slot – collisions are possible. »Design hash functions such that collisions are minimized. »But avoiding collisions is impossible. Design collision-resolution techniques. Search will cost Ө(n) time in the worst case. »However, all operations can be made to have an expected complexity of Ө(1).

hashtables - 8 Lin / Devi Comp 122, Fall 2003 Methods of Resolution Chaining: »Store all elements that hash to the same slot in a linked list. »Store a pointer to the head of the linked list in the hash table slot. Open Addressing: »All elements stored in hash table itself. »When collisions occur, use a systematic (consistent) procedure to store elements in free slots of the table. k2k2 0 m–1 k1k1 k4k4 k5k5 k6k6 k7k7 k3k3 k8k8

hashtables - 9 Lin / Devi Comp 122, Fall 2003 Collision Resolution by Chaining 0 m–1 h(k 1 )=h(k 4 ) h(k 2 )=h(k 5 )=h(k 6 ) h(k 3 )=h(k 7 ) U (universe of keys) K (actual keys) k1k1 k2k2 k3k3 k5k5 k4k4 k6k6 k7k7 k8k8 h(k8)h(k8) X X X

hashtables - 10 Lin / Devi Comp 122, Fall 2003 k2k2 Collision Resolution by Chaining 0 m–1 U (universe of keys) K (actual keys) k1k1 k2k2 k3k3 k5k5 k4k4 k6k6 k7k7 k8k8 k1k1 k4k4 k5k5 k6k6 k7k7 k3k3 k8k8

hashtables - 11 Lin / Devi Comp 122, Fall 2003 Hashing with Chaining Dictionary Operations: Chained-Hash-Insert (T, x) »Insert x at the head of list T[h(key[x])]. »Worst-case complexity – O(1). Chained-Hash-Delete (T, x) »Delete x from the list T[h(key[x])]. »Worst-case complexity – proportional to length of list with singly-linked lists. O(1) with doubly-linked lists. Chained-Hash-Search (T, k) »Search an element with key k in list T[h(k)]. »Worst-case complexity – proportional to length of list.

hashtables - 12 Lin / Devi Comp 122, Fall 2003 Analysis on Chained-Hash-Search Load factor =n/m = average keys per slot. »m – number of slots. » n – number of elements stored in the hash table. Worst-case complexity: (n) + time to compute h(k). Average depends on how h distributes keys among m slots. Assume »Simple uniform hashing. Any key is equally likely to hash into any of the m slots, independent of where any other key hashes to. »O(1) time to compute h(k). Time to search for an element with key k is (|T[h(k)]|). Expected length of a linked list = load factor = = n/m.

hashtables - 13 Lin / Devi Comp 122, Fall 2003 Expected Cost of an Unsuccessful Search Proof: Any key not already in the table is equally likely to hash to any of the m slots. To search unsuccessfully for any key k, need to search to the end of the list T[h(k)], whose expected length is α. Adding the time to compute the hash function, the total time required is Θ(1+α). Theorem: An unsuccessful search takes expected time Θ(1+α).

hashtables - 14 Lin / Devi Comp 122, Fall 2003 Expected Cost of a Successful Search Proof: The probability that a list is searched is proportional to the number of elements it contains. Assume that the element being searched for is equally likely to be any of the n elements in the table. The number of elements examined during a successful search for an element x is 1 more than the number of elements that appear before x in xs list. »These are the elements inserted after x was inserted. Goal: »Find the average, over the n elements x in the table, of how many elements were inserted into xs list after x was inserted. Theorem: A successful search takes expected time Θ(1+α).

hashtables - 15 Lin / Devi Comp 122, Fall 2003 Expected Cost of a Successful Search Proof (contd): Let x i be the i th element inserted into the table, and let k i = key[x i ]. Define indicator random variables X ij = I{h(k i ) = h(k j )}, for all i, j. Simple uniform hashing Pr{h(k i ) = h(k j )} = 1/m E[X ij ] = 1/m. Expected number of elements examined in a successful search is: Theorem: A successful search takes expected time Θ(1+α). No. of elements inserted after x i into the same slot as x i.

hashtables - 16 Lin / Devi Comp 122, Fall 2003 Proof – Contd. (linearity of expectation) Expected total time for a successful search = Time to compute hash function + Time to search = O(2+ /2 – /2n) = O(1+ ).

hashtables - 17 Lin / Devi Comp 122, Fall 2003 Expected Cost – Interpretation If n = O(m), then =n/m = O(m)/m = O(1). Searching takes constant time on average. Insertion is O(1) in the worst case. Deletion takes O(1) worst-case time when lists are doubly linked. Hence, all dictionary operations take O(1) time on average with hash tables with chaining.

hashtables - 18 Lin / Devi Comp 122, Fall 2003 Good Hash Functions Satisfy the assumption of simple uniform hashing. »Not possible to satisfy the assumption in practice. Often use heuristics, based on the domain of the keys, to create a hash function that performs well. Regularity in key distribution should not affect uniformity. Hash value should be independent of any patterns that might exist in the data. »E.g. Each key is drawn independently from U according to a probability distribution P: k:h(k) = j P(k) = 1/m for j = 0, 1, …, m–1. »An example is the division method.

hashtables - 19 Lin / Devi Comp 122, Fall 2003 Keys as Natural Numbers Hash functions assume that the keys are natural numbers. When they are not, have to interpret them as natural numbers. Example: Interpret a character string as an integer expressed in some radix notation. Suppose the string is CLRS: »ASCII values: C=67, L=76, R=82, S=83. »There are 128 basic ASCII values. »So, CLRS = 67· · · ·128 0 = 141,764,947.

hashtables - 20 Lin / Devi Comp 122, Fall 2003 Division Method Map a key k into one of the m slots by taking the remainder of k divided by m. That is, h(k) = k mod m Example: m = 31 and k = 78 h(k) = 16. Advantage: Fast, since requires just one division operation. Disadvantage: Have to avoid certain values of m. »Dont pick certain values, such as m=2 p »Or hash wont depend on all bits of k. Good choice for m: »Primes, not too close to power of 2 (or 10) are good.

hashtables - 21 Lin / Devi Comp 122, Fall 2003 Multiplication Method If 0 < A < 1, h(k) = m (kA mod 1) = m (kA – kA ) where kA mod 1 means the fractional part of kA, i.e., kA – kA. Disadvantage: Slower than the division method. Advantage: Value of m is not critical. »Typically chosen as a power of 2, i.e., m = 2 p, which makes implementation easy. Example: m = 1000, k = 123, A … h(k) = 1000(123 · mod 1) = 1000 · = 18.

hashtables - 22 Lin / Devi Comp 122, Fall 2003 Multiplication Mthd. – Implementation Choose m = 2 p, for some integer p. Let the word size of the machine be w bits. Assume that k fits into a single word. (k takes w bits.) Let 0 < s < 2 w. (s takes w bits.) Restrict A to be of the form s/2 w. Let k s = r 1 ·2 w + r 0. r 1 holds the integer part of kA ( kA ) and r 0 holds the fractional part of kA (kA mod 1 = kA – kA ). We dont care about the integer part of kA. »So, just use r 0, and forget about r 1.

hashtables - 23 Lin / Devi Comp 122, Fall 2003 Multiplication Mthd – Implementation k s = A·2 w r0r0 r1r1 w bits h(k)h(k) extract p bits · We want m (kA mod 1). We could get that by shifting r 0 to the left by p = lg m bits and then taking the p bits that were shifted to the left of the binary point. But, we dont need to shift. Just take the p most significant bits of r 0. binary point

hashtables - 24 Lin / Devi Comp 122, Fall 2003 How to choose A? Another example: On board. How to choose A? »The multiplication method works with any legal value of A. »But it works better with some values than with others, depending on the keys being hashed. »Knuth suggests using A ( 5 – 1)/2.