Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.
Hash Tables.
Hash Tables CIS 606 Spring 2010.
Hashing.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Hashing CS 3358 Data Structures.
Data Structures – LECTURE 11 Hash tables
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
Hash Tables How well do hash tables support dynamic set operations? Implementations –Direct address –Hash functions Collision resolution methods –Universal.
Dictionaries and Hash Tables1  
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
Lecture 11 March 5 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Tirgul 9 Hash Tables (continued) Reminder Examples.
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Tirgul 8 Hash Tables (continued) Reminder Examples.
Lecture 10: Search Structures and Hashing
Hashing General idea: Get a large array
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
Data Structures Hashing Uri Zwick January 2014.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
Comp 249 Programming Methodology Chapter 15 Linked Data Structure - Part B Dr. Aiman Hanna Department of Computer Science & Software Engineering Concordia.
Spring 2015 Lecture 6: Hash Tables
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
SEARCHING, SORTING, AND ASYMPTOTIC COMPLEXITY Lecture 12 CS2110 – Fall 2009.
Data Structures Week 6 Further Data Structures The story so far  We understand the notion of an abstract data type.  Saw some fundamental operations.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Data Structures Hash Tables. Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols:
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
CSC 211 Data Structures Lecture 13
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Amihood Amir Bar Ilan University Direct Addressing In old days: LD 1,1 LD 2,2 AD 1,2 ST 1,3 Today: C
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Ihab Mohammed and Safaa Alwajidi. Introduction Hash tables are dictionary structure that store objects with keys and provide very fast access. Hash table.
Hashing Fundamental Data Structures and Algorithms Margaret Reid-Miller 18 January 2005.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Linear Sorting. Comparison based sorting Any sorting algorithm which is based on comparing the input elements has a lower bound of Proof, since there.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Hash table CSC317 We have elements with key and satellite data
CSCI 210 Data Structures and Algorithms
CS 332: Algorithms Hash Tables David Luebke /19/2018.
EEE2108: Programming for Engineers Chapter 8. Hashing
Hash Table.
Hash Tables – 2 Comp 122, Spring 2004.
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
CS 3343: Analysis of Algorithms
DATA STRUCTURES-COLLISION TECHNIQUES
DATA STRUCTURES-COLLISION TECHNIQUES
Hash Tables – 2 1.
Presentation transcript:

Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2

We want to manage a set of n elements with the dictionary operations Insert, Search and Delete. Each element has a unique key from a universe U of possible keys, |U| >> n Hash table – an array of size m, m << |U| Hash function – a function that maps a key to a cell in the hash table. Required property – in order to work fast, the elements in the hash table should be equally distributed among the cells. Can you find a hash function that has this property on any input? No – since |U| >> m, there is always a bad input

Quick-sort –the pivot’s position is fixed –there are good inputs and bad inputs. Randomized Quick-sort – uniform distribution on all the possible pivots –No more inputs discrimination – all the inputs have the same probability of working fast.

Starting point: for every hash function, there is a “really bad” input. A possible solution: just as in quick sort, randomize the algorithm instead of looking at a random input. The logic behind it: There is no bad input. For every input there is a small chance of choosing a bad hash function for this input, i.e. a function that will cause many collisions. Our family of hash function Specific hash function h 10,5 () h 2,13 () h 24,82 () h 68,53 ()

Order of execution The order of execution: 1.The input is fixed (everything that will be fed into the program is considered an input and is fixed now). 2.The program is run 3.The hash function is chosen (randomly) and remains fixed for the entire duration of the program run.

Ideal case - take 1 What is our “ideal case”? (that we always use when trying to analyze good hash functions) A random choice of index. First try we will call a function good if: For every key k, and index i it holds that P h [ h(k) = i] = 1/m Is that good enough? Look at { h i (·) | for every key k : h i (k) = i } This is obviously a bad family (everything collides).

Ideal case - take 2 We want that the probability of collision will be as in the random case. For every pair of keys k 1 ≠ k 2 and pair of indices i 1, i 2 it holds that P h [ h(k 1 ) = i 1 and h(k 2 ) = i 2 ] = 1/m 2 What is the probability of collision? Sum the probabilities of collision in cell i for each i. This means m*1/m 2 = 1/m This is enough to ensure the expected number of collisions is as in the random case.

Ensuring good average performance The chance that two keys will fall to the same slot is 1/m - just like if the hash function was truly random! Claim: When using a universal hash family H, the average number of look-ups of any hash operation is n/m (as in the random case)

VS. Hash table - If we have an estimation of n, the number of elements inserted to the table, we can choose the size of the table to be proportional to n. Then, we will have constant time performance - no matter how many elements we have: 10 6, 10 8, 10 10, or more... Balanced Tree - The performance of a balanced tree, O(log n), is affected by the number of elements we have! As we have more elements, we have slower operations. For very large numbers, like 10 10, this makes a difference.

Constructing a universal family Choose p - a prime larger than all keys. For any a,b in Z p ={0,...,p-1} denote fix a hash function: h a,b (k) = (a * k + b) mod p The universal family: H p = { h a,b (·) | a,b in Z p } Theorem: H p is a universal family of hash functions. Proof: for each k 1 ≠ k 2 and each i 1,i 2 there is exactly one solution (a,b) for the equations : a*k 1 + b = i 1 and a*k 2 + b = i 2.

Average over inputs – exact analsys In Universal Hashing - no assumptions about the input (I.e. for any input, most hash functions will handle it well). For example, we don’t know a-priori how the grades are distributed. (surely they are not uniform over 0-100). If we know that the input has a specific distribution, we might want to use this. For example, if the input is uniformly distributed, then the simple division method will obtain simple uniform hashing. In general, we don’t know the input’s distribution, and so Universal Hashing is superior!

Encapsulation – hide internal structure from the user! Easy maintenance using building blocks. Protect the system from it’s users - the user can’t cause inconsistencies. Reusable data structures – DS implementation should fit all objects, not just books. Sorting in a sorted map – a sorted map is sorted only by key by definition. Sorting by the value as well is a violation of the ‘contract’. Documentation – informative documentation of methods and data members (//data members - not informative). Different naming conventions for local variables vs. data members improves readability Programming exercise 1 Issues

Debugging – a major part of the exercise. Test different inputs as well as load tests. Equals vs. == == compares addresses of objects Equals compares the values of the strings. The complication – compilers optimizations might put a constant string that is pointed at from different pointers in the same address. In this case, == and equals give the same results. We cannot count on this: Using new allocates different addresses Different compilers If your code failed on many tests because of this mistake, fix the problematic comparisons, and send only the fixed files (say exactly where the problem was). Programming exercise 1 Issues

Theoretical homework 2, Question 2 Finding an element Let A be a data structure consisting of integers, and x an integer. The function IsMember gets as input A and x, and returns true when x appears in A, false otherwise. (a)Assume A is an array. Write pseudo-code for IsMember function and analyze its worst-case complexity. Solution: The algorithm: Go over the elements in A and compare them to x. Stop when x is found or when all the elements in A are checked. Worst-case time complexity: O(n).

Theoretical homework 2, Question 2 (b) Assume we can preprocess the contents of the array A in order to decrease the cost of IsMember. We can store the result of the preprocessing in a new data structure B. (i) Indicate what data structure B and pre-processing function PreProcess can be used to speed-up IsMember. What is the worst- case complexity of PreProcess? (ii) Write a function IsMember I that takes advantage of the data structure B. What is the worst-case complexity of IsMember ? What is pre-processing and what is it good for? Think about the following scenario: the site keeps many lists of ‘hevre’ from different schools, movements, work places etc., some of them very popular.

Theoretical homework 2, Question 2 Once you are registered to a group, you can enter with your unique login as many times as you like and see what’s up. Identification is done by calling IsMember. As the lists become longer, identification becomes slower. As the site’s manager, you decide to organize the lists more efficiently so that IsMember works fast. What you need is a pre-processing function. Pre-process – the idea of pre-processing is to organize the data in a certain way before starting to use it, so that common / online operations will work fast.

Theoretical homework 2, Question 2 Solution to 2-b-i: Pre-process – sort the array in O(n log n) IsMember = binary search, O(log n) Do you have another solution? Solution 2: Pre-process – build a hash table, O(n) IsMember – look for the key in the hash table, O(1) on the average.

Theoretical homework 2, Question 2 (c) Assume now that the array A is sorted but its contents are not known in advance. Write a new function IsMember whose worst-case complexity is O(log i), when x appears in A[i], and O(log n) when x does not appear in A. Solution: First step – determine the range of x (lower and upper bounds) in O(log i). Can we run a binary search? No, The required time complexity does not suit binary search So, what can we do? Search algorithm: Initialize: u = 1 While A[u] < x & u < A.size u = 2*u If ( u > A.size ) u = A.size Perform binary search of x in A[1…u]

Theoretical homework 2, Question 2 Complexity Analysis: If x is in index i, To get the upper bound u we do most ┌ log( i ) ┐ operations Next we perform binary search on A[1…u] complexity O(log u) Since u < 2*i we get O(log u) = O( log i )