Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.

Slides:



Advertisements
Similar presentations
Chapter 13. Red-Black Trees
Advertisements

Theory of Computing Lecture 3 MAS 714 Hartmut Klauck.
Quiz3! Midterm! Assignment2! (most) Quiz4! Today’s special: 4 for 1.
Discrete Structure Li Tak Sing( 李德成 ) Lectures
Tirgul 5 AVL trees.
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
1 Trees. 2 Outline –Tree Structures –Tree Node Level and Path Length –Binary Tree Definition –Binary Tree Nodes –Binary Search Trees.
AVL-Trees (Part 1: Single Rotations) Lecture COMP171 Fall 2006.
4a-Searching-More1 More Searching Dan Barrish-Flood.
Universal Hashing When attempting to foil an malicious adversary, randomize the algorithm Universal hashing: pick a hash function randomly when the algorithm.
1 /26 Red-black tree properties Every node in a red-black tree is either black or red Every null leaf is black No path from a leaf to a root can have two.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
1 Red-Black Trees. 2 Black-Height of the tree = 4.
1 Red-Black Trees. 2 Black-Height of the tree = 4.
Unit 11a 1 Unit 11: Data Structures & Complexity H We discuss in this unit Graphs and trees Binary search trees Hashing functions Recursive sorting: quicksort,
Tirgul 5 Comparators AVL trees. Comparators You already know interface Comparable which is used to compare objects. By implementing the interface, one.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
Lecture 10: Search Structures and Hashing
Tirgul 5 This tirgul is about AVL trees. You will implement this in prog-ex2, so pay attention... BTW - prog-ex2 is on the web. Start working on it!
Tirgul 7 Heaps & Priority Queues Reminder Examples Hash Tables Reminder Examples.
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
David Luebke 1 7/2/2015 ITCS 6114 Red-Black Trees.
Balanced Search Trees CS 3110 Fall Some Search Structures Sorted Arrays –Advantages Search in O(log n) time (binary search) –Disadvantages Need.
1 Red-Black Trees. 2 Definition: A red-black tree is a binary search tree where: –Every node is either red or black. –Each NULL pointer is considered.
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.
David Luebke 1 9/18/2015 CS 332: Algorithms Red-Black Trees.
Balancing Binary Search Trees. Balanced Binary Search Trees A BST is perfectly balanced if, for every node, the difference between the number of nodes.
Mudasser Naseer 1 10/20/2015 CSC 201: Design and Analysis of Algorithms Lecture # 11 Red-Black Trees.
CSIT 402 Data Structures II
Tonga Institute of Higher Education Design and Analysis of Algorithms IT 254 Lecture 4: Data Structures.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
2IL50 Data Structures Fall 2015 Lecture 7: Binary Search Trees.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
CS 473Lecture X1 CS473-Algorithms Lecture RED-BLACK TREES (RBT)
Week 8 - Wednesday.  What did we talk about last time?  Level order traversal  BST delete  2-3 trees.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
CSS446 Spring 2014 Nan Wang.  to study trees and binary trees  to understand how binary search trees can implement sets  to learn how red-black trees.
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
October 19, 2005Copyright © by Erik D. Demaine and Charles E. LeisersonL7.1 Introduction to Algorithms LECTURE 8 Balanced Search Trees ‧ Binary.
Instructor Neelima Gupta Expected Running Times and Randomized Algorithms Instructor Neelima Gupta
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
CSE Advanced Algorithms Instructor : Gautam Das Submitted by Raja Rajeshwari Anugula & Srujana Tiruveedhi.
1 Binary Search Trees  Average case and worst case Big O for –insertion –deletion –access  Balance is important. Unbalanced trees give worse than log.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Red-Black Trees an alternative to AVL trees. Balanced Binary Search Trees A Binary Search Tree (BST) of N nodes is balanced if height is in O(log N) A.
Data Structures and Algorithms (AT70.02) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Prof. Sumanta Guha Slide Sources: CLRS “Intro.
CSC317 1 x y γ β α x y γ β x β What did we leave untouched? α y x β.
1 Red-Black Trees. 2 A Red-Black Tree with NULLs shown Black-Height of the tree = 4.
CSC317 Selection problem q p r Randomized‐Select(A,p,r,i)
Binary Search Trees A binary search tree is a binary tree
Hash table CSC317 We have elements with key and satellite data
CS 332: Algorithms Red-Black Trees David Luebke /20/2018.
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
Summary of General Binary search tree
Slide Sources: CLRS “Intro. To Algorithms” book website
Slide Sources: CLRS “Intro. To Algorithms” book website
Randomized Algorithms CS648
Multi-Way Search Trees
CSE373: Data Structures & Algorithms Lecture 5: AVL Trees
Red-Black Trees.
CSE 332: Data Abstractions AVL Trees
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
CS 3343: Analysis of Algorithms
Presentation transcript:

Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2

Universal Hashing Starting point: for every hash function, there is a “really bad” input. A possible solution: choose the hash function randomly from a family of hash functions. The logic behind it: For any given input, we want that “most” of the hash functions in our family will handle it with few collisions. Our family of hash function Specific hash function h 10,5 () h 2,13 () h 24,82 () h 68,53 ()

Demonstration Let us conduct an experiment: –A family of about 10,000 hash functions (the family you saw in class, details later on). –One fixed input (50 keys), inserted to a table of size 70. (Student grades of exercises - dast 2003) –Question: how many will behave really bad? Next slide shows the results - the x-axis describes the number of collisions (in this case it was also equal to the number of pairs that collide), and the y-axis describes how many functions had such a number of collisions.

Results

Most functions perform close to average performance The average number of collisions is 8-9 [ all entries of the hash table always contained at most two elements, so the number of collisions is actually the number of entries with more than one element]. Very very few functions had more than twice this number of collisions (or less than half). This is no accident! –We constructed the family of functions so that the average performance of all the functions over any input will be good. –Probability laws (e.g the Markov inequality you saw in class) tell us that very few elements of a universe will behave much worse (or much better) than the average behavior.

A good family of hash functions Conclusion: Designing a family with good average performance is enough. We need to know two things: –A criteria that guarantees good average performance. –How to construct a family that will have this criteria.

Ensuring good average performance Definition: A family of hash functions H is universal if for any two keys k 1 and k 2, and any two slots in the table y 1 and y 2, the probability that h(k 1 )= y 1 and h(k 2 )= y 2 is at most 1/m 2 (m is the size of the hash table). Remark: This means that the chance that two keys will fall to the same slot is 1/m - just like if the hash function was truly random! Claim: When using a universal hash family H, the average time of any hash operation is at most n/m + 1 (n is the number of elements we insert to the table).

Is this better than a balanced tree? If we have an estimation of n, the number of elements we will insert to the table, we will have constant time performance - no matter how many elements we have: 10 6, 10 8, 10 10, or more... In contrary, the performance of a balanced tree, O(log n), is affected by the number of elements we have! As we have more elements, we have slower operations. For very large numbers, like 10 10, this makes a difference.

Constructing a universal family Choose p - a prime larger than all keys. For any a,b in Z p ={0,...,p-1} denote fix a hash function: h a,b (k) = ((a * k + b) mod p) mod m The universal family: H p,m = { h a,b () | a,b in Z p } Theorem: H p,m is a universal family of hash functions. In our demonstration, the set of keys was all possible grades. We chose p=101, inserted 50 (real) grades into a hash table of size 70 (doing this for all the hash functions in H 101,70 and counting collisions ).

A second approach - average over inputs In Universal Hashing - no assumptions about the input (I.e. for any input, most hash functions will handle it well). For example, we don’t know a-priori how the grades are distributed. (surly they are not uniform over 0-100). If we know that the input has a specific distribution, we might want to use this. For example, if the input is uniformly distributed, then the simple division method will obtain simple uniform hashing. In general, we don’t know the input’s distribution, and so Universal Hashing is superior!

T2 q.1 Reminder - quicksort: quicksort(A[1..n]) 1. choose a pivot p from A. 2. re-arrange A s.t. all elements smaller than p will be located before it in A, and all larger elements will be after it. 3. Suppose now p is in slot k. 4. Recursively sort A[1..k-1] and A[k+1..n]. The connection to the previous discussion: If we choose the pivot randomly, we actually have a family of algorithms, from which we choose one. The average performance is good, and so, for any input, most algorithms will perform well!

T2 q.1 - continued Question: How many calls to the random number generator will we have in the worst case, and in the best case? Answer: The number of these calls will always be ! Proof: Let us draw the recursion tree : An internal node represents a call to quicksort with an array of size at least 2. A leaf represents a call to quicksort with an array of size 1. Any internal node is also a father of a leaf that represents the pivot it used.

The recursion tree Any leaf represents a single element in the array. Therefore the number of leaves is exactly n. [the ordered array is actually the leaves, from left to right]. The random number generator is called once in every internal nodes. Therefore we actually ask: how many internal nodes are there? Let X be the number of internal nodes.

Proof (continued) Observation 1: X is at most n, since any internal node points to at least one leaf. Observation 2: X is at least n/3 : –Divide the set of leaves to subsets according their father. –Each subset contains at most 3 leaves, and therefore there are at least n/3 subsets. Therefore: –X = no. of subsets >= n/3 Conclusion: Q.E.D

T3 q.2 Reminder - Red-Black trees: A binary search tree, with the following properties: 1. Every node has a color - either red or black. 2. The root is black. 3. Every leaf (empty child) is black. 4. Both children of a red node are black. 5. Every path from a node to a descendant leaf contains the same number of black nodes. The black height of a tree is the number of black nodes in a path from the root to some leaf (not counting the root).

T3 q.2 - first part Question: What is the maximal number of internal nodes of an RB tree with black height h? First intuition: The path from the root to a leaf must contain exactly h black nodes. We want it to be long, so we can put a red node between each two black nodes. That’s the maximal we can do, since otherwise we’ll violate property 4. Important: This is just intuition, not a proof! A proof must show, by one or more arguments, without gaps between them, that the claim must be true. For example, in the above intuition there are two gap: how do we know there is no way to make it even longer? And can we actually construct such a tree?

A maximal tree First part: showing we can actually construct such a tree: Take a complete binary tree with 2h+1 levels. Color the root black, the second level red, the third level black, and so on. Number of internal nodes: 2^(2h) - 1 Notice that is a valid RB tree (with black height h): Properties 1, 2 & 4 immediately hold. 3 - The number of levels is odd, and we colored the first level black, then the last level (leaves) is black too. 5 - All paths have alternating red & black nodes, and have the same length. black red black......

This tree is indeed maximal Claim: Any RB tree with black height h has at most 2h levels (ignoring the leaves). Proof: What is the no. of nodes in some path from a root to a leaf: All paths contain same no. of black nodes, h in our case. Including the root, we have h+1 black nodes. There must be at most h red nodes by property 4, therefore the path has 2h+1 nodes, or 2h if we ignore the leaf. Remark: A binary tree with 2h levels contains at most 2^(2h)-1. This happens when the tree is complete. Answer: An RB tree with maximal height h can have at most 2^(2h)-1 internal nodes.

T3 q.2 - second part Question: What is the minimal number of internal nodes of an RB tree with black height h. Claim: There cannot be any red nodes in the minimal RB tree. Proof: Suppose there is a red node, x. It must have two black sons. We can delete x and one of its sub-trees, T1, and connect x’s father to the other sub-tree, T2. The only property we need to check is 5 - but for any path in T1 there is a path with the same number of nodes it T2. So property 5 holds. Therefore the original tree wasn’t minimal.

T3 q.2 - second part (continued) Claim: An RB tree with no red nodes must be a complete binary tree. Proof: Consider only the internal nodes. If this tree is not complete, there are missing nodes at the last level. Then there is a node with two paths to a leaf, with different lengths. Since all nodes are black, this violates property 5. Answer: There is a single RB tree of black height h with minimal no. of internal nodes. It has 2^h - 1 internal nodes.