Presentation is loading. Please wait.

Presentation is loading. Please wait.

Randomized Algorithms CS648

Similar presentations


Presentation on theme: "Randomized Algorithms CS648"โ€” Presentation transcript:

1 Randomized Algorithms CS648
Lecture 11 Hashing - I

2 โ€œDoes ๐‘–โˆˆ ๐‘บ ?โ€ for any given ๐‘–โˆˆ๐‘ผ.
Problem Definition ๐‘ผ= 1,2,โ€ฆ,๐‘š called universe ๐‘บโŠ†๐‘ผ and ๐‘ =|๐‘บ| ๐‘ โ‰ช ๐‘š Examples: ๐‘š= , ๐‘ = 10 3 Aim Maintain a data structure for storing ๐‘บ to support the search query : โ€œDoes ๐‘–โˆˆ ๐‘บ ?โ€ for any given ๐‘–โˆˆ๐‘ผ.

3 Solutions Solutions with worst case guarantees Alternative:
Solution for static ๐‘บ : Array storing ๐‘บ in sorted order Solution for dynamic ๐‘บ : Height Balanced Search trees (AVL trees, Red-Black trees,โ€ฆ) Time per operation: O(log ๐‘ ), Space: O(๐‘ ) Alternative: Time per operation: O(1), Space: O(๐‘š) Solutions used in practice with no worst case guarantees Hashing.

4 How many bits needed to encode ๐’‰ ?
Hashing Hash table: ๐‘ป: an array of size ๐’. Hash function ๐’‰ : ๐‘ผ๏ƒ  [๐’] Answering a Query: โ€œDoes ๐‘–โˆˆ ๐‘บ ?โ€ ๐‘˜๏ƒŸ๐’‰(๐‘–); Search the list stored at ๐‘ป[๐‘˜]. Properties of ๐’‰ : ๐’‰ ๐‘– computable in O(1) time. Space required by ๐’‰: O(1). Elements of ๐‘บ ๐‘ป โ‹ฎ 1 ๐’โˆ’๐Ÿ How many bits needed to encode ๐’‰ ?

5 Collision Definition: Two elements ๐‘–,๐‘—โˆˆ๐‘ผ are said to collide under hash function ๐’‰ if ๐’‰ ๐‘– =๐’‰ ๐‘— Worst case time complexity of searching an item ๐‘– : No. of elements in ๐‘บ colliding with ๐‘–. A Discouraging fact: No hash function can be found which is good for all ๐‘บ. Proof: At least ๐‘š/๐‘› elements from ๐‘ผ are mapped to a single index in ๐‘ป. โ‹ฎ 1 ๐’โˆ’๐Ÿ ๐‘ป

6 Collision Definition: Two elements ๐‘–,๐‘—โˆˆ๐‘ผ are said to collide under hash function ๐’‰ if ๐’‰ ๐‘– =๐’‰ ๐‘— Worst case time complexity of searching an item ๐‘– : No. of elements in ๐‘บ colliding with ๐‘–. A Discouraging fact: No hash function can be found which is good for all ๐‘บ. Proof: At least ๐‘š/๐‘› elements from ๐‘ผ are mapped to a single index in ๐‘ป. โ‹ฎ 1 ๐’โˆ’๐Ÿ ๐‘ป โ‹ฏ ๐‘š/๐‘›

7 The following result gave an answer in affirmative๏Š
Hashing A very popular heuristic since 1950โ€™s Achieves O(1) search time in practice Worst case guarantee on search time: O(๐’”) Question: Can we have a hashing ensuring O(1) worst case guarantee on search time. O(๐’”) space. Expected O(๐’”) preprocessing time. The following result gave an answer in affirmative๏Š Michael Fredman, Janos Komlos, Endre Szemeredy. Storing a Sparse Table with O(1) Worst Case Access Time. Journal of the ACM (Volume 31, Issue 3), 1984.

8 Why does hashing work so well in Practice ?

9 Why does hashing work so well in Practice ?
Question: What is the simplest hash function ๐’‰ : ๐‘ผ๏ƒ  [๐’] ? Answer: ๐’‰ ๐‘– =๐‘– ๐ฆ๐จ๐ ๐‘› Hashing works so well in practice because the set ๐‘บ is usually a uniformly random subset of ๐‘ผ. Let us give a theoretical reasoning for this fact.

10 Why does hashing work so well in Practice ?
1 2 m Let ๐‘ฆ 1 , ๐‘ฆ 2 ,โ€ฆ, ๐‘ฆ ๐‘  denote ๐‘  elements selected randomly uniformly from ๐‘ผ to form ๐‘บ. Question: What is expected number of elements colliding with ๐‘ฆ 1 ? Answer: Let ๐‘ฆ 1 takes value ๐‘–. P( ๐‘ฆ ๐‘— collides with ๐‘ฆ 1 ) = ?? โ‹ฎ ๐‘–โˆ’๐‘› ๐‘– How many possible values can ๐‘ฆ ๐‘— take ? ๐‘–+๐‘› How many possible values can collide with ๐‘– ? ๐‘–+2๐‘› ๐‘šโˆ’1 ๐‘–+3๐‘› โ‹ฎ

11 Why does hashing work so well in Practice ?
1 2 m Let ๐‘ฆ 1 , ๐‘ฆ 2 ,โ€ฆ, ๐‘ฆ ๐‘  denote ๐‘  elements selected randomly uniformly from ๐‘ผ to form ๐‘บ. Question: What is expected number of elements colliding with ๐‘ฆ 1 ? Answer: Let ๐‘ฆ 1 takes value ๐‘–. P( ๐‘ฆ ๐‘— collides with ๐‘ฆ 1 ) = ๐‘š ๐‘› ๐‘šโˆ’1 Expected number of elements of ๐‘บ colliding with ๐‘ฆ 1 = = ๐‘š ๐‘› ๐‘šโˆ’1 (๐‘ โˆ’1) =๐‘‚ 1 for ๐‘›=๐Ž(๐‘ ) โ‹ฎ ๐‘–โˆ’๐‘› Values which may collide with ๐‘– under the hash function ๐’‰ ๐‘ฅ =๐’™ ๐ฆ๐จ๐ ๐‘› ๐‘– ๐‘–+๐‘› ๐‘–+2๐‘› ๐‘–+3๐‘› โ‹ฎ

12 Why does hashing work so well in Practice ?
Conclusion ๐’‰ ๐‘– =๐‘– ๐ฆ๐จ๐ ๐‘› works so well because for a uniformly random subset of ๐‘ผ, the expected number of collision at an index of ๐‘ป is O(1). It is easy to fool this hash function such that it achieves O(s) search time. (do it as a simple exercise). This makes us think: โ€œHow can we achieve worst case O(1) search time for a given set ๐‘บ.โ€

13 How to achieve worst case O(1) search time

14 Key idea to achieve worst case O(1) search time
Observation: Of course, no single hash function is good for every possible ๐‘บ. But we may strive for a hash function which is good for a given ๐‘บ. A promising direction: Find out a set of hash functions H such that For any given ๐‘บ, many of them are good. Select a function randomly from H and try for ๐‘บ. The notion of goodness is captured formally by Universal hash family in the following slide.

15 Universal Hash Family

16 Universal Hash Family Definition: A collection ๐‘ฏ of hash-functions is said to be universal if there exists a constant ๐‘ such that for any ๐‘–,๐‘—โˆˆ๐‘ผ, ๐ ๐’‰ โˆˆ ๐‘Ÿ ๐‘ฏ ๐’‰ ๐‘– =๐’‰ ๐‘— โ‰ค ๐‘ ๐‘› Fact: Set of all functions from ๐‘ผ to [๐’] is a universal hash family (do it as homework). Question: Can we use the set of all functions as universal hash family in real life ? Answer: No. There are ๐‘› ๐‘š possible functions. Every pair of them must differ in at least one bit. At least one of them will require ๐‘š log ๐‘› bits to encode. So the space occupied by a randomly chosen hash function is too large ๏Œ. Question: Does there exist a Universal hash family whose hash functions have a compact encoding?

17 Universal Hash Family Definition: A collection ๐‘ฏ of hash-functions is said to be universal if there exists a constant ๐‘ such that for any ๐‘–,๐‘—โˆˆ๐‘ผ, ๐ ๐’‰ โˆˆ ๐‘Ÿ ๐‘ฏ ๐’‰ ๐‘– =๐’‰ ๐‘— โ‰ค ๐‘ ๐‘› There indeed exist many c-Universal hash families with compact hash function ๏Š Example: Let ๐’‰ ๐’‚ : ๐‘ผ๏ƒ  [๐’] defined as ๐’‰ ๐’‚ ๐‘– = ๐’‚๐‘– ๐ฆ๐จ๐ ๐’‘ ๐ฆ๐จ๐ ๐’ ๐‘ฏ= ๐’‰ ๐’‚ ๐Ÿโ‰ค๐’‚โ‰ค๐’‘โˆ’๐Ÿ} is ๐‘-universal. This looks complicated. In the next class we shall show that it is very natural and intuitive. For todayโ€™s lecture, you donโ€™t need it ๏Š

18 Static Hashing worst Case O(1) search time

19 The Journey One Milestone in Our Journey: Tools Needed:
A perfect hash function using hash table of size O( ๐‘  2 ) Tools Needed: Universal Hash Family where ๐‘ is a small constant Elementary Probability

20 Perfect hashing using O( ๐’” ๐Ÿ ) space
Let ๐‘ฏ be Universal Hash Family. Let ๐‘ฟ : the number of collisions for ๐‘บ when ๐’‰ โˆˆ ๐‘Ÿ ๐‘ฏ ? Question: What is ๐„[๐‘ฟ] ? ๐‘ฟ ๐‘–,๐‘— = ๐Ÿ if ๐’‰ ๐‘– =๐’‰(๐‘—) ๐ŸŽ otherwise ๐‘ฟ= ๐‘–<๐‘— ๐š๐ง๐ ๐‘–,๐‘—โˆˆ๐‘บ ๐‘ฟ ๐‘–,๐‘— ๐„ ๐‘ฟ = ๐‘–<๐‘— ๐š๐ง๐ ๐‘–,๐‘—โˆˆ๐‘บ ๐„[ ๐‘ฟ ๐‘–,๐‘— ] = ๐‘–<๐‘— ๐š๐ง๐ ๐‘–,๐‘—โˆˆ๐‘บ ๐[ ๐‘ฟ ๐‘–,๐‘— =๐Ÿ] โ‰ค ๐‘–<๐‘— ๐š๐ง๐ ๐‘–,๐‘—โˆˆ๐‘บ ๐’„ ๐’ = ๐’„ ๐’ โˆ™ ๐’”(๐’”โˆ’๐Ÿ) ๐Ÿ

21 Perfect hashing using O( ๐’” ๐Ÿ ) space
Let ๐‘ฏ be Universal Hash Family. Let ๐‘ฟ : the number of collisions for ๐‘บ when ๐’‰ โˆˆ ๐‘Ÿ ๐‘ฏ ? Lemma1: ๐„[๐‘ฟ]= ๐’„ ๐’ โˆ™ ๐’”(๐’”โˆ’๐Ÿ) ๐Ÿ Question: How large should ๐’ be to achieve no collision ? Question: How large should ๐’ be to achieve ๐„ ๐‘ฟ = ๐Ÿ ๐Ÿ ? Answer: Pick ๐’=๐’„ ๐’” ๐Ÿ .

22 Perfect hashing using O( ๐’” ๐Ÿ ) space
Let ๐‘ฏ be Universal Hash Family. Let ๐‘ฟ : the number of collisions for ๐‘บ when ๐’‰ โˆˆ ๐‘Ÿ ๐‘ฏ ? Lemma1: ๐„[๐‘ฟ]= ๐’„ ๐’ โˆ™ ๐’”(๐’”โˆ’๐Ÿ) ๐Ÿ Observation: ๐„ ๐‘ฟ โ‰ค ๐Ÿ ๐Ÿ when ๐’=๐’„ ๐’” ๐Ÿ . Question: What is the probability of no collision when ๐’=๐’„ ๐’” ๐Ÿ ? Answer: โ€œNo collisionโ€ ๏ƒณ โ€œ๐‘ฟ=๐ŸŽโ€ P(No collision ) = P(๐‘ฟ=๐ŸŽ) = ๐Ÿ โˆ’ P(๐‘ฟโ‰ฅ๐Ÿ) โ‰ฅ๐Ÿ โˆ’ ๐Ÿ ๐Ÿ = ๐Ÿ ๐Ÿ Use Markovโ€™s Inequality to bound it.

23 Perfect hashing using O( ๐’” ๐Ÿ ) space
Let ๐‘ฏ be Universal Hash Family. Lemma2: For ๐’=๐’„ ๐’” ๐Ÿ , there will be no collision with probability at least Algorithm1: Perfect hashing for ๐‘บ Repeat Pick ๐’‰ โˆˆ ๐‘Ÿ ๐‘ฏ ; ๐’• ๏ƒŸ the number of collisions for ๐‘บ under ๐’‰. Until ๐’•=๐ŸŽ. Theorem: A perfect hash function can be computed for ๐‘บ in expected O( ๐’” ๐Ÿ ) time. Corollary: A hash table occupying O( ๐’” ๐Ÿ ) space and worst case O(๐Ÿ) search time.

24 Hashing with O(๐’”) space and O(1) worst case search time
We have completed almost 90% of our journey. To achieve the goal of O(๐’”) space and worst case O(๐Ÿ) search time, here is the sketch (the details will be given in the beginning of the next class) Use the same hashing scheme as used in Algorithm1 except that use ๐’= O(๐’”). Of course, there will be collisions. Use an additional level of hash tables to take care of collisions. In the next class: We shall complete our algorithm for hashing with O(๐’”) space and O(1) worst case search time We shall present a very natural way to design various Universal Hash Families.


Download ppt "Randomized Algorithms CS648"

Similar presentations


Ads by Google