Theory of Computational Complexity M1 Takao Inoshita Iwama & Ito Lab Graduate School of Informatics, Kyoto University.

Slides:



Advertisements
Similar presentations
The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.
Advertisements

5.4 Basis And Dimension.
5.1 Real Vector Spaces.
On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.
Occupancy Problems m balls being randomly assigned to one of n bins. (Independently and uniformly) The questions: - what is the maximum number of balls.
On the Density of a Graph and its Blowup Raphael Yuster Joint work with Asaf Shapira.
Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.
Counting Chapter 6 With Question/Answer Animations.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.
Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.
The number of edge-disjoint transitive triples in a tournament.
Visual Recognition Tutorial
By : L. Pour Mohammad Bagher Author : Vladimir N. Vapnik
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006
Point and Confidence Interval Estimation of a Population Proportion, p
TDC 369 / TDC 432 April 2, 2003 Greg Brewster. Topics Math Review Probability –Distributions –Random Variables –Expected Values.
Chapter 4 Probability Distributions
Brun’s Sieve Let B 1, …, B m be events, X i the indicator random variable for Bi and X = X 1 + … + X m the number of Bi that hold. Let there be a hidden.
INFINITE SEQUENCES AND SERIES
Crossing Lemma - Part I1 Computational Geometry Seminar Lecture 7 The “Crossing Lemma” and applications Ori Orenbach.
Probability theory 2008 Conditional probability mass function  Discrete case  Continuous case.
Probability theory 2011 Convergence concepts in probability theory  Definitions and relations between convergence concepts  Sufficient conditions for.
The moment generating function of random variable X is given by Moment generating function.
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
Chapter 21 Random Variables Discrete: Bernoulli, Binomial, Geometric, Poisson Continuous: Uniform, Exponential, Gamma, Normal Expectation & Variance, Joint.
Chapter 4. Continuous Probability Distributions
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
Random Walks and Markov Chains Nimantha Thushan Baranasuriya Girisha Durrel De Silva Rahul Singhal Karthik Yadati Ziling Zhou.
Discrete Mathematics, 1st Edition Kevin Ferland
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
Slide 1 Copyright © 2004 Pearson Education, Inc..
Chapter 8. Section 8. 1 Section Summary Introduction Modeling with Recurrence Relations Fibonacci Numbers The Tower of Hanoi Counting Problems Algorithms.
The importance of sequences and infinite series in calculus stems from Newton’s idea of representing functions as sums of infinite series.  For instance,
Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is.
Chapter 6 With Question/Answer Animations 1. Chapter Summary The Basics of Counting The Pigeonhole Principle Permutations and Combinations Binomial Coefficients.
Copyright © 2014, 2010 Pearson Education, Inc. Chapter 2 Polynomials and Rational Functions Copyright © 2014, 2010 Pearson Education, Inc.
Fall 2002CMSC Discrete Structures1 One, two, three, we’re… Counting.
March 10, 2015Applied Discrete Mathematics Week 6: Counting 1 Permutations and Combinations How many different sets of 3 people can we pick from a group.
The Integers. The Division Algorithms A high-school question: Compute 58/17. We can write 58 as 58 = 3 (17) + 7 This forms illustrates the answer: “3.
Chapter 2 Mathematical preliminaries 2.1 Set, Relation and Functions 2.2 Proof Methods 2.3 Logarithms 2.4 Floor and Ceiling Functions 2.5 Factorial and.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Probability Theory Overview and Analysis of Randomized Algorithms Prepared by John Reif, Ph.D. Analysis of Algorithms.
Chapter 5.6 From DeGroot & Schervish. Uniform Distribution.
Chapter 01 Probability and Stochastic Processes References: Wolff, Stochastic Modeling and the Theory of Queues, Chapter 1 Altiok, Performance Analysis.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
Chapter 4-5 DeGroot & Schervish. Conditional Expectation/Mean Let X and Y be random variables such that the mean of Y exists and is finite. The conditional.
1 Probability and Statistical Inference (9th Edition) Chapter 5 (Part 2/2) Distributions of Functions of Random Variables November 25, 2015.
Discrete Random Variables. Introduction In previous lectures we established a foundation of the probability theory; we applied the probability theory.
Random Variables. Numerical Outcomes Consider associating a numerical value with each sample point in a sample space. (1,1) (1,2) (1,3) (1,4) (1,5) (1,6)
ICS 353: Design and Analysis of Algorithms
Joint Moments and Joint Characteristic Functions.
Diversity Loss in General Estimation of Distribution Algorithms J. L. Shapiro PPSN (Parallel Problem Solving From Nature) ’06 BISCuit 2 nd EDA Seminar.
1 What happens to the location estimator if we minimize with a power other that 2? Robert J. Blodgett Statistic Seminar - March 13, 2008.
2/24/20161 One, two, three, we’re… Counting. 2/24/20162 Basic Counting Principles Counting problems are of the following kind: “How many different 8-letter.
Department of Statistics University of Rajshahi, Bangladesh
Chapter 6 Large Random Samples Weiqi Luo ( 骆伟祺 ) School of Data & Computer Science Sun Yat-Sen University :
Binomial Coefficients and Identities
1 Distributed Vertex Coloring. 2 Vertex Coloring: each vertex is assigned a color.
Theory of Computational Complexity Probability and Computing Ryosuke Sasanuma Iwama and Ito lab M1.
David Luebke 1 6/26/2016 CS 332: Algorithms Linear-Time Sorting Continued Medians and Order Statistics.
Theory of Computational Complexity Yusuke FURUKAWA Iwama Ito lab M1.
David Luebke 1 7/2/2016 CS 332: Algorithms Linear-Time Sorting: Review + Bucket Sort Medians and Order Statistics.
CHAPTER SIX T HE P ROBABILISTIC M ETHOD M1 Zhang Cong 2011/Nov/28.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
Chapter 5: Integration Section 5.1 An Area Problem; A Speed-Distance Problem An Area Problem An Area Problem (continued) Upper Sums and Lower Sums Overview.
The Pigeonhole Principle
Randomized Algorithms
Randomized Algorithms
The Selection Problem.
Presentation transcript:

Theory of Computational Complexity M1 Takao Inoshita Iwama & Ito Lab Graduate School of Informatics, Kyoto University

Chapter 5 Balls, Bins and Random Graphs highlight of my part ・ Balls-and-Bins Problem ・ Poisson Approximation ・ Some Applications

5.1 Example : The Birthday Paradox ・ 30 people in the room ・ Some two people share the same birthday or no two people share? ・ The Birthday of each person is a random day from a 365-day year, chosen independently and uniformly at random. ・ It is easier to think about the configurations where people do not share a birthday.

calculate ・ One way to calculate this probability is to directly count the configurations where two people so do not share a birthday. ・ 30 days must be chosen from the 365 and these days can be assigned to the people in any order of the 30! Possible orders. ・ Whole event is pattern.

We can also calculate this probability one person at a time. This product is …. In general (in case of m people and n possible birthdays)

If m is small compared to n, we can use below approximation. Therefore,

Hence the value for m at which the probabilities of both become ½ is approximately given below

Let E k be the event that kth person’s birthday does not match any of the birthday of the first k-1 people. Then the probability that the first k people fail to have distinct birthday is

If this probability is less than 1/2 With people, the probability is at least ½ that all birthday will be distinct.

Assumption : the first people all have distinct birthdays. Each person after that has probability at least of having the same birthday as one of the first people Hence, Once, there are people, the probability of their different birthday is at most 1/e

5.2. Balls into Bins The Balls-and-Bins Model The birthday paradox is an example of Balls- and-Bins problem. We have m balls that are thrown into n bins, with the location of each ball chosen independently and uniformly at random from n bins. The birthday paradox means whether or not there is a bin with two balls.

If this probability is less than 1/2 With people, the probability is at least ½ that all birthday will be distinct. Balls-and-Bins problem In case of m balls and n bins, for some at least one of the bins is likely to have more than one ball in it.

Lemma 5.1 : When n balls are thrown independently and uniformly at random into n bins, the probability that the maximum load is more than is at most 1/n for n sufficiently large.

The probability that bin 1 receives at least M balls is at most Proof : This follows from a union bound; there are distinct sets of M balls, and for any set of M balls the probability that all land in bin 1 is. We now use the inequalities

Here the second inequality is a consequence of the following general bound on factorials: since

Applying a union bound again allows us to find that, for, the probability that any bin receives at least M balls is bounded above by for n sufficiently large

Application: Bucket Sort Bucket Sort is an example of a sorting algorithm that breaks the lower bound for standard comparison-based sorting and runs in expected linear time. Assumption : Input is restricted to a set of elements of integer chosen independently and uniformly at random from the range

First step of Bucket Sort Buckets : linked lists

Second step of Bucket Sort is any standard quadratic time algorithm. Concatenating the sorted lists from each bucket in order gives us the sorted order for the elements.

Analyzing of Bucket Sort Assuming that each element can be placed in the appropriate bucket in O(1) time, the first step requires only O(n) time. The input is uniformly at random. So … The number of elements that land in a specific bucket follows a binomial distribution B(n, 1/n). Bucket Sort falls naturally into the balls-and-bins model. An element A bucket B(n, 1/n) each placement : O(1)

Analyzing of Bucket Sort Let Xj be the number of elements that land in the jth Bucket. We can sort the jth bucket in at most time for some constant c. The expected time spent sorting in the second stage is at most Since X1 is a binomial random variable B(n, 1/n), (from 3.2.1)

Analyzing of Bucket Sort Hence the total expected time spent in the second stage is at most 2cn. Bucket Sort runs in expected linear time.

5.3 The Poisson Distribution We want to get the excepted fraction of bins with r balls for any r. First, we consider a particular case of r = 0. The probability the first bin remains empty is ・・・・・・ 1/n

By symmetry this probability is the same for all bins. X : a random variable that represents the number of empty bins Xj : a random variable that is 1 when jth bin is empty and 0 otherwise Thus, the expected fraction of empty bins is approximately

In general case, the probability that a given bin has r balls is When m and n are large compared to r This probability p r is approximately

Definition 5.1: A discrete Poisson random variable X with parameter μ is given by the following probability distribution on j = 0,1,2,… the sum of probability in this distribution

the expectation of this random variable

Lemma 5.2: The sum of a finite number of independent Poisson random variables is a Poisson random variable.

Proof : The case of more random variables than two is simply handled by induction

Lemma 5.3: The moment generating function of a Poisson random variable with parameter μ is Proof : For any t,

Lemma 5.3 → Lemma 5.2 Given two independent Poisson random variable X and Y with means μ 1 and μ 2, we apply Theorem 5.3 which is the moment generating function of a Poisson random variable with mean μ 1 + μ 2. By theorem 4.2, the moment generating function uniquely defines the distribution, and hence the sum X + Y is a Poisson random variable with mean μ 1 + μ 2.

Theorem 5.4 : Let X be a Poisson random variable with parameter μ. 1. If x > μ, then 2. If x < μ, then

Proof of Theorem For any t > 0 and x > μ, Plugging in the expression for the moment generating function of the Poisson distribution, we have Choosing gives,

Proof of Theorem For any t < 0 and x < μ, Hence, Choosing gives,

5.3.1 Limit of the Binomial Distribution When throwing m balls randomly into b bins, the probability that a bin has r balls is approximately the Poisson distribution with mean m/b. Binomial distribution Poisson distribution When n is large and p is small, limit

Theorem 5.5 : Let X n be a binomial random variable with parameters n and p, where p is a function of n and is a constant that is independent of n. Then for any fixed k,

Let X m be the number of balls in a specific bin. Then X m is a binomial random variable with parameter m and 1/n. Consider the balls-and-bins problem that there are m balls and n bins, where m is a function of n and Apply Theorem 5.5 for balls-and-bins problem matching the before approximation.

Applications of Theorem 5.5 ・ Count the number of spelling or grammatical mistakes. ・ Consider the number of chocolate chips inside a chocolate chip cookie. ・ Continuous settings (in Chapter 8) spells mistake with p (small) success with n (large)

Proof of Theorem 5.5 We can write Then,

Proof of Theorem 5.5 Combining, we have

Proof of Theorem 5.5 In the limit, as n approaches infinity p approaches zero because the limiting value pn is the constant λ. It follows that and Since lies between these two values, the theorem follows.

5.4 The Poisson Approximation In the balls-and-bins problems, the probability about one bin depends on the probability of another bin. We want to treat bin load as independent Poisson random variables for easier analysis. The probability of an event using this Poisson approximation for all bins and multiplying it by gives an upper bound for the probability of the event when m balls are thrown into n bins.

Theorem 5.6 : The distribution of conditioned on is the same as regardless of the value of m. Independent Poisson random Variables with mean m/n the number of balls in the ith bin

Difference between bins-and-balls and the approximation ・・・・・・ ・・・・・・ E[balls] = m/n Sum of balls = mSum of balls = ? (Average = m) Exact bins-and-balls model Approximation

Proof : When throwing k balls into n bins, the probability that for any k 1,…,k n satisfying is given by

Now, for any k 1,…,k n with, consider the probability that condition on satisfying

The probability that is since the are independent Poisson random variables with mean m/n. Also, by Lemma 5.2, the sum of the is itself a Poisson random variable with mean m. Hence

Theorem 5.7 : Let f (x 1,…,x n ) be a nonnegative function. Then Proof :

Since is Poisson distributed with mean m, we now have We use the following loose bound on m!, which we prove as Lemma 5.8 : This yields

Lemma 5.8 : Proof : For Therefore The result now follows simply by exponentiating.

If the function is the indicator function that is 1 if some event occurs and 0 otherwise, then Theorem 5.7 gives bounds on the probability of events. In balls-and-bins problem, the number of balls in bins are taken to be independent Poisson random variables with mean m/n. the Poisson case m balls are thrown into n bins at random. the exact case

Corollary 5.9 : Any event that takes place with probability p in the Poisson case takes place with probability at most in the exact case. Proof : Let f be the indicator function of the event. In this case, E[f] is just probability that the event occurs, and the result follows immediately from Theorem 5.7

Corollary 5.9 : the Poisson case the exact case Any rare events in the Poisson case are also rare in the exact case.

Theorem 5.10 : Let f (x 1,…,x n ) be a nonnegative function such is either monotonically increasing or monotonically decreasing in m. Then

Corollary 5.11 : Let ε be an event whose probability is either monotonically increasing or monotonically decreasing in the number of balls If ε has probability p in the Poisson case, then ε has probability at most 2p in the exact case.

Lemma 5.12 : When n balls are thrown independently and uniformly at random into n bins, the maximum load is at least ln n/ln ln n with probability at least 1 – 1/n for n sufficiently large. Proof : In the Poisson case, the probability that bin 1 has load at least is at least 1/eM!, which is the probability it has load exactly M.

if, the probability that the maximum load is not at least M in exact case is at most In the Poisson case, all bins are independent, so the probability that no bin has load at least M is at least

It therefore suffices to show that or equivalently that From Lemma 5.8, it follows that when n is suitably large. Hence, for n suitably large,

5.4.1 Example : Coupon Collector’s Problem, Revisited The coupon collector’s problem can be thought of as a balls-and-bins problem. coupons Cereal boxes bins balls question If balls are thrown at random into bins, how many balls are thrown until all bins have at least one ball?

Apply before result for the balls-and-bins problem From section 2.4.1, the expected number of balls that must be thrown before each bin has at least one ball is nH(n). From section 3.3.1, when n ln n + cn balls are thrown the probability that not all bins have at least one ball is

Theorem 5.13 : Let x be the number of coupons observed before obtaining one of each of n types of coupons. Then, for any constant c,

Proof of Theorem 5.13 : Outline 1. We prove 2. We prove 3. We prove that if 1. and 2., the Poisson approximation is accurate. 4. If 3., Theorem 5.13 is right. m = n ln n + cn

Proof of Theorem 5.13 : 4, We look at the problem as a balls-and-bins problem. For the Poisson approximation, we suppose that number of balls in each bin is a Poisson random variable with mean ln n + c. The probability that a specific bin is empty is then Since all bins are independent under the Poisson approximation, the probability that no bin is empty is (for sufficiently large n)

Proof of Theorem 5.13 : 3. Let ε be the event that no bin is empty, and let X is the number of balls thrown. We use Pr(ε) splitting it as follows: (5. 7)

Proof of Theorem 5.13 : If and From Eqn (5. 7), and hence

Proof of Theorem 5.13 : Consider that X is a Poisson random variable with mean m, since it is a sum of independent Poisson random variables. From Theorem We prove

Proof of Theorem 5.13 : For we use that for to show A similar argument holds if x < m, so

Proof of Theorem 5.13 : 2. We prove Since Pr(ε | X = k) is increasing in k,

Hence we have the bound This is the probability of the following experiment: we throw balls and there is still at least one empty bin, but after throwing an additional balls, all bins nonempty.

Emp ty ・・・・・・ ・・・・・・ stepnext by union bound Hence this difference is o(1) as well.

Since 1, 2, 3, 4, proving the theorem.