Download presentation

Presentation is loading. Please wait.

Published byMorgan Seel Modified about 1 year ago

1
ENGG 2040C: Probability Models and Applications Andrej Bogdanov Spring Limit theorems

2
Many times we do not need to calculate probabilities exactly An approximate or qualitative estimate often suffices P( magnitude 7+ earthquake within 10 years ) = ? This is often a much easier task

3
What do you think? I toss a coin 1000 times. The probability that I get a streak of 14 consecutive heads is < 10% ≈ 50% > 90% AB C

4
Consecutive heads where I i is an indicator r.v. for the event “14 consecutive heads starting at position i ” Let N be the number of occurrences of 14 consecutive heads in 1000 coin flips. N = I 1 + … + I 987 E[I i ] = P(I i = 1) = 1/2 14 E[N ] = 987 ⋅ 1/2 14 = 987/16384 ≈

5
Markov’s inequality For every non-negative random variable X and every value a : P(X ≥ a) ≤ E[X] / a. E[N ] ≈ P[N ≥ 1] ≤ E[N ] / 1 ≤ 6%.

6
Proof of Markov’s inequality For every non-negative random variable X : and every value a : P(X ≥ a) ≤ E[X] / a. E[X ] = E[X | X ≥ a ] P(X ≥ a) + E[X | X < a ] P(X < a) ≥ 0 ≥ a ≥ 0 E[X ] ≥ a P(X ≥ a) + 0.

7
Hats 1000 people throw their hats in the air. What is the probability at least 100 people get their hat back? N = I 1 + … + I 1000 where I i is the indicator for the event that person i gets their hat. Then E[I i ] = P(I i = 1) = 1/n Solution E[N ] = n 1/n = 1 P[N ≥ 100] ≤ E[N ] / 100 = 1%.

8
Patterns A coin is tossed 1000 times. Give an upper bound on the probability that the pattern HH occurs: (b) at most 100 times (a) at least 500 times

9
Patterns Let N be the number of occurrences of HH. P[N ≥ 500] ≤ E[N ] / 500 = /500 ≈ 49.88% so 500+ HH s occur with probability ≤ 49.88%. P[N ≤ 100] ≤ ? P[999 – N ≥ 899] (b) P[N ≤ 100] =≤ E[999 – N ] / 899 = (999 – )/ 899 ≤ 83.34% Last time we calculated E[N ] = 999/4 = (a)

10
Computer simulation of patterns # toss n coins and count number of consecutive head pairs def consheads(n): count = 0 lastone = randint(0, 1) thisone = randint(0, 1) for i in range(n - 1): if lastone == 1 and thisone == 1: count = count + 1 lastone = thisone thisone = randint(0, 1) return count >>> for i in range(100): print(consheads(1000), end = “ ”)

11
Chebyshev’s inequality For every random variable X and every t : P(|X – | ≥ t ) ≤ 1 / t 2. where = E[X], = √Var[X].

12
Patterns E[N ] = 999/4 = Var[N] = (5 ⋅ 999 – 7)/16 = = ≈ (a) P(X ≥ 500) ≤ P(|X – | ≥ ) ≤ 1/ ≈ 0.50% (b) P(X ≤ 100) ≤ P(|X – | ≥ 8.47 ) ≤ 1/ ≈ 1.39%

13
Proof of Chebyshev’s inequality For every random variable X and every a : P(|X – | ≥ t ) ≤ 1 / t 2. where = E[X], = √Var[X]. P(|X – | ≥ t ) = P((X – ) 2 ≥ t 2 2 ) ≤ E[(X – ) 2 ] / t 2 2 = 1 / t 2.

14
An illustration – t + t P(|X – | ≥ t ) ≤ 1 / t 2. a P( X ≥ a ) ≤ / a. 0 Markov’s inequality: Chebyshev’s inequality:

15
Polling

16
X i = 1 if i 0 if i X 1,…, X n are independent Bernoulli( ) where is the fraction of blue voters X = X 1 + … + X n X/n is the pollster’s estimate of

17
Polling How accurate is the pollster’s estimate X/n ? E[X] = = n E[X 1 ] + … + E[X n ] Var[X]= Var [X 1 ] + … + Var [X n ] = 2 n = E[X i ], = √Var[X i ] X = X 1 + … + X n

18
Polling E[X] = n Var[X] = 2 n P( |X – n| ≥ t √n ) ≤ 1 / t 2. P( |X/n – | ≥ ) ≤ . confidence error sampling error X = X 1 + … + X n nn

19
The weak law of large numbers For every , > 0 and n ≥ 2 ( 2 ) : P(|X/n – | ≥ ) ≤ X 1,…, X n are independent with same p.m.f. (p.d.f.) = E[X i ], = √Var[X i ], X = X 1 + … + X n

20
Polling Say we want confidence error = 10% and sampling error = 5%. How many people should we poll? For , > 0 and n ≥ 2 ( 2 ) : P(|X/n – | ≥ ) ≤ n ≥ 2 ( 2 ) ≥ 4000 2 For Bernoulli( ) samples, 2 = (1 – ) ≤ 1/4 This suggests we should poll about 1000 people.

21
A polling simulation number of people polled n X 1 + … + X n n X 1, …, X n independent Bernoulli(1/2) pollster’s estimate

22
A polling simulation number of people polled n X 1 + … + X n n 20 simulations pollster’s estimate

23
A more precise estimate Let’s assume n is large. Weak law of large numbers: X 1 + … + X n ≈ n with high probability X 1,…, X n are independent with same p.m.f. (p.d.f.) P( |X – n| ≥ t √n ) ≤ 1 / t 2. this suggests X 1 + … + X n ≈ n + T √n

24
Some experiments X = X 1 + … + X n X i independent Bernoulli(1/2) n = 6 n = 40

25
Some experiments X = X 1 + … + X n X i independent Poisson(1) n = 3 n = 20

26
Some experiments X = X 1 + … + X n X i independent Uniform(0, 1) n = 2 n = 10

27
The normal random variable f(t) = (2 ) -½ e -t /2 2 t p.d.f. of a normal random variable

28
The central limit theorem X 1,…, X n are independent with same p.m.f. (p.d.f.) where T is a normal random variable. = E[X i ], = √Var[X i ], X = X 1 + … + X n For every t (positive or negative): lim P(X ≤ n + t √n ) = P(T ≤ t) n → ∞

29
Polling again Probability model X = X 1 + … + X n X i independent Bernoulli( ) = fraction that will vote blue E[X i ] = , = √Var[X i ] = √ (1 - ) ≤ ½. Say we want confidence error = 10% and sampling error = 5%. How many people should we poll?

30
Polling again lim P(X ≤ n – t √n ) = P(T ≤ -t) n → ∞ 5% n lim P(X ≥ n + t √n ) = P(T ≥ t) n → ∞ 5% n lim P(X/n is not within 5% of ) = P(T ≤ -t) + P(T ≥ t) n → ∞ = 2 P(T ≤ -t) t √n = 5% n t = 5%√n/

31
The c.d.f. of a normal random variable t F(t)F(t) P(T ≤ -t) t -t P(T ≥ t)

32
Polling again confidence error = 2 P(T ≤ -t) We want a confidence error of ≤ 10% : = 2 P(T ≤ -5%√n/ ) ≤ 2 P(T ≤ -√n/10) We need to choose n so that P(T ≤ -√n/10) ≤ 5%.

33
Polling again t F(t)F(t) P(T ≤ -√n/10) ≤ 5% -√n/10 ≈ n ≈ ≈ 271

34
Party Give an estimate of the probability that the average arrival time of a guest is past 8:40pm. Ten guests arrive independently at a party between 8pm and 9pm.

35
Acute triangles Drop three points at random on a square. What is the probability that they form an acute triangle?

36
Simulation # indicate whether the triangle with the given vertices is acute def is_acute(x1, y1, x2, y2, x3, y3): def dot(x1, y1, x2, y2, x0, y0): return (x1 - x0) * (x2 - x0) + (y1 - y0) * (y2 - y0) a1 = dot(x2, y2, x3, y3, x1, y1) a2 = dot(x3, y3, x1, y1, x2, y2) a3 = dot(x1, y1, x2, y2, x3, y3) return a1 > 0 and a2 > 0 and a3 > 0 # count the fraction of acute triangles among n random samples def simulate_triangles(n): count = 0 for i in range(n): if is_acute(uniform(0.0, 1.0), uniform(0.0, 1.0), uniform( count = count + 1 return 1.0 * count / n Idea: Conduct a poll among random triangles!

37
Simulation Want sampling error =.01, confidence error = Rigorous estimate: By weak law of large numbers, we can choose n = 2 ( 2 ) ≤ 50,000 > simulate_triangles(50000) > simulate_triangles(50000) > simulate_triangles(50000)

38
Simulation Want sampling error =.01, confidence error = Non-rigorous (but better) estimate: Central limit theorem suggests choosing n such that t √n ≤ n, P(Normal < -t) = > simulate_triangles(5366) > simulate_triangles(5366) > simulate_triangles(5366) t = n = (t/2 ) 2 ≈ 5366

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google