Presentation is loading. Please wait.

Presentation is loading. Please wait.

ENGG 2040C: Probability Models and Applications Andrej Bogdanov Spring 2014 8. Limit theorems.

Similar presentations


Presentation on theme: "ENGG 2040C: Probability Models and Applications Andrej Bogdanov Spring 2014 8. Limit theorems."— Presentation transcript:

1 ENGG 2040C: Probability Models and Applications Andrej Bogdanov Spring Limit theorems

2 Many times we do not need to calculate probabilities exactly An approximate or qualitative estimate often suffices P( magnitude 7+ earthquake within 10 years ) = ? This is often a much easier task

3 What do you think? I toss a coin 1000 times. The probability that I get a streak of 14 consecutive heads is < 10% ≈ 50% > 90% AB C

4 Consecutive heads where I i is an indicator r.v. for the event “14 consecutive heads starting at position i ” Let N be the number of occurrences of 14 consecutive heads in 1000 coin flips. N = I 1 + … + I 987 E[I i ] = P(I i = 1) = 1/2 14 E[N ] = 987 ⋅ 1/2 14 = 987/16384 ≈

5 Markov’s inequality For every non-negative random variable X and every value a : P(X ≥ a) ≤ E[X] / a. E[N ] ≈ P[N ≥ 1] ≤ E[N ] / 1 ≤ 6%.

6 Proof of Markov’s inequality For every non-negative random variable X : and every value a : P(X ≥ a) ≤ E[X] / a. E[X ] = E[X | X ≥ a ] P(X ≥ a) + E[X | X < a ] P(X < a) ≥ 0 ≥ a ≥ 0 E[X ] ≥ a P(X ≥ a) + 0.

7 Hats 1000 people throw their hats in the air. What is the probability at least 100 people get their hat back? N = I 1 + … + I 1000 where I i is the indicator for the event that person i gets their hat. Then E[I i ] = P(I i = 1) = 1/n Solution E[N ] = n 1/n = 1 P[N ≥ 100] ≤ E[N ] / 100 = 1%.

8 Patterns A coin is tossed 1000 times. Give an upper bound on the probability that the pattern HH occurs: (b) at most 100 times (a) at least 500 times

9 Patterns Let N be the number of occurrences of HH. P[N ≥ 500] ≤ E[N ] / 500 = /500 ≈ 49.88% so 500+ HH s occur with probability ≤ 49.88%. P[N ≤ 100] ≤ ? P[999 – N ≥ 899] (b) P[N ≤ 100] =≤ E[999 – N ] / 899 = (999 – )/ 899 ≤ 83.34% Last time we calculated E[N ] = 999/4 = (a)

10 Computer simulation of patterns # toss n coins and count number of consecutive head pairs def consheads(n): count = 0 lastone = randint(0, 1) thisone = randint(0, 1) for i in range(n - 1): if lastone == 1 and thisone == 1: count = count + 1 lastone = thisone thisone = randint(0, 1) return count >>> for i in range(100): print(consheads(1000), end = “ ”)

11 Chebyshev’s inequality For every random variable X and every t : P(|X –  | ≥ t  ) ≤ 1 / t 2. where  = E[X],  = √Var[X].

12 Patterns E[N ] = 999/4 = Var[N] = (5 ⋅ 999 – 7)/16 =  =  ≈ (a) P(X ≥ 500) ≤ P(|X –  | ≥  ) ≤ 1/ ≈ 0.50% (b) P(X ≤ 100) ≤ P(|X –  | ≥ 8.47  ) ≤ 1/ ≈ 1.39%

13 Proof of Chebyshev’s inequality For every random variable X and every a : P(|X –  | ≥ t  ) ≤ 1 / t 2. where  = E[X],  = √Var[X]. P(|X –  | ≥ t  ) = P((X –  ) 2 ≥ t 2  2 ) ≤ E[(X –  ) 2 ] / t 2  2 = 1 / t 2.

14 An illustration   – t   + t   P(|X –  | ≥ t  ) ≤ 1 / t 2.  a P( X ≥ a  ) ≤  / a. 0 Markov’s inequality: Chebyshev’s inequality:

15 Polling

16 X i = 1 if i 0 if i X 1,…, X n are independent Bernoulli(  ) where  is the fraction of blue voters X = X 1 + … + X n X/n is the pollster’s estimate of 

17 Polling How accurate is the pollster’s estimate X/n ? E[X] = =  n E[X 1 ] + … + E[X n ] Var[X]= Var [X 1 ] + … + Var [X n ] =  2 n  = E[X i ],  = √Var[X i ] X = X 1 + … + X n

18 Polling E[X] =  n Var[X] =  2 n P( |X –  n| ≥ t  √n ) ≤ 1 / t 2. P( |X/n –  | ≥  ) ≤ . confidence error sampling error X = X 1 + … + X n  nn

19 The weak law of large numbers For every ,  > 0 and n ≥  2  (  2  ) : P(|X/n –  | ≥  ) ≤  X 1,…, X n are independent with same p.m.f. (p.d.f.)  = E[X i ],  = √Var[X i ], X = X 1 + … + X n

20 Polling Say we want confidence error  = 10% and sampling error  = 5%. How many people should we poll? For ,  > 0 and n ≥  2  (  2  ) : P(|X/n –  | ≥  ) ≤  n ≥  2  (  2  ) ≥ 4000  2 For Bernoulli(  ) samples,  2 =  (1 –  ) ≤ 1/4 This suggests we should poll about 1000 people.

21 A polling simulation number of people polled n X 1 + … + X n n X 1, …, X n independent Bernoulli(1/2) pollster’s estimate

22 A polling simulation number of people polled n X 1 + … + X n n 20 simulations pollster’s estimate

23 A more precise estimate Let’s assume n is large. Weak law of large numbers: X 1 + … + X n ≈  n with high probability X 1,…, X n are independent with same p.m.f. (p.d.f.) P( |X –  n| ≥ t  √n ) ≤ 1 / t 2. this suggests X 1 + … + X n ≈  n + T  √n

24 Some experiments X = X 1 + … + X n X i independent Bernoulli(1/2) n = 6 n = 40

25 Some experiments X = X 1 + … + X n X i independent Poisson(1) n = 3 n = 20

26 Some experiments X = X 1 + … + X n X i independent Uniform(0, 1) n = 2 n = 10

27 The normal random variable f(t) = (2  ) -½ e -t /2 2 t p.d.f. of a normal random variable

28 The central limit theorem X 1,…, X n are independent with same p.m.f. (p.d.f.) where T is a normal random variable.  = E[X i ],  = √Var[X i ], X = X 1 + … + X n For every t (positive or negative): lim P(X ≤  n + t  √n ) = P(T ≤ t) n → ∞

29 Polling again Probability model X = X 1 + … + X n X i independent Bernoulli(  )  = fraction that will vote blue E[X i ] = ,  = √Var[X i ] = √  (1 -  ) ≤ ½. Say we want confidence error  = 10% and sampling error  = 5%. How many people should we poll?

30 Polling again lim P(X ≤  n – t  √n ) = P(T ≤ -t) n → ∞ 5% n lim P(X ≥  n + t  √n ) = P(T ≥ t) n → ∞ 5% n lim P(X/n is not within 5% of  ) = P(T ≤ -t) + P(T ≥ t) n → ∞ = 2 P(T ≤ -t) t  √n = 5% n t  = 5%√n/ 

31 The c.d.f. of a normal random variable t F(t)F(t) P(T ≤ -t) t -t P(T ≥ t)

32 Polling again confidence error = 2 P(T ≤ -t) We want a confidence error of ≤ 10% : = 2 P(T ≤ -5%√n/  ) ≤ 2 P(T ≤ -√n/10) We need to choose n so that P(T ≤ -√n/10) ≤ 5%.

33 Polling again t F(t)F(t) P(T ≤ -√n/10) ≤ 5% -√n/10 ≈ n ≈ ≈ 271

34 Party Give an estimate of the probability that the average arrival time of a guest is past 8:40pm. Ten guests arrive independently at a party between 8pm and 9pm.

35 Acute triangles Drop three points at random on a square. What is the probability that they form an acute triangle?

36 Simulation # indicate whether the triangle with the given vertices is acute def is_acute(x1, y1, x2, y2, x3, y3): def dot(x1, y1, x2, y2, x0, y0): return (x1 - x0) * (x2 - x0) + (y1 - y0) * (y2 - y0) a1 = dot(x2, y2, x3, y3, x1, y1) a2 = dot(x3, y3, x1, y1, x2, y2) a3 = dot(x1, y1, x2, y2, x3, y3) return a1 > 0 and a2 > 0 and a3 > 0 # count the fraction of acute triangles among n random samples def simulate_triangles(n): count = 0 for i in range(n): if is_acute(uniform(0.0, 1.0), uniform(0.0, 1.0), uniform( count = count + 1 return 1.0 * count / n Idea: Conduct a poll among random triangles!

37 Simulation Want sampling error  =.01, confidence error  = Rigorous estimate: By weak law of large numbers, we can choose n =  2  (  2  ) ≤ 50,000 > simulate_triangles(50000) > simulate_triangles(50000) > simulate_triangles(50000)

38 Simulation Want sampling error  =.01, confidence error  = Non-rigorous (but better) estimate: Central limit theorem suggests choosing n such that t  √n ≤  n, P(Normal < -t) =  > simulate_triangles(5366) > simulate_triangles(5366) > simulate_triangles(5366) t = n = (t/2  ) 2 ≈ 5366


Download ppt "ENGG 2040C: Probability Models and Applications Andrej Bogdanov Spring 2014 8. Limit theorems."

Similar presentations


Ads by Google