Presentation is loading. Please wait.

Presentation is loading. Please wait.

#10 The Central Limit Theorem

Similar presentations


Presentation on theme: "#10 The Central Limit Theorem"— Presentation transcript:

1 #10 The Central Limit Theorem
Opinionated Lessons in Statistics by Bill Press #10 The Central Limit Theorem Professor William H. Press, Department of Computer Science, the University of Texas at Austin

2 S = P X w t h ´ Á ( t ) = Y µ ¶ 1 ¡ ¾ + ¢ e x p " l n # ¼ Ã ! p ( ¢ )
The Central Limit Theorem is the reason that the Normal (Gaussian) distribution is uniquely important. We need to understand where it does and doesn’t apply. Let S = 1 N P X i w t h Can always subtract off the means, then add back later. Then Á S ( t ) = Y i X N 1 2 + e x p " l n # Ã ! Whoa! It better have a convergent Taylor series around zero! (Cauchy doesn’t, e.g.) These terms decrease with N, but how fast? So, S is normally distributed p S ( ) N o r m a l ; 1 2 P i Professor William H. Press, Department of Computer Science, the University of Texas at Austin

3 Intuitively, the product of a lot of arbitrary functions that all start at 1 and have zero derivative looks like this: 1 Because the product falls off so fast, it loses all memory of the details of its factors except the starting value 1 and fact of zero derivative. In characteristic function space that’s basically the CLT. Professor William H. Press, Department of Computer Science, the University of Texas at Austin

4 p ( ¢ ) » N o r m a l ; P ¾ N S = P X V a r ( N S ) = p ( ¢ ) » N o r
CLT is usually stated about the sum of RVs, not the average, so p S ( ) N o r m a l ; 1 2 P i Now, since and N S = P X i V a r ( N S ) = 2 it follows that the simple sum of a large number of r.v.’s is normally distributed, with variance equal to the sum of the variances: p P X i ( ) N o r m a l ; 2 If N is large enough, and if the higher moments are well-enough behaved, and if the Taylor series expansion exists! Also beware of borderline cases where the assumptions technically hold, but convergence to Normal is slow and/or highly nonuniform. (This can affect p-values for tail tests, as we will soon see.) Professor William H. Press, Department of Computer Science, the University of Texas at Austin

5 x ; = 1 : N P ( x j ¹ ; ¾ ) = Y 1 p 2 ¼ e P ( ¹ ; ¾ j x ) / 1 p 2 ¼ e
Since Gaussians are so universal, let’s learn estimate the parameters m and s of a Gaussian from a set of points drawn from it: For now, we’ll just find the maximum of the posterior distribution of (m,s), given some data, for a uniform prior. This is called “maximum a posteriori (MAP)” by Bayesians, and “maximum likelihood (MLE)” by frequentists. The data is: x i ; = 1 : N The statistical model is: P ( x j ; ) = Y i 1 p 2 e uniform P ( ; j x ) / 1 p 2 N e i The posterior estimate is: Now find the MAP (MLE): Ha! The MAP mean is the sample mean, the MAP variance is the sample variance! = @ P 2 ( X i x N ) 1 = @ P 3 [ N 2 + X i ( x ) ] 1 Bessel’s Correction: N–1 Professor William H. Press, Department of Computer Science, the University of Texas at Austin

6 It won’t surprise you that I did the algebra by computer, in Mathematica:
Professor William H. Press, Department of Computer Science, the University of Texas at Austin


Download ppt "#10 The Central Limit Theorem"

Similar presentations


Ads by Google